Voyage AI MCP. Get high-precision vectors from text, images, and code.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Voyage AI provides high-precision embedding and reranking services for advanced RAG systems. It lets your agent generate vectors from text, code, images, and complex documents.
You can refine search results with cross-encoders or process massive datasets using managed batch jobs.
What your AI agents can do
Cancel batch
Stops an active batch inference job using its unique ID.
Create batch
Starts a new, large-scale batch job to process files or data for embeddings.
Create contextualized embeddings
Generates vectors for document chunks while preserving the surrounding text context.
Creates high-dimensional vectors for plain text, turning readable content into numerical data usable by vector databases.
Combines images and text into a single vector representation, enabling the agent to perform visual searches alongside text queries.
Takes initial search results and reorders them based on relevance score using cross-encoders, ensuring the top hits are the most accurate context for your query.
Manages large-scale data transformation by submitting batch jobs to process thousands of files asynchronously, then monitoring their status.
Generates embeddings for document chunks while preserving metadata about the chunk's origin and surrounding text, which reduces loss of context during retrieval.
Provides tools to upload files for batch jobs, retrieve file metadata (get_file), or download specific content using get_file_content.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Voyage AI (AI Embeddings API) MCP Server: 13 Tools
These tools give your agent full control over the data lifecycle, from uploading files to generating vectors and refining search results.
019e5d66cancel batch
Stops an active batch inference job using its unique ID.
019e5d66create batch
Starts a new, large-scale batch job to process files or data for embeddings.
019e5d66create contextualized embeddings
Generates vectors for document chunks while preserving the surrounding text context.
019e5d66create embeddings
Creates standard embeddings (vectors) from simple text input.
019e5d66create multimodal embeddings
Generates vectors by combining and embedding both text and image data.
019e5d66delete file
Removes a file from the server's tracked storage.
019e5d66get batch
Checks the current status and progress of an existing batch job.
019e5d66get file
Retrieves metadata (like file type or size) for a specific uploaded file.
019e5d66get file content
Downloads the actual raw content of an already uploaded file.
019e5d66list batches
Shows a list of all batch jobs that have been created or are pending.
019e5d66list files
Lists all files currently stored and tracked on the server.
019e5d66rerank
Reorders a list of documents based on how relevant they are to a specific query.
019e5d66upload file
Uploads one or more files, specifically designating them for future batch processing.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Voyage AI (AI Embeddings API), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
You're building a sophisticated RAG system, and you need reliable vector generation that handles everything—plain text, images, whole documents. This server gives your agent high-precision embeddings and reranking capabilities right out of the gate. It’s built for deep context retrieval.
Generating Vectors and Context
You can start by creating standard embeddings using create_embeddings. You feed it simple text strings, and it spits out high-dimensional vectors that you use in your database. If your data includes images alongside text, don't worry; you'll generate combined vectors using create_multimodal_embeddings, which lets your agent run visual searches just as easily as text queries.
For documents, you're gonna want more than basic embeddings. Use create_contextualized_embeddings when you process document chunks. This tool embeds the chunk while keeping track of the surrounding context and metadata; that means when you retrieve it later, you don't lose vital information about where in the document the snippet came from.
If you just run with standard embeddings on large documents, you risk losing this critical contextual depth.
Refining Search Results
Initial search results are only half the battle. You gotta make sure the top hits are actually the best ones. Use rerank to take a list of initial document chunks and reorder them based on how relevant they really are to your specific query. This process, which uses cross-encoders, makes certain that when your agent reads the context, it's reading the most accurate info first.
Handling Massive Data Loads (Batch Processing)
When you’re dealing with thousands of files—say, an entire corporate knowledge base—you can't process them all at once. You start by uploading those assets using upload_file, which designates multiple files for later batch work. To kick off the large-scale transformation, you call create_batch. This submits a job to process your whole dataset asynchronously.
You need to keep an eye on that process. Use list_batches to see every job currently running or waiting in line. If you want to check the progress of one specific job, use get_batch, which tells you exactly where it stands. If a batch job goes sideways or you change your mind, you can shut it down instantly using cancel_batch, provided you have its unique ID.
Managing Your Files and Assets
The server tracks everything you upload. To see what files are sitting there waiting for processing, just run list_files. If you need details on a specific file—like its size or file type—use get_file to grab that metadata. Need the raw content of an uploaded file? You'll download it using get_file_content.
And when you’re done with a file and want to clean up your storage, use delete_file to remove it from the server’s tracked space.
This suite handles everything: embedding creation for text or images, making sure context stays intact, running massive jobs in the background, and fine-tuning search results until they're perfect.
How Voyage AI MCP Works
- 1 Subscribe to the server and enter your unique Voyage AI API Key.
- 2 Use tools like
create_embeddingsorcreate_multimodal_embeddingsto transform data into vectors, or use batch tools (create_batch) for large sets of files. - 3 Your agent receives the necessary vector IDs or reranked documents, which it uses directly in its response generation process.
The bottom line is that you run complex, multi-step data pipelines—from uploading source material to generating context vectors—all through your single agent workflow.
Who Is Voyage AI MCP For?
This server is for AI Engineers and Data Scientists who aren't satisfied with basic keyword search. If your application handles proprietary documents, codebases, or visual data, and retrieval accuracy matters more than speed, you need this. It’s built for people tired of 'fuzzy' searches that miss the mark.
Building production RAG pipelines that require precise vector generation (e.g., using create_contextualized_embeddings) and robust batch management.
Experimenting with multimodal search by vectorizing image-text pairs or refining retrieval accuracy using the rerank tool.
Integrating high-precision, scalable search into a web application without having to manage complex indexing infrastructure themselves.
What Changes When You Connect
- Achieve better search relevance. Instead of just passing initial results to the LLM, use
rerankto improve context scoring using cross-encoders. The final answer quality jumps because you're giving it the absolute best material first. - Handle diverse data types easily. You don't have to write separate logic for images and text. Use
create_multimodal_embeddingsonce, and your agent gets a single vector space that represents both visual and written information. - Process massive documents without timeouts. Don't process a 500-page PDF in one go. Upload the file with
upload_file, then usecreate_batchto let the server handle the chunking and embedding across millions of tokens. - Minimize context loss during retrieval. Standard embeddings sometimes forget what was around a key phrase. Use
create_contextualized_embeddingsso your agent knows not just what the text says, but where it came from in the original document. - Maintain full visibility over data assets. Before you run anything, use
list_filesandget_fileto check what's actually uploaded and available for processing, making debugging straightforward.
Real-World Use Cases
Building a Codebase Search Tool
A developer needs to find how a specific API function is used across 50 different modules. They use create_embeddings with the codebase files, then ask their agent to run rerank against a query like 'How do I update user credentials?' This gives them ranked snippets of code directly from the most relevant files.
Analyzing Internal Policy Documents
A compliance officer needs to know which policies address both 'remote work' and 'data retention'. Instead of keyword search, they use create_contextualized_embeddings on their document library. The agent can then query the vectors, returning chunks that maintain the context of surrounding clauses.
Visual Question Answering (VQA)
A customer support bot is shown a picture of an error code and asked, 'What does this mean?' The agent uses create_multimodal_embeddings to combine the image vector with the text query. This allows it to understand visual context that simple text search would miss.
Large-Scale Indexing of Manuals
A technical writer needs to index 10,000 pages of product manuals for a new support portal. They use upload_file to stage the source documents and then initiate a job with create_batch. The agent monitors get_batch until the entire dataset is vectorized.
The Tradeoffs
Assuming simple embeddings are enough
The user runs basic create_embeddings on a document chunk, gets results, and passes them to the LLM. The resulting context is vague because surrounding critical text was lost.
→
Always use create_contextualized_embeddings. This retains metadata about the source block—it's the difference between generating a vector and generating a precise vector tied to its location.
Relying on direct file retrieval
The user tries to pass raw text from get_file_content directly into an embedding model without chunking. The input is too long, and the resulting vector loses meaning.
→
Use upload_file first, then let the server manage the chunks via a batch process (create_batch). Never feed large raw files directly to the endpoint.
Stopping at initial search results
The agent runs a search and just picks the top 3 documents by default. One of those three is actually irrelevant, but it confuses the LLM.
→
Always run rerank after your initial vector search. It's a dedicated step that filters noise and guarantees the most relevant context gets prioritized for the final answer.
When It Fits, When It Doesn't
Use this server if retrieval accuracy is critical and your data is complex, meaning it involves code, images, or massive documents. You need high-precision search over simple keyword matching.
Don't use this if:
1. Your goal is basic internal communication (e.g., 'Who was in the meeting?'). Use a dedicated messaging service API instead.
2. You only care about surface-level, direct answers that don't require deep context mining. A simple database query might suffice.
When you must use it: If your retrieval process fails because of data complexity (e.g., 'My search keeps missing the image context,' or 'The document is too big for one call'), then this server provides the specialized tools (create_multimodal_embeddings, create_contextualized_embeddings, and batch processing) to handle that technical debt.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Voyage AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 13 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Getting good answers shouldn't feel like pulling teeth.
Right now, if you need an AI agent to answer questions based on your internal documents, it's a messy process. You have to manually upload PDFs to one place, copy code snippets from another repo, and then pray the basic vector search hits the right chunk. If the document is massive, or if the key context is buried in an image caption, you lose data.
With Voyage AI MCP Server, that whole pipeline gets streamlined into a few calls. You upload your source material once. Your agent runs `create_embeddings`, then uses `rerank` to filter noise, and finally presents the LLM with only the most relevant, context-rich information. The answer is right there; you just have to ask for it.
Voyage AI (AI Embeddings API) MCP Server: Vectorize everything.
Manual data preparation involves writing separate code paths—one for text files, one for images, and another for managing the chunking logic. This complexity makes maintenance hell. You end up running five different APIs just to build one search feature.
This server abstracts that away. Whether you're dealing with an image or a 500-page manual, you call `create_multimodal_embeddings` or start a batch job. The underlying complexity of vectorization and chunk management is handled; your agent just gets the high-quality vectors it needs.
Common Questions About Voyage AI MCP
How do I generate embeddings for code using Voyage AI (AI Embeddings API)? +
Use the create_embeddings tool, ensuring you specify a model optimized for code. This generates vectors that respect programming syntax and structure better than general text models.
Is there a way to process thousands of files at once with Voyage AI (AI Embeddings API)? +
Yes, use the batch tools. First, stage your data using upload_file, then initiate the job via create_batch. You monitor progress and status updates using get_batch.
What's the difference between basic embeddings and contextualized ones in Voyage AI (AI Embeddings API)? +
Contextualized embeddings (create_contextualized_embeddings) keep track of where a chunk came from. This prevents retrieval errors because the vector knows its surrounding document context.
How do I use Voyage AI (AI Embeddings API) to search images and text together? +
You must use create_multimodal_embeddings. This tool converts both visual data and textual descriptions into a single, comparable vector space.
What credentials do I need to set up the Voyage AI embeddings API? +
You must provide your specific Voyage AI API Key during setup. This key authenticates every call, ensuring only authorized agents can run jobs and access the models.
How do I use the `rerank` tool to improve search results? +
The rerank tool takes your initial set of documents (vectors) and scores them against a specific query. It boosts relevance by calculating which pieces of text are mathematically closest to the user's intent.
If I uploaded a file for batch processing, how do I manage it afterward? +
You can use list_files and then delete_file. This lets you clean up local references or metadata from files after the inference job is complete.
What happens if my batch job fails, and how do I check its status? +
Use the get_batch tool with your specific ID. It returns the current operational status and often provides detailed error messages or progress updates for debugging.
How does reranking improve my RAG system's accuracy? +
By using the rerank tool, your agent can take a list of potentially relevant documents and re-score them using a powerful cross-encoder model. This ensures that the most semantically relevant pieces of information are ranked first, providing better context for the LLM to answer queries.
What is the benefit of using contextualized embeddings? +
The create_contextualized_embeddings tool allows you to embed chunks of text while considering the surrounding content of the same document. This prevents loss of meaning that often happens with standard chunking, leading to much higher retrieval precision.
Can I process images and text in the same vector space? +
Yes! With create_multimodal_embeddings, you can provide interleaved sequences of text and image URLs. Voyage AI will generate a single embedding that represents the combined semantic meaning, perfect for visual or hybrid search.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Honeybadger (Error Tracking)
Monitor app exceptions and uptime via Honeybadger — list projects, resolve faults, and track deployments.
Requirement Decomposition Prover
AI generates the happy path but omits error handling, edge cases, security, and observability — the '80% Problem'. This tool forces complete requirement decomposition BEFORE code generation: specify inputs/outputs, map failure modes, cover boundary conditions, validate OWASP, plan logging.
Conductor (Netflix OSS)
Automate workflow orchestration via Netflix Conductor — manage workflow and task definitions, and start executions directly from any AI agent.
You might also like
Taboola
Manage Taboola advertising campaigns, ads, and performance reports directly from any AI agent.
Segment
Equip your AI agent with read access to your Segment workspace to audit sources, destinations, warehouses, and tracking plans natively.
DOAJ
Search and manage open access research journals and articles via the Directory of Open Access Journals (DOAJ).