Voyage AI MCP. Get high-precision vectors from text, images, and code.

Q: How do I generate embeddings for code using Voyage AI (AI Embeddings API)?

Use the createembeddings tool, ensuring you specify a model optimized for code. This generates vectors that respect programming syntax and structure better than general text models.

Q: Is there a way to process thousands of files at once with Voyage AI (AI Embeddings API)?

Yes, use the batch tools. First, stage your data using uploadfile, then initiate the job via createbatch. You monitor progress and status updates using getbatch.

Q: What's the difference between basic embeddings and contextualized ones in Voyage AI (AI Embeddings API)?

Contextualized embeddings (createcontextualizedembeddings) keep track of where a chunk came from. This prevents retrieval errors because the vector knows its surrounding document context.

Q: How do I use Voyage AI (AI Embeddings API) to search images and text together?

You must use createmultimodalembeddings. This tool converts both visual data and textual descriptions into a single, comparable vector space.

Q: How do I use the rerank tool to improve search results?

The rerank tool takes your initial set of documents (vectors) and scores them against a specific query. It boosts relevance by calculating which pieces of text are mathematically closest to the user's intent.

Q: If I uploaded a file for batch processing, how do I manage it afterward?

You can use listfiles and then deletefile. This lets you clean up local references or metadata from files after the inference job is complete.

Q: What happens if my batch job fails, and how do I check its status?

Use the getbatch tool with your specific ID. It returns the current operational status and often provides detailed error messages or progress updates for debugging.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Voyage AI provides high-precision embedding and reranking services for advanced RAG systems. It lets your agent generate vectors from text, code, images, and complex documents.

You can refine search results with cross-encoders or process massive datasets using managed batch jobs.

What your AI agents can do

Cancel batch

Stops an active batch inference job using its unique ID.

Create batch

Starts a new, large-scale batch job to process files or data for embeddings.

Create contextualized embeddings

Generates vectors for document chunks while preserving the surrounding text context.

+ 10 more capabilities included

Generate Standard Embeddings

Creates high-dimensional vectors for plain text, turning readable content into numerical data usable by vector databases.

Vectorize Multimodal Data

Combines images and text into a single vector representation, enabling the agent to perform visual searches alongside text queries.

Refine Search Context (Reranking)

Takes initial search results and reorders them based on relevance score using cross-encoders, ensuring the top hits are the most accurate context for your query.

Process Large Datasets in Batches

Manages large-scale data transformation by submitting batch jobs to process thousands of files asynchronously, then monitoring their status.

Embed Contextually Aware Chunks

Generates embeddings for document chunks while preserving metadata about the chunk's origin and surrounding text, which reduces loss of context during retrieval.

Manage File Assets

Provides tools to upload files for batch jobs, retrieve file metadata (get_file), or download specific content using get_file_content.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Voyage AI (AI Embeddings API) MCP Server: 13 Tools

These tools give your agent full control over the data lifecycle, from uploading files to generating vectors and refining search results.

cancel019e5d66

cancel batch

Stops an active batch inference job using its unique ID.

create019e5d66

create batch

Starts a new, large-scale batch job to process files or data for embeddings.

create019e5d66

create contextualized embeddings

Generates vectors for document chunks while preserving the surrounding text context.

create019e5d66

create embeddings

Creates standard embeddings (vectors) from simple text input.

create019e5d66

create multimodal embeddings

Generates vectors by combining and embedding both text and image data.

delete019e5d66

delete file

Removes a file from the server's tracked storage.

get019e5d66

get batch

Checks the current status and progress of an existing batch job.

get019e5d66

get file

Retrieves metadata (like file type or size) for a specific uploaded file.

get019e5d66

get file content

Downloads the actual raw content of an already uploaded file.

list019e5d66

list batches

Shows a list of all batch jobs that have been created or are pending.

list019e5d66

list files

Lists all files currently stored and tracked on the server.

action019e5d66

rerank

Reorders a list of documents based on how relevant they are to a specific query.

upload019e5d66

upload file

Uploads one or more files, specifically designating them for future batch processing.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Voyage AI (AI Embeddings API), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You're building a sophisticated RAG system, and you need reliable vector generation that handles everything—plain text, images, whole documents. This server gives your agent high-precision embeddings and reranking capabilities right out of the gate. It’s built for deep context retrieval.

Generating Vectors and Context

You can start by creating standard embeddings using create_embeddings. You feed it simple text strings, and it spits out high-dimensional vectors that you use in your database. If your data includes images alongside text, don't worry; you'll generate combined vectors using create_multimodal_embeddings, which lets your agent run visual searches just as easily as text queries.

For documents, you're gonna want more than basic embeddings. Use create_contextualized_embeddings when you process document chunks. This tool embeds the chunk while keeping track of the surrounding context and metadata; that means when you retrieve it later, you don't lose vital information about where in the document the snippet came from.

If you just run with standard embeddings on large documents, you risk losing this critical contextual depth.

Refining Search Results

Initial search results are only half the battle. You gotta make sure the top hits are actually the best ones. Use rerank to take a list of initial document chunks and reorder them based on how relevant they really are to your specific query. This process, which uses cross-encoders, makes certain that when your agent reads the context, it's reading the most accurate info first.

Handling Massive Data Loads (Batch Processing)

When you’re dealing with thousands of files—say, an entire corporate knowledge base—you can't process them all at once. You start by uploading those assets using upload_file, which designates multiple files for later batch work. To kick off the large-scale transformation, you call create_batch. This submits a job to process your whole dataset asynchronously.

You need to keep an eye on that process. Use list_batches to see every job currently running or waiting in line. If you want to check the progress of one specific job, use get_batch, which tells you exactly where it stands. If a batch job goes sideways or you change your mind, you can shut it down instantly using cancel_batch, provided you have its unique ID.

Managing Your Files and Assets

The server tracks everything you upload. To see what files are sitting there waiting for processing, just run list_files. If you need details on a specific file—like its size or file type—use get_file to grab that metadata. Need the raw content of an uploaded file? You'll download it using get_file_content.

And when you’re done with a file and want to clean up your storage, use delete_file to remove it from the server’s tracked space.

This suite handles everything: embedding creation for text or images, making sure context stays intact, running massive jobs in the background, and fine-tuning search results until they're perfect.

How Voyage AI MCP Works

1 Subscribe to the server and enter your unique Voyage AI API Key.
2 Use tools like create_embeddings or create_multimodal_embeddings to transform data into vectors, or use batch tools (create_batch) for large sets of files.
3 Your agent receives the necessary vector IDs or reranked documents, which it uses directly in its response generation process.

The bottom line is that you run complex, multi-step data pipelines—from uploading source material to generating context vectors—all through your single agent workflow.

Who Is Voyage AI MCP For?

This server is for AI Engineers and Data Scientists who aren't satisfied with basic keyword search. If your application handles proprietary documents, codebases, or visual data, and retrieval accuracy matters more than speed, you need this. It’s built for people tired of 'fuzzy' searches that miss the mark.

AI Engineer

Building production RAG pipelines that require precise vector generation (e.g., using create_contextualized_embeddings) and robust batch management.

Data Scientist

Experimenting with multimodal search by vectorizing image-text pairs or refining retrieval accuracy using the rerank tool.

Backend Developer

Integrating high-precision, scalable search into a web application without having to manage complex indexing infrastructure themselves.

What Changes When You Connect

Achieve better search relevance. Instead of just passing initial results to the LLM, use rerank to improve context scoring using cross-encoders. The final answer quality jumps because you're giving it the absolute best material first.
Handle diverse data types easily. You don't have to write separate logic for images and text. Use create_multimodal_embeddings once, and your agent gets a single vector space that represents both visual and written information.
Process massive documents without timeouts. Don't process a 500-page PDF in one go. Upload the file with upload_file, then use create_batch to let the server handle the chunking and embedding across millions of tokens.
Minimize context loss during retrieval. Standard embeddings sometimes forget what was around a key phrase. Use create_contextualized_embeddings so your agent knows not just what the text says, but where it came from in the original document.
Maintain full visibility over data assets. Before you run anything, use list_files and get_file to check what's actually uploaded and available for processing, making debugging straightforward.

Real-World Use Cases

Building a Codebase Search Tool

A developer needs to find how a specific API function is used across 50 different modules. They use create_embeddings with the codebase files, then ask their agent to run rerank against a query like 'How do I update user credentials?' This gives them ranked snippets of code directly from the most relevant files.

Analyzing Internal Policy Documents

A compliance officer needs to know which policies address both 'remote work' and 'data retention'. Instead of keyword search, they use create_contextualized_embeddings on their document library. The agent can then query the vectors, returning chunks that maintain the context of surrounding clauses.

Visual Question Answering (VQA)

A customer support bot is shown a picture of an error code and asked, 'What does this mean?' The agent uses create_multimodal_embeddings to combine the image vector with the text query. This allows it to understand visual context that simple text search would miss.

Large-Scale Indexing of Manuals

A technical writer needs to index 10,000 pages of product manuals for a new support portal. They use upload_file to stage the source documents and then initiate a job with create_batch. The agent monitors get_batch until the entire dataset is vectorized.

The Tradeoffs

Assuming simple embeddings are enough

The user runs basic create_embeddings on a document chunk, gets results, and passes them to the LLM. The resulting context is vague because surrounding critical text was lost.

→ Always use create_contextualized_embeddings. This retains metadata about the source block—it's the difference between generating a vector and generating a precise vector tied to its location.

Relying on direct file retrieval

The user tries to pass raw text from get_file_content directly into an embedding model without chunking. The input is too long, and the resulting vector loses meaning.

→ Use upload_file first, then let the server manage the chunks via a batch process (create_batch). Never feed large raw files directly to the endpoint.

Stopping at initial search results

The agent runs a search and just picks the top 3 documents by default. One of those three is actually irrelevant, but it confuses the LLM.

→ Always run rerank after your initial vector search. It's a dedicated step that filters noise and guarantees the most relevant context gets prioritized for the final answer.

When It Fits, When It Doesn't

Use this server if retrieval accuracy is critical and your data is complex, meaning it involves code, images, or massive documents. You need high-precision search over simple keyword matching.

Don't use this if:
1. Your goal is basic internal communication (e.g., 'Who was in the meeting?'). Use a dedicated messaging service API instead.
2. You only care about surface-level, direct answers that don't require deep context mining. A simple database query might suffice.

When you must use it: If your retrieval process fails because of data complexity (e.g., 'My search keeps missing the image context,' or 'The document is too big for one call'), then this server provides the specialized tools (create_multimodal_embeddings, create_contextualized_embeddings, and batch processing) to handle that technical debt.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Voyage AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 13 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

cancel_batch create_batch create_contextualized_embeddings create_embeddings create_multimodal_embeddings delete_file get_batch get_file get_file_content list_batches list_files rerank upload_file

Getting good answers shouldn't feel like pulling teeth.

Right now, if you need an AI agent to answer questions based on your internal documents, it's a messy process. You have to manually upload PDFs to one place, copy code snippets from another repo, and then pray the basic vector search hits the right chunk. If the document is massive, or if the key context is buried in an image caption, you lose data.

With Voyage AI MCP Server, that whole pipeline gets streamlined into a few calls. You upload your source material once. Your agent runs `create_embeddings`, then uses `rerank` to filter noise, and finally presents the LLM with only the most relevant, context-rich information. The answer is right there; you just have to ask for it.

Voyage AI (AI Embeddings API) MCP Server: Vectorize everything.

Manual data preparation involves writing separate code paths—one for text files, one for images, and another for managing the chunking logic. This complexity makes maintenance hell. You end up running five different APIs just to build one search feature.

This server abstracts that away. Whether you're dealing with an image or a 500-page manual, you call `create_multimodal_embeddings` or start a batch job. The underlying complexity of vectorization and chunk management is handled; your agent just gets the high-quality vectors it needs.

Common Questions About Voyage AI MCP

How do I generate embeddings for code using Voyage AI (AI Embeddings API)? +

Use the create_embeddings tool, ensuring you specify a model optimized for code. This generates vectors that respect programming syntax and structure better than general text models.

Is there a way to process thousands of files at once with Voyage AI (AI Embeddings API)? +

Yes, use the batch tools. First, stage your data using upload_file, then initiate the job via create_batch. You monitor progress and status updates using get_batch.

What's the difference between basic embeddings and contextualized ones in Voyage AI (AI Embeddings API)? +

Contextualized embeddings (create_contextualized_embeddings) keep track of where a chunk came from. This prevents retrieval errors because the vector knows its surrounding document context.

How do I use Voyage AI (AI Embeddings API) to search images and text together? +

You must use create_multimodal_embeddings. This tool converts both visual data and textual descriptions into a single, comparable vector space.

What credentials do I need to set up the Voyage AI embeddings API? +

You must provide your specific Voyage AI API Key during setup. This key authenticates every call, ensuring only authorized agents can run jobs and access the models.

How do I use the `rerank` tool to improve search results? +

The rerank tool takes your initial set of documents (vectors) and scores them against a specific query. It boosts relevance by calculating which pieces of text are mathematically closest to the user's intent.

If I uploaded a file for batch processing, how do I manage it afterward? +

You can use list_files and then delete_file. This lets you clean up local references or metadata from files after the inference job is complete.

What happens if my batch job fails, and how do I check its status? +

Use the get_batch tool with your specific ID. It returns the current operational status and often provides detailed error messages or progress updates for debugging.

How does reranking improve my RAG system's accuracy? +

By using the rerank tool, your agent can take a list of potentially relevant documents and re-score them using a powerful cross-encoder model. This ensures that the most semantically relevant pieces of information are ranked first, providing better context for the LLM to answer queries.

What is the benefit of using contextualized embeddings? +

The create_contextualized_embeddings tool allows you to embed chunks of text while considering the surrounding content of the same document. This prevents loss of meaning that often happens with standard chunking, leading to much higher retrieval precision.

Can I process images and text in the same vector space? +

Yes! With create_multimodal_embeddings, you can provide interleaved sequences of text and image URLs. Voyage AI will generate a single embedding that represents the combined semantic meaning, perfect for visual or hybrid search.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript