Vertex AI Vector Search MCP. Search embeddings and manage indices from chat.

Q: How do I find all indexes in my project using listvectorindexes?

Run listvectorindexes. This tool gives you a complete inventory of every vector index created across your entire GCP project, regardless of whether it's currently deployed to an endpoint.

Q: Should I use listdeployedindexes or listvectorindexes?

Use listvectorindexes when you need a count of everything in the system. Use listdeployedindexes when you already know the specific endpoint and only want to see which indexes are live on it.

Q: How do I check if an index is finished building using listvectoroperations?

Run listvectoroperations. This shows all long-running tasks. Check the status field: you're good to query when the task reports SUCCEEDED.

Q: How do I know if my data set can handle large queries when using searchnearestneighbors?

The performance of searchnearestneighbors depends on two factors: the dimensionality of your vectors and the number of nodes in the index. Higher dimensions mean slower lookups, but better semantic matching.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Vertex AI Vector Search connects Google's vector matching capabilities right into your agent or IDE. It lets you query billions of semantic embeddings and manage Vertex Index endpoints conversationally.

You can perform nearest neighbor searches, list active indexes (`list_vector_indexes`), track index build status using `list_vector_operations`, and get configuration details with `get_index_details`.

This is infrastructure management for your AI agent.

What your AI agents can do

Get index details

Retrieves configuration and metadata for a specific vector index.

List deployed indexes

Lists all indexes that are currently deployed to a specified endpoint.

List index endpoints

Retrieves a list of every index endpoint set up in the project.

+ 3 more capabilities included

Search Semantic Neighbors

Pass an endpoint ID, deployed index ID, and a vector array to find the closest semantic matches within your data.

List All Vector Indexes

Retrieve a list of every vector index configured in your entire Google Cloud project.

Check Index Configuration

Fetch specific metadata and setup details for one named vector index.

List Available Endpoints

Get a list of all active service endpoints defined in your project.

Check Deployed Indexes

See which specific vector indexes are currently deployed and active on a given endpoint.

Monitor Build Jobs

Review the timeline and status of long-running operations, like index deployments or builds.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Vertex AI Vector Search MCP Server: 6 Tools for Index Management

This server gives you the tools needed to list, check, and query vector indices across your Google Cloud project.

get019d761b

get index details

Retrieves configuration and metadata for a specific vector index.

list019d761b

list deployed indexes

Lists all indexes that are currently deployed to a specified endpoint.

list019d761b

list index endpoints

Retrieves a list of every index endpoint set up in the project.

list019d761b

list vector indexes

Lists all vector indexes available across the entire Google Cloud project.

list019d761b

list vector operations

Lists records of long-running background operations related to vector index management.

search019d761b

search nearest neighbors

Performs a nearest neighbor search by comparing a query vector against a deployed index.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Vertex AI Vector Search, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You don't wanna jump into the Cloud Console just to run a vector search. This server plugs Google’s massive vector matching power right into your agent or IDE. It lets you manage billions of semantic embeddings conversationally. You treat it like infrastructure management for your AI client, not manual data entry.

If you need to find the closest thing in your dataset—the nearest neighbor—you use search_nearest_neighbors. All you gotta do is pass an endpoint ID, a deployed index ID, and the vector array generated by your query. The agent then takes those inputs and finds the absolute best semantic matches within the data you've indexed.

When you need to know what indexes exist across your whole Google Cloud project, list_vector_indexes gives you that complete rundown. You can check every single vector index configured, regardless of whether it’s active or not. If you want more detail on a specific index—like its dimensionality or how it's set up—you use get_index_details.

This pulls the full metadata for one named vector index so you know exactly what you're dealing with.

Your agent also needs to track where your services are running. To see all the service endpoints defined in your project, you call list_index_endpoints. These endpoints are critical because they scale your RAG applications and direct traffic. Once you have those endpoints listed, you can narrow it down by checking which specific vector indexes are actually deployed and active on a given endpoint using list_deployed_indexes.

That tells you if the index is ready for production reads.

And what about status? If you're working with multi-terabyte datasets, building or deploying an index isn't instantaneous. You need to track those long-running background operations. For that, use list_vector_operations to review the timeline and current status of tasks—like deployments or builds—so you know when your data is actually available for querying.

Basically, this server gives your agent total visibility into your vector search setup: it lets you list every index (list_vector_indexes), check its config (get_index_details), see which network endpoints are active (list_index_endpoints), confirm what indexes are running on those endpoints (list_deployed_indexes), monitor if the builds finished (list_vector_operations), and finally, run the search itself (search_nearest_neighbors).

It’s a complete toolset for managing vector databases without leaving your chat window.

How Vertex AI Vector Search MCP Works

1 Enable the Google Cloud Vertex AI API for your project.
2 Provide the server with your Project ID, desired Location, and OAuth2 Access Token credentials.
3 Ask your agent to fetch and compare dense geometrical data structures conversationally using tools like search_nearest_neighbors.

The bottom line is that you get low-latency access to billion-scale embedding lookups without touching the Cloud Console.

Who Is Vertex AI Vector Search MCP For?

This is for MLOps engineers and data architects who run RAG pipelines. If you're tired of clicking through dashboards just to check if an index build finished, this server lets you manage infrastructure status—from listing indexes to running vector searches—all via chat.

MLOps Engineer

Uses list_vector_operations and get_index_details to track the progress of multi-hour index deployments, ensuring CI/CD pipelines stay green.

Backend Architect

Runs list_index_endpoints and verifies infrastructure configuration details (like shards or node counts) tied to critical vector databases organization-wide.

RAG Data Scientist

Uses search_nearest_neighbors to quickly push experimental float arrays into production endpoints, gauging proximity precision on the fly for testing.

What Changes When You Connect

Find semantic matches instantly. Instead of manually querying data, you pass a query vector to search_nearest_neighbors and get the three most semantically similar results immediately.
Stay on top of infrastructure status. Use list_vector_operations to track multi-hour index build progress without leaving your coding environment or opening multiple tabs.
Know exactly what’s live. Running list_index_endpoints shows you every active network connection, so you never query the wrong deployment version.
Audit your data layers easily. Running list_vector_indexes gives a single view of all indexes in the project, helping verify dimensionality and naming conventions.
Deep dive into configuration. When you run get_index_details, you get verified metadata on a specific index—things like shard count or required dimensions.

Real-World Use Cases

Debugging RAG Context Loss

A data scientist wants to know if the LLM is pulling context from the correct knowledge base. They run list_index_endpoints first, identify the production endpoint ID, and then use search_nearest_neighbors with a test vector. The results prove whether the intended index (document_vault_prod) is active on that specific endpoint.

Verifying Index Build Completion

An MLOps engineer just kicked off an index build for petabytes of data. They don't want to wait. They prompt the agent, which runs list_vector_operations. The chat response confirms the operation ID and status (RUNNING vs. SUCCEEDED), allowing them to move on while monitoring.

Checking Project Scope

A new architect needs a full inventory of all vector search capabilities. They simply ask the agent to run list_vector_indexes. The response provides a clean list, including the name and dimensions for every index in the entire GCP project.

Pinpointing Production Data

A backend developer needs confirmation that only the stable production index is serving traffic. They use list_deployed_indexes against a known endpoint ID, ensuring they aren't accidentally pointing to an experimental staging build.

The Tradeoffs

Assuming global visibility

Asking the agent to just 'show me all my indexes' without knowing if you mean all project indexes or only those connected to a specific endpoint.

→ First, use list_vector_indexes for a total count of everything in the project. If you need to verify what’s live on one service, run list_deployed_indexes after identifying the target endpoint ID using list_index_endpoints.

Skipping operational checks

Running a complex query using search_nearest_neighbors immediately after an index build without checking if the operation is finished.

→ Always check for pending tasks first. Run list_vector_operations to ensure any long-running build job has reached SUCCEEDED status before executing live queries.

Mixing up index scope

Confusing the general list of all indexes (list_vector_indexes) with which indexes are actually ready for querying on a specific service endpoint.

→ The list_vector_indexes shows existence; the list_deployed_indexes shows readiness. Use both tools in sequence to verify the operational state.

When It Fits, When It Doesn't

Use this server if your workflow requires connecting an LLM or agent directly to massive, indexed semantic data without manual dashboard navigation. You must know what you are looking for: If you need a list of every index that exists in the project regardless of deployment status, run list_vector_indexes. If you need to see which specific indexes are active on a known endpoint ID, use list_deployed_indexes paired with list_index_endpoints. Only query vectors using search_nearest_neighbors when you have verified that both the target index and its deployment endpoint are confirmed via the listing tools. Don't attempt to manage indexing or querying without first confirming status using list_vector_operations; otherwise, your queries may fail due to incomplete builds.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Vertex AI Vector Search. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_index_details list_deployed_indexes list_index_endpoints list_vector_indexes list_vector_operations search_nearest_neighbors

Checking infrastructure status shouldn't require opening five different Cloud Console tabs.

Today, checking the operational status of a vector index is a nightmare. You have to open the Indexing page, then maybe go to the Operations tab for build history, and finally jump over to the Endpoints section just to confirm if an index is actually live on a service. It's click-heavy and slow.

With this MCP server, you ask your agent once. It runs `list_vector_operations` to check the background builds, then uses `list_index_endpoints` for connectivity, giving you all the status updates—build history, index existence, and live connections—in a single chat response.

Using Vertex AI Vector Search MCP Server gives you instant semantic search capability.

Before this server, running a similarity search meant scripting complex API calls with specific IDs and hardcoded vectors. If the index ID changed or the endpoint was updated, your entire script broke until you manually fixed it.

Now, you simply tell your agent to run `search_nearest_neighbors`. The server handles the required parameters—endpoint, index, vector—allowing you to focus purely on the query logic, not the infrastructure plumbing.

Common Questions About Vertex AI Vector Search MCP

How do I find all indexes in my project using list_vector_indexes? +

Run list_vector_indexes. This tool gives you a complete inventory of every vector index created across your entire GCP project, regardless of whether it's currently deployed to an endpoint.

Should I use list_deployed_indexes or list_vector_indexes? +

Use list_vector_indexes when you need a count of everything in the system. Use list_deployed_indexes when you already know the specific endpoint and only want to see which indexes are live on it.

How do I check if an index is finished building using list_vector_operations? +

Run list_vector_operations. This shows all long-running tasks. Check the status field: you're good to query when the task reports SUCCEEDED.

What information does search_nearest_neighbors require? +

It requires three specific inputs in a JSON array format: the target endpoint ID, the deployed index ID, and your query vector (a list of floats).

What do I need to provide for the `list_index_endpoints` tool to run successfully? +

You must supply a valid OAuth2 Access Token and the target Google Cloud Project ID. The agent uses these credentials to authenticate against your cloud resources before listing any deployed endpoints.

If I run `get_index_details` but get an error, what does that usually mean? +

The failure is typically due to insufficient permissions or providing a non-existent Index ID. Check your IAM roles first; if access is granted, verify the index name and project scope are correct.

How do I know if my data set can handle large queries when using `search_nearest_neighbors`? +

The performance of search_nearest_neighbors depends on two factors: the dimensionality of your vectors and the number of nodes in the index. Higher dimensions mean slower lookups, but better semantic matching.

Beyond just listing an index name, what specific configuration metrics does `get_index_details` provide? +

You retrieve comprehensive metadata, including the exact dimensionality of the vectors (e.g., 768 or 1536), the configured shard count, and the current version status of that particular vector index.

How do I perform a nearest-neighbor similarity test via chat? +

Just write: Search my endpoint '1xxx' against index 'deployed_abc_1' looking for 3 nearest neighbors to the vector [0.015, -0.042, 0.111]. The queryIndexTool bridges to Vertex and returns the IDs and distances of your geometrical matches instantly.

Can I query a status for indices that take hours to build on GCP? +

Absolutely. Use the prompt: Check my google cloud vector operations. The listOperationsTool reveals all in-flight Cloud operations indicating completion percentages and precise timestamps, allowing you to sidestep the Google Console completely.

Where do I easily find the short-lived VERTEX_ACCESS_TOKEN? +

On your terminal with gcloud installed and logged in, simply type gcloud auth print-access-token. Copy the output stream starting with ya29... into your configurations and the integration is ready for connection.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript