# Chroma Vector DB MCP MCP

> Chroma (Vector DB) MCP gives your AI agent full control over semantic data. List collections, perform high-dimensional vector similarity searches, and audit document counts in natural conversation. It lets you manage private knowledge bases directly from your chat client.

## Overview
- **Category:** loved-by-devs
- **Price:** Free
- **Tags:** embeddings, semantic-search, llm-infrastructure, vector-search, data-retrieval, machine-learning

## Description

When your AI needs to answer questions using proprietary or complex documents, it can't just guess; it needs context. This MCP connects your agent straight into Chroma, giving it visibility over your entire vector data layer. You stop writing boilerplate Python code for debugging and start asking simple questions—like 'How many records are in the staging environment?' or 'Find me all docs related to API authentication.' It's about talking to your knowledge base instead of querying a database schema. By using this MCP through Vinkius, you give your agent the power to look at exactly what context it needs from your vector store, handling everything from listing available collections to retrieving specific document IDs.

## Tools

### check_heartbeat
Tests network availability against explicit Chroma API nodes to confirm connectivity status.

### count_documents
Calculates and reports the total number of documents stored in a specified collection.

### get_collection
Retrieves detailed configuration and metadata for one specific vector knowledge block.

### get_documents
Pulls the actual text content and semantic context from known document arrays.

### list_collections
Generates a list of all defined vector collections available in your database tenant.

### peek_documents
Shows a limited preview of the metadata attached to your database limits for quick inspection.

### query_embeddings
Performs high-dimensional vector similarity searches based on semantic input queries.

## Prompt Examples

**Prompt:** 
```
List all vector collections
```

**Response:** 
```
I found 3 collections: 'knowledge-base', 'user-embeddings', and 'staging-docs'. Would you like to check the document count for any of them?
```

**Prompt:** 
```
Peek at the first 5 documents in 'knowledge-base'
```

**Response:** 
```
Peeking into 'knowledge-base'... Here are the first 5 documents. They contain technical documentation about our API endpoints and authentication flows. Each has metadata like 'source' and 'last_updated'.
```

**Prompt:** 
```
Is the Chroma server alive?
```

**Response:** 
```
Checking heartbeat... Connection successful! The Chroma instance responded in 12ms and is fully operational.
```

## Capabilities

### Check system health
Validates network availability and connectivity against the Chroma API nodes.

### List all knowledge collections
Retrieves a list of every defined vector collection within your database tenant.

### Count stored documents
Provides an exact total count of document volumes across specified collections.

### Examine document contents
Pulls specific, raw documents and their associated semantic context from known arrays.

### Preview limited records
Extracts a quick look at the metadata or content of your database limits without needing to pull everything.

### Perform semantic searches
Identifies precise logical bounds that match high-dimensional semantic clustering criteria.

## Use Cases

### Verifying staging environment readiness
A PM needs to know if their new documentation set is ready. They ask, 'What collections exist for the Q3 rollout?' The agent runs `list_collections`, and they immediately see if the expected staging database was populated.

### Debugging a failed search query
A developer suspects the wrong data is being returned. They use `peek_documents` to check the metadata of documents in the 'user-embeddings' collection, confirming that the source and date fields are correctly attached before running `query_embeddings`.

### Auditing data growth over time
A data engineer needs to prove compliance by tracking records. They run `count_documents` across all production tenants, getting a precise total volume that they can report directly from the chat.

### Checking connectivity before deployment
Before running any complex queries, an ops team member runs `check_heartbeat`. A successful response confirms the instance is fully operational and ready for high-volume traffic.

## Benefits

- Debugging retrieval logic is fast. Instead of writing a script to test search boundaries, you just run `query_embeddings` through your agent's chat interface.
- You always know what data exists. Use `list_collections` to see every single knowledge silo and `get_collection` for its specific settings—no guesswork required.
- Maintain operational confidence by checking system stability with `check_heartbeat`. You get immediate confirmation that the connection is live before running a complex query.
- Understand your data footprint. Run `count_documents` to track volumes across different tenants, ensuring you're not running expensive searches on empty collections.
- Inspect raw context easily. Need to see what documents are attached without pulling all the data? Use `peek_documents` for a quick metadata preview.

## How It Works

The bottom line is you get full visibility into your vector embeddings using only natural conversation.

1. First, subscribe to this MCP and provide your Chroma URL (Cloud or self-hosted) and the required API Key.
2. Next, tell your AI agent what you want to check—for instance, 'Show me all available collections' or 'Count documents in X.'
3. Your agent executes the necessary tool call and returns the structured data directly into the chat window.

## Frequently Asked Questions

**How do I see which vector collections are available using `list_collections`?**
Running `list_collections` returns a clear list of every defined knowledge silo in the database. This helps you identify exactly where your data lives before running any other query.

**What is the difference between `count_documents` and `peek_documents`?**
`count_documents` gives you a single number: the total volume of records. `peek_documents` shows you a small, readable sample of the metadata or content attached to those documents.

**Do I need to run `check_heartbeat` before querying embeddings?**
It's smart practice to check connectivity first. Running `check_heartbeat` confirms that your network connection is live and the Chroma instance is fully operational, preventing failed searches.

**What if I want to know more about a specific collection using `get_collection`?**
You simply ask for details on the name of the collection. The agent uses `get_collection` and returns its full configuration, helping you understand its scope and metadata.

**If I run `query_embeddings` with a vector that is too large or malformed, how does the system handle it?**
The system validates input dimensions first. If the vector doesn't match the expected embedding size for a collection, the query fails immediately. This prevents corrupted data from running through your semantic search pipeline.

**How do I ensure that my staging environment is isolated when using `get_collection`?**
You must explicitly manage tenant context before calling `get_collection`. Always confirm your API key and connection URL point to the correct database instance. Never assume the current context handles environment switching for you.

**What specific metadata do I receive back when I use the `get_documents` tool?**
You get the full document content, but critically, you also get associated metadata like the source ID, creation timestamp, and any custom fields attached to that record. This lets you trace information back to its origin.

**If my `check_heartbeat` call returns an error, what does that mean for running other commands?**
It means the fundamental connection is broken; no operation will succeed until connectivity is restored. You must address the network or credential issue before attempting to run any data retrieval tools.