# Cohere MCP MCP

> Cohere provides an API gateway for enterprise-grade AI models, letting your agent handle everything from advanced chat conversations and document reranking to generating vector embeddings and precise text tokenization. It's a single connection point for complex NLP pipelines.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** llm, embeddings, reranking, natural-language-processing, tokenization, chat-api

## Description

This MCP connects your workflow directly to Cohere’s powerful suite of natural language processing tools. You can use it to manage entire information retrieval cycles—from taking raw user input, running that through the model discovery tool to check available models, generating semantic embeddings, and then reranking documents against a specific query. Need to estimate token limits before sending a massive prompt? The tokenization tool handles that quickly.

It's built for pipelines: if you’re building an application where data moves from one state to another—for instance, taking raw text, embedding it, and then passing those vectors into a database for retrieval—this MCP lets your agent orchestrate all of that without switching APIs. When you combine this with other specialized services in the Vinkius catalog, you can chain multiple operations together through one AI agent, building automations that span different platforms.

This setup means you stop writing dedicated HTTP calls just to interact with Cohere. Your AI client acts as a single orchestration layer for all your NLP needs.

## Tools

### chat
Sends a message to a Cohere model, returning text responses along with necessary citations and tool call suggestions.

### detokenize
Reconstructs readable text from an array of token IDs, which helps verify the integrity of tokenization processes.

### embed
Creates vector embeddings for given texts using a specified model and input type, useful for semantic comparisons.

### list_models
Retrieves names, context lengths, and capabilities of all models Cohere offers, allowing you to choose the right tool for the job.

### rerank
Scores a set of documents against a query text and returns them in order of relevance, with confidence scores.

### tokenize
Converts raw text into token IDs or vice versa, which is critical for accurately measuring token usage before sending prompts.

## Prompt Examples

**Prompt:** 
```
Send a message to Command R+ asking 'What is the capital of Brazil?'
```

**Response:** 
```
Command R+ responded: 'The capital of Brazil is Brasília. It was purpose-built to replace Rio de Janeiro as the capital in 1960, and is located in the country's central-west region.'
```

**Prompt:** 
```
Rerank these documents for the query 'machine learning models': ['Neural networks are inspired by biological neurons.', 'Python is a popular programming language.', 'Transformers use attention mechanisms for sequence processing.']
```

**Response:** 
```
Reranked results: 1. 'Transformers use attention mechanisms...' (score: 0.95), 2. 'Neural networks are inspired...' (score: 0.72), 3. 'Python is a popular...' (score: 0.12). The transformer and neural network documents are most relevant to ML models.
```

**Prompt:** 
```
Generate embeddings for these texts: ['The weather is nice today.', 'I love programming in Python.'] using embed-v4.
```

**Response:** 
```
Generated embeddings for 2 texts using embed-v4 with input_type 'search_document'. Each embedding is a 1024-dimensional vector. You can use these for semantic search, similarity comparison or vector database storage.
```

## Capabilities

### Conduct conversational chat
Send complex messages to advanced models, receiving responses that include source citations and function call support.

### Generate vector embeddings
Create numerical representations of text for semantic search or similarity comparisons using various input types.

### Improve search relevance
Take a query and a set of documents, then reorder them by calculated relevance score to improve retrieval accuracy.

### Analyze model options
List all available Cohere models, showing their names, context length limits, and capabilities for planning.

### Estimate token counts
Break down text into tokens or reconstruct text from token IDs to accurately predict API costs and manage input size.

## Use Cases

### Building a Q&A bot with source validation
A user asks, 'What was the company's revenue last year?' The agent uses the chat tool to get an answer and simultaneously pulls citations showing exactly which internal document chunk provided that specific figure.

### Improving a complex knowledge base search
Instead of just searching keywords, the system first runs embed on the query. It then retrieves 20 candidate documents and uses rerank to cut that list down to the top 5 most relevant pieces for the user.

### Debugging a large prompt payload
A developer needs to send a long document chunk but isn't sure if it will exceed the token limit. They use tokenize on the text first, guaranteeing they stay under budget before calling chat.

### Creating multi-step content analysis
The agent receives an article, uses embed to create vectors for the article and then passes those vectors through a second MCP's retrieval tool for comparison against other stored data.

## Benefits

- Get accurate source citations directly from the chat tool. When your agent answers a question, it doesn't just guess; it tells you where its information came from.
- Build semantic search indexes efficiently by using the embed tool to turn documents and queries into comparable vector representations.
- Stop relying on simple keyword matching for search results. The rerank tool reorders retrieved documents based on deep relevance scores, making your search feel smarter.
- Accurately predict token usage before running a prompt. Use the tokenize or detokenize tools to test input sizes and prevent costly API overruns.
- Streamline model selection by listing all available Cohere models using list_models; you instantly know which models support embeddings versus chat.

## How It Works

The bottom line is you get to manage multiple advanced AI models and data tasks through one standard, predictable API interface.

1. Subscribe to this MCP in Vinkius and provide your required Cohere API Key.
2. Connect your agent (e.g., Claude, Cursor) once from that single client connection.
3. Your agent can then execute the various NLP operations—like generating embeddings or reranking documents—as part of a larger workflow.

## Frequently Asked Questions

**How do I get a Cohere API Key?**
Log in to the [**Cohere Dashboard**](https://dashboard.cohere.com/api-keys), go to **API Keys** and click **Create API Key**. Copy the key immediately — it starts with a random string and won't be shown again. Free tier includes trial access with rate limits.

**What models are available?**
Use the `list_models` tool to see all available Cohere models. Key models include command-r-plus (most capable, 128K context), command-r (efficient, 128K context), command-r7b (lightweight, 128K context), embed-v4 (embeddings) and rerank-v3.5 (reranking).

**Can I send multi-turn conversations?**
Yes! Pass a messages array with alternating 'user', 'assistant' and 'system' roles. Each message has a 'role' and 'content' field. Command models support function calling and will return tool_calls when appropriate.

**What is reranking and when should I use it?**
Reranking reorders a set of documents by their relevance to a query. Use it after an initial search to improve result quality. The rerank tool takes a query, list of documents and returns them ranked by relevance score. Cohere's rerank models are industry-leading for search applications.

**When using the `embed` tool, how do I choose the right input type for my vectors?**
You must specify the purpose when calling `embed`. Use 'search_document' to index general text for similarity search. Alternatively, use 'classification' if your goal is grouping or labeling documents based on predefined categories.

**How do I estimate my token count before running a long chat with the `chat` tool?**
Run the `tokenize` tool first. It returns the precise list of token IDs and strings, letting you accurately predict how many tokens your prompt will use for cost estimation or length checks.

**When using the `rerank` tool, how do I ensure I only get the top results?**
You set the optional `top_n` parameter when running `rerank`. This limits the output to return exactly N documents, which saves tokens and keeps your search result display clean.

**Does the `chat` tool support structured responses or function calling?**
Yes, the `chat` tool handles explicit tool call functionality. It returns not only conversational text but also detailed data about any potential functions it determines are necessary to execute.