# Together AI MCP

> Together AI connects your local agent to dozens of open-source models and ML services. You can instantly generate chat completions, create vector embeddings for RAG pipelines, or fine-tune custom LLMs—all through one API endpoint. It lets you query Llama, Mixtral, and more from a single place without leaving your IDE.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** llm, model-inference, fine-tuning, open-source-ai, machine-learning, api-deployment

## Description

Look, you've got an agent running locally, and it needs muscle that doesn't cost a fortune or tie you down to some closed system. This MCP server connects your setup directly to dozens of open-source models and ML services from the Together AI network. It gives you high-speed inference for big language models like Llama 3 and Mixtral. You can run everything—from simple prompts to full custom model training runs—all through one API endpoint, right inside your IDE.

When you need to figure out what's available, start with the `list_available_models` tool. It checks the entire Together AI network and spits back a comprehensive list of every open-source LLM and diffusion model they support. This lets you know exactly which engine—whether it's for natural language processing or image generation—you need to tackle your current task.

For basic text tasks, you've got two ways to go. If you just need a quick answer based on a single prompt, use `text_completion`. You just send over the specific model ID and the prompt, and it spits out the requested text. But if you’re building a chat interface or running a complex dialogue that requires remembering context, you'll want to run a multi-turn conversation using `chat_completion`. This tool handles the entire message history—you pass in the model ID along with an array of previous messages—so your agent doesn't forget what was said two turns ago.

If your goal is building a Retrieval Augmented Generation (RAG) pipeline, you gotta deal with embeddings. Use `generate_embeddings` to convert any list of raw input strings into numerical vector embeddings. You just specify the embedding model ID, and it handles turning that plain text into vectors ready for database indexing. This is how you make your documents searchable.

Need some visual flair? If you're working on anything graphical, `generate_image` uses external diffusion models to create image files. All you gotta do is send over a detailed descriptive text prompt—the more specific you are about what you want the picture to look like, the better it turns out.

For custom AI development, you have two tools managing the entire lifecycle of fine-tuning. First, when your open-source model isn't quite hitting the mark for your niche use case, you kick off a new training run using `create_finetune_job`. This tool takes two key inputs: the base model ID and the specific dataset you want it to train on. That starts the whole process.

Once that job is running in the background—and it will take time—you need to know if it's stuck or done. Use `list_finetune_jobs` to retrieve a list of all your submitted fine-tuning jobs. This lets you check the current status of every single job, giving you visibility into whether they're queued, running, or finished. It covers everything from checking existing runs to listing them for an audit.

## Tools

### chat_completion
Runs a multi-turn conversation using an open-source model, accepting a model ID and message history array.

### create_finetune_job
Starts the training process for a custom LLM by specifying a base model and the dataset to train on.

### generate_embeddings
Converts a list of input strings into numerical vector embeddings using a specified embedding model ID.

### generate_image
Creates an image file by sending a detailed descriptive text prompt to the external diffusion model.

### list_available_models
Returns a list of all LLMs and open-source models currently supported on the Together AI platform.

### list_finetune_jobs
Retrieves a list of all fine-tuning jobs, allowing you to check their current status.

### text_completion
Executes a single text generation request using an open-source model based on a provided prompt and model ID.

## Prompt Examples

**Prompt:** 
```
List all the models currently available on Together AI.
```

**Response:** 
```
I've fetched 132 available models. Here are the top chat models:
- meta-llama/Llama-2-70b-chat-hf
- mistralai/Mixtral-8x7B-Instruct-v0.1
- google/gemma-7b-it
Ask if you want the embedding or image models only.
```

**Prompt:** 
```
Generate an embedding array using model `togethercomputer/m2-bert-80M-8k-retrieval` for the sentence 'The cat sat on the mat'.
```

**Response:** 
```
Embeddings generated successfully. Dimensions: 768. Sample values:
[-0.0124, 0.0411, 0.0812, ... -0.0123]
```

## Capabilities

### List available models
Checks the Together AI network to find all currently supported open-source LLMs and diffusion models.

### Run chat completions
Executes multi-turn conversational cycles using advanced, specified open-source models (e.g., Llama 3).

### Generate text embeddings
Converts input texts into numerical vectors that capture semantic meaning for database indexing.

### Create images from prompts
Uses external diffusion models to generate visual media based on a detailed text description.

### Start fine-tuning jobs
Initiates a custom training run by pointing the system to a base model and your specific dataset file.

### Check job statuses
Retrieves the current status of any existing or previously submitted model fine-tuning jobs.

## Use Cases

### Building a Custom FAQ Bot (RAG)
The ML Engineer has 10,000 pages of PDFs. They feed these into an indexing service to get embeddings using `generate_embeddings`. When a user asks a question later, the agent uses those vectors to retrieve context and then passes that context plus the query into `chat_completion` for a precise answer.

### Creating Marketing Assets from Chat Output
The developer asks their agent to write three product descriptions for a new gadget using `text_completion`. They copy one of those descriptions, and immediately use it as the detailed prompt in the `generate_image` tool to create accompanying marketing art.

### Validating Model Choices Before Commit
The Software Engineer is debating between Llama 3 and Mixtral. Instead of writing two separate scripts, they use `list_available_models` first. Then, they run the same prompt through both models using their respective model IDs in a single chat session to compare performance.

### Archiving Custom Data Models
The Research Scientist has identified a niche domain for an LLM. They use `create_finetune_job` with their specialized dataset and monitor the job progress using `list_finetune_jobs`, all without ever leaving their main agent interface.

## Benefits

- Model Diversity: You don't get locked into one vendor. Use `list_available_models` to see dozens of open-source alternatives (Llama, Mixtral) and test them all within the same chat session.
- Vector Prep on Demand: Need embeddings for a knowledge base? Call `generate_embeddings` with raw text logs; you get vectors ready to load into your analytical database immediately.
- Zero Context Switching for Tuning: Instead of jumping between CLI tools, use `create_finetune_job` and `list_finetune_jobs` right inside your chat environment. It keeps the whole workflow together.
- Full Media Pipeline: Need a visual element? Use `generate_image`. You can generate code from an LLM (`chat_completion`) and then use that output to describe what image you need next.
- Flexible Inference: Whether you're doing simple, single-prompt text generation with `text_completion` or complex multi-turn dialogue with `chat_completion`, the server handles it all.

## How It Works

The bottom line is: it lets your local code talk to dozens of powerful open-source LLMs without you needing separate keys or endpoints for each one.

1. Sign up for the Together AI integration and grab a developer API Key from their control panel.
2. Plug that API key into your agent's configuration, specifying which models you need to access.
3. Your AI client uses the server tools (like `chat_completion` or `generate_embeddings`) to run inference or start jobs directly.

## Frequently Asked Questions

**How do I check which open-source LLMs are available?**
You run the `list_available_models` tool. This gives you a list of every model ID and its capabilities right now, letting you pick the best engine for your job.

**Is `chat_completion` better than `text_completion`?**
`chat_completion` is almost always what you want. It's built to handle message history (the whole conversation), while `text_completion` is only for single, stateless prompts.

**What models can I use for image generation?**
The server uses external diffusion models for this. You just need a detailed text description in the prompt provided to the `generate_image` tool; you don't specify the model ID.

**How do I start training my own LLM?**
Use the `create_finetune_job` tool. You must provide a base model ID and point to your specific dataset file for it to begin.

**If I have a massive dataset, how do I efficiently run `generate_embeddings`?**
You process them in batches. While the tool handles large arrays of strings, we recommend grouping texts into manageable chunks (e.g., 100-500 items) to prevent timeouts and optimize throughput. This method helps you monitor progress and ensures reliable data transfer for your vector database.

**How do I check the status of a fine-tuning job after running `create_finetune_job`?**
You use the `list_finetune_jobs` tool to query all jobs. This returns a list that includes both active and completed runs, showing you the current state (e.g., 'PENDING', 'RUNNING', or 'FAILED') for easy monitoring.

**Can `chat_completion` force the output into JSON format?**
Yes, you can guide the model to output structured data. When providing the prompt and message history, include specific instructions requesting a JSON schema. This ensures your AI client receives predictable, machine-readable results for reliable parsing.

**What parameters should I control when using `generate_image`?**
Beyond the descriptive prompt, you can often specify dimensions or aspect ratios in the tool call. Checking the model's documentation will show supported size constraints (e.g., 1:1 square, 16:9 landscape) to get exactly the format your application requires.

**Where do I obtain my Together AI API Key?**
Log in to the developer portal via `api.together.xyz/settings/api-keys`. If you do not have an existing key, click **Create API Key**. This token enables the execution of remote inferences spanning their hosted clusters securely.

**Do I have to pay to use Together models through the agent?**
Yes. This connector simply routes your instructions to Together AI. Any tokens consumed during chat completion, embeddings, images generation, or fine-tuning workloads are billed directly to your registered Together AI account balance according to their official compute pricing models.

**Can I access free models on Together AI?**
Yes! Together AI frequently offers free tiers for certain open-source models intended for experimentation and research. You can query these directly from your agent without depleting your account balance, though specific free-tier rate limits will apply.