# Coqui TTS MCP

> Coqui TTS (Open Source Speech Studio API) instantly converts text into high-quality speech audio. This MCP connects your AI client to self-hosted or cloud Coqui models, letting you list available voices and generate accurate voiceovers directly from an agent conversation. It’s perfect for developers who need reliable, open-source Text-to-Speech output without leaving their code editor.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** text-to-speech, tts, coqui-ai, speech-synthesis, voice-generation

## Description

Need to turn written text into speech? This MCP connects your AI client to a Coqui Speech Studio API endpoint. You can use this connection through Vinkius to get high-quality voice synthesis from models you manage yourself. It lets your agent discover all the voices available on your server and then synthesize audio based on natural conversation. Whether you're building an app or just making sample voiceovers, you send text, and it comes back as spoken audio. You don't have to write separate scripts; your agent handles the whole process. This is how developers build features that actually talk.

## Tools

### list_models
Finds and reports the full list of all text-to-speech models currently running on your Coqui server.

### synthesize_speech
Generates an actual audio file based on a text input using one of your available TTS models.

## Prompt Examples

**Prompt:** 
```
List all available TTS models on my Coqui server.
```

**Response:** 
```
I've retrieved the available models from your Coqui server. You have access to models like 'tts_models/en/ljspeech/vits' and 'tts_models/multilingual/multi-dataset/xtts_v2'. Which one would you like to use?
```

**Prompt:** 
```
Synthesize the text 'Welcome to the future of open source AI' into speech.
```

**Response:** 
```
I am generating the audio for that text now... Done! The speech has been synthesized successfully using your default Coqui model. You can find the audio metadata and file details here.
```

**Prompt:** 
```
What models can I use for speech synthesis?
```

**Response:** 
```
Let me check your Coqui server... It currently reports 3 active models: a standard English model, a multilingual XTTS model, and a fast VITS model. Would you like to see the full technical IDs for these?
```

## Capabilities

### Check available voices
You ask what models are ready, and it returns a list of all TTS voices currently loaded on your Coqui server.

### Generate audio from text
It takes any block of text you provide and immediately converts it into synthesized speech.

## Use Cases

### Creating a product tour walkthrough
A technical writer needs to demonstrate how a new feature works. Instead of recording three separate voice tracks, they use their agent to run `list_models` first, pick an English model, and then call `synthesize_speech` repeatedly for each step. The result is a cohesive audio guide.

### Testing localization models
A global product manager wants to see if their new Chinese language model works correctly. They use the agent, which calls `list_models`, confirms the correct locale ID is available, and then uses `synthesize_speech` to test a sample phrase.

### Building an automated notification system
A developer builds a CI/CD pipeline that needs to read error logs aloud for quick review. They connect the MCP, confirming model availability with `list_models`, and then pass the log text to `synthesize_speech`.

### Generating sample voiceovers quickly
A content creator has 50 lines of script for a podcast trailer. Using the agent, they batch-feed the text into `synthesize_speech` after confirming model health with `list_models`, generating all audio files in minutes.

## Benefits

- You get reliable, open-source voice generation. You don't rely on proprietary APIs with usage caps or unpredictable costs.
- Using the `list_models` tool lets you see exactly which voices are active on your server before you write a single line of code.
- The synthesis process is streamlined. Instead of writing boilerplate API calls, your agent handles the text-to-speech conversion for you.
- It’s built for developers who need voice output in their application logic. You just tell the agent to synthesize speech using `synthesize_speech`.
- You keep control of your models. Since this connects to your self-hosted Coqui API, you manage the infrastructure and data.

## How It Works

The bottom line is you point your client at your Coqui server, and it handles the voice generation process for you.

1. Subscribe to this MCP and enter your specific Coqui API endpoint URL.
2. Ask your agent to find out what voices are available using the model listing tool.
3. Provide a text string and tell the agent to synthesize speech, getting back the audio metadata.

## Frequently Asked Questions

**How can I check which voice models are currently installed on my server?**
You can use the `list_models` tool. Your agent will query the Coqui server and return a list of all available TTS models ready for synthesis.

**Is it possible to generate audio files from a text string directly?**
Yes! Use the `synthesize_speech` tool by providing the text you want to convert. The agent will process it through Coqui and return the audio metadata.

**What do I need to provide to connect my local Coqui instance?**
You only need to provide the `COQUI_SERVER_URL`. This is the base address where your Coqui Speech Studio API is reachable (e.g., http://localhost:5002).

**When I use list_models, how do I determine if a model supports a specific language?**
The model name itself indicates compatibility. Look for standard prefixes like 'en' for English or 'multilingual' for broad dialect support. This helps you select the right voice profile upfront.

**After calling synthesize_speech, how do I retrieve detailed information about the generated audio file?**
The system returns comprehensive metadata immediately after synthesis. You get details on the file ID, model configuration used, and storage location for easy retrieval.

**What happens if my API connection fails during synthesize_speech?**
If the service encounters an issue, the agent returns a specific HTTP status code along with an error message. This allows you to quickly debug whether it's a connectivity or input problem.

**Are there any rate limits when I use synthesize_speech?**
Rate limiting depends entirely on your self-hosted Coqui setup. Your API provider manages the throttling, and the agent will pass those specific error codes back to you for handling.

**What file formats can I expect after running synthesize_speech?**
The API handles standard audio formats like WAV and MP3. For definitive proof of supported output types, consult the official Coqui documentation or use list_models to check capabilities.