Coqui TTS MCP. Turn text into high-quality speech from your agent.

Q: What models can I list using the listmodels tool?

The listmodels tool shows all TTS models loaded on your connected Coqui server. You get a list of model names and technical IDs you can then use for synthesis.

Q: Is the synthesized audio from the synthesizespeech tool permanent?

The synthesizespeech tool returns metadata about the generated audio file, giving you file details and model configurations. You get the data needed to access the file, but the tool itself handles the generation.

Q: Does the Coqui TTS (Open Source Speech Studio API) MCP Server support multiple languages?

Yes. The agent can use listmodels to check for multilingual models (like XTTS) and then use synthesizespeech to generate speech in different languages.

Q: How does the synthesizespeech tool handle different voice parameters?

The synthesizespeech tool accepts parameters like model ID and target text. You specify the model you want to use, and the tool converts the provided text into high-quality audio.

Q: Can the listmodels tool tell me about the current usage limits of my Coqui server?

No, the listmodels tool only lists available TTS models. For usage limits or rate caps, check your Coqui server's dedicated dashboard or documentation.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Coqui TTS (Open Source Speech Studio API) MCP Server lets your AI agent generate high-quality speech from text. It connects to your self-hosted Coqui server to list available models and synthesize audio files instantly.

Use this to add robust, open-source voice generation directly into your agent's workflow.

What your AI agents can do

List models

Lists every TTS model currently loaded and available on your Coqui server.

Synthesize speech

Converts a given text string into spoken audio and returns the file's metadata.

List Available Models

The agent checks your Coqui server and returns a list of all TTS models currently loaded and ready for use.

Generate Audio from Text

The agent takes a text string and uses the synthesis engine to create a spoken audio file, returning the file's metadata.

Get Audio File Details

The agent retrieves specific information about an audio file or model configuration after synthesis.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Coqui TTS (Open Source Speech Studio API) MCP Server: 2 Tools for Voice Generation

Use these two tools to list all available TTS models and convert text into spoken audio directly from your AI agent.

list019e5d0b

list models

Lists every TTS model currently loaded and available on your Coqui server.

synthesize019e5d0b

synthesize speech

Converts a given text string into spoken audio and returns the file's metadata.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Coqui TTS (Open Source Speech Studio API), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Coqui TTS (Open Source Speech Studio API) MCP Server lets your AI agent turn text into high-quality speech. You hook it up to your own Coqui server, and your agent can list models and spit out audio files on demand. It's built to add solid, open-source voice generation right into your agent's workflow.

Your agent uses the list_models tool to check your Coqui server and get a list of every TTS model loaded and ready to go. The synthesize_speech tool takes a text string and runs the synthesis engine to create a spoken audio file, giving you the file's metadata when it's done.

How Coqui TTS MCP Works

1 Subscribe to the server and provide your specific Coqui Server URL.
2 The AI client sends a request (e.g., 'Use model X to read Y').
3 The agent executes the necessary tool calls (list_models or synthesize_speech), and the results are returned to the client.

The bottom line is, your agent sends a prompt, and the server handles the call to your Coqui API, keeping the process entirely within the chat context.

Who Is Coqui TTS MCP For?

This is for developers building AI-powered apps, content creators needing quick voiceovers, and researchers testing TTS models. You're someone who needs to embed high-quality, open-source voice generation into a workflow without writing dedicated backend code.

Developer

Integrates voice synthesis into applications, calling synthesize_speech directly from the code editor or agent workflow.

Content Creator

Generates voiceovers and speech samples for scripts or podcasts by running the agent and using the synthesize_speech tool.

AI Researcher

Tests and compares different TTS models by using list_models to discover available configurations and running synthesis tests.

What Changes When You Connect

Generate Voiceovers Instantly: Use synthesize_speech to convert any text into audio. You don't need to manage API keys or endpoints; the agent handles the whole call.
Discover Available Voices: Run list_models to see exactly what models are running on your Coqui server. This ensures your agent uses the right voice for the job.
Open-Source Power: Since it uses Coqui TTS, you get access to open-source, high-quality speech models. It's flexible and customizable, unlike proprietary cloud APIs.
Seamless Integration: The MCP Server connects the voice generation to your agent's conversation flow. You simply ask the agent to speak something, and it runs the necessary tool.
Metadata Tracking: synthesize_speech returns metadata for the generated audio. You know exactly what file was made and which model was used, which is key for auditing and production work.

Real-World Use Cases

Drafting a Podcast Script Voiceover

A content creator writes a script segment. Instead of downloading the text and using a separate program, they ask their agent: 'Generate the following text as a voiceover.' The agent uses synthesize_speech, and the audio file metadata is returned, ready for immediate use.

Testing Model Compatibility

An AI researcher needs to know if their new test script works across three different TTS models. They first ask the agent to run list_models. Once they confirm the models are loaded, they then run synthesize_speech three times, confirming the API handles the model switching easily.

Creating Interactive Tutorials

A developer building a guide needs step-by-step audio instructions. They ask the agent to list available models first. Once they pick a specific voice, they feed the tutorial text to synthesize_speech to get the audio, all within the chat.

Building a Multilingual Bot

A company bot needs to speak multiple languages. The developer uses list_models to ensure the correct language-specific models are active. The bot then uses synthesize_speech with the appropriate model ID to handle multilingual output.

The Tradeoffs

Using a single, generic TTS API

Relying on a single, all-in-one cloud service means you're stuck with their model set and pay high rates for basic functionality. You can't test or swap out the underlying engine.

→ Use this Coqui TTS MCP Server. First, run list_models to see the full, open-source options. Then, use synthesize_speech with the specific model you want to test or deploy.

Ignoring model availability

Trying to generate audio using a model ID that isn't loaded on your server. This results in a generic 'Model Not Found' error and forces a manual fix.

→ Always check first. Run list_models to confirm every model ID is active. Then, pass those verified IDs to synthesize_speech to ensure the job runs.

Manual API calls for every change

Having to write custom code to manually list models and then call synthesis endpoints every time you want to test a new voice.

→ Let your agent handle the workflow. Ask the agent to 'List the models, and then synthesize this paragraph using the English voice.' The agent orchestrates both list_models and synthesize_speech for you.

When It Fits, When It Doesn't

Use this server if your priority is open-source control and model diversity. If you need to test multiple voices or want to avoid vendor lock-in, this is the right tool. You must know your Coqui server URL and have access to the underlying Coqui API. Don't use this if you just need the simplest, one-click integration with no configuration. If you only need to send simple voice messages without model selection, a basic messaging service might suffice. But if model choice matters, this is your best bet.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Coqui TTS. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 2 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

list_models synthesize_speech

Getting voice generation into your workflow shouldn't require a new backend microservice.

Before this, you had to build a separate service just to handle text-to-speech. You'd write code to connect to the API, manage model IDs, and handle the audio file streaming. Every time you changed the voice or needed to test a new model, you were writing and testing more code.

Now, you just let your agent call the Coqui TTS MCP Server. You ask it to synthesize text. The server handles the complex API calls, the model selection, and the metadata retrieval. You get audio without touching your core application logic.

Coqui TTS MCP Server: Generate high-quality speech from text.

You no longer have to manually check the API documentation for model names or worry about connection strings. The agent handles the connection details, and the `list_models` tool lets you verify everything in the chat. The synthesis process becomes a simple command to your agent.

The voice generation process is now conversational. It’s an action taken by your agent, not a function you call in a script. That changes everything.

Common Questions About Coqui TTS MCP

How do I connect the Coqui TTS (Open Source Speech Studio API) MCP Server? +

You subscribe to the server and provide your specific Coqui Server URL. The agent uses this URL to connect directly to your self-hosted or cloud-based API endpoint.

What models can I list using the list_models tool? +

The list_models tool shows all TTS models loaded on your connected Coqui server. You get a list of model names and technical IDs you can then use for synthesis.

Is the synthesized audio from the synthesize_speech tool permanent? +

The synthesize_speech tool returns metadata about the generated audio file, giving you file details and model configurations. You get the data needed to access the file, but the tool itself handles the generation.

Does the Coqui TTS (Open Source Speech Studio API) MCP Server support multiple languages? +

Yes. The agent can use list_models to check for multilingual models (like XTTS) and then use synthesize_speech to generate speech in different languages.

What if my Coqui server goes down? +

If the API is unreachable, the agent call will fail, providing an immediate error message. The server doesn't mask connectivity issues; it tells you right away.

How does the `synthesize_speech` tool handle different voice parameters? +

The synthesize_speech tool accepts parameters like model ID and target text. You specify the model you want to use, and the tool converts the provided text into high-quality audio.

What security steps are needed to connect the Coqui TTS (Open Source Speech Studio API) MCP Server? +

You must provide a secure API endpoint URL when subscribing. This connection routes all speech generation and metadata through your established, private server connection.

Can the `list_models` tool tell me about the current usage limits of my Coqui server? +

No, the list_models tool only lists available TTS models. For usage limits or rate caps, check your Coqui server's dedicated dashboard or documentation.

How can I check which voice models are currently installed on my server? +

You can use the list_models tool. Your agent will query the Coqui server and return a list of all available TTS models ready for synthesis.

Is it possible to generate audio files from a text string directly? +

Yes! Use the synthesize_speech tool by providing the text you want to convert. The agent will process it through Coqui and return the audio metadata.

What do I need to provide to connect my local Coqui instance? +

You only need to provide the COQUI_SERVER_URL. This is the base address where your Coqui Speech Studio API is reachable (e.g., http://localhost:5002).

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript