Play.ht MCP for AI. Convert Text to Professional Audio Files Fast

Q: How do I find out what voices are available using getvoices?

Call getvoices. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.

Q: What is the difference between converttts and getttsstatus?

converttts starts the audio generation job and returns a request ID. getttsstatus uses that exact ID to check if the conversion finished or if it's still pending.

Q: Can I use converttts with an unknown voice?

No. You must first run getvoices to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.

Q: Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files?

Yes. When using converttts, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.

Q: If a conversion fails, how do I debug the issue using getttsstatus?

While getttsstatus tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.

Q: Does converttts handle massive amounts of text, or is there a limit?

For short bursts of copy, converttts works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by getttsstatus.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

How this MCP server connects to your AI agent

Play.ht MCP Server turns plain text into professional audio files using a neural voice engine. It lets you discover available voices—like listing all language options—and then converts any block of text instantly.

You can also track long-running jobs, so your AI client knows exactly when the final MP3 or WAV file is ready to download.

What AI agents can do with Play.ht (AI Voice Generation & TTS) Automation

Convert tts

Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

Get tts status

Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.

Get voices

Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.

List available voices

Retrieves a structured list of every voice Play.ht offers, including metadata like language and unique IDs.

Convert text to speech

Takes input text and converts it into an audio file format (MP3 or WAV) using a specified voice ID.

Check conversion status

Uses a unique request ID to check if the long-running TTS job is finished, pending, or failed.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

What AI agents can do with Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio

Use these three tools to manage the entire text-to-speech process, from discovering available voices to checking job status and generating final audio.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Play.ht (AI Voice Generation & TTS) on Vinkius

Convert Tts

Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

Get Tts Status

Checks the completion status of a running TTS job, requiring only the unique request...

Get Voices

Retrieves a structured list of all available Play.ht voices, including their unique...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Play.ht integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "playht-ai-voice-generation-tts": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Play.ht tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"playht-ai-voice-generation-tts": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Play.ht (AI Voice Generation & TTS), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Play.ht. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 3 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Getting professional voiceover shouldn't require multiple manual signups and API keys., Solved with Vinkius AI Gateway

Right now, if you need a video narrated, you often have to jump through hoops: first, checking which voices are even available on the platform; second, finding the right endpoint for text submission; and third, figuring out how long you have to wait before the file is actually ready.

With this Play.ht MCP Server, that whole sequence gets contained. Your agent handles the three steps automatically: `get_voices` validates your choice; `convert_tts` starts the job; and `get_tts_status` monitors it until you get the final asset. It's clean.

The Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio Pipelines

Before this server, running an audio job meant managing multiple state flags and dealing with inconsistent API calls—sometimes the list of voices was separate from the conversion service, creating integration gaps.

Now, you treat voice generation as a single, reliable pipeline. You call `get_voices` for inputs, use `convert_tts` for outputs, and rely on `get_tts_status` to handle all the complexity in between. It just works.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

text-to-speech

ai-voices

speech-synthesis

voice-generation

neural-tts

What your AI can actually do with this

Look, this Play.ht MCP Server handles turning plain text into professional audio files using their neural voice engine. It's built to let your AI client do the heavy lifting—you just point it at the server.

To get started, you first need to check what voices are available. You'll use the get_voices tool; this calls up a structured list of every single voice Play.ht has in its library. It gives you metadata for each one, including their unique IDs and what languages they support. This is how you figure out which voice fits your project.

Once you've got the right Voice ID, you can actually generate the audio. You call convert_tts, feeding it three things: the text you want spoken, that specific Voice ID, and any parameters for quality or format. The tool then starts processing, turning that written script into either an MP3 or a WAV file.

Since these aren't instantaneous jobs, they run in the background. You don't just call convert_tts and assume you got the final file; it's a multi-step process. After submitting your text for conversion, the server gives you a unique request ID. This is key because it tells your AI client what to watch for next.

If you need to know if the audio job finished or if something went wrong, you use get_tts_status. You just pass that unique request ID into this tool, and it checks the status—it'll tell you if the job is still pending, if it's done, or if it failed. This lets your agent wait for confirmation before trying to pull down the final audio file.

It’s basically a three-step loop: first, check get_voices for options; second, run convert_tts with text and voice ID; and third, constantly monitor get_tts_status using the returned request ID until you can grab your finished MP3 or WAV file.

Using this setup means you don't have to manually manage API calls. Your AI client handles the whole sequence. It grabs the list of voices first, making sure it knows all the available IDs and languages before sending a single character of text for conversion. When that job is submitted via convert_tts, your agent gets back that unique tracker ID.

You can then feed that ID into get_tts_status repeatedly. This process keeps your workflow tight because you're never guessing if the file is ready; you just ask the server, and it tells you exactly where it stands.

If you need to build out video narrations, this handles everything from script text to final audio file download. Developers can integrate realistic speech synthesis directly into their own apps without having to deal with manual API scheduling or polling. Accessibility teams find it useful because they can quickly turn large documents or reports into clear, audible speech for people who rely on that format.

It's a complete pipeline: discover voices, submit text, track status, and get the file.

Built · Hosted · Managed by Vinkius Play.ht Voice Generation - TTS MCP Server

Server ID 019e5d47-afa7-7051-87b1-663bcfc37cc3

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Here's how it actually works

The bottom line is: you discover voices first, send the job to convert text, and then check back on a specific ID until it's done.

Your AI client first calls get_voices to select the correct voice ID and language for the project.

The agent then sends the text and the selected voice ID to convert_tts. A request ID is immediately returned, kicking off the audio generation process.

Finally, your client polls get_tts_status using that request ID until the status confirms completion. You can then access the final audio file.

What Changes When You Connect

Generate high-quality audio on demand: The convert_tts tool handles the heavy lifting, letting you turn scripts into MP3 or WAV files with a single call. You don't need multiple endpoints.

Know exactly what voices are available: Use get_voices to list every voice option. This prevents guesswork and ensures your agent always selects a valid ID before starting conversion.

Track long jobs without polling constantly: The get_tts_status tool lets you check job progress using a unique ID, making the entire process reliable and predictable for your client code.

Fine-tune audio output: You control the quality (from Draft to High) and format of every file. This gives you granular control over the final asset without writing complex post-processing steps.

Speed up content pipelines: By separating discovery (get_voices) from execution, your agent runs a clean, reliable three-step workflow that minimizes chances of failure.

See it in action

01 01

Creating an e-learning module narration

The L&D team writes new training material. Instead of manually recording it or using a basic TTS tool, the agent runs get_voices to find the ideal educational voice. It then calls convert_tts with the full script and tracks status via get_tts_status, delivering finished audio assets in minutes.

02 02

Updating product documentation for accessibility

The technical writer needs to make a guide audible immediately. The agent calls get_voices first, then uses convert_tts with the text and a clear voice profile. The resulting audio file is uploaded directly to the help center.

03 03

Building an automated podcast filler segment

The content manager needs filler audio for episodes that run long. They feed the script into convert_tts. Because it returns a status ID, the agent can wait and confirm completion before packaging the final episode mix.

04 04

Validating voice choices across multiple regions

The global marketing team needs to ensure they use an American English voice. They run get_voices first to filter for 'en-US' voices, preventing accidental use of incorrect language settings.

The honest tradeoffs

Trying to convert text with a random ID

Anti-pattern

The user just guesses a voice ID and sends the text straight to convert_tts. The conversion fails instantly because the ID doesn't exist or is invalid, forcing them to restart.

The Fix

Always run get_voices first. Use that list to pull a known-good, valid Voice ID. Then pass that specific ID into convert_tts. This guarantees the input parameters are correct.

Ignoring job status

Anti-pattern

The agent calls convert_tts and assumes the audio file is ready right away, leading to a timeout error or a failed download because the conversion was still running.

The Fix

After calling convert_tts, you must capture the request ID. Then, use get_tts_status repeatedly until the status reports 'complete'. Don't try to access the file before that.

Using a generic API wrapper

Anti-pattern

Relying on an older or incomplete library that doesn't distinguish between listing voices and converting text, causing confusion in the workflow.

The Fix

Use the three dedicated tools: get_voices for discovery, convert_tts for execution, and get_tts_status for state management. This clear separation makes your code predictable.

When It Fits, When It Doesn't

You should use this server if your primary goal is turning written text into high-quality, customizable audio assets through a structured workflow. The three tools—get_voices, convert_tts, and get_tts_status—are built for reliability: you list voices to validate inputs; you convert text with the specific parameters; and you check status because TTS is an asynchronous process.

Don't use this if you need to modify audio content (e.g., adding music or effects) or if you only need basic, low-fidelity speech generation. For those cases, look for dedicated audio editing APIs or simpler cloud-based synthesis services that handle the entire job in one call without status tracking.

Questions you might have

How do I find out what voices are available using get_voices? +

Call get_voices. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.

What is the difference between convert_tts and get_tts_status? +

convert_tts starts the audio generation job and returns a request ID. get_tts_status uses that exact ID to check if the conversion finished or if it's still pending.

Can I use convert_tts with an unknown voice? +

No. You must first run get_voices to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.

Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files? +

Yes. When using convert_tts, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.

How do I authenticate my connection before using `convert_tts`? +

You must supply your Play.ht API Key and User ID when setting up this server. Your agent uses these credentials to authorize every call, ensuring you have permission to generate audio assets.

If a conversion fails, how do I debug the issue using `get_tts_status`? +

While get_tts_status tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.

What parameters can I pass to `convert_tts` for fine-tuning the audio output? +

You control quality levels (Draft through High) and speaking speed directly within the function call. This lets you precisely adjust the audio profile—like making it sound more formal or conversational—for your text.

Does `convert_tts` handle massive amounts of text, or is there a limit? +

For short bursts of copy, convert_tts works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by get_tts_status.

How can I find the right voice ID for my language? +

Use the get_voices tool. It returns a complete list of available voices, allowing you to filter by name, language, and gender to find the perfect match for your project.

Can I control the speed and format of the generated audio? +

Yes! When using convert_tts, you can specify the speed (from 0.5 to 2.0), the output_format (like mp3 or wav), and the quality level to suit your needs.

What should I do if a conversion takes a long time? +

For longer texts, use the get_tts_status tool with your transcription_id. This allows you to check if the audio is still processing or ready for download.

How this MCP server connects to your AI agent

What AI agents can do with Play.ht (AI Voice Generation & TTS) Automation

Convert tts

Get tts status

Get voices

What AI agents can do with Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio

Convert Tts

Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

Get Tts Status

Checks the completion status of a running TTS job, requiring only the unique request...

Get Voices

Retrieves a structured list of all available Play.ht voices, including their unique...

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

Getting professional voiceover shouldn't require multiple manual signups and API keys., Solved with Vinkius AI Gateway

The Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio Pipelines

text-to-speech

ai-voices

speech-synthesis

voice-generation

neural-tts

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

Creating an e-learning module narration

Updating product documentation for accessibility

Building an automated podcast filler segment

Validating voice choices across multiple regions

The honest tradeoffs

Trying to convert text with a random ID

Ignoring job status

Using a generic API wrapper

When It Fits, When It Doesn't

Questions you might have