Play.ht MCP. Convert Text to Professional Audio Files Fast

Q: How do I find out what voices are available using getvoices?

Call getvoices. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.

Q: What is the difference between converttts and getttsstatus?

converttts starts the audio generation job and returns a request ID. getttsstatus uses that exact ID to check if the conversion finished or if it's still pending.

Q: Can I use converttts with an unknown voice?

No. You must first run getvoices to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.

Q: Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files?

Yes. When using converttts, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.

Q: If a conversion fails, how do I debug the issue using getttsstatus?

While getttsstatus tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.

Q: Does converttts handle massive amounts of text, or is there a limit?

For short bursts of copy, converttts works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by getttsstatus.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Play.ht MCP Server turns plain text into professional audio files using a neural voice engine. It lets you discover available voices—like listing all language options—and then converts any block of text instantly.

You can also track long-running jobs, so your AI client knows exactly when the final MP3 or WAV file is ready to download.

What your AI agents can do

Convert tts

Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

Get tts status

Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.

Get voices

Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.

List available voices

Retrieves a structured list of every voice Play.ht offers, including metadata like language and unique IDs.

Convert text to speech

Takes input text and converts it into an audio file format (MP3 or WAV) using a specified voice ID.

Check conversion status

Uses a unique request ID to check if the long-running TTS job is finished, pending, or failed.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio

Use these three tools to manage the entire text-to-speech process, from discovering available voices to checking job status and generating final audio.

convert019e5d47

convert tts

Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

get019e5d47

get tts status

Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.

get019e5d47

get voices

Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Play.ht (AI Voice Generation & TTS), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Look, this Play.ht MCP Server handles turning plain text into professional audio files using their neural voice engine. It's built to let your AI client do the heavy lifting—you just point it at the server.

To get started, you first need to check what voices are available. You'll use the get_voices tool; this calls up a structured list of every single voice Play.ht has in its library. It gives you metadata for each one, including their unique IDs and what languages they support. This is how you figure out which voice fits your project.

Once you've got the right Voice ID, you can actually generate the audio. You call convert_tts, feeding it three things: the text you want spoken, that specific Voice ID, and any parameters for quality or format. The tool then starts processing, turning that written script into either an MP3 or a WAV file.

Since these aren't instantaneous jobs, they run in the background. You don't just call convert_tts and assume you got the final file; it's a multi-step process. After submitting your text for conversion, the server gives you a unique request ID. This is key because it tells your AI client what to watch for next.

If you need to know if the audio job finished or if something went wrong, you use get_tts_status. You just pass that unique request ID into this tool, and it checks the status—it'll tell you if the job is still pending, if it's done, or if it failed. This lets your agent wait for confirmation before trying to pull down the final audio file.

It’s basically a three-step loop: first, check get_voices for options; second, run convert_tts with text and voice ID; and third, constantly monitor get_tts_status using the returned request ID until you can grab your finished MP3 or WAV file.

Using this setup means you don't have to manually manage API calls. Your AI client handles the whole sequence. It grabs the list of voices first, making sure it knows all the available IDs and languages before sending a single character of text for conversion. When that job is submitted via convert_tts, your agent gets back that unique tracker ID.

You can then feed that ID into get_tts_status repeatedly. This process keeps your workflow tight because you're never guessing if the file is ready; you just ask the server, and it tells you exactly where it stands.

If you need to build out video narrations, this handles everything from script text to final audio file download. Developers can integrate realistic speech synthesis directly into their own apps without having to deal with manual API scheduling or polling. Accessibility teams find it useful because they can quickly turn large documents or reports into clear, audible speech for people who rely on that format.

It's a complete pipeline: discover voices, submit text, track status, and get the file.

How Play.ht MCP Works

1 Your AI client first calls get_voices to select the correct voice ID and language for the project.
2 The agent then sends the text and the selected voice ID to convert_tts. A request ID is immediately returned, kicking off the audio generation process.
3 Finally, your client polls get_tts_status using that request ID until the status confirms completion. You can then access the final audio file.

The bottom line is: you discover voices first, send the job to convert text, and then check back on a specific ID until it's done.

Who Is Play.ht MCP For?

Content teams who manage high volumes of multimedia assets. Developers building apps that need voice output without relying on cloud-native TTS services. Anyone who spends time manually managing audio pipelines and needs reliable, structured text-to-speech generation.

Video Editor/Producer

Needs to generate consistent voice narration for dozens of videos a week without hiring freelance voice actors.

Software Engineer

Must integrate high-quality, reliable TTS into a production application or internal tool using structured API calls.

Technical Writer/Educator

Generates audio versions of technical documentation for accessibility features and online training modules.

What Changes When You Connect

Generate high-quality audio on demand: The convert_tts tool handles the heavy lifting, letting you turn scripts into MP3 or WAV files with a single call. You don't need multiple endpoints.
Know exactly what voices are available: Use get_voices to list every voice option. This prevents guesswork and ensures your agent always selects a valid ID before starting conversion.
Track long jobs without polling constantly: The get_tts_status tool lets you check job progress using a unique ID, making the entire process reliable and predictable for your client code.
Fine-tune audio output: You control the quality (from Draft to High) and format of every file. This gives you granular control over the final asset without writing complex post-processing steps.
Speed up content pipelines: By separating discovery (get_voices) from execution, your agent runs a clean, reliable three-step workflow that minimizes chances of failure.

Real-World Use Cases

Creating an e-learning module narration

The L&D team writes new training material. Instead of manually recording it or using a basic TTS tool, the agent runs get_voices to find the ideal educational voice. It then calls convert_tts with the full script and tracks status via get_tts_status, delivering finished audio assets in minutes.

Updating product documentation for accessibility

The technical writer needs to make a guide audible immediately. The agent calls get_voices first, then uses convert_tts with the text and a clear voice profile. The resulting audio file is uploaded directly to the help center.

Building an automated podcast filler segment

The content manager needs filler audio for episodes that run long. They feed the script into convert_tts. Because it returns a status ID, the agent can wait and confirm completion before packaging the final episode mix.

Validating voice choices across multiple regions

The global marketing team needs to ensure they use an American English voice. They run get_voices first to filter for 'en-US' voices, preventing accidental use of incorrect language settings.

The Tradeoffs

Trying to convert text with a random ID

The user just guesses a voice ID and sends the text straight to convert_tts. The conversion fails instantly because the ID doesn't exist or is invalid, forcing them to restart.

→ Always run get_voices first. Use that list to pull a known-good, valid Voice ID. Then pass that specific ID into convert_tts. This guarantees the input parameters are correct.

Ignoring job status

The agent calls convert_tts and assumes the audio file is ready right away, leading to a timeout error or a failed download because the conversion was still running.

→ After calling convert_tts, you must capture the request ID. Then, use get_tts_status repeatedly until the status reports 'complete'. Don't try to access the file before that.

Using a generic API wrapper

Relying on an older or incomplete library that doesn't distinguish between listing voices and converting text, causing confusion in the workflow.

→ Use the three dedicated tools: get_voices for discovery, convert_tts for execution, and get_tts_status for state management. This clear separation makes your code predictable.

When It Fits, When It Doesn't

You should use this server if your primary goal is turning written text into high-quality, customizable audio assets through a structured workflow. The three tools—get_voices, convert_tts, and get_tts_status—are built for reliability: you list voices to validate inputs; you convert text with the specific parameters; and you check status because TTS is an asynchronous process.

Don't use this if you need to modify audio content (e.g., adding music or effects) or if you only need basic, low-fidelity speech generation. For those cases, look for dedicated audio editing APIs or simpler cloud-based synthesis services that handle the entire job in one call without status tracking.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Play.ht. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 3 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

convert_tts get_tts_status get_voices

Getting professional voiceover shouldn't require multiple manual signups and API keys.

Right now, if you need a video narrated, you often have to jump through hoops: first, checking which voices are even available on the platform; second, finding the right endpoint for text submission; and third, figuring out how long you have to wait before the file is actually ready.

With this Play.ht MCP Server, that whole sequence gets contained. Your agent handles the three steps automatically: `get_voices` validates your choice; `convert_tts` starts the job; and `get_tts_status` monitors it until you get the final asset. It's clean.

The Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio Pipelines

Before this server, running an audio job meant managing multiple state flags and dealing with inconsistent API calls—sometimes the list of voices was separate from the conversion service, creating integration gaps.

Now, you treat voice generation as a single, reliable pipeline. You call `get_voices` for inputs, use `convert_tts` for outputs, and rely on `get_tts_status` to handle all the complexity in between. It just works.

Common Questions About Play.ht MCP

How do I find out what voices are available using get_voices? +

Call get_voices. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.

What is the difference between convert_tts and get_tts_status? +

convert_tts starts the audio generation job and returns a request ID. get_tts_status uses that exact ID to check if the conversion finished or if it's still pending.

Can I use convert_tts with an unknown voice? +

No. You must first run get_voices to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.

Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files? +

Yes. When using convert_tts, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.

How do I authenticate my connection before using `convert_tts`? +

You must supply your Play.ht API Key and User ID when setting up this server. Your agent uses these credentials to authorize every call, ensuring you have permission to generate audio assets.

If a conversion fails, how do I debug the issue using `get_tts_status`? +

While get_tts_status tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.

What parameters can I pass to `convert_tts` for fine-tuning the audio output? +

You control quality levels (Draft through High) and speaking speed directly within the function call. This lets you precisely adjust the audio profile—like making it sound more formal or conversational—for your text.

Does `convert_tts` handle massive amounts of text, or is there a limit? +

For short bursts of copy, convert_tts works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by get_tts_status.

How can I find the right voice ID for my language? +

Use the get_voices tool. It returns a complete list of available voices, allowing you to filter by name, language, and gender to find the perfect match for your project.

Can I control the speed and format of the generated audio? +

Yes! When using convert_tts, you can specify the speed (from 0.5 to 2.0), the output_format (like mp3 or wav), and the quality level to suit your needs.

What should I do if a conversion takes a long time? +

For longer texts, use the get_tts_status tool with your transcription_id. This allows you to check if the audio is still processing or ready for download.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript