Play.ht MCP. Convert Text to Professional Audio Files Fast
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Play.ht MCP Server turns plain text into professional audio files using a neural voice engine. It lets you discover available voices—like listing all language options—and then converts any block of text instantly.
You can also track long-running jobs, so your AI client knows exactly when the final MP3 or WAV file is ready to download.
What your AI agents can do
Convert tts
Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.
Get tts status
Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.
Get voices
Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.
Retrieves a structured list of every voice Play.ht offers, including metadata like language and unique IDs.
Takes input text and converts it into an audio file format (MP3 or WAV) using a specified voice ID.
Uses a unique request ID to check if the long-running TTS job is finished, pending, or failed.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio
Use these three tools to manage the entire text-to-speech process, from discovering available voices to checking job status and generating final audio.
019e5d47convert tts
Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.
019e5d47get tts status
Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.
019e5d47get voices
Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Play.ht (AI Voice Generation & TTS), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Look, this Play.ht MCP Server handles turning plain text into professional audio files using their neural voice engine. It's built to let your AI client do the heavy lifting—you just point it at the server.
To get started, you first need to check what voices are available. You'll use the get_voices tool; this calls up a structured list of every single voice Play.ht has in its library. It gives you metadata for each one, including their unique IDs and what languages they support. This is how you figure out which voice fits your project.
Once you've got the right Voice ID, you can actually generate the audio. You call convert_tts, feeding it three things: the text you want spoken, that specific Voice ID, and any parameters for quality or format. The tool then starts processing, turning that written script into either an MP3 or a WAV file.
Since these aren't instantaneous jobs, they run in the background. You don't just call convert_tts and assume you got the final file; it's a multi-step process. After submitting your text for conversion, the server gives you a unique request ID. This is key because it tells your AI client what to watch for next.
If you need to know if the audio job finished or if something went wrong, you use get_tts_status. You just pass that unique request ID into this tool, and it checks the status—it'll tell you if the job is still pending, if it's done, or if it failed. This lets your agent wait for confirmation before trying to pull down the final audio file.
It’s basically a three-step loop: first, check get_voices for options; second, run convert_tts with text and voice ID; and third, constantly monitor get_tts_status using the returned request ID until you can grab your finished MP3 or WAV file.
Using this setup means you don't have to manually manage API calls. Your AI client handles the whole sequence. It grabs the list of voices first, making sure it knows all the available IDs and languages before sending a single character of text for conversion. When that job is submitted via convert_tts, your agent gets back that unique tracker ID.
You can then feed that ID into get_tts_status repeatedly. This process keeps your workflow tight because you're never guessing if the file is ready; you just ask the server, and it tells you exactly where it stands.
If you need to build out video narrations, this handles everything from script text to final audio file download. Developers can integrate realistic speech synthesis directly into their own apps without having to deal with manual API scheduling or polling. Accessibility teams find it useful because they can quickly turn large documents or reports into clear, audible speech for people who rely on that format.
It's a complete pipeline: discover voices, submit text, track status, and get the file.
How Play.ht MCP Works
- 1 Your AI client first calls
get_voicesto select the correct voice ID and language for the project. - 2 The agent then sends the text and the selected voice ID to
convert_tts. A request ID is immediately returned, kicking off the audio generation process. - 3 Finally, your client polls
get_tts_statususing that request ID until the status confirms completion. You can then access the final audio file.
The bottom line is: you discover voices first, send the job to convert text, and then check back on a specific ID until it's done.
Who Is Play.ht MCP For?
Content teams who manage high volumes of multimedia assets. Developers building apps that need voice output without relying on cloud-native TTS services. Anyone who spends time manually managing audio pipelines and needs reliable, structured text-to-speech generation.
Needs to generate consistent voice narration for dozens of videos a week without hiring freelance voice actors.
Must integrate high-quality, reliable TTS into a production application or internal tool using structured API calls.
Generates audio versions of technical documentation for accessibility features and online training modules.
What Changes When You Connect
- Generate high-quality audio on demand: The
convert_ttstool handles the heavy lifting, letting you turn scripts into MP3 or WAV files with a single call. You don't need multiple endpoints. - Know exactly what voices are available: Use
get_voicesto list every voice option. This prevents guesswork and ensures your agent always selects a valid ID before starting conversion. - Track long jobs without polling constantly: The
get_tts_statustool lets you check job progress using a unique ID, making the entire process reliable and predictable for your client code. - Fine-tune audio output: You control the quality (from Draft to High) and format of every file. This gives you granular control over the final asset without writing complex post-processing steps.
- Speed up content pipelines: By separating discovery (
get_voices) from execution, your agent runs a clean, reliable three-step workflow that minimizes chances of failure.
Real-World Use Cases
Creating an e-learning module narration
The L&D team writes new training material. Instead of manually recording it or using a basic TTS tool, the agent runs get_voices to find the ideal educational voice. It then calls convert_tts with the full script and tracks status via get_tts_status, delivering finished audio assets in minutes.
Updating product documentation for accessibility
The technical writer needs to make a guide audible immediately. The agent calls get_voices first, then uses convert_tts with the text and a clear voice profile. The resulting audio file is uploaded directly to the help center.
Building an automated podcast filler segment
The content manager needs filler audio for episodes that run long. They feed the script into convert_tts. Because it returns a status ID, the agent can wait and confirm completion before packaging the final episode mix.
Validating voice choices across multiple regions
The global marketing team needs to ensure they use an American English voice. They run get_voices first to filter for 'en-US' voices, preventing accidental use of incorrect language settings.
The Tradeoffs
Trying to convert text with a random ID
The user just guesses a voice ID and sends the text straight to convert_tts. The conversion fails instantly because the ID doesn't exist or is invalid, forcing them to restart.
→
Always run get_voices first. Use that list to pull a known-good, valid Voice ID. Then pass that specific ID into convert_tts. This guarantees the input parameters are correct.
Ignoring job status
The agent calls convert_tts and assumes the audio file is ready right away, leading to a timeout error or a failed download because the conversion was still running.
→
After calling convert_tts, you must capture the request ID. Then, use get_tts_status repeatedly until the status reports 'complete'. Don't try to access the file before that.
Using a generic API wrapper
Relying on an older or incomplete library that doesn't distinguish between listing voices and converting text, causing confusion in the workflow.
→
Use the three dedicated tools: get_voices for discovery, convert_tts for execution, and get_tts_status for state management. This clear separation makes your code predictable.
When It Fits, When It Doesn't
You should use this server if your primary goal is turning written text into high-quality, customizable audio assets through a structured workflow. The three tools—get_voices, convert_tts, and get_tts_status—are built for reliability: you list voices to validate inputs; you convert text with the specific parameters; and you check status because TTS is an asynchronous process.
Don't use this if you need to modify audio content (e.g., adding music or effects) or if you only need basic, low-fidelity speech generation. For those cases, look for dedicated audio editing APIs or simpler cloud-based synthesis services that handle the entire job in one call without status tracking.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Play.ht. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 3 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Getting professional voiceover shouldn't require multiple manual signups and API keys.
Right now, if you need a video narrated, you often have to jump through hoops: first, checking which voices are even available on the platform; second, finding the right endpoint for text submission; and third, figuring out how long you have to wait before the file is actually ready.
With this Play.ht MCP Server, that whole sequence gets contained. Your agent handles the three steps automatically: `get_voices` validates your choice; `convert_tts` starts the job; and `get_tts_status` monitors it until you get the final asset. It's clean.
The Play.ht (AI Voice Generation & TTS) MCP Server: 3 Tools for Audio Pipelines
Before this server, running an audio job meant managing multiple state flags and dealing with inconsistent API calls—sometimes the list of voices was separate from the conversion service, creating integration gaps.
Now, you treat voice generation as a single, reliable pipeline. You call `get_voices` for inputs, use `convert_tts` for outputs, and rely on `get_tts_status` to handle all the complexity in between. It just works.
Common Questions About Play.ht MCP
How do I find out what voices are available using get_voices? +
Call get_voices. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.
What is the difference between convert_tts and get_tts_status? +
convert_tts starts the audio generation job and returns a request ID. get_tts_status uses that exact ID to check if the conversion finished or if it's still pending.
Can I use convert_tts with an unknown voice? +
No. You must first run get_voices to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.
Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files? +
Yes. When using convert_tts, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.
How do I authenticate my connection before using `convert_tts`? +
You must supply your Play.ht API Key and User ID when setting up this server. Your agent uses these credentials to authorize every call, ensuring you have permission to generate audio assets.
If a conversion fails, how do I debug the issue using `get_tts_status`? +
While get_tts_status tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.
What parameters can I pass to `convert_tts` for fine-tuning the audio output? +
You control quality levels (Draft through High) and speaking speed directly within the function call. This lets you precisely adjust the audio profile—like making it sound more formal or conversational—for your text.
Does `convert_tts` handle massive amounts of text, or is there a limit? +
For short bursts of copy, convert_tts works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by get_tts_status.
How can I find the right voice ID for my language? +
Use the get_voices tool. It returns a complete list of available voices, allowing you to filter by name, language, and gender to find the perfect match for your project.
Can I control the speed and format of the generated audio? +
Yes! When using convert_tts, you can specify the speed (from 0.5 to 2.0), the output_format (like mp3 or wav), and the quality level to suit your needs.
What should I do if a conversion takes a long time? +
For longer texts, use the get_tts_status tool with your transcription_id. This allows you to check if the audio is still processing or ready for download.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
AI21 Labs
Access state-of-the-art language models for text generation, summarization, and semantic understanding at enterprise scale.
Docparser
Equip your AI agent to extract data from documents, manage parsers, and track extraction results via the Docparser API.
LearningSuite
Create and deliver corporate training programs with course authoring, quizzes, and progress tracking for distributed teams.
You might also like
Trigger.dev (Background Tasks & Jobs)
Manage background tasks and jobs via Trigger.dev — trigger tasks, monitor runs, manage schedules, and configure environment variables directly from your AI agent.
Pipedrive Deals
Deep deal management — search, create, update, delete deals with pipeline tracking, timeline analysis, and participant management.
Mailshake
Manage cold outreach campaigns, leads, and prospects via the Mailshake REST API.