# Play.ht MCP

> Play.ht MCP Server turns plain text into professional audio files using a neural voice engine. It lets you discover available voices—like listing all language options—and then converts any block of text instantly. You can also track long-running jobs, so your AI client knows exactly when the final MP3 or WAV file is ready to download.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** text-to-speech, ai-voices, speech-synthesis, voice-generation, neural-tts

## Description

Look, this Play.ht MCP Server handles turning plain text into professional audio files using their neural voice engine. It's built to let your AI client do the heavy lifting—you just point it at the server. 

To get started, you first need to check what voices are available. You'll use the `get_voices` tool; this calls up a structured list of every single voice Play.ht has in its library. It gives you metadata for each one, including their unique IDs and what languages they support. This is how you figure out which voice fits your project.

Once you've got the right Voice ID, you can actually generate the audio. You call `convert_tts`, feeding it three things: the text you want spoken, that specific Voice ID, and any parameters for quality or format. The tool then starts processing, turning that written script into either an MP3 or a WAV file.

Since these aren't instantaneous jobs, they run in the background. You don't just call `convert_tts` and assume you got the final file; it's a multi-step process. After submitting your text for conversion, the server gives you a unique request ID. This is key because it tells your AI client what to watch for next. If you need to know if the audio job finished or if something went wrong, you use `get_tts_status`. You just pass that unique request ID into this tool, and it checks the status—it'll tell you if the job is still pending, if it's done, or if it failed. This lets your agent wait for confirmation before trying to pull down the final audio file.

It’s basically a three-step loop: first, check `get_voices` for options; second, run `convert_tts` with text and voice ID; and third, constantly monitor `get_tts_status` using the returned request ID until you can grab your finished MP3 or WAV file.

Using this setup means you don't have to manually manage API calls. Your AI client handles the whole sequence. It grabs the list of voices first, making sure it knows all the available IDs and languages before sending a single character of text for conversion. When that job is submitted via `convert_tts`, your agent gets back that unique tracker ID. You can then feed that ID into `get_tts_status` repeatedly. This process keeps your workflow tight because you're never guessing if the file is ready; you just ask the server, and it tells you exactly where it stands.

If you need to build out video narrations, this handles everything from script text to final audio file download. Developers can integrate realistic speech synthesis directly into their own apps without having to deal with manual API scheduling or polling. Accessibility teams find it useful because they can quickly turn large documents or reports into clear, audible speech for people who rely on that format. It's a complete pipeline: discover voices, submit text, track status, and get the file.

## Tools

### convert_tts
Turns input text into an audio file format (MP3 or WAV) using a specific voice ID and quality setting.

### get_tts_status
Checks the completion status of a running TTS job, requiring only the unique request ID for tracking.

### get_voices
Retrieves a structured list of all available Play.ht voices, including their unique IDs, languages, and metadata.

## Prompt Examples

**Prompt:** 
```
List all available English voices from Play.ht.
```

**Response:** 
```
I've retrieved the voices. You have access to several options like 'Larry' (Male), 'Susan' (Female), and 'William' (Male). Which one would you like to use for your text?
```

**Prompt:** 
```
Convert 'Welcome to our AI platform' to speech using voice ID 's3://voice-id' in high quality.
```

**Response:** 
```
Starting conversion... The request has been submitted. Your transcription ID is 'trans_123'. I will monitor the status for you.
```

**Prompt:** 
```
Check the status of transcription ID 'trans_123'.
```

**Response:** 
```
The conversion for 'trans_123' is complete! You can access your audio file at the generated URL.
```

## Capabilities

### List available voices
Retrieves a structured list of every voice Play.ht offers, including metadata like language and unique IDs.

### Convert text to speech
Takes input text and converts it into an audio file format (MP3 or WAV) using a specified voice ID.

### Check conversion status
Uses a unique request ID to check if the long-running TTS job is finished, pending, or failed.

## Use Cases

### Creating an e-learning module narration
The L&D team writes new training material. Instead of manually recording it or using a basic TTS tool, the agent runs `get_voices` to find the ideal educational voice. It then calls `convert_tts` with the full script and tracks status via `get_tts_status`, delivering finished audio assets in minutes.

### Updating product documentation for accessibility
The technical writer needs to make a guide audible immediately. The agent calls `get_voices` first, then uses `convert_tts` with the text and a clear voice profile. The resulting audio file is uploaded directly to the help center.

### Building an automated podcast filler segment
The content manager needs filler audio for episodes that run long. They feed the script into `convert_tts`. Because it returns a status ID, the agent can wait and confirm completion before packaging the final episode mix.

### Validating voice choices across multiple regions
The global marketing team needs to ensure they use an American English voice. They run `get_voices` first to filter for 'en-US' voices, preventing accidental use of incorrect language settings.

## Benefits

- Generate high-quality audio on demand: The `convert_tts` tool handles the heavy lifting, letting you turn scripts into MP3 or WAV files with a single call. You don't need multiple endpoints.
- Know exactly what voices are available: Use `get_voices` to list every voice option. This prevents guesswork and ensures your agent always selects a valid ID before starting conversion.
- Track long jobs without polling constantly: The `get_tts_status` tool lets you check job progress using a unique ID, making the entire process reliable and predictable for your client code.
- Fine-tune audio output: You control the quality (from Draft to High) and format of every file. This gives you granular control over the final asset without writing complex post-processing steps.
- Speed up content pipelines: By separating discovery (`get_voices`) from execution, your agent runs a clean, reliable three-step workflow that minimizes chances of failure.

## How It Works

The bottom line is: you discover voices first, send the job to convert text, and then check back on a specific ID until it's done.

1. Your AI client first calls `get_voices` to select the correct voice ID and language for the project.
2. The agent then sends the text and the selected voice ID to `convert_tts`. A request ID is immediately returned, kicking off the audio generation process.
3. Finally, your client polls `get_tts_status` using that request ID until the status confirms completion. You can then access the final audio file.

## Frequently Asked Questions

**How do I find out what voices are available using get_voices?**
Call `get_voices`. This tool returns a list of all Play.ht voices, giving you details like the voice ID, language code, and gender for selection.

**What is the difference between convert_tts and get_tts_status?**
`convert_tts` starts the audio generation job and returns a request ID. `get_tts_status` uses that exact ID to check if the conversion finished or if it's still pending.

**Can I use convert_tts with an unknown voice?**
No. You must first run `get_voices` to retrieve a valid, active Voice ID. Passing an incorrect ID will cause the conversion job to fail immediately.

**Does Play.ht (AI Voice Generation & TTS) MCP Server support WAV files?**
Yes. When using `convert_tts`, you can specify your desired output format, including MP3 and WAV, giving you control over the final asset type.

**How do I authenticate my connection before using `convert_tts`?**
You must supply your Play.ht API Key and User ID when setting up this server. Your agent uses these credentials to authorize every call, ensuring you have permission to generate audio assets.

**If a conversion fails, how do I debug the issue using `get_tts_status`?**
While `get_tts_status` tracks progress, if an error occurs, the returned status object will contain specific failure codes. Check these details to pinpoint why your transcription ID isn't completing.

**What parameters can I pass to `convert_tts` for fine-tuning the audio output?**
You control quality levels (Draft through High) and speaking speed directly within the function call. This lets you precisely adjust the audio profile—like making it sound more formal or conversational—for your text.

**Does `convert_tts` handle massive amounts of text, or is there a limit?**
For short bursts of copy, `convert_tts` works instantly. If you're processing large documents or high volumes, the system may queue requests. You must check on their progress using the unique ID provided by `get_tts_status`.

**How can I find the right voice ID for my language?**
Use the `get_voices` tool. It returns a complete list of available voices, allowing you to filter by name, language, and gender to find the perfect match for your project.

**Can I control the speed and format of the generated audio?**
Yes! When using `convert_tts`, you can specify the `speed` (from 0.5 to 2.0), the `output_format` (like mp3 or wav), and the `quality` level to suit your needs.

**What should I do if a conversion takes a long time?**
For longer texts, use the `get_tts_status` tool with your `transcription_id`. This allows you to check if the audio is still processing or ready for download.