# LMNT MCP

> LMNT provides ultra-low latency speech synthesis, letting your AI client generate high-fidelity audio in milliseconds. Use it to clone voices instantly from samples or manage all your custom voice assets through dedicated tools like `create_voice` and `list_voices`. It's built for real-time applications where speed matters.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** text-to-speech, voice-cloning, ai-audio, low-latency, speech-synthesis

## Description

You're building something real-time—a conversational agent, maybe live content localization—and latency is everything. This server gives you ultra-low latency speech synthesis and voice cloning, generating high-fidelity audio in milliseconds for your AI client. You gotta manage these assets fast.

To turn text into sound, the `generate_speech` tool takes written input, lets you select a language, and converts it straight into an audio stream encoded in base64 format; that's ready to play or download right away.

When you need a new voice model, cloning is quick. You run `create_voice`, upload your sample audio, and the server instantly generates a unique ID for that brand-new cloned voice asset. This capability lets you replicate voices on the fly without spending time recording long sessions.

Managing your entire library of voices is straightforward. You use `list_voices` to pull up a full inventory, giving you IDs and basic attributes for every custom and system voice in the account. If you only care about one specific asset, you can fetch detailed metadata using `get_voice`. For an overview of all assets, you've got that list.

You control the lifecycle of those assets too. You modify existing voices by calling `update_voice` to change its associated metadata without touching the underlying audio samples or model itself. When a voice is useless clutter, you delete it using `delete_voice`, pointing directly at the specific ID you want gone.

Beyond asset management, you track your usage with `get_account`; this tool retrieves critical metrics like how many characters you've used up and what your remaining plan limits are. It keeps you in the loop on billing right out of the gate. This architecture means your agent can handle synthesis, cloning, listing, checking details, modifying assets, deleting junk, and monitoring usage—all through direct function calls.

## Tools

### create_voice
Takes an audio sample and generates a unique ID for a new cloned voice asset.

### delete_voice
Removes a specific, existing voice from the account using its unique identifier.

### generate_speech
Converts input text into an audio stream and returns it encoded in base64 format for playback or download.

### get_account
Retrieves the current account usage metrics, including character counts used and remaining plan limits.

### get_voice
Fetches detailed metadata for a single voice ID, showing its properties and status.

### list_voices
Returns an array of all available voices in the account, allowing you to inspect their IDs and basic attributes.

### update_voice
Modifies metadata for an existing voice ID without changing the underlying audio samples or model.

## Prompt Examples

**Prompt:** 
```
List all the voices I have access to.
```

**Response:** 
```
I've retrieved the available voices. You have access to system voices like 'Aura' (ID: aura) and 'Lily' (ID: lily), as well as your custom clones. Would you like to hear a sample of one?
```

**Prompt:** 
```
Synthesize 'Welcome to the future of speech' using the voice ID 'lily' in MP3 format.
```

**Response:** 
```
Generating audio... Done! I've synthesized the text using the 'Lily' voice. The MP3 audio is ready for playback or download.
```

**Prompt:** 
```
Show me my current LMNT account usage and plan info.
```

**Response:** 
```
Your account is currently on the 'Pro' plan. You have used 45,000 characters out of your 1,000,000 monthly limit. Your next billing cycle resets in 12 days.
```

## Capabilities

### Synthesize Speech from Text
The agent calls `generate_speech` to convert written text into a base64 encoded audio stream, supporting multiple languages.

### Create Voice Clones
The agent executes `create_voice` by uploading an audio sample and instantly generating a new, usable voice ID.

### List All Available Voices
The agent runs `list_voices` to retrieve a full inventory of all custom and system voices associated with the account.

### Retrieve Voice Details
The agent uses `get_voice` or `get_account` to pull specific metadata about an existing voice ID or check usage limits.

### Update and Delete Assets
The agent manages the asset lifecycle by calling `update_voice` for modifications or `delete_voice` to remove unused voices.

## Use Cases

### Building a Real-Time Chatbot
A developer needs their chatbot to respond audibly, mimicking a human voice. Instead of relying on slow cloud APIs, they configure the agent to use `generate_speech` with an ultra-low latency connection. The result is immediate audio output that feels conversational and natural.

### Localizing a Corporate Training Module
A content creator has a video script in English but needs it localized to Mandarin for a global audience. They use `create_voice` with samples of native speakers, then call `generate_speech` repeatedly, specifying the target language and voice ID for every segment.

### Auditing Voice Assets
An operations team needs to know which voices are active but haven't been used in months. They run `list_voices`, inspect the full list, and then use `get_voice` on suspicious IDs before deciding if they need to clean up by running `delete_voice`.

### Scaling Up Production Capacity
A startup is preparing for a major marketing push. They first call `get_account` to verify their remaining monthly character limit, then use the confirmed capacity to run high-volume speech synthesis jobs using `generate_speech`, ensuring they don't exceed their plan.

## Benefits

- Speed is the key benefit. By using `generate_speech`, your agent delivers audio in milliseconds, making it suitable for live conversational AI where latency kills the experience.
- You maintain full control over your voice library. Tools like `list_voices` and `get_voice` let you audit every asset before running a job, so you never use the wrong ID again.
- Voice cloning is instant. Running `create_voice` means you upload samples and immediately get a functional, reusable voice ID for generating speech, bypassing weeks of recording studio time.
- Usage tracking is built in. Before massive campaigns, check your limits with `get_account`. This prevents billing surprises when running high-volume jobs.
- Asset management is clean. You can delete old or unused assets using `delete_voice`, keeping your voice inventory streamlined and reducing clutter.

## How It Works

The bottom line is, your AI client treats audio generation and voice management like any other function call—it's just another tool in the conversation.

1. Subscribe to the LMNT MCP Server and provide your API Key to your AI client.
2. Instruct your agent to perform a specific action (e.g., 'Generate speech for X text using voice Y').
3. The agent calls the appropriate tool (`generate_speech`, `create_voice`, etc.) and receives the resulting audio data or asset metadata.

## Frequently Asked Questions

**How do I check if my account has enough credits for high-volume speech synthesis using generate_speech?**
Call the `get_account` tool. This returns your current plan details, showing exactly how many characters you have consumed and what your remaining monthly limit is.

**What do I need to use the create_voice tool for cloning a new voice?**
You must provide an audio sample file. The `create_voice` tool takes this input, processes it, and returns a unique voice ID that you can then use with `generate_speech`.

**I have too many old voices; how do I clean up my asset list?**
First, check your full inventory using `list_voices`. Once you confirm an unused voice ID, you can call `delete_voice` to remove it from the active set.

**Can I modify a voice's metadata without changing its core sound?**
Yes. Use the `update_voice` tool. This lets you adjust parameters or labels for an existing asset ID using `get_voice` as your reference, without affecting the audio itself.

**What should I do if a call to `generate_speech` fails?**
The API returns specific error codes and structured messages. Check the documentation for common failure reasons, such as unsupported text characters or invalid voice IDs, and refine your input parameters.

**How do I handle the audio data returned by `generate_speech`?**
The tool sends a base64 encoded stream. Your AI client must decode this string back into raw binary data before you can play or save the resulting audio file (e.g., MP3).

**Are there rate limits when I use the `list_voices` tool?**
Yes, standard API rate limits apply to all endpoints. If your agent sends too many requests quickly, you will receive a 429 error; implementing an exponential backoff strategy is required.

**What details does the `get_voice` tool provide for a specific ID?**
It returns detailed metadata about that voice. This includes its unique ID, supported languages, and usage parameters, letting you confirm compatibility before making a large generation call.

**Can I choose different audio formats like MP3 or WAV?**
Yes. The `generate_speech` tool allows you to specify formats like mp3, wav, or mulaw, along with custom sample rates to fit your application's needs.

**How do I create a new voice clone?**
Use the `create_voice` tool by providing a name and a base64-encoded audio sample. The system will process the file and return a new Voice ID for immediate use.

**How can I check how many characters I have left in my plan?**
Run the `get_account` tool. It returns your current usage metrics and plan details directly from the LMNT API.