# ElevenLabs MCP MCP

> ElevenLabs gives your AI agent complete control over high-quality speech and audio dubbing. You can generate lifelike voiceovers, clone voices, or translate video into multiple languages without ever leaving your chat interface. It's the full suite for professional audio content creation.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** text-to-speech, voice-synthesis, audio-generation, dubbing, speech-api, natural-language

## Description

This MCP lets you manage every part of high-fidelity AI audio generation through natural conversation. Instead of jumping between dedicated audio platforms and your agent workspace, you keep everything in one place. You can initiate complex dubbing jobs, synthesize speech using perfect conversational intonation, or just check which voices are available globally. Developers love that they don't have to manually track API keys; Vinkius handles all credentials through a zero-trust proxy, so your keys never sit on a disk. This means you can focus purely on the creative output—whether it’s generating an entire library of voice samples or monitoring how much character quota you have left for the month.

## Tools

### get_history_item
Retrieves the specific details for a single historical audio generation job.

### get_subscription
Checks your current billing cycle information and remaining character usage quota.

### get_user_info
Fetches basic profile details for the connected ElevenLabs account.

### get_voice
Gets detailed information about a specific voice model available on the platform.

### list_history
Shows an overview of all previous audio generation jobs and their status.

### list_models
Retrieves a list of available AI speech models you can use for generation.

### list_projects
Lists your current or past audio dubbing and voice projects.

### list_pronunciation_dictionaries
Shows available phonetic dictionaries to ensure specific words are pronounced correctly in the generated speech.

### list_voices
Provides a comprehensive list of all voices, both standard and cloned, accessible to your account.

### text_to_speech
Converts any block of text into high-quality audio metadata using supported voice settings.

## Prompt Examples

**Prompt:** 
```
Generate audio for: 'Hello, this is a lifelike AI voice.' using voice 'abc-123'
```

**Response:** 
```
Inference pipeline triggered! I've sent your script to ElevenLabs using voice 'abc-123'. The audio is being generated with perfect intonation. Would you like to check the history for the download link?
```

**Prompt:** 
```
Show me my remaining character quota
```

**Response:** 
```
Retrieving subscription info... You have used 15,000 characters out of your 100,000 monthly limit. You have 85,000 characters remaining for this billing cycle.
```

**Prompt:** 
```
Dub this video into Spanish: https://example.com/video.mp4
```

**Response:** 
```
Dubbing job initiated! I've started the translation queue for your video into Spanish. I'll provide the tracking ID (dub_abc) so you can monitor the rendering status.
```

## Capabilities

### Generate Speech Audio
Converts raw text into high-fidelity audio files, supporting dozens of languages and voices.

### Manage Voice Profiles
List, identify, and tune voice settings to maximize human likeness for a specific project.

### Automate Dubbing Projects
Takes existing video or audio content and translates/doubles the voices into different languages automatically.

### Monitor Usage Limits
Checks your account's character quotas and subscription status to prevent overspending.

### Audit Content History
Retrieves a full, structured log of all past generations for easy review and troubleshooting.

## Use Cases

### Launching a Global Campaign
The marketing team needs to launch an ad campaign in five languages. Instead of hiring voice actors for every region, they ask their agent to use `list_projects` first, then initiate the translation queue via audio dubbing tools, ensuring consistent branding across all language versions.

### Debugging Audio Failures
A developer notices an audio file sounds choppy. They run a check using `get_history_item` to pull up the exact generation parameters and then use `list_models` to verify if they used the correct speech model.

### Updating Voice Assets
A content creator needs to add a new voice for their character. They first run `list_voices` to see available options, then use `get_voice` to check specific tuning parameters like stability before generating the final sample via text-to-speech.

### Monitoring API Costs
The product owner needs assurance that the AI agent isn't running wild. They prompt the system to run `get_subscription` to confirm current usage and available character quotas before initiating a large batch of audio generation.

## Benefits

- You never have to worry about running out of capacity. Use `get_subscription` to check your remaining character quota and keep your content pipeline moving.
- Need to ensure a specific name or technical term sounds right? Before generating audio, use `list_pronunciation_dictionaries` to define how the word should be spoken.
- Don't just generate audio; track it. Use `list_history` to view an overview of past projects, and then use `get_history_item` for deep dives into specific job details.
- The system handles complex credential management using a zero-trust proxy, so connecting your API keys is secure and simple, letting you focus on the content, not the security protocols.
- Want to build an automated global campaign? You can use `list_projects` to manage all your dubbing jobs before triggering new ones via the core audio generation tools.

## How It Works

The bottom line is: you tell your agent what kind of voice or audio job you need, and it handles the rest via the ElevenLabs API.

1. Subscribe to this MCP and enter your ElevenLabs API Key.
2. Ask your AI agent to perform an action, like synthesizing speech or checking your remaining character quota.
3. The service executes the task and returns either the generated audio metadata or the requested usage information.

## Frequently Asked Questions

**How do I check my audio generation budget using get_subscription?**
It immediately returns your current billing details, showing how many characters you've used this month versus your total limit. This stops overspending before it happens.

**Does list_voices show me my own cloned voice?**
Yes, `list_voices` shows every available voice model on the platform. It pulls both standard library voices and any custom or cloned voices specific to your account.

**What if I need a word pronounced differently? Do I use list_pronunciation_dictionaries?**
Yes, that's exactly what the `list_pronunciation_dictionaries` tool is for. It lets you define phonetic rules so your text-to-speech output says specialized or foreign words correctly.

**Can I list all my past video projects using list_projects?**
The `list_projects` tool aggregates all your dubbing and voice initiatives. It’s the single source of truth for tracking large-scale content translation efforts.

**When I use `list_history`, what details do I get about my past generation attempts?**
It provides a comprehensive log of every attempt, not just completed projects. You'll see the timestamp, the input text used, and whether the run succeeded or failed, which is essential for debugging.

**Does running `get_user_info` confirm that my API credentials are correctly authenticated?**
Yes, calling `get_user_info` validates your connection to ElevenLabs. It returns key account metadata and confirms the status of your user ID, helping you isolate if a problem is with the client setup or the service itself.

**After I run `text_to_speech`, how do I actually download the resulting audio file?**
The tool returns an audio metadata object containing a unique job ID and status. You must use this job ID to check the progress or initiate the final retrieval of the completed audio asset.

**When using `list_models`, what information helps me choose the right AI speech model?**
The list provides details about each available model, including its primary purpose and any specific constraints. This lets you compare specialized voices against general-purpose ones before running a large generation batch.