# Deepgram MCP

> Deepgram provides high-speed audio processing for your AI client. It handles speech-to-text transcription from URLs, generating accurate transcripts with speaker diarization. You can also convert text back into natural-sounding audio using the Aura engine. This MCP lets you manage models, check project usage, and control API keys all through conversation.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** speech-to-text, text-to-speech, transcription, diarization, audio-processing, neural-networks

## Description

Your agent needs to read audio files or generate voiceovers? Deepgram handles both sides of speech processing—transcribing audio into usable text and turning pure text back into natural-sounding speech. Forget manual uploads or juggling multiple services. Your AI client calls this MCP, and it manages the whole workflow for you. You can take public video links and get a clean transcript back, complete with who spoke when (diarization). If you need voiceovers for videos, just send the text, and we generate the audio file. Need to know if your usage is spiking? Check the limits instantly. All this functionality lives in Vinkius, allowing your AI agent to access everything from model selection to project key retrieval using simple natural language commands.

## Tools

### get_project_usage
Checks the current API usage, including minute consumption and request counts for your Deepgram project.

### list_api_keys
Retrieves all currently active identifiers associated with your deepgram projects.

### list_available_models
Lists the names and details of high-performance STT and TTS models you can use for a job.

### list_deepgram_projects
Retrieves a list of all deepgram projects linked to your account.

### convert_text_to_speech
Generates a natural-sounding audio file when you provide it with plain text.

### transcribe_audio_url
Converts speech from an audio or video file provided via URL into structured text.

## Prompt Examples

**Prompt:** 
```
Transcribe the audio from this URL: 'https://static.deepgram.com/examples/interview_segments_nuwav.wav'.
```

**Response:** 
```
Transcription triggered! I'm converting the speech from your URL into high-fidelity text using Nova-3. I'll provide the formatted transcript and speaker diarization details shortly.
```

**Prompt:** 
```
Convert this text to speech: 'Deepgram is the fastest way to add voice to your AI'.
```

**Response:** 
```
Audio synthesis complete! I've rendered your text into natural-sounding speech using the Aura engine. You can now access the media file and metadata for playback.
```

**Prompt:** 
```
List all active API keys for project 'proj_123'.
```

**Response:** 
```
I've retrieved the identifiers for your active API keys in project proj_123. There are 2 keys currently authorized for transcription and synthesis. Shall I check the usage stats for this project?
```

## Capabilities

### Transcribe Audio from a Link
Feed an audio or video URL into the MCP and receive a structured text transcript.

### Generate Speech from Text
Pass plain text to the MCP, which returns a high-quality media file of spoken audio.

### Check Usage Limits
Ask the MCP for current API usage and remaining minute consumption across your projects.

### Manage Access Credentials
Retrieve active API key identifiers or list available Deepgram projects.

## Use Cases

### Processing a massive archive of interviews
The research team has 50 video files with recorded interviews. Instead of manually uploading them, they tell their agent to run `transcribe_audio_url` against the entire directory list. The MCP handles all 50 links and returns structured text for every single session.

### Creating an automated tutorial video
The content team writes a script for a new product feature. They pass the final text to `convert_text_to_speech`, generating the voiceover audio. Then, they use that audio file as input for their deployment.

### Debugging an API workflow
The engineer notices a transcription job fails and suspects bad permissions. They run `list_api_keys` to verify active credentials and check the project scope using `get_project_usage` before restarting the process.

### Building a voice chatbot backend
The developer needs real-time speech input for an agent. They first use `list_available_models` to select a low-latency STT model, then connect that model via transcribing audio from a URL.

## Benefits

- Transcribe complex audio: Use `transcribe_audio_url` to process recordings from public URLs, getting full transcripts with speaker separation (diarization).
- Automate voice generation: Convert text into speech using `convert_text_to_speech`, eliminating the need for manual recording or studio time.
- Control your credentials: Quickly list active keys with `list_api_keys` and check project status by running `list_deepgram_projects` through simple queries.
- Stay within budget: Use `get_project_usage` to monitor API limits before a large batch of transcriptions, stopping you from hitting unexpected rate caps.
- Select the right model: Before processing anything, run `list_available_models` to ensure your task uses the optimal high-performance AI engine.

## How It Works

The bottom line is that your AI acts as an automated media production coordinator for all your speech and transcription needs.

1. First, you subscribe to this MCP and grab your specific API Key from the Deepgram console.
2. Second, you tell your AI client what you want—maybe 'transcribe this URL' or 'make audio of this text.'
3. The MCP executes the task, returning either a formatted transcript or a finished audio media file directly to your agent.

## Frequently Asked Questions

**How do I use `transcribe_audio_url`?**
You provide a public URL pointing to the audio or video. The MCP then fetches that content and converts the speech into formatted text, giving you diarization details.

**What is the difference between `list_available_models` and using them?**
`list_available_models` just shows what models exist. You run a conversion job (like transcription) and specify which model name you want to use for that specific task.

**Does `convert_text_to_speech` require me to upload files?**
No, it just needs the text. You pass the plain string of characters directly to your agent, and the MCP handles generating the audio media file for you.

**How do I check my API quotas using `get_project_usage`?**
You simply ask your agent to run `get_project_usage`. It returns a simple breakdown of how many minutes and requests you've already used in the current cycle.

**How do I use `list_api_keys` to check my active Deepgram credentials?**
It retrieves a list of all current API keys tied to your account. This is essential for security, letting you verify which identifiers are authorized and ensuring you don't accidentally run jobs using deprecated or inactive keys.

**What information does `list_deepgram_projects` provide?**
This function lists every project associated with your Deepgram account. You need this list to correctly reference a specific project ID when running complex operations, such as checking usage or transcribing audio for that defined scope.

**When listing models using `list_available_models`, what criteria should I use?**
The tool returns model names and capabilities. You must check the descriptions to select a model optimized for your specific content—for instance, picking one that handles speaker diarization or particular accents will maximize accuracy.

**Can `convert_text_to_speech` handle generating multiple audio files from different inputs?**
Yes. You provide the text and specify output parameters like voice type and format. By looping this call through your agent, you can efficiently generate large batches of synthetic speech assets for various use cases.

**How do I get a Deepgram API Key?**
Log in to the Deepgram Console, navigate to the **API Keys** section, and create a new key with the necessary permissions.

**What is the Nova-3 model?**
Nova-3 is Deepgram's latest state-of-the-art transcription model, offering unmatched speed and accuracy for real-world audio.

**Can I synthesize speech in different voices?**
Yes! The `convert_text_to_speech` tool allows you to specify models like `aura-asteria-en` or `aura-orion-en` for different vocal profiles.