Deepgram MCP for AI. Convert Audio to Text and Vice Versa.
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Deepgram provides high-speed audio processing for your AI client. It handles speech-to-text transcription from URLs, generating accurate transcripts with speaker diarization.
You can also convert text back into natural-sounding audio using the Aura engine. This MCP lets you manage models, check project usage, and control API keys all through conversation.
What your AI can do
Get project usage
Checks the current API usage, including minute consumption and request counts for your Deepgram project.
List api keys
Retrieves all currently active identifiers associated with your deepgram projects.
List available models
Lists the names and details of high-performance STT and TTS models you can use for a job.
Feed an audio or video URL into the MCP and receive a structured text transcript.
Pass plain text to the MCP, which returns a high-quality media file of spoken audio.
Ask the MCP for current API usage and remaining minute consumption across your projects.
Retrieve active API key identifiers or list available Deepgram projects.
Ask an AI about this
Waiting for input…
Deepgram: 6 Tools for Audio Processing
These tools let you manage projects, check usage limits, list models, and execute both transcription and speech synthesis tasks via your agent.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Deepgram on VinkiusGet Project Usage
Checks the current API usage, including minute consumption and request counts for your Deepgram project.
List Api Keys
Retrieves all currently active identifiers associated with your deepgram projects.
List Available Models
Lists the names and details of high-performance STT and TTS models you can use for a...
List Deepgram Projects
Retrieves a list of all deepgram projects linked to your account.
Convert Text To Speech
Generates a natural-sounding audio file when you provide it with plain text.
Transcribe Audio Url
Converts speech from an audio or video file provided via URL into structured text.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Deepgram, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Deepgram. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 6 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Handling Voice Data Manually Is a Time Sink.
Right now, processing recorded conversations means exporting the file, uploading it to a separate transcription service, waiting hours for credit checks, then downloading the resulting text file. Then you have to copy that data into your application and maybe run another script just to clean up timestamps.
With this MCP, your agent handles the whole chain. You give the URL, and the system automatically transcribes it with speaker diarization. The result hits your workflow as ready-to-use, structured text.
Generate Speech on Demand With Deepgram
Before this MCP, generating a voiceover meant writing the script, then exporting it to an expensive third-party TTS platform, paying per character, and downloading a ZIP of audio files. If you needed multiple versions, you repeated the whole cycle.
Now, your agent handles the synthesis job entirely. You pass the text, get the high-quality audio file back, and repeat that process instantly—no logins, no manual exports.
What your AI can actually do with this
Your agent needs to read audio files or generate voiceovers? Deepgram handles both sides of speech processing—transcribing audio into usable text and turning pure text back into natural-sounding speech. Forget manual uploads or juggling multiple services. Your AI client calls this MCP, and it manages the whole workflow for you.
You can take public video links and get a clean transcript back, complete with who spoke when (diarization). If you need voiceovers for videos, just send the text, and we generate the audio file. Need to know if your usage is spiking? Check the limits instantly. All this functionality lives in Vinkius, allowing your AI agent to access everything from model selection to project key retrieval using simple natural language commands.
019dd0de-137a-738f-bddf-196625557e29 Here's how it actually works
The bottom line is that your AI acts as an automated media production coordinator for all your speech and transcription needs.
First, you subscribe to this MCP and grab your specific API Key from the Deepgram console.
Second, you tell your AI client what you want—maybe 'transcribe this URL' or 'make audio of this text.'
The MCP executes the task, returning either a formatted transcript or a finished audio media file directly to your agent.
Who is this actually for?
Engineers building agentic workflows, content teams needing scalable video assets, or research staff processing large amounts of interview data. If you deal with voice and text conversion regularly, this is for you.
Needs to set up automated pipelines that pull speech data from external sources, check usage limits via get_project_usage, and ensure correct key rotation using list_api_keys.
Must automate the creation of subtitles or voiceovers for global video campaigns by feeding text to generate audio, then transcribing raw footage from URLs.
Processes hours of interview recordings, using transcribe_audio_url to get detailed transcripts and monitoring project limits via list_deepgram_projects.
What Changes When You Connect
Transcribe complex audio: Use transcribe_audio_url to process recordings from public URLs, getting full transcripts with speaker separation (diarization).
Automate voice generation: Convert text into speech using convert_text_to_speech, eliminating the need for manual recording or studio time.
Control your credentials: Quickly list active keys with list_api_keys and check project status by running list_deepgram_projects through simple queries.
Stay within budget: Use get_project_usage to monitor API limits before a large batch of transcriptions, stopping you from hitting unexpected rate caps.
Select the right model: Before processing anything, run list_available_models to ensure your task uses the optimal high-performance AI engine.
See it in action
Processing a massive archive of interviews
The research team has 50 video files with recorded interviews. Instead of manually uploading them, they tell their agent to run transcribe_audio_url against the entire directory list. The MCP handles all 50 links and returns structured text for every single session.
Creating an automated tutorial video
The content team writes a script for a new product feature. They pass the final text to convert_text_to_speech, generating the voiceover audio. Then, they use that audio file as input for their deployment.
Debugging an API workflow
The engineer notices a transcription job fails and suspects bad permissions. They run list_api_keys to verify active credentials and check the project scope using get_project_usage before restarting the process.
Building a voice chatbot backend
The developer needs real-time speech input for an agent. They first use list_available_models to select a low-latency STT model, then connect that model via transcribing audio from a URL.
The honest tradeoffs
Treating Deepgram like a simple file uploader
A user tries to upload an MP4 directly into their agent, assuming the MCP will handle it just because they see 'audio' in the name.
The transcribe_audio_url tool requires a public URL pointing to the audio/video file. You must provide that link; you can't pass a local file path directly.
Ignoring account limits
Running dozens of large transcriptions without checking resource constraints, leading to sudden API failures.
Always check the current status and remaining capacity first. Use get_project_usage before initiating any high-volume transcription jobs.
Assuming one model works for all tasks
Running a transcript job using default settings when the audio is heavily accented or noisy, resulting in poor accuracy.
First, call list_available_models to see which specialized models are available. Select the highest-accuracy option for your specific content type.
When It Fits, When It Doesn't
Use this MCP if your primary job involves converting between spoken word and written text—either transcribing audio from links or generating voiceovers from scripts. If you only need to write a script, use a basic text generation tool; don't worry about the transcription tools. If you have transcribed text but need to validate its structure against a schema, use a Pydantic-style validation MCP instead of using transcribe_audio_url. Remember: this is for audio and speech only.
Questions you might have
How do I use `transcribe_audio_url`? +
You provide a public URL pointing to the audio or video. The MCP then fetches that content and converts the speech into formatted text, giving you diarization details.
What is the difference between `list_available_models` and using them? +
list_available_models just shows what models exist. You run a conversion job (like transcription) and specify which model name you want to use for that specific task.
Does `convert_text_to_speech` require me to upload files? +
No, it just needs the text. You pass the plain string of characters directly to your agent, and the MCP handles generating the audio media file for you.
How do I check my API quotas using `get_project_usage`? +
You simply ask your agent to run get_project_usage. It returns a simple breakdown of how many minutes and requests you've already used in the current cycle.
How do I use `list_api_keys` to check my active Deepgram credentials? +
It retrieves a list of all current API keys tied to your account. This is essential for security, letting you verify which identifiers are authorized and ensuring you don't accidentally run jobs using deprecated or inactive keys.
What information does `list_deepgram_projects` provide? +
This function lists every project associated with your Deepgram account. You need this list to correctly reference a specific project ID when running complex operations, such as checking usage or transcribing audio for that defined scope.
When listing models using `list_available_models`, what criteria should I use? +
The tool returns model names and capabilities. You must check the descriptions to select a model optimized for your specific content—for instance, picking one that handles speaker diarization or particular accents will maximize accuracy.
Can `convert_text_to_speech` handle generating multiple audio files from different inputs? +
Yes. You provide the text and specify output parameters like voice type and format. By looping this call through your agent, you can efficiently generate large batches of synthetic speech assets for various use cases.
How do I get a Deepgram API Key? +
Log in to the Deepgram Console, navigate to the API Keys section, and create a new key with the necessary permissions.
What is the Nova-3 model? +
Nova-3 is Deepgram's latest state-of-the-art transcription model, offering unmatched speed and accuracy for real-world audio.
Can I synthesize speech in different voices? +
Yes! The convert_text_to_speech tool allows you to specify models like aura-asteria-en or aura-orion-en for different vocal profiles.
We've already built the connector for Deepgram. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 6 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.