Speechmatics MCP. Transcribe Audio and Generate Voiceovers, Automatically.
Speechmatics provides high-accuracy audio processing capabilities right in your agent. Transcribe massive amounts of audio files—whether they're podcasts or meeting recordings—into structured text. You can also convert any written script into natural, human-sounding speech using various voices (like Sarah, Theo, and Megan). It handles everything from batch transcription to job management, giving you full control over your audio pipelines.
Give Claude and any AI agent real-world access
Submit large audio recordings and receive highly accurate written transcripts.
Turn plain text into high-quality, natural-sounding voice audio using multiple character voices.
Keep track of every processing task, listing recent activity and checking the status of ongoing jobs.
Pull finished transcriptions in various formats like JSON or plain text for immediate use.
Ask an AI about this
Waiting for input…
What AI agents can do with Speechmatics with 8 Tools
These eight tools let you manage every step of advanced audio processing: submitting jobs, tracking status, generating voiceovers, and retrieving clean transcripts.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Speechmatics MCPCreate Job
Starts a new process to transcribe an audio file from a provided source.
Create Temp Key
Generates secure, temporary API keys for client-side access management.
Delete Job
Removes a transcription job from the system if it was started accidentally or is no...
Generate Tts
Converts specified text into an audio file using high-quality, natural speech voices.
Get Job
Retrieves the current status and specific details for a single transcription job ID.
Get Transcript
Pulls the final, completed text or subtitle file associated with a finished job.
Get Usage
Checks your current billing consumption and usage statistics for the service.
List Jobs
Shows a list of all recent transcription jobs you have submitted to the system.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on each call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Speechmatics, then connect any of our 5,200+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,200+ others, all in one place
- Add new capabilities to your AI anytime you want
- Connections are secured and governed automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog weekly
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Speechmatics. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS CLOUD
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on each call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Sifting Through Hours of Raw Audio Is a Full-Time Job
Today, getting usable text from audio means clicking into a separate transcription service. You upload the file, wait an unknown time for processing, then you log back in to download a messy export. This cycle involves copying data from one system, pasting it into another for cleanup, and hoping all the timestamps didn't break your script.
With this MCP, audio processing happens inside your agent’s conversation flow. You just submit the source file, and the system handles the entire wait period in the background. The finished text or subtitles are delivered directly back to you through a simple command.
Get Natural Speech with generate_tts
Writing scripts and then having a voice artist record them is slow, expensive, and requires scheduling. You also have to manually edit the audio file and sync it perfectly to your video timeline.
Now you input your text and tell your agent which professional voice you want using generate_tts. It delivers the finished, ready-to-use audio file in seconds. The friction point—the wait time and manual labor—is gone.
What Speechmatics MCP does for your AI
Dealing with raw audio is a massive headache for any workflow. Before this MCP, turning hours of recorded conversation or video content into usable text required specialized software and tedious manual exports. Now, your agent connects directly to Speechmatics through Vinkius, letting you handle advanced audio processing as part of a natural conversation.
You can feed it an audio file—via URL or base64—and quickly start a batch transcription job. Need voiceovers for training videos? Just give it the text and tell it which high-quality voice to use. The system manages all the background work, monitoring your jobs until the transcript is ready for you to pull out in JSON or SRT format.
019e38f0-fe67-71c5-b58a-73494de646c4 How to set up Speechmatics MCP
The bottom line is: you tell your agent what audio needs processing, and it handles the entire lifecycle from submission to retrieval.
First, your agent initiates a request by submitting the audio file (via URL or base64) to create a new job.
Next, you monitor the task status using list_jobs and get_job until the transcription is marked as complete.
Finally, you retrieve the finished text or subtitles using get_transcript to integrate it into your workflow.
Who uses Speechmatics MCP
This MCP is built for content teams, data analysts, and developers who treat recorded speech as a core asset. If your job involves turning talks, calls, or podcasts into actionable text, this tool saves you from endless copy-pasting and manual QA.
Uses the MCP to transcribe raw audio episodes immediately, then uses generate_tts to create voiceover segments for promotional clips.
Integrates speech recognition into a prototype app, using tools like create_job and get_transcript to handle backend data processing without managing servers.
Processes hours of recorded client meetings via batch transcription and then uses the output to populate structured databases for reporting.
Benefits of connecting Speechmatics MCP
Batch processing large files is simple. Use create_job to submit multiple hours of audio at once and handle the entire workload without complex scripting.
You get professional voice quality for free. The generate_tts tool lets you turn any script into natural speech using voices like Sarah or Theo, perfect for e-learning modules.
Monitoring is built in. You never have to worry if a job failed; list_jobs and get_job let your agent track every single step of the process.
Output flexibility means less cleanup time. When you pull results with get_transcript, you can choose JSON, SRT subtitles, or plain text.
It’s secure and auditable. The create_temp_key tool lets your team manage access credentials without exposing permanent API keys.
Speechmatics MCP use cases
Indexing internal knowledge bases from calls
A Customer Success Manager has a pile of recorded support calls. They ask their agent to use create_job on all the audio files. The system transcribes everything, and then they pull the clean text using get_transcript, immediately feeding it into an indexed search database.
Creating multilingual training materials
An e-learning developer needs to update voiceovers for a new module. They input the script and tell their agent to use generate_tts with Megan's voice, receiving a ready-to-use audio file instantly.
Automating video subtitling
A content creator finishes recording a podcast episode. Instead of manually transcribing it, they ask their agent to use create_job on the MP3 URL and then pull the output using get_transcript in SRT format for immediate upload.
Auditing usage costs
A team lead wants to know how much audio processing has occurred this month. They ask their agent to run get_usage, getting an instant report on account consumption without having to check a separate dashboard.
Speechmatics MCP tradeoffs
What to watch out for, and the recommended way to handle each one.
Trying to use simple HTTP requests for jobs
Manually making multiple calls just to check if the job finished, leading to complex error handling and brittle code that fails when status codes change.
Let your agent manage the state. Use list_jobs to see recent activity, then rely on get_job to poll for status updates until it's complete before calling get_transcript.
Assuming raw audio is always usable text
A developer gets a transcript but finds it contains time stamps and formatting junk that requires hours of manual cleaning in Excel.
Use the dedicated tools. Transcribe first with create_job, then ask your agent to process the output using get_transcript to ensure you only receive clean, usable text formats like JSON.
Using a general purpose AI for voice generation
Asking an LLM to 'make me sound like a robot reading this,' which results in low-quality, monotonous audio that doesn't match brand standards.
Use the generate_tts tool. This provides access to professional voices (like Jack or Theo) and ensures the output is high quality, ready for production use.
When to use Speechmatics MCP
You should use this MCP if your core problem revolves around turning audio into text, or text into speech. Specifically, if you need reliable batch transcription of large files (use create_job), or if your workflow requires generating professional voiceovers for content creation (use generate_tts). Don't use it if you just need to send a simple message or read data from a structured spreadsheet; those are better handled by dedicated database connectors. If you only need basic file uploading without job management, this MCP is overkill because the tools like list_jobs and get_job provide essential state tracking that simple APIs lack. This is for entire content pipelines.
Frequently asked questions about Speechmatics MCP
How do I transcribe a large podcast episode with Speechmatics MCP? +
You start by using create_job, providing the audio URL or base64. Your agent monitors its status until it's complete, then you use get_transcript to pull the final text.
Can I generate subtitles with Speechmatics MCP? +
Yes. After a job finishes using create_job, you can retrieve the transcript using get_transcript and specify SRT format for subtitle files.
Is there a way to track my spending on Speechmatics MCP? +
Absolutely. You use the get_usage tool anytime to check your account consumption statistics without leaving your current workflow.
What is the difference between list_jobs and get_job using Speechmatics MCP? +
list_jobs shows a summary of all recent jobs you've run. Use get_job when you know the specific ID of one job and need detailed status updates on it.
Do I need to manage API keys for Speechmatics MCP? +
Yes, but it’s easy. You can use create_temp_key to generate temporary credentials, keeping your main key secure while allowing controlled access for testing or specific integrations.