AssemblyAI MCP. Transcribe audio and structure the text output.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
AssemblyAI. Transcribe and audit audio data using speech-to-text models. This server lets your AI client run transcription jobs from any audio or video URL, retrieving clean text with speaker labels.
You can also get transcripts segmented by sentences or paragraphs, check confidence scores for accuracy, and manage the job history.
What your AI agents can do
Delete transcript
Removes a specific transcription job record.
Get transcript
Retrieves the complete text result for a given job ID.
Get transcript paragraphs
Gets the transcript broken down by paragraphs.
Send an audio or video URL to start a job and get a job ID.
Fetch the complete text result for a specific job ID.
Retrieve the full transcript, structured and broken down by paragraphs.
Retrieve the full transcript, structured and broken down by individual sentences.
Fetch a list of all past and active transcription jobs.
Permanently remove a specific transcription job record.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
AssemblyAI MCP Server: 6 Tools for Audio Processing
Use these tools to process audio files, manage job queues, and extract structured text data from transcripts.
019d8418delete transcript
Removes a specific transcription job record.
019d8418get transcript
Retrieves the complete text result for a given job ID.
019d8418get transcript paragraphs
Gets the transcript broken down by paragraphs.
019d8418get transcript sentences
Gets the transcript broken down by sentences.
019d8418list transcripts
Lists all past and active transcription jobs.
019d8418transcribe audio
Starts a transcription job using an audio or video URL.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with AssemblyAI, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
You're running an audio workflow, and this server's got your back. It lets your AI client run transcription jobs from any audio or video URL, giving you clean text with speaker labels. You can grab the full text result for a specific job ID using get_transcript, or you can get the transcript broken down by paragraphs with get_transcript_paragraphs, and you can get it broken down by individual sentences using get_transcript_sentences.
You start everything off by sending an audio or video URL to transcribe_audio, which kicks off the job and gives you a job ID. You can check the accuracy of the results by getting confidence scores right off the job. To keep track of everything, you can list all past and active jobs with list_transcripts, and if you need to clean house, you can permanently remove a specific job record using delete_transcript.
How AssemblyAI MCP Works
- 1 Subscribe to the server and enter your AssemblyAI API Key.
- 2 Tell your AI client to start a job by providing an audio or video URL.
- 3 Your agent monitors the job status. Once complete, you retrieve the transcript using
get_transcriptor segment it using specific tools.
The bottom line is, your agent manages the entire audio pipeline, from initial job submission to final structured data retrieval.
Who Is AssemblyAI MCP For?
Data analysts who need to verify transcript accuracy, content creators needing speaker metadata, and operations leads who must quickly audit meeting records. If you work with large volumes of spoken word content, this server is for you.
Verifies transcript accuracy by checking confidence scores across multiple files and auditing linguistic trends.
Monitors podcast transcriptions and extracts speaker metadata directly into their content workflow.
Performs rapid audits of meeting recordings and pulls key summaries using natural language prompts.
What Changes When You Connect
- Get clean text and speaker labels immediately. When you run
transcribe_audio, your agent gets structured output, not raw data. This lets you know who said what, which is key for podcast metadata. - Structure text for analysis. Instead of one big block of text, use
get_transcript_sentencesorget_transcript_paragraphsto break the transcript down. This makes it easy to query specific segments. - Verify data accuracy. Every transcript has a confidence score. Use
get_transcriptto check this score and verify the reliability of the linguistic data before publishing anything. - Manage everything in one place. Use
list_transcriptsto see every job, active or complete. This gives you strict control over your entire audio asset history. - Monitor job status. For long audio files, your agent doesn't just wait. It monitors the job progress, so you know exactly when the data is ready.
Real-World Use Cases
Analyzing a full podcast series
A content creator has 20 podcast episodes. Instead of manually listening to every minute, they ask their agent to run transcribe_audio on all the URLs. The agent processes the jobs, then uses list_transcripts to confirm completion. Finally, it can pull speaker labels to populate their CMS.
Auditing a meeting transcript for key points
An operations lead needs to find every mention of 'Q3 budget' from a 90-minute meeting recording. They use transcribe_audio to process the recording. Once the transcript is ready, they use get_transcript_sentences to break the text down, allowing the agent to filter for specific keywords within structured segments.
Developing a knowledge base from lectures
A data analyst records a university lecture and feeds the audio to the agent. The agent runs transcribe_audio. The analyst then uses get_transcript_paragraphs to pull structured chunks of text. This organized output allows them to feed the content into a knowledge base for later retrieval.
Cleaning up large batches of recordings
A team needs to process 50 hours of historical call recordings. They use transcribe_audio for the batch. They then use list_transcripts to manage the queue, and get_transcript to pull the clean text, confirming the job's status and accuracy for every file.
The Tradeoffs
Treating the transcript as one file
Downloading the full output from get_transcript and searching it with standard text editors. You lose the context of who said what or where the text came from.
→
Always use the specific segmentation tools. To keep things structured, run get_transcript_sentences or get_transcript_paragraphs. This forces the AI client to treat the data as segmented objects, not just a wall of text.
Ignoring job status
Calling get_transcript immediately after running transcribe_audio because you think it's instant. The job hasn't finished, and you get an error or incomplete data.
→
Always check the job status first. Use list_transcripts to track the job ID and confirm the status before attempting to retrieve the data using get_transcript.
Manually deleting records
Running delete_transcript because you think the data is junk. You lose all the history and the audit trail for that specific audio asset.
→
Use list_transcripts first. Review the metadata to ensure the record you want to delete is genuinely unnecessary. The server provides a full history for necessary auditing.
When It Fits, When It Doesn't
Use this server if your primary task is converting raw, multi-source audio (podcasts, meetings, lectures) into structured, searchable text. You need to know who spoke, when they spoke, and the confidence score for the text. Don't use this if you just need to upload a file to a storage bucket. If your goal is simple file transfer, use a dedicated file transfer API. If you need to process audio that isn't speech (e.g., pure music or sound effects), this server won't help. Stick to the flow: transcribe_audio -> list_transcripts -> get_transcript_sentences.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by AssemblyAI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Sifting through hours of raw meeting audio is a massive time sink.
Right now, you record a meeting, download a huge audio file, and then you're stuck. You copy the audio into a transcription service, wait an hour, and then you get one massive text document. You spend the next hour manually reading through the whole thing, looking for the one sentence about the budget or the specific decision point.
With AssemblyAI, your agent handles the whole thing. You send the URL, and the agent gets back clean text, broken down by speaker and sentence. You stop reading and start querying. The output is ready to be indexed and analyzed immediately.
AssemblyAI MCP Server: Get structured text output.
Before, you had to wait for a single, monolithic file download, and you had no way to verify if the transcription was right. You just assumed it was good. You had to manually organize the text into usable chunks.
Now, the agent uses dedicated tools like `get_transcript_sentences` to give you granular control. You get the data segmented by paragraph or sentence, plus confidence scores. The difference is structure. You're done guessing.
Common Questions About AssemblyAI MCP
How do I use the AssemblyAI MCP Server to transcribe an audio file? +
You start by calling transcribe_audio, passing the URL of the audio or video. The agent returns a Job ID, and you then use list_transcripts to monitor the status until the job is complete.
Can I get the transcript broken down by paragraphs using AssemblyAI MCP Server? +
Yes. Use the get_transcript_paragraphs tool. This provides the full transcript content, cleanly separated and structured by paragraph breaks.
What is the best way to check the accuracy of the transcript using AssemblyAI MCP Server? +
Check the confidence scores. The tools provide confidence scores for the transcript. Use this score to verify the data's reliability before relying on it for critical decisions.
How do I manage my past transcription jobs with AssemblyAI MCP Server? +
Use list_transcripts. This tool gives you a complete history of every job, active or completed, allowing you to track and manage your audio assets.
How do I use the list_transcripts tool to check the status of my audio jobs? +
You use list_transcripts to see all active and past jobs. This tool gives you an overview of job IDs and their current status, letting you monitor progress without needing a specific job ID.
What is the purpose of the get_transcript tool with AssemblyAI MCP Server? +
The get_transcript tool retrieves the final, full text result for a specific job ID. You input the ID, and the tool returns the complete, cleaned transcript content.
Can I delete a transcription record using the delete_transcript tool? +
Yes, delete_transcript removes a specific job record entirely. You just need to provide the job ID you want to remove from your history.
Does AssemblyAI MCP Server support multiple types of audio files? +
The server handles various audio and video URLs. You simply pass the URL to transcribe_audio, and the server manages the transcription process regardless of the source media type.
How do I find my AssemblyAI API Key? +
Log in to your AssemblyAI dashboard, and you will find your API Key on the main home page. Copy and paste it below.
What audio formats are supported? +
AssemblyAI supports most common audio and video formats, including MP3, WAV, AAC, MP4, and others. Simply provide a public URL to the file.
Can the agent identify different speakers? +
Yes. When starting a job via transcribe_audio, set the speaker_labels parameter to true. Your agent will return the text categorized by speaker ID.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Hugging Face
Access thousands of pre-trained AI models for NLP, vision, and audio tasks with the largest open-source machine learning hub.
HeyGen
Automate AI video generation via HeyGen — manage avatars, videos, and templates directly from any AI agent.
Supabase Vector
Connect your AI to Supabase Vector. Execute pgvector semantic searches, manage embeddings, and run relational database queries directly from your terminal.
You might also like
Eurostat Trade — EU International Commerce
EU international trade data: imports and exports by partner country and product classification (SITC), industrial production index, retail trade volume, and services sector statistics for all 27 EU member states.
Files.com Alternative
Securely manage Files.com storage, users, and permissions. List folders, move files, and automate workflows through natural language.
Rijksmuseum
Explore the Rijksmuseum's world-class art collection and research library. Search for masterpieces, access high-resolution image metadata via IIIF, and query linked open data.