AssemblyAI MCP. Transcribing any audio or video file into structured data.

Q: How do I transpire an audio file using the transcribeaudiourl tool?

You call transcribeaudiourl and pass it a public URL to your MP3 or video. This starts the job, giving you a job ID that you can then use with gettranscript to track its status.

Q: What is the difference between getsummary and listtranscripts?

listtranscripts shows you which files you've processed recently. getsummary takes a specific, completed transcript (by ID) and pulls out its concise, automated summary.

Q: Can I get sentiment analysis on my own custom audio file?

Yes, after transcribing the file using transcribeaudiourl, you can pass the resulting transcript ID to getsentiments to analyze the overall mood and tone of the speech.

Q: Which tool do I use if I need to find all the major topics?

You use the gettopics tool. This runs topic detection on a finished transcript, providing a list of structured subjects that were covered in the media.

Q: How do I check the processing status of a transcription using the gettranscript tool?

You use gettranscript to monitor job progress. This confirms if your audio job succeeded, failed, or is still actively processing. It's essential for building reliable workflows.

Q: How do I remove old recordings using the deletetranscript tool?

You call deletetranscript to permanently remove job results. This is necessary for data privacy or compliance requirements, ensuring the transcript is fully removed from your account.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

AssemblyAI connects audio and video files directly to your agent for high-fidelity transcription. It goes far beyond basic captions by automatically detecting who spoke what, generating chapter markers, summarizing key points, and running deep sentiment analysis on the spoken content.

You get structured data and actionable insights without lifting a finger.

What your AI agents can do

Delete transcript

Removes a specific transcript record from your account history.

Get chapters

Retrieves automated chapter markers and timestamps for the content.

Get sentiments

Generates an analysis showing the overall emotional tone of the recorded speech.

+ 6 more capabilities included

Transcribe Media Files

Processes public audio or video URLs, converting spoken content into accurate, written transcripts.

Identify Speakers

Separates the transcript by speaker labels, allowing you to perfectly coordinate meeting minutes and interview records.

Extract Core Insights

Generates automated summaries of long recordings and detects major topics discussed in the audio.

Analyze Emotion

Runs sentiment analysis on the content, showing whether the conversation was generally positive, negative, or neutral.

Create Timed Chapters

Generates automated chapters and high-fidelity video recaps with timestamps for easy content navigation.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

OAuth 2.0 Compatible

Claude

ChatGPT

Cursor

Gemini

VS Code

JetBrains

Vercel

Zendesk

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

AssemblyAI: 9 Tools for Speech Intelligence

These tools let you manage the entire lifecycle of spoken content, from initial transcription to deep analytical insight extraction.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using AssemblyAI on Vinkius

delete019dd0bd

delete transcript

Removes a specific transcript record from your account history.

get019dd0bd

get chapters

Retrieves automated chapter markers and timestamps for the content.

get019dd0bd

get sentiments

Generates an analysis showing the overall emotional tone of the recorded speech.

get019dd0bd

get speakers

Retrieves detailed labels that show exactly which person spoke each segment of the audio.

get019dd0bd

get summary

Pulls together a concise, automated summary of the entire transcript content.

get019dd0bd

get topics

Detects and lists the major subjects or topics that were discussed in the audio.

get019dd0bd

get transcript

Checks the current status of a transcription job or retrieves the final result once processing is done.

list019dd0bd

list transcripts

Shows you a list of your most recently processed and saved transcripts.

transcribe019dd0bd

transcribe audio url

Starts the process of converting an audio or video URL into a full transcript.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with AssemblyAI, then connect any of our 4,800+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,800+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by AssemblyAI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 9 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

The tedious process of manually synthesizing meeting notes and call logs is brutal.

Right now, if you're analyzing a batch of customer calls or a long internal sync, you have to download the files, upload them somewhere, wait for multiple services to run, then copy-paste the raw text into a spreadsheet. You spend hours just cleaning up formatting, identifying who spoke when, and trying to pull out the core decision points from hundreds of pages of transcript.

With this MCP, you simply hand over the source media URL to your agent. Your AI client handles all that messy work behind the scenes—transcribing, labeling speakers, summarizing, and finding topics—and presents it back as clean, structured data ready for immediate use.

Get Speakers: Know Exactly Who Said What

Without this capability, every transcript is a single narrative stream. You lose the context of who agreed to what, or which speaker raised which concern. If you're reviewing interview data, that lack of speaker differentiation makes minutes useless.

The MCP fixes this by providing detailed utterances separated by labeled speakers. It’s not just text; it's a perfectly coordinated record of conversation flow. This changes how you structure meeting follow-ups entirely.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What you can do with this MCP connector

Imagine dropping a two-hour meeting recording into your agent's workflow. Instead of getting a massive block of raw text you have to read through, your AI pulls out everything you need right away. This MCP handles the whole process: first, it transcribes the audio with superhuman accuracy; second, it identifies every speaker so you know who said what; and third, it runs analysis on that speech.

You get automated summaries detailing the core topics discussed, a sentiment breakdown showing when things got tense or positive, and even timestamps marking natural chapter breaks. The whole process happens through your agent's conversation flow. This level of detailed audio intelligence makes gathering context effortless, whether you’re analyzing customer calls or editing podcast material.

Accessing this power means having the entire Vinkius MCP catalog available to run complex data pipelines right from where you already work.

Built · Hosted · Managed by Vinkius AssemblyAI MCP - Transcribe Audio & Video Data Server ID 019dd0bd-7422-70bf-9913-8537412614bd

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

You get perfect speaker separation. Instead of a single block of text, the system tags every utterance with who said it, making meeting minutes immediately useful.
Stop manually reading through hours of recordings. The get_summary tool pulls out the key takeaways and main action items in seconds.
Analyze customer calls for mood shifts. Running get_sentiments gives you a clear picture of whether customers are happy or frustrated at specific moments, not just overall.
Organize huge media libraries instantly. The get_chapters tool automatically marks natural breaks and sections, turning raw video into navigable content.
You save time managing jobs. Use list_transcripts to see your recent work history and get_transcript to check if a large file is ready without guessing.

Real-World Use Cases

Post-Meeting Minutes

A project manager needs minutes from a 90-minute sync call. They feed the audio URL into their agent, which uses get_speakers and get_summary. The output is an instant document detailing who said what, followed by bulleted key decisions—no manual note-taking needed.

Podcast Repurposing

A content creator records a long interview. They use the MCP to get the raw transcript via transcribe_audio_url, then run get_chapters and get_topics. This lets them quickly generate multiple short social media clips, each focused on a distinct topic.

Sales Call Review

A sales team lead needs to audit 50 recorded calls. They use the MCP's intelligence to run get_sentiments and get_topics. The agent returns a report showing which topics correlate with negative sentiment, pinpointing training gaps.

Research Data Compilation

A researcher has multiple interviews. They use the MCP to transcribe them all, then run get_speakers on each one and list_transcripts to manage the batch. This creates a highly organized dataset ready for deep analysis.

The Tradeoffs

Treating it like simple file conversion

Thinking that just uploading an MP3 means you get usable data.

→ Don't just transcribe the audio. After running transcribe_audio_url, immediately call get_speakers and get_summary. This ensures you get structured intelligence, not just text.

Ignoring job status

Sending a massive 4-hour file to your agent and forgetting to check if it actually finished processing.

→ Always follow up the initial transcription request with get_transcript. This verifies the job status before you try to pull any derived analysis like get_sentiments.

Over-relying on raw text

Using only the transcript output without extracting meaning.

→ The real value is in the metadata. Use the MCP to get get_topics, or run get_sentiments alongside the transcript. This turns mere words into actionable data points.

When It Fits, When It Doesn't

Use this if your primary goal is turning unstructured, spoken media (audio/video) into structured, analyzed text and metadata. You need to know who said it (get_speakers), what they felt about it (get_sentiments), or what the conversation was actually about (get_topics). Don't use this if you just need simple captioning; those basic tools handle that. Also, don't rely on this for transcription writing—it’s a data input tool, not a creative writing assistant. If your task is purely text-to-text (e.g., rewriting an essay), look at general language models instead.

Common Questions About AssemblyAI MCP

How do I transpire an audio file using the transcribe_audio_url tool? +

You call transcribe_audio_url and pass it a public URL to your MP3 or video. This starts the job, giving you a job ID that you can then use with get_transcript to track its status.

What is the difference between get_summary and list_transcripts? +

list_transcripts shows you which files you've processed recently. get_summary takes a specific, completed transcript (by ID) and pulls out its concise, automated summary.

Can I get sentiment analysis on my own custom audio file? +

Yes, after transcribing the file using transcribe_audio_url, you can pass the resulting transcript ID to get_sentiments to analyze the overall mood and tone of the speech.

Which tool do I use if I need to find all the major topics? +

You use the get_topics tool. This runs topic detection on a finished transcript, providing a list of structured subjects that were covered in the media.

How do I check the processing status of a transcription using the get_transcript tool? +

You use get_transcript to monitor job progress. This confirms if your audio job succeeded, failed, or is still actively processing. It's essential for building reliable workflows.

What specific speaker labels does the get_speakers tool provide? +

The tool provides detailed labels identifying who spoke when. It separates utterances and assigns a unique label to each distinct voice or speaker in the recording, making meeting minutes precise.

How do I remove old recordings using the delete_transcript tool? +

You call delete_transcript to permanently remove job results. This is necessary for data privacy or compliance requirements, ensuring the transcript is fully removed from your account.

What information does the get_chapters tool extract from an audio file? +

The tool pulls out automated chapter markers and timestamps. It structures your media library by pinpointing specific sections of a long recording, which is perfect for quick content navigation.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript