# AssemblyAI MCP for AI Agents MCP

> AssemblyAI lets your AI client transcribe audio and video files with extreme accuracy, finding more than just words. It automatically identifies who is speaking, summarizes content, analyzes mood, and even chapters out long recordings so you can process complex media instantly.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** speech-to-text, transcription, audio-intelligence, natural-language-processing, speech-recognition, video-analysis

## Description

Stop manually uploading files to web portals or waiting for slow human transcription services. This MCP lets your AI agent take full control of high-fidelity audio intelligence right inside your workflow. You point it at a public video URL, and it handles the heavy lifting.

Your agent can transcribe speech using advanced models that deliver superhuman accuracy. Beyond just text, it automatically figures out who said what by identifying individual speakers. It also pulls out deep insights like automated summaries, topic breakdowns, or even sentiment—telling you if the discussion was positive or negative at specific points in time. When connected via Vinkius, your AI client acts as a dedicated audio engineer and linguistic analyst, making content discovery simple enough to manage right from your conversation.

## Tools

### delete_transcript
Permanently removes a specific transcript record from your directory.

### get_chapters
Retrieves an automated chapter list for media content.

### get_sentiments
Analyzes the emotional tone of a transcript, identifying positive or negative moments.

### get_speakers
Retrieves detailed labels separating and tracking different speakers in a conversation.

### get_summary
Generates an automatic, concise summary of the full transcript content.

### get_topics
Detects and lists the specific themes or topics discussed throughout the audio recording.

### get_transcript
Checks the status of a transcription job or retrieves the completed transcript result.

### list_transcripts
Shows you a list of your most recent and available transcripts for review.

### transcribe_audio_url
Starts the process of transcribing any provided audio link.

## Prompt Examples

**Prompt:** 
```
Transcribe this podcast URL: 'https://example.com/audio.mp3' and identify who speaks when.
```

**Response:** 
```
**Transcription Job Started.**

The MCP has successfully queued the job. The high-fidelity transcript is being generated, including speaker labels.

**Job ID:** `tr_456`
**Status:** Processing (ETA: 2 minutes)

I've set up a monitor on this job and will notify you as soon as it's ready for review.
```

**Prompt:** 
```
Show me my five most recent jobs, and get the summary for 'Client Onboarding Call'.
```

**Response:** 
```
**Recent Transcripts Found:**
*   Team Sync (Completed)
*   Product Interview (Completed)
*   Marketing Strategy (Processing...)

I see the 'Client Onboarding Call' transcript. I'm running `get_summary` now... 

**Summary:** The client confirmed interest in Feature X but needs pricing details for Enterprise Tier Y. They expressed concern about implementation timeframes, which we need to address in the next call.
```

**Prompt:** 
```
What was the general mood of the conversation on this audio? And what were the main topics covered?
```

**Response:** 
```
**Audio Intelligence Report (ID: tr_456):**
*   **Overall Sentiment:** Moderately Positive. Several segments showed high neutrality, especially during technical discussions.
*   **Top Topics Detected:** 1. Implementation Roadmaps; 2. Pricing Models; 3. Feature X Adoption.

I've attached a segment breakdown showing where the sentiment dipped—it happened when discussing integration costs.
```

## Capabilities

### Transcribe Audio/Video URLs
Sends an external link to the MCP and receives a highly accurate transcript of all spoken content.

### Determine Speakers and Dialogue
Separates the transcription into distinct segments, labeling exactly which speaker spoke at any given moment.

### Generate Automated Summaries
Creates concise summaries of long recordings, giving you the key takeaways without reading through every word.

### Analyze Sentiment and Topics
Pulls out high-level insights by detecting overall mood (sentiment) or specific themes (topics) within the speech.

### Map Content Chapters
Creates an automated chapter breakdown of media, helping you navigate long videos or podcasts instantly.

## Use Cases

### Analyzing Customer Support Calls
A support manager asks their agent to process a week's worth of call recordings. The agent uses the MCP to automatically generate summaries, analyze sentiment for high-risk calls using `get_sentiments`, and pull out key topics via `get_topics` to identify training needs.

### Creating Podcast Show Notes
A content creator uploads a raw interview video URL. The agent first runs the transcription (`transcribe_audio_url`), then uses `get_speakers` and `get_chapters` to build detailed, multi-section show notes ready for immediate publishing.

### Indexing Legal Interviews
A paralegal needs quick insights from dozens of recorded interviews. They run the MCP to get speaker labels (`get_speakers`) and use `get_summary` on each transcript, ensuring every key point is captured for litigation support.

### Monitoring Webinars
A sales team needs a quick recap of a webinar. The agent runs the MCP to detect topics (`get_topics`) and get an overall sentiment report on the Q&A session, providing immediate follow-up talking points.

## Benefits

- Automate content discovery: Instead of manually reading minutes of meeting notes, you can use the `get_summary` tool to get immediate, high-fidelity executive reports.
- Identify speaker contributions: The ability to label speakers via `get_speakers` ensures that every person's contribution is perfectly documented for meeting minutes and interviews.
- Mine emotional intelligence: Use `get_sentiments` to analyze customer feedback or sales calls, pinpointing exactly when the mood shifted from positive to concerned.
- Structure your media library: The automated chapters provided by `get_chapters` let you treat massive video files like perfectly indexed articles, improving content accessibility.
- Full control over history: You can use `list_transcripts` and then manage specific jobs with `delete_transcript`, keeping your records clean and organized.

## How It Works

The bottom line is that you just pass an audio link to your AI client, and all the intelligence comes back formatted for immediate use.

1. You subscribe to this MCP on Vinkius and retrieve your API key.
2. Your AI client sends the external audio or video URL and specifies what insight you need (e.g., 'Summarize and detect sentiment').
3. The MCP processes the data, returning the requested structured insights—be it a transcript, summary, or speaker list—directly to your agent.

## Frequently Asked Questions

**How do I use AssemblyAI MCP to transcribe video files from YouTube or Vimeo?**
You simply pass the public URL of the video to your AI agent. The MCP handles the streaming and transcription process, returning a full transcript that you can then analyze for summaries or topics.

**Can AssemblyAI MCP tell me who said what in an interview recording?**
Yes, it uses speaker diarization to label every utterance. You get detailed segments showing exactly which person spoke when, making meeting minutes accurate and easy to write up.

**What if I need the transcript for multiple recordings? Is there a way to process them all?**
Your agent can use the `list_transcripts` tool to see everything you've processed. From there, you can run analysis tools like getting summaries or topics on several jobs in sequence.

**Is AssemblyAI MCP better than just using a simple text-to-speech service?**
Yes, because it doesn't just transcribe; it analyzes the content. It pulls out insights like sentiment and topics, giving you deep context that basic transcription services miss completely.

**Can I use AssemblyAI MCP to organize my media library with chapters?**
Absolutely. The tool can automatically detect natural breaks in the audio or video and generate chapter markers (`get_chapters`), so you never lose your place when reviewing long content.