# AssemblyAI MCP MCP

> AssemblyAI connects audio and video files directly to your agent for high-fidelity transcription. It goes far beyond basic captions by automatically detecting who spoke what, generating chapter markers, summarizing key points, and running deep sentiment analysis on the spoken content. You get structured data and actionable insights without lifting a finger.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** speech-to-text, transcription, audio-intelligence, natural-language-processing, speech-recognition, video-analysis

## Description

Imagine dropping a two-hour meeting recording into your agent's workflow. Instead of getting a massive block of raw text you have to read through, your AI pulls out everything you need right away. This MCP handles the whole process: first, it transcribes the audio with superhuman accuracy; second, it identifies every speaker so you know who said what; and third, it runs analysis on that speech. You get automated summaries detailing the core topics discussed, a sentiment breakdown showing when things got tense or positive, and even timestamps marking natural chapter breaks. The whole process happens through your agent's conversation flow. This level of detailed audio intelligence makes gathering context effortless, whether you’re analyzing customer calls or editing podcast material. Accessing this power means having the entire Vinkius MCP catalog available to run complex data pipelines right from where you already work.

## Tools

### delete_transcript
Removes a specific transcript record from your account history.

### get_chapters
Retrieves automated chapter markers and timestamps for the content.

### get_sentiments
Generates an analysis showing the overall emotional tone of the recorded speech.

### get_speakers
Retrieves detailed labels that show exactly which person spoke each segment of the audio.

### get_summary
Pulls together a concise, automated summary of the entire transcript content.

### get_topics
Detects and lists the major subjects or topics that were discussed in the audio.

### get_transcript
Checks the current status of a transcription job or retrieves the final result once processing is done.

### list_transcripts
Shows you a list of your most recently processed and saved transcripts.

### transcribe_audio_url
Starts the process of converting an audio or video URL into a full transcript.

## Prompt Examples

**Prompt:** 
```
Transcribe this podcast URL: 'https://example.com/audio.mp3' and enable speaker diarization.
```

**Response:** 
```
Transcription started! I've submitted the URL to AssemblyAI with speaker labels enabled. The job ID is 'tr_123'. I'll monitor the processing status and notify you as soon as the high-fidelity transcript is ready.
```

**Prompt:** 
```
Show my 5 most recent transcription jobs.
```

**Response:** 
```
I've retrieved your recent jobs. You have 5 completed transcripts, including 'Team Weekly Sync' and 'Product Interview'. Would you like to extract the high-fidelity summary for the sync meeting?
```

**Prompt:** 
```
Get the sentiment analysis for transcript 'tr_123'.
```

**Response:** 
```
Accessing audio intelligence... Transcript tr_123 shows a primarily 'Positive' sentiment with peaks of 'Neutral' during the technical breakdown. I've retrieved the high-fidelity timestamped segments for you. Need more insights?
```

## Capabilities

### Transcribe Media Files
Processes public audio or video URLs, converting spoken content into accurate, written transcripts.

### Identify Speakers
Separates the transcript by speaker labels, allowing you to perfectly coordinate meeting minutes and interview records.

### Extract Core Insights
Generates automated summaries of long recordings and detects major topics discussed in the audio.

### Analyze Emotion
Runs sentiment analysis on the content, showing whether the conversation was generally positive, negative, or neutral.

### Create Timed Chapters
Generates automated chapters and high-fidelity video recaps with timestamps for easy content navigation.

## Use Cases

### Post-Meeting Minutes
A project manager needs minutes from a 90-minute sync call. They feed the audio URL into their agent, which uses `get_speakers` and `get_summary`. The output is an instant document detailing who said what, followed by bulleted key decisions—no manual note-taking needed.

### Podcast Repurposing
A content creator records a long interview. They use the MCP to get the raw transcript via `transcribe_audio_url`, then run `get_chapters` and `get_topics`. This lets them quickly generate multiple short social media clips, each focused on a distinct topic.

### Sales Call Review
A sales team lead needs to audit 50 recorded calls. They use the MCP's intelligence to run `get_sentiments` and `get_topics`. The agent returns a report showing which topics correlate with negative sentiment, pinpointing training gaps.

### Research Data Compilation
A researcher has multiple interviews. They use the MCP to transcribe them all, then run `get_speakers` on each one and `list_transcripts` to manage the batch. This creates a highly organized dataset ready for deep analysis.

## Benefits

- You get perfect speaker separation. Instead of a single block of text, the system tags every utterance with who said it, making meeting minutes immediately useful.
- Stop manually reading through hours of recordings. The `get_summary` tool pulls out the key takeaways and main action items in seconds.
- Analyze customer calls for mood shifts. Running `get_sentiments` gives you a clear picture of whether customers are happy or frustrated at specific moments, not just overall.
- Organize huge media libraries instantly. The `get_chapters` tool automatically marks natural breaks and sections, turning raw video into navigable content.
- You save time managing jobs. Use `list_transcripts` to see your recent work history and `get_transcript` to check if a large file is ready without guessing.

## How It Works

The bottom line is that you hand over a URL, and your AI client returns organized, analyzed intelligence about the speech inside.

1. First, you connect your API Key from the AssemblyAI dashboard to your preferred AI client.
2. Next, you tell your agent which audio or video URL needs processing and what kind of analysis you need (e.g., 'Summarize this meeting and find all negative mentions').
3. Finally, your agent sends the job request and waits for the results, providing structured data like summaries, topic lists, and sentiment scores back to your conversation.

## Frequently Asked Questions

**How do I transpire an audio file using the transcribe_audio_url tool?**
You call `transcribe_audio_url` and pass it a public URL to your MP3 or video. This starts the job, giving you a job ID that you can then use with `get_transcript` to track its status.

**What is the difference between get_summary and list_transcripts?**
`list_transcripts` shows you which files you've processed recently. `get_summary` takes a specific, completed transcript (by ID) and pulls out its concise, automated summary.

**Can I get sentiment analysis on my own custom audio file?**
Yes, after transcribing the file using `transcribe_audio_url`, you can pass the resulting transcript ID to `get_sentiments` to analyze the overall mood and tone of the speech.

**Which tool do I use if I need to find all the major topics?**
You use the `get_topics` tool. This runs topic detection on a finished transcript, providing a list of structured subjects that were covered in the media.

**How do I check the processing status of a transcription using the get_transcript tool?**
You use get_transcript to monitor job progress. This confirms if your audio job succeeded, failed, or is still actively processing. It's essential for building reliable workflows.

**What specific speaker labels does the get_speakers tool provide?**
The tool provides detailed labels identifying who spoke when. It separates utterances and assigns a unique label to each distinct voice or speaker in the recording, making meeting minutes precise.

**How do I remove old recordings using the delete_transcript tool?**
You call delete_transcript to permanently remove job results. This is necessary for data privacy or compliance requirements, ensuring the transcript is fully removed from your account.

**What information does the get_chapters tool extract from an audio file?**
The tool pulls out automated chapter markers and timestamps. It structures your media library by pinpointing specific sections of a long recording, which is perfect for quick content navigation.