# AssemblyAI MCP for AI Agents MCP

> AssemblyAI provides a complete audio intelligence workflow for AI agents. It transcribes spoken content from any URL, giving you structured text that includes speaker labels and confidence scores. Manage job history, audit transcripts by sentence or paragraph, and ensure your audio data is always searchable and ready for analysis.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** speech-to-text, transcription, audio-processing, natural-language-processing, ai-models

## Description

Connecting AssemblyAI to your agent transforms complex audio processing into a natural conversation. Instead of manually uploading files and waiting on web consoles, your agent handles the entire transcription process automatically. It starts jobs from any URL, retrieves clean text with speaker separation, and provides detailed audits on everything said. You can get transcripts broken down by sentences or paragraphs for structured data modeling, and even check confidence scores to verify accuracy. This level of audio intelligence management is available through Vinkius, the leading catalog of MCPs, allowing your agent to handle all media processing tasks without you ever needing technical access.

Whether you're monitoring a series of podcast episodes or transcribing lengthy meeting recordings, your agent acts as a real-time linguistic assistant. It monitors job status and maintains a full history of transcripts, keeping your audio assets organized and instantly searchable.

## Tools

### delete_transcript
Removes a specified transcription record from the system's history.

### get_transcript_paragraphs
Retrieves the full transcript text broken down into logical paragraphs.

### get_transcript_sentences
Gets the transcribed content segmented and formatted by individual sentences.

### get_transcript
Retrieves the final, processed text result of a completed transcription job.

### list_transcripts
Lists all past and currently active transcription jobs in your account history.

### transcribe_audio
Starts a new transcription job using any provided audio or video URL.

## Prompt Examples

**Prompt:** 
```
Can you transcribe this podcast episode from YouTube and give me key quotes with who said them?
```

**Response:** 
```
**Podcast Transcript Audit**

*Job ID: transcript-923*
*Status: Complete (Confidence Score: 98%)*

**Key Insights:**

1. **Speaker A:** "The biggest shift this year is multimodal AI." (Confidence: 95%)
2. **Speaker B:** "It's not about the model; it's about the prompt engineering."
3. **Speaker A:** "Exactly. That’s where we need to focus our resources next quarter." (Confidence: 97%)
```

**Prompt:** 
```
List my last five transcript jobs and tell me which ones I need to clean up.
```

**Response:** 
```
**Transcript History Report**

| Job ID | Source Type | Status | Date Completed |
| :---: | :---: | :---: | :---: |
| `t-102` | Meeting Video | Complete | 2024-06-18 |
| `t-923` | Podcast Audio | Complete | 2024-06-17 |
| `t-55a` | Webinar Recording | Failed | N/A |

The 'Webinar Recording' job is marked as failed. I recommend deleting it using the delete transcript tool to keep your list clean.
```

**Prompt:** 
```
I need the transcript from that meeting broken down by paragraphs, please. I want to analyze the flow.
```

**Response:** 
```
**Paragraph Breakdown for Meeting ID 501**

*The project scope needs immediate review.*

The initial requirements are solid, but we missed a critical detail regarding international compliance.

*Next steps involve drafting three separate proposals.* We need one focused on the EU market and two others covering APAC regions. This requires more detailed input from legal.
```

## Capabilities

### Start Transcription Jobs
The MCP begins the process by taking an audio or video URL to initiate a new transcription job.

### Retrieve Full Transcript Results
It fetches the complete written text, including speaker labels and confidence scores for every segment of speech.

### Structure Text Data
The agent can break down the raw transcript into discrete paragraphs or individual sentences for precise data handling.

### Monitor Job Status and History
You can list all past and active jobs, checking progress to ensure timely delivery of your audio content.

### Delete Records
The MCP allows you to delete specific transcript records when they are no longer needed.

## Use Cases

### Analyzing Podcast Content for Show Notes
A content manager needs to turn a 45-minute podcast episode into searchable show notes. The agent uses `transcribe_audio` on the URL, then calls `get_transcript_paragraphs` and retrieves speaker labels to draft detailed, structured captions immediately.

### Auditing Legal Meeting Minutes
An operations analyst receives a raw audio recording of a board meeting. They ask their agent to process it and then use the confidence scores from `get_transcript` to quickly flag any segments where transcription accuracy was questionable.

### Indexing Academic Lectures
A data scientist wants to index a series of lectures for later retrieval. They run multiple jobs, using `list_transcripts` to manage the batch, and then use `get_transcript_sentences` to feed the structured data into an external knowledge base.

### Cleaning Up Old Archives
A user realizes they have unnecessary old recordings. They ask their agent to run a cleanup command that calls `list_transcripts` and then systematically uses `delete_transcript` on expired jobs, keeping the system tidy.

## Benefits

- Get structured text output instantly. Use the `get_transcript_paragraphs` tool to break down long scripts into manageable blocks, perfect for database entry.
- Verify data quality immediately. The results include confidence scores, letting you audit every piece of transcribed speech and flag potential errors before publishing.
- Manage your media workflow without clicks. Your agent handles starting jobs (`transcribe_audio`), monitoring progress, and retrieving the final text—all through natural conversation.
- Maintain strict control over assets. Use `list_transcripts` to view a clean history of all job IDs and use `delete_transcript` when cleanup is necessary.
- Extract contextually rich data. By using `get_transcript_sentences`, you can build specific queries, asking your agent about details contained within precise conversational turns.

## How It Works

The bottom line is that your agent manages the entire lifecycle: from starting the audio capture to delivering structured, verified text data without manual intervention.

1. Connect your agent via any compatible client and provide the necessary API key.
2. Tell your agent which audio or video URL needs processing. The MCP starts a transcription job and monitors its progress.
3. Once complete, your agent retrieves the full text data, allowing you to structure it by paragraphs or sentences for immediate use.

## Frequently Asked Questions

**How do I find my AssemblyAI API Key?**
Log in to your [**AssemblyAI dashboard**](https://www.assemblyai.com/app), and you will find your API Key on the main home page. Copy and paste it below.

**What audio formats are supported?**
AssemblyAI supports most common audio and video formats, including MP3, WAV, AAC, MP4, and others. Simply provide a public URL to the file.

**Can the agent identify different speakers?**
Yes. When starting a job via `transcribe_audio`, set the `speaker_labels` parameter to true. Your agent will return the text categorized by speaker ID.