# Gladia Speech AI MCP

> Gladia Speech AI provides enterprise-grade speech recognition and analysis, turning any audio or video stream into actionable data. This MCP handles everything from basic transcription to complex tasks like speaker diarization, multi-language translation across 100+ languages, and applying custom large language model prompts directly to the spoken content. It supports processing pre-recorded files via uploads and managing secure WebSocket connections for real-time live streaming.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** speech-to-text, transcription, audio-analysis, speaker-diarization, translation, natural-language-processing

## Description

You can feed any audio source—a podcast recording, a length meeting call, or even a live broadcast—into this MCP and get structured text back out. Forget listening to hours of raw audio just to find three action items; your agent handles the heavy lifting. It doesn't just transcribe what was said; it figures out who spoke each line, translates segments into dozens of languages, and can summarize the entire discussion based on specific prompts you give it. When you connect this MCP through Vinkius, your AI client treats it like a natural extension of conversation. Instead of juggling separate services for file uploads, job status checks, and final analysis, you ask one question, and the system executes the entire workflow, delivering clean, ready-to-use text data.

## Tools

### delete_transcription
Removes a specific transcription job from your Gladia account.

### upload_audio_file
Transfers an audio file to the platform so you can begin processing it.

### get_transcription
Checks the current status and retrieves the final text results for a known job ID.

### list_transcriptions
Retrieves a list of all previously run, pre-recorded transcription jobs.

### init_live_session
Starts and maintains a secure link for real-time transcription during live broadcasts or meetings.

### init_transcription
Begins the processing job for an uploaded audio file to generate a transcript.

## Prompt Examples

**Prompt:** 
```
List my 5 most recent transcription jobs.
```

**Response:** 
```
I've retrieved your recent jobs. You have 5 tasks: 'Meeting_Notes.mp3' (Done), 'Interview_01.wav' (Done), and 3 others. Would you like the results for any of these?
```

**Prompt:** 
```
Start a transcription for this audio URL with summarization enabled: https://example.com/audio.mp3
```

**Response:** 
```
Transcription job initiated! The Job ID is `job_12345`. I've enabled summarization as requested. I'll monitor the status for you.
```

**Prompt:** 
```
I need a WebSocket URL to start a live transcription session in 16000Hz.
```

**Response:** 
```
I've generated a live session. Here is your secure WebSocket URL: `wss://api.gladia.io/v2/live/...`. The sample rate is set to 16000Hz.
```

## Capabilities

### Process uploaded audio files
Upload an audio file to start a secure job that transcribes and analyzes the spoken content.

### Manage live streaming sessions
Initialize continuous, real-time transcription streams for ongoing meetings or broadcasts over WebSocket connections.

### Extract specific data from audio
Apply custom prompts to the transcribed text to pull out structured insights, like names, dates, and action items.

### Handle job status tracking
Check the progress or retrieve the final results of any transcription job you've started.

## Use Cases

### Cleaning up a recorded client interview
A marketing manager needs to analyze an hour-long Zoom call. Instead of manually listening for key quotes, they ask their agent to use `upload_audio_file` and run the transcription with diarization. The resulting text immediately tells them which speaker said what, making follow-up action items easy.

### Covering a live panel discussion
A journalist needs real-time notes from a conference panel. They connect their agent to the MCP using `init_live_session`. The transcript streams in instantly, allowing them to capture quotes and speaker shifts without missing a beat.

### Processing international podcast archives
A global content team has recorded interviews in six different languages. They use the MCP's advanced transcription features to upload files, enabling simultaneous translation and summarization for all regional markets.

### Debugging a failed audio job
An engineer uploaded an audio file but isn't sure if the job finished correctly. They use `list_transcriptions` first to find the Job ID, then call `get_transcription` to confirm the status and retrieve any error logs.

## Benefits

- Instead of manually transcribing hours of video, simply use `init_transcription` to upload a file and get the full text transcript ready for editing. The system handles speaker diarization automatically.
- For real-time work, you can initialize secure WebSocket connections using `init_live_session`. This means your agent transcribes meetings as they happen, eliminating transcription lag.
- The analysis goes way beyond simple spelling out words. You apply custom LLM prompts to the audio data to extract specific insights or structure unstructured notes into JSON format.
- If you need to know what jobs are running or finished, use `list_transcriptions` to pull up a history of all your work in one query. Then, check the results with `get_transcription`.
- The multi-language support is massive; you can initiate transcription and translation across over 100 languages, making global content creation straightforward.

## How It Works

The bottom line is that it converts raw, unstructured audio into clean, actionable data points without you needing to manage complex API calls.

1. Subscribe to this MCP and enter your Gladia API key.
2. Instruct your AI client to either upload a file for batch processing or start an initial live session link.
3. The system runs the job, and you retrieve the status and final text results directly through conversation.

## Frequently Asked Questions

**How do I transcribe a live meeting with Gladia Speech AI MCP?**
You initiate a real-time session by calling `init_live_session`. This creates a secure WebSocket link that streams the transcription output to your agent as the meeting happens.

**Can I translate audio using Gladia Speech AI MCP?**
Yes. The MCP supports multi-language translation. You can run a job and specify both the source language and the target language for the output text.

**What is speaker diarization with Gladia Speech AI MCP?**
Speaker diarization identifies who spoke what during the audio session. The resulting transcript will tag lines to specific speakers, making it easy to track contributions in a meeting.

**How do I check if my transcription job finished using Gladia Speech AI MCP?**
After starting a job with `init_transcription`, you use the `get_transcription` tool, providing the Job ID. This will tell you the status and provide the final results when ready.

**Does Gladia Speech AI MCP support video files?**
While it processes audio content, you must extract the audio stream first. The MCP is designed to handle the resulting audio files for transcription and analysis.