# Rev.ai MCP

> Rev.ai handles high-accuracy speech-to-text and full media transcription. Submit audio or video files via your AI client to start a job. It then returns not just text, but also structured data: captions (SRT/VTT), topic scores, sentiment analysis, and concise summaries. This tool manages the entire process from file submission through deep, multi-layered analysis.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** transcription, speech-to-text, audio-processing, ai-summaries, captions

## Description

You submit a media file URL—audio or video—to **start transcription jobs** using `submit_stt_job`. This kicks off an asynchronous process, giving you a Job ID. You can track its progress and get detailed status updates for any job with that ID by calling `get_stt_job` and keep an eye on all your work history by reviewing the list of recent jobs through `list_stt_jobs`. 

When the processing is done, you've got several ways to pull out usable data. You can get the full written content using `get_transcript`, which returns the text in either JSON or plain format. If you need synchronized captions for a video, `get_captions` pulls those files as standard SRT or VTT formats.

For deeper analysis, your agent runs several NLP tasks. To see what people were talking about, you can pull a list of key themes and their importance scores using `get_topic_extraction_result`. You'll also get an emotional read on the speech by calling `get_sentiment_analysis_result`, which returns a score showing whether the tone was positive, negative, or neutral. If the transcript is long, don't waste time reading it all; use `get_transcript_summary` to condense the main points into a quick overview.

You can also get highly technical data. To perform forced alignment and get precise word timings for every syllable spoken, submit both the audio and text to `submit_alignment_job`, then grab those timestamps with `get_alignment_result`. If you're unsure what language was recorded, process the file URL using `submit_language_id_job` and check the result with `get_language_id_result` to identify the primary spoken tongue.

Before you start any job, you might need better accuracy. You can improve recognition for specific industry jargon by submitting custom phrases via `submit_vocabulary`. You'll manage your custom terms history using `list_vocabularies`, and check if a phrase was accepted with `get_vocabulary`.

For extra control over the process, you submit a transcript text to run sentiment analysis on that text specifically using `submit_sentiment_analysis_job`, or identify topics using `submit_topic_extraction_job`. You also kick off the entire job by submitting a raw URL with `submit_stt_job`.

When everything's done, and you're done with the data, you can clean up your account. Use `delete_stt_job` to permanently wipe out any transcription job's associated data, or if you need to clear out a custom vocabulary set, use `delete_vocabulary`. This server handles every step: from submitting media files and getting basic text, to pulling structured captions, generating summaries, detecting sentiment shifts, extracting key topics, and even aligning word timings.

## Tools

### delete_stt_job
Permanently removes the data associated with a transcription job that is already complete or failed.

### delete_vocabulary
Removes a previously submitted custom vocabulary set from your profile.

### get_alignment_result
Returns precise timestamps for every word in the audio, useful for forced alignment tasks.

### get_captions
Pulls synchronized caption files (SRT or VTT) from a finished job's media content.

### get_language_id_result
Identifies the primary language spoken in an audio file and provides confidence scores for top languages.

### get_sentiment_analysis_result
Returns a score detailing the emotional tone (positive, negative) of the speech within the transcript.

### get_stt_job
Checks and retrieves the current status and detailed information for any submitted transcription job ID.

### get_topic_extraction_result
Pulls a list of key topics identified in a transcript, along with their relative importance scores.

### get_transcript_summary
Generates and returns a condensed summary of the main points from a finished transcript job.

### get_transcript
Retrieves the full written text for a completed job; you can request JSON or plain text format.

### get_vocabulary
Checks the processing status of custom vocabulary phrases you recently submitted.

### list_stt_jobs
Retrieves a list of all your transcription jobs that occurred within the last 30 days.

### list_vocabularies
Shows you a history of custom vocabularies you've submitted to improve accuracy.

### submit_alignment_job
Submits both audio and the transcript text to perform forced alignment, adding word-level timings.

### submit_language_id_job
Processes an audio URL or file to determine what language is being spoken.

### submit_sentiment_analysis_job
Submits a transcript text specifically for emotional tone analysis and scoring.

### submit_stt_job
The main action: submits an audio or video file URL to start the asynchronous transcription process.

### submit_topic_extraction_job
Submits a transcript text specifically for identifying and scoring its key discussion topics.

### submit_vocabulary
Processes new custom phrases or jargon you want the engine to recognize during transcription.

## Prompt Examples

**Prompt:** 
```
Transcribe this podcast episode: https://example.com/audio.mp3
```

**Response:** 
```
I've submitted the transcription job for your audio file. The Job ID is `abc-123`. I'll monitor the status for you.
```

**Prompt:** 
```
Give me a summary of the transcript for job ID abc-123.
```

**Response:** 
```
Here is the summary for job `abc-123`: The discussion focused on the new API deployment strategy and the timeline for the Q3 release.
```

**Prompt:** 
```
List my transcription jobs from the last month.
```

**Response:** 
```
I found 3 recent jobs: `abc-123` (transcribed), `def-456` (in_progress), and `ghi-789` (failed). Which one would you like to check?
```

## Capabilities

### Start transcription jobs
Submit a media file URL to begin asynchronous speech-to-text processing.

### Get the raw transcript text
Retrieve the full written content of a completed job, formatted as JSON or plain text.

### Generate video captions
Extract synchronized caption files (SRT/VTT) for visual media based on a finished transcription job.

### Analyze sentiment
Run the transcript through NLP to get scores indicating positive, negative, or neutral tone shifts in the speech.

### Extract key topics and themes
Identify the main subjects discussed in a transcript, returning topic names along with their statistical relevance/score.

### Create summaries of long audio
Condense lengthy transcripts into short, focused summaries, saving manual reading time.

## Use Cases

### Analyzing multiple interview transcripts.
A researcher records 15 hours of interviews. Instead of manual coding, they run all files through `submit_stt_job`. Once transcribed, they use `get_topic_extraction_result` and `get_sentiment_analysis_result` across the entire dataset to quantify patterns in mood and discussion points.

### Creating blog posts from podcasts.
A podcast editor uploads a raw audio file using `submit_stt_job`. After completion, they retrieve the text with `get_transcript` and then call `get_captions` to grab clean SRT files for YouTube. Finally, they use `get_topic_extraction_result` to build out SEO keywords.

### Quickly assessing meeting takeaways.
A project manager records a long status call and submits it via `submit_stt_job`. Instead of reading the transcript, they immediately use `get_transcript_summary` to get the core decisions and action items for stakeholders.

### Debugging audio files with jargon.
A developer needs to process recordings containing proprietary medical terms. They first run through `submit_vocabulary`, submit those terms, wait for status via `get_vocabulary`, and only then use `submit_stt_job` to guarantee high-fidelity results.

## Benefits

- Stop wasting time reading raw text. Use `get_transcript_summary` to get a bullet-point summary of long meetings in seconds, instead of writing them up manually.
- Improve accuracy for niche terms instantly. Before uploading anything, use `submit_vocabulary` so the engine correctly identifies client names or technical jargon that standard models miss.
- Get more than just text. After transcription, call `get_topic_extraction_result` to get a structured list of key themes and how important they were, which is way better for content strategy.
- Manage your media output perfectly. Use `get_captions` immediately after transcribing video to pull out SRT or VTT files—perfect for subtitling without extra steps.
- Know the tone of every word. Run a transcript through `get_sentiment_analysis_result`. You instantly see where the discussion got emotional or negative, which is crucial for market research reports.

## How It Works

The bottom line is you submit a file, wait for the job status, and then run specific analytical calls against that completed job ID.

1. Submit the media URL using `submit_stt_job`. You'll get a Job ID back.
2. Wait for job completion. Check status with `get_stt_job` until it's 'completed'.
3. Call analysis tools (e.g., `get_transcript`, `get_topic_extraction_result`) using the Job ID to pull structured data.

## Frequently Asked Questions

**How do I improve accuracy with submit_stt_job?**
You use `submit_vocabulary` first. You feed the engine a list of your specific technical terms or names, wait for its status using `get_vocabulary`, and then you run the main job via `submit_stt_job`. This tells the model what to expect.

**Which tool do I use to get captions?**
You use `get_captions` after a job is done. It pulls out synchronized caption files like SRT or VTT, which are perfect for video platforms and don't require you to process the raw text at all.

**Is there one tool to get summary, topics, and sentiment?**
No. You need a pipeline. After submitting the job with `submit_stt_job` and getting the Job ID, you must call `get_transcript_summary`, `get_topic_extraction_result`, *and* `get_sentiment_analysis_result` separately using that same ID.

**What if I want to know what language was spoken?**
Use the dedicated tool, `submit_language_id_job`. It processes an audio file or URL and gives you a confidence score for the top language identified in the recording.

**What is the function of `delete_stt_job`?**
The `delete_stt_job` tool permanently removes job data. You use this when you need to clear records, but it only works for jobs that are already completed or have failed status.

**How can I review my past transcription work using the `list_stt_jobs` tool?**
The `list_stt_jobs` tool gives you a list of all your transcription jobs from the last 30 days. This lets you quickly check job IDs, statuses (like 'in progress' or 'failed'), and pick up where you left off.

**If I need time-stamped accuracy, how do I use `submit_alignment_job`?**
You submit audio and a transcript to the alignment job. This process forces alignment, letting you get precise timestamps for every word spoken in the media file.

**After submitting custom terms with `submit_vocabulary`, how do I check its status using `get_vocabulary`?**
You call `get_vocabulary` to track if your custom vocabulary is ready for use. It checks the processing status of the phrases you submitted, confirming when they are available in the transcription models.

**How can I check if my transcription job is finished?**
Use the `get_stt_job` tool with your Job ID. It will return the current status, such as 'in_progress', 'transcribed', or 'failed'.

**Can I get subtitles for my video files?**
Yes! Once a job is 'transcribed', use the `get_captions` tool. You can specify the format as either 'srt' or 'vtt'.

**How do I improve accuracy for industry-specific jargon?**
You can use the `submit_vocabulary` tool to provide a list of custom phrases. This helps the AI recognize technical terms and unique names more accurately.