Supercharge your AI with Rev.ai. Turns audio and video into structured, analyzed data.

Q: How do I improve accuracy with submitsttjob?

You use submitvocabulary first. You feed the engine a list of your specific technical terms or names, wait for its status using getvocabulary, and then you run the main job via submitsttjob. This tells the model what to expect.

Q: Which tool do I use to get captions?

You use getcaptions after a job is done. It pulls out synchronized caption files like SRT or VTT, which are perfect for video platforms and don't require you to process the raw text at all.

Q: Is there one tool to get summary, topics, and sentiment?

No. You need a pipeline. After submitting the job with submitsttjob and getting the Job ID, you must call gettranscriptsummary, gettopicextractionresult, and getsentimentanalysisresult separately using that same ID.

Q: What if I want to know what language was spoken?

Use the dedicated tool, submitlanguageidjob. It processes an audio file or URL and gives you a confidence score for the top language identified in the recording.

Q: What is the function of deletesttjob?

The deletesttjob tool permanently removes job data. You use this when you need to clear records, but it only works for jobs that are already completed or have failed status.

Q: How can I review my past transcription work using the liststtjobs tool?

The liststtjobs tool gives you a list of all your transcription jobs from the last 30 days. This lets you quickly check job IDs, statuses (like 'in progress' or 'failed'), and pick up where you left off.

Q: After submitting custom terms with submitvocabulary, how do I check its status using getvocabulary?

You call getvocabulary to track if your custom vocabulary is ready for use. It checks the processing status of the phrases you submitted, confirming when they are available in the transcription models.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Rev.ai handles high-accuracy speech-to-text and full media transcription. Submit audio or video files via your AI client to start a job.

It then returns not just text, but also structured data: captions (SRT/VTT), topic scores, sentiment analysis, and concise summaries. This tool manages the entire process from file submission through deep, multi-layered analysis.

What your AI can do

Delete stt job

Permanently removes the data associated with a transcription job that is already complete or failed.

Delete vocabulary

Removes a previously submitted custom vocabulary set from your profile.

Get alignment result

Returns precise timestamps for every word in the audio, useful for forced alignment tasks.

+ 16 more capabilities included

Start transcription jobs

Submit a media file URL to begin asynchronous speech-to-text processing.

Get the raw transcript text

Retrieve the full written content of a completed job, formatted as JSON or plain text.

Generate video captions

Extract synchronized caption files (SRT/VTT) for visual media based on a finished transcription job.

Analyze sentiment

Run the transcript through NLP to get scores indicating positive, negative, or neutral tone shifts in the speech.

Extract key topics and themes

Identify the main subjects discussed in a transcript, returning topic names along with their statistical relevance/score.

Create summaries of long audio

Condense lengthy transcripts into short, focused summaries, saving manual reading time.

Ask an AI about this

Compatible AI Apps

OAuth 2.0 Compatible

Claude

ChatGPT

Cursor

Gemini

VS Code

JetBrains

Vercel

Zendesk

+ any other MCP app

Included with Plan

Waiting for input…

AI Agent

Rev.ai MCP Server: 20 Tools for Media Processing

This server gives your agent tools to manage the entire media workflow: submit jobs, check status, get transcripts, and run deep analysis like sentiment scoring.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Rev.ai on Vinkius

Delete Stt Job

Permanently removes the data associated with a transcription job that is already complete or failed.

Delete Vocabulary

Removes a previously submitted custom vocabulary set from your profile.

Get Alignment Result

Returns precise timestamps for every word in the audio, useful for forced alignment...

Get Captions

Pulls synchronized caption files (SRT or VTT) from a finished job's media content.

Get Language Id Result

Identifies the primary language spoken in an audio file and provides confidence...

Get Sentiment Analysis Result

Returns a score detailing the emotional tone (positive, negative) of the speech within the transcript.

Get Stt Job

Checks and retrieves the current status and detailed information for any submitted transcription job ID.

Get Topic Extraction Result

Pulls a list of key topics identified in a transcript, along with their relative...

Get Transcript Summary

Generates and returns a condensed summary of the main points from a finished...

Get Transcript

Retrieves the full written text for a completed job; you can request JSON or plain...

Get Vocabulary

Checks the processing status of custom vocabulary phrases you recently submitted.

List Stt Jobs

Retrieves a list of all your transcription jobs that occurred within the last 30 days.

List Vocabularies

Shows you a history of custom vocabularies you've submitted to improve accuracy.

Submit Alignment Job

Submits both audio and the transcript text to perform forced alignment, adding...

Submit Language Id Job

Processes an audio URL or file to determine what language is being spoken.

Submit Sentiment Analysis Job

Submits a transcript text specifically for emotional tone analysis and scoring.

Submit Stt Job

The main action: submits an audio or video file URL to start the asynchronous...

Submit Topic Extraction Job

Submits a transcript text specifically for identifying and scoring its key discussion topics.

Submit Vocabulary

Processes new custom phrases or jargon you want the engine to recognize during transcription.

Connect to your AI in seconds. Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Rev.ai integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "revai": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Rev.ai tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"revai": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Rev.ai, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,000+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Rev.ai. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 19 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Manually turning hours of video into usable notes is a nightmare.

Right now, if you get a 90-minute meeting recording, you download the file. Then you either listen to it all, or you use a basic service that gives you messy text—text full of errors and no structure. You end up spending hours copy-pasting, cleaning up filler words, and manually creating bullet points just to make it readable.

With this MCP server, your agent handles the whole thing. Submit the media URL using `submit_stt_job`. When the job finishes, you don't get raw text; you run `get_transcript_summary` right away, and boom—you have a clean summary of the main points instantly.

Rev.ai MCP Server: Structured Data from Speech

Before this server, if you wanted to know if a meeting was positive or negative, you had to read every single word and manually count the complaints versus the praise. You were stuck in the text itself.

Now, after transcription, your agent calls `get_sentiment_analysis_result`. It returns structured scores for the whole job. That's it. No reading required—you just get the data.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

You submit a media file URL—audio or video—to start transcription jobs using submit_stt_job. This kicks off an asynchronous process, giving you a Job ID. You can track its progress and get detailed status updates for any job with that ID by calling get_stt_job and keep an eye on all your work history by reviewing the list of recent jobs through list_stt_jobs.

When the processing is done, you've got several ways to pull out usable data. You can get the full written content using get_transcript, which returns the text in either JSON or plain format. If you need synchronized captions for a video, get_captions pulls those files as standard SRT or VTT formats.

For deeper analysis, your agent runs several NLP tasks. To see what people were talking about, you can pull a list of key themes and their importance scores using get_topic_extraction_result. You'll also get an emotional read on the speech by calling get_sentiment_analysis_result, which returns a score showing whether the tone was positive, negative, or neutral.

If the transcript is long, don't waste time reading it all; use get_transcript_summary to condense the main points into a quick overview.

You can also get highly technical data. To perform forced alignment and get precise word timings for every syllable spoken, submit both the audio and text to submit_alignment_job, then grab those timestamps with get_alignment_result. If you're unsure what language was recorded, process the file URL using submit_language_id_job and check the result with get_language_id_result to identify the primary spoken tongue.

Before you start any job, you might need better accuracy. You can improve recognition for specific industry jargon by submitting custom phrases via submit_vocabulary. You'll manage your custom terms history using list_vocabularies, and check if a phrase was accepted with get_vocabulary.

For extra control over the process, you submit a transcript text to run sentiment analysis on that text specifically using submit_sentiment_analysis_job, or identify topics using submit_topic_extraction_job. You also kick off the entire job by submitting a raw URL with submit_stt_job.

When everything's done, and you're done with the data, you can clean up your account. Use delete_stt_job to permanently wipe out any transcription job's associated data, or if you need to clear out a custom vocabulary set, use delete_vocabulary. This server handles every step: from submitting media files and getting basic text, to pulling structured captions, generating summaries, detecting sentiment shifts, extracting key topics, and even aligning word timings.

Built · Hosted · Managed by Vinkius Rev.ai MCP Server - Speech-to-Text & Analysis

Server ID 019ea603-f523-7257-b567-66388199a0cc

Vinkius Inspector

Compliance Grade A+

Score 98.33/100

Report View Report ↗

Here's how it actually works

The bottom line is you submit a file, wait for the job status, and then run specific analytical calls against that completed job ID.

Submit the media URL using submit_stt_job. You'll get a Job ID back.

Wait for job completion. Check status with get_stt_job until it's 'completed'.

Call analysis tools (e.g., get_transcript, get_topic_extraction_result) using the Job ID to pull structured data.

Who is this actually for?

Content creators who need to turn every podcast or video into blog-ready text. Researchers slogging through hours of interview footage. Developers building media processing tools for clients. Basically, anyone whose job involves converting spoken word into structured data.

Podcast Editor

Submits raw audio files to generate captions and summaries immediately, then uses get_transcript to copy clean text for show notes.

Market Researcher

Runs multiple interviews through the server; it calls get_sentiment_analysis_result and get_topic_extraction_result on each one to quantify qualitative data.

Developer

Integrates the STT pipeline into a larger application, using submit_vocabulary first to ensure accuracy for specific client names or jargon before processing any media.

What Changes When You Connect

Stop wasting time reading raw text. Use get_transcript_summary to get a bullet-point summary of long meetings in seconds, instead of writing them up manually.

Improve accuracy for niche terms instantly. Before uploading anything, use submit_vocabulary so the engine correctly identifies client names or technical jargon that standard models miss.

Get more than just text. After transcription, call get_topic_extraction_result to get a structured list of key themes and how important they were, which is way better for content strategy.

Manage your media output perfectly. Use get_captions immediately after transcribing video to pull out SRT or VTT files—perfect for subtitling without extra steps.

Know the tone of every word. Run a transcript through get_sentiment_analysis_result. You instantly see where the discussion got emotional or negative, which is crucial for market research reports.

See it in action

01 01

Analyzing multiple interview transcripts.

A researcher records 15 hours of interviews. Instead of manual coding, they run all files through submit_stt_job. Once transcribed, they use get_topic_extraction_result and get_sentiment_analysis_result across the entire dataset to quantify patterns in mood and discussion points.

02 02

Creating blog posts from podcasts.

A podcast editor uploads a raw audio file using submit_stt_job. After completion, they retrieve the text with get_transcript and then call get_captions to grab clean SRT files for YouTube. Finally, they use get_topic_extraction_result to build out SEO keywords.

03 03

Quickly assessing meeting takeaways.

A project manager records a long status call and submits it via submit_stt_job. Instead of reading the transcript, they immediately use get_transcript_summary to get the core decisions and action items for stakeholders.

04 04

Debugging audio files with jargon.

A developer needs to process recordings containing proprietary medical terms. They first run through submit_vocabulary, submit those terms, wait for status via get_vocabulary, and only then use submit_stt_job to guarantee high-fidelity results.

The honest tradeoffs

Assuming a single endpoint works

Anti-pattern

Sending the raw audio file directly to one tool expecting it to return all analysis (summary, sentiment, topics). This fails because processing needs distinct steps.

The Fix

You gotta build a pipeline. First, use submit_stt_job. Once that job ID is ready and marked 'completed', then you call get_transcript, followed by get_topic_extraction_result in separate calls.

Forgetting custom vocabulary

Anti-pattern

Transcribing a highly technical interview without using submit_vocabulary. The engine will mishear jargon, making the final text unusable.

The Fix

Always start by identifying key industry terms. Run those through submit_vocabulary first. Then proceed with submit_stt_job to ensure maximum accuracy.

Ignoring job status checks

Anti-pattern

Calling get_transcript immediately after submitting the file, assuming it's ready. You just get an error because the transcription hasn't finished yet.

The Fix

You must check the status first. Use get_stt_job repeatedly until the returned status is 'completed'. Only then do you call data retrieval tools like get_transcript.

When It Fits, When It Doesn't

Use this server if your core need is converting spoken word into highly structured, analyzed text. This isn't just a simple transcription service; it’s an analysis pipeline. You use it when you need to know what was said (text), how the speakers felt (sentiment), and what they were talking about (topics/themes). Don't use this if all you want is a quick, basic audio dump—use a simpler tool instead. If your content has specialized language or jargon, you absolutely must run submit_vocabulary first, because otherwise, the accuracy of every subsequent call (get_transcript, etc.) will be shot. It’s complex because it requires state management across multiple job IDs and steps.

Questions you might have

How do I improve accuracy with submit_stt_job? +

You use submit_vocabulary first. You feed the engine a list of your specific technical terms or names, wait for its status using get_vocabulary, and then you run the main job via submit_stt_job. This tells the model what to expect.

Which tool do I use to get captions? +

You use get_captions after a job is done. It pulls out synchronized caption files like SRT or VTT, which are perfect for video platforms and don't require you to process the raw text at all.

Is there one tool to get summary, topics, and sentiment? +

No. You need a pipeline. After submitting the job with submit_stt_job and getting the Job ID, you must call get_transcript_summary, get_topic_extraction_result, and get_sentiment_analysis_result separately using that same ID.

What if I want to know what language was spoken? +

Use the dedicated tool, submit_language_id_job. It processes an audio file or URL and gives you a confidence score for the top language identified in the recording.

What is the function of `delete_stt_job`? +

The delete_stt_job tool permanently removes job data. You use this when you need to clear records, but it only works for jobs that are already completed or have failed status.

How can I review my past transcription work using the `list_stt_jobs` tool? +

The list_stt_jobs tool gives you a list of all your transcription jobs from the last 30 days. This lets you quickly check job IDs, statuses (like 'in progress' or 'failed'), and pick up where you left off.

If I need time-stamped accuracy, how do I use `submit_alignment_job`? +

You submit audio and a transcript to the alignment job. This process forces alignment, letting you get precise timestamps for every word spoken in the media file.

After submitting custom terms with `submit_vocabulary`, how do I check its status using `get_vocabulary`? +

You call get_vocabulary to track if your custom vocabulary is ready for use. It checks the processing status of the phrases you submitted, confirming when they are available in the transcription models.

How can I check if my transcription job is finished? +

Use the get_stt_job tool with your Job ID. It will return the current status, such as 'in_progress', 'transcribed', or 'failed'.

Can I get subtitles for my video files? +

Yes! Once a job is 'transcribed', use the get_captions tool. You can specify the format as either 'srt' or 'vtt'.

How do I improve accuracy for industry-specific jargon? +

You can use the submit_vocabulary tool to provide a list of custom phrases. This helps the AI recognize technical terms and unique names more accurately.

Connect to your AI in seconds.

Delete stt job

Delete vocabulary

Get alignment result

Rev.ai MCP Server: 20 Tools for Media Processing

Make your AI actually useful.

Delete Stt Job

Delete Vocabulary

Get Alignment Result

Get Captions

Get Language Id Result

Get Sentiment Analysis Result

Get Stt Job

Get Topic Extraction Result

Get Transcript Summary

Get Transcript

Get Vocabulary

List Stt Jobs

List Vocabularies

Submit Alignment Job

Submit Language Id Job

Submit Sentiment Analysis Job

Submit Stt Job

Submit Topic Extraction Job

Submit Vocabulary

Connect to your AI in seconds. Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Manually turning hours of video into usable notes is a nightmare.

Rev.ai MCP Server: Structured Data from Speech

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Analyzing multiple interview transcripts.

Creating blog posts from podcasts.

Quickly assessing meeting takeaways.

Debugging audio files with jargon.

The honest tradeoffs

Assuming a single endpoint works

Forgetting custom vocabulary

Ignoring job status checks

When It Fits, When It Doesn't

Questions you might have