Supercharge your AI with Rev.ai. Turns audio and video into structured, analyzed data.
Works with every AI agent you already use
…and any MCP-compatible client
Connect to your AI in seconds.
Rev.ai handles high-accuracy speech-to-text and full media transcription. Submit audio or video files via your AI client to start a job.
It then returns not just text, but also structured data: captions (SRT/VTT), topic scores, sentiment analysis, and concise summaries. This tool manages the entire process from file submission through deep, multi-layered analysis.
What your AI can do
Delete stt job
Permanently removes the data associated with a transcription job that is already complete or failed.
Delete vocabulary
Removes a previously submitted custom vocabulary set from your profile.
Get alignment result
Returns precise timestamps for every word in the audio, useful for forced alignment tasks.
Submit a media file URL to begin asynchronous speech-to-text processing.
Retrieve the full written content of a completed job, formatted as JSON or plain text.
Extract synchronized caption files (SRT/VTT) for visual media based on a finished transcription job.
Run the transcript through NLP to get scores indicating positive, negative, or neutral tone shifts in the speech.
Identify the main subjects discussed in a transcript, returning topic names along with their statistical relevance/score.
Condense lengthy transcripts into short, focused summaries, saving manual reading time.
Ask an AI about this
Compatible AI Apps
OAuth 2.0 CompatibleWaiting for input…
Rev.ai MCP Server: 20 Tools for Media Processing
This server gives your agent tools to manage the entire media workflow: submit jobs, check status, get transcripts, and run deep analysis like sentiment scoring.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Rev.ai on VinkiusDelete Stt Job
Permanently removes the data associated with a transcription job that is already complete or failed.
Delete Vocabulary
Removes a previously submitted custom vocabulary set from your profile.
Get Alignment Result
Returns precise timestamps for every word in the audio, useful for forced alignment...
Get Captions
Pulls synchronized caption files (SRT or VTT) from a finished job's media content.
Get Language Id Result
Identifies the primary language spoken in an audio file and provides confidence...
Get Sentiment Analysis Result
Returns a score detailing the emotional tone (positive, negative) of the speech within the transcript.
Get Stt Job
Checks and retrieves the current status and detailed information for any submitted transcription job ID.
Get Topic Extraction Result
Pulls a list of key topics identified in a transcript, along with their relative...
Get Transcript Summary
Generates and returns a condensed summary of the main points from a finished...
Get Transcript
Retrieves the full written text for a completed job; you can request JSON or plain...
Get Vocabulary
Checks the processing status of custom vocabulary phrases you recently submitted.
List Stt Jobs
Retrieves a list of all your transcription jobs that occurred within the last 30 days.
List Vocabularies
Shows you a history of custom vocabularies you've submitted to improve accuracy.
Submit Alignment Job
Submits both audio and the transcript text to perform forced alignment, adding...
Submit Language Id Job
Processes an audio URL or file to determine what language is being spoken.
Submit Sentiment Analysis Job
Submits a transcript text specifically for emotional tone analysis and scoring.
Submit Stt Job
The main action: submits an audio or video file URL to start the asynchronous...
Submit Topic Extraction Job
Submits a transcript text specifically for identifying and scoring its key discussion topics.
Submit Vocabulary
Processes new custom phrases or jargon you want the engine to recognize during transcription.
Connect to your AI in seconds. Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Rev.ai, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,000+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Rev.ai. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 19 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Manually turning hours of video into usable notes is a nightmare.
Right now, if you get a 90-minute meeting recording, you download the file. Then you either listen to it all, or you use a basic service that gives you messy text—text full of errors and no structure. You end up spending hours copy-pasting, cleaning up filler words, and manually creating bullet points just to make it readable.
With this MCP server, your agent handles the whole thing. Submit the media URL using `submit_stt_job`. When the job finishes, you don't get raw text; you run `get_transcript_summary` right away, and boom—you have a clean summary of the main points instantly.
Rev.ai MCP Server: Structured Data from Speech
Before this server, if you wanted to know if a meeting was positive or negative, you had to read every single word and manually count the complaints versus the praise. You were stuck in the text itself.
Now, after transcription, your agent calls `get_sentiment_analysis_result`. It returns structured scores for the whole job. That's it. No reading required—you just get the data.
What your AI can actually do with this
You submit a media file URL—audio or video—to start transcription jobs using submit_stt_job. This kicks off an asynchronous process, giving you a Job ID. You can track its progress and get detailed status updates for any job with that ID by calling get_stt_job and keep an eye on all your work history by reviewing the list of recent jobs through list_stt_jobs.
When the processing is done, you've got several ways to pull out usable data. You can get the full written content using get_transcript, which returns the text in either JSON or plain format. If you need synchronized captions for a video, get_captions pulls those files as standard SRT or VTT formats.
For deeper analysis, your agent runs several NLP tasks. To see what people were talking about, you can pull a list of key themes and their importance scores using get_topic_extraction_result. You'll also get an emotional read on the speech by calling get_sentiment_analysis_result, which returns a score showing whether the tone was positive, negative, or neutral.
If the transcript is long, don't waste time reading it all; use get_transcript_summary to condense the main points into a quick overview.
You can also get highly technical data. To perform forced alignment and get precise word timings for every syllable spoken, submit both the audio and text to submit_alignment_job, then grab those timestamps with get_alignment_result. If you're unsure what language was recorded, process the file URL using submit_language_id_job and check the result with get_language_id_result to identify the primary spoken tongue.
Before you start any job, you might need better accuracy. You can improve recognition for specific industry jargon by submitting custom phrases via submit_vocabulary. You'll manage your custom terms history using list_vocabularies, and check if a phrase was accepted with get_vocabulary.
For extra control over the process, you submit a transcript text to run sentiment analysis on that text specifically using submit_sentiment_analysis_job, or identify topics using submit_topic_extraction_job. You also kick off the entire job by submitting a raw URL with submit_stt_job.
When everything's done, and you're done with the data, you can clean up your account. Use delete_stt_job to permanently wipe out any transcription job's associated data, or if you need to clear out a custom vocabulary set, use delete_vocabulary. This server handles every step: from submitting media files and getting basic text, to pulling structured captions, generating summaries, detecting sentiment shifts, extracting key topics, and even aligning word timings.
019ea603-f523-7257-b567-66388199a0cc Here's how it actually works
The bottom line is you submit a file, wait for the job status, and then run specific analytical calls against that completed job ID.
Submit the media URL using submit_stt_job. You'll get a Job ID back.
Wait for job completion. Check status with get_stt_job until it's 'completed'.
Call analysis tools (e.g., get_transcript, get_topic_extraction_result) using the Job ID to pull structured data.
Who is this actually for?
Content creators who need to turn every podcast or video into blog-ready text. Researchers slogging through hours of interview footage. Developers building media processing tools for clients. Basically, anyone whose job involves converting spoken word into structured data.
Submits raw audio files to generate captions and summaries immediately, then uses get_transcript to copy clean text for show notes.
Runs multiple interviews through the server; it calls get_sentiment_analysis_result and get_topic_extraction_result on each one to quantify qualitative data.
Integrates the STT pipeline into a larger application, using submit_vocabulary first to ensure accuracy for specific client names or jargon before processing any media.
What Changes When You Connect
Stop wasting time reading raw text. Use get_transcript_summary to get a bullet-point summary of long meetings in seconds, instead of writing them up manually.
Improve accuracy for niche terms instantly. Before uploading anything, use submit_vocabulary so the engine correctly identifies client names or technical jargon that standard models miss.
Get more than just text. After transcription, call get_topic_extraction_result to get a structured list of key themes and how important they were, which is way better for content strategy.
Manage your media output perfectly. Use get_captions immediately after transcribing video to pull out SRT or VTT files—perfect for subtitling without extra steps.
Know the tone of every word. Run a transcript through get_sentiment_analysis_result. You instantly see where the discussion got emotional or negative, which is crucial for market research reports.
See it in action
Analyzing multiple interview transcripts.
A researcher records 15 hours of interviews. Instead of manual coding, they run all files through submit_stt_job. Once transcribed, they use get_topic_extraction_result and get_sentiment_analysis_result across the entire dataset to quantify patterns in mood and discussion points.
Creating blog posts from podcasts.
A podcast editor uploads a raw audio file using submit_stt_job. After completion, they retrieve the text with get_transcript and then call get_captions to grab clean SRT files for YouTube. Finally, they use get_topic_extraction_result to build out SEO keywords.
Quickly assessing meeting takeaways.
A project manager records a long status call and submits it via submit_stt_job. Instead of reading the transcript, they immediately use get_transcript_summary to get the core decisions and action items for stakeholders.
Debugging audio files with jargon.
A developer needs to process recordings containing proprietary medical terms. They first run through submit_vocabulary, submit those terms, wait for status via get_vocabulary, and only then use submit_stt_job to guarantee high-fidelity results.
The honest tradeoffs
Assuming a single endpoint works
Sending the raw audio file directly to one tool expecting it to return all analysis (summary, sentiment, topics). This fails because processing needs distinct steps.
You gotta build a pipeline. First, use submit_stt_job. Once that job ID is ready and marked 'completed', then you call get_transcript, followed by get_topic_extraction_result in separate calls.
Forgetting custom vocabulary
Transcribing a highly technical interview without using submit_vocabulary. The engine will mishear jargon, making the final text unusable.
Always start by identifying key industry terms. Run those through submit_vocabulary first. Then proceed with submit_stt_job to ensure maximum accuracy.
Ignoring job status checks
Calling get_transcript immediately after submitting the file, assuming it's ready. You just get an error because the transcription hasn't finished yet.
You must check the status first. Use get_stt_job repeatedly until the returned status is 'completed'. Only then do you call data retrieval tools like get_transcript.
When It Fits, When It Doesn't
Use this server if your core need is converting spoken word into highly structured, analyzed text. This isn't just a simple transcription service; it’s an analysis pipeline. You use it when you need to know what was said (text), how the speakers felt (sentiment), and what they were talking about (topics/themes). Don't use this if all you want is a quick, basic audio dump—use a simpler tool instead. If your content has specialized language or jargon, you absolutely must run submit_vocabulary first, because otherwise, the accuracy of every subsequent call (get_transcript, etc.) will be shot. It’s complex because it requires state management across multiple job IDs and steps.
Questions you might have
How do I improve accuracy with submit_stt_job? +
You use submit_vocabulary first. You feed the engine a list of your specific technical terms or names, wait for its status using get_vocabulary, and then you run the main job via submit_stt_job. This tells the model what to expect.
Which tool do I use to get captions? +
You use get_captions after a job is done. It pulls out synchronized caption files like SRT or VTT, which are perfect for video platforms and don't require you to process the raw text at all.
Is there one tool to get summary, topics, and sentiment? +
No. You need a pipeline. After submitting the job with submit_stt_job and getting the Job ID, you must call get_transcript_summary, get_topic_extraction_result, and get_sentiment_analysis_result separately using that same ID.
What if I want to know what language was spoken? +
Use the dedicated tool, submit_language_id_job. It processes an audio file or URL and gives you a confidence score for the top language identified in the recording.
What is the function of `delete_stt_job`? +
The delete_stt_job tool permanently removes job data. You use this when you need to clear records, but it only works for jobs that are already completed or have failed status.
How can I review my past transcription work using the `list_stt_jobs` tool? +
The list_stt_jobs tool gives you a list of all your transcription jobs from the last 30 days. This lets you quickly check job IDs, statuses (like 'in progress' or 'failed'), and pick up where you left off.
If I need time-stamped accuracy, how do I use `submit_alignment_job`? +
You submit audio and a transcript to the alignment job. This process forces alignment, letting you get precise timestamps for every word spoken in the media file.
After submitting custom terms with `submit_vocabulary`, how do I check its status using `get_vocabulary`? +
You call get_vocabulary to track if your custom vocabulary is ready for use. It checks the processing status of the phrases you submitted, confirming when they are available in the transcription models.
How can I check if my transcription job is finished? +
Use the get_stt_job tool with your Job ID. It will return the current status, such as 'in_progress', 'transcribed', or 'failed'.
Can I get subtitles for my video files? +
Yes! Once a job is 'transcribed', use the get_captions tool. You can specify the format as either 'srt' or 'vtt'.
How do I improve accuracy for industry-specific jargon? +
You can use the submit_vocabulary tool to provide a list of custom phrases. This helps the AI recognize technical terms and unique names more accurately.
We've already built the connector for Rev.ai. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 19 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.