Turn any audio or video into structured text.
Claude
ChatGPT
Cursor
Gemini
Windsurf
VS Code
JetBrains
Vercel
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Verbit handles professional transcription and captioning for media files. Just give your agent a link, and it manages the entire job: uploading the audio/video, tracking its progress in real time, and finally downloading the complete transcript or captions in formats like SRT, VTT, or DOCX.
What your AI can do
Create job
Starts a new job by submitting a media file URL that needs to be transcribed.
Get job
Checks the current status and progress of an existing transcription job ID.
Get transcript
Retrieves the final, completed transcript for a given job ID in your preferred format.
You give the agent a media URL, and it starts processing the file for text output.
The agent checks the system to tell you if your transcript is pending, in progress, or finished.
Once a job finishes, the agent fetches and delivers the text file in various structured formats.
You link your Verbit transcript job to an ID you already use in your internal project management system.
Ask an AI about this
Compatible AI Apps
OAuth 2.0 CompatibleWaiting for input…
Verbit: Media Processing Tools (3)
These three tools let you manage the entire process of converting media to text: starting a job, checking its status, and downloading the final output.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Verbit on VinkiusCreate Job
Starts a new job by submitting a media file URL that needs to be transcribed.
Get Job
Checks the current status and progress of an existing transcription job ID.
Get Transcript
Retrieves the final, completed transcript for a given job ID in your preferred...
Connect to your AI in seconds. Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Verbit, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,000+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Verbit. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 3 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Transcriptions are a massive manual bottleneck.
Right now, getting captions or transcripts means logging into a service, uploading the video file, and then staring at a dashboard until it's done. Then you download the raw file, open a second program to convert it to SRT, and maybe copy-paste parts of it somewhere else for your project tracker.
With this MCP, that whole sequence vanishes. You just tell your agent where the video is. It handles the entire upload, wait time, and conversion process until you get the clean output file, ready to drop into your workflow.
Get Structured Transcripts with Verbit.
You stop manually checking status. Instead of waiting for confirmation across different web pages and dashboards, you simply ask the agent to check the job progress using `get_job` until it confirms completion. Then, a single call to `get_transcript` pulls everything in one go.
It's not just about text; it's about workflow certainty. The process is fully managed, secure, and reliable.
What your AI can actually do with this
Verbit lets you turn any piece of recorded media into usable text. Instead of manually managing separate transcription services, your agent handles the whole pipeline—from start to finish. You feed it a public URL for an audio file; it starts the job and monitors its progress until it's done. Once complete, you get the full transcript or captions ready to use in any format you need, like JSON or Word.
The platform runs these complex workflows safely inside Vinkius’s isolated sandbox, guaranteeing that your API keys pass through a zero-trust proxy and never sit on disk. This means you can focus solely on getting high-quality text output without worrying about secure credential management.
019e5d65-b464-73a1-a01f-a12f5ab96c2a Here's how it actually works
The bottom line is: your AI client manages the entire waiting period and data transfer between Verbit's service and your final output.
First, tell the agent which media file needs transcribing by providing a public URL.
Next, the agent uses that information to start the job and then periodically checks its status until it reports completion.
Finally, you ask the agent for the transcript, specifying the required format (e.g., SRT or DOCX), and it retrieves the finished file.
Who is this actually for?
Content producers, legal firms, and media managers who are tired of juggling multiple web portals to convert audio/video into usable text.
Needs captions (SRT/VTT) generated instantly for YouTube or social media without manually downloading and reformatting files.
Has recordings of interviews or depositions that must be converted into structured, searchable text documents quickly.
Needs to automate the process of checking transcription status across dozens of user-generated video assets for project reports.
What Changes When You Connect
You get immediate captions (SRT/VTT) for videos. No more manually exporting and timing caption files; just ask the agent to create them.
The workflow is fully traceable. You can use external IDs when creating a job, linking the transcript directly back to your internal project tracking system.
You don't get stuck on single formats. The agent pulls transcripts in multiple file types—JSON, TXT, DOCX, and more—so you never have to re-save anything.
It manages status checks automatically. You can use the get_job tool repeatedly until your workflow confirms that the transcript is ready for download.
The entire process is secure. All API calls run through Vinkius's zero-trust proxy, meaning your Verbit keys pass in transit but never sit on disk.
See it in action
Post-Interview Documentation
A legal team uploads multiple interview recordings. The agent first uses create_job for all files, then loops through them using get_job until they are all complete, finally calling get_transcript to pull all required DOCX files into a single folder.
YouTube Captioning Pipeline
A content creator uploads a new video link. The agent calls create_job, waits for the job ID, and when done, uses get_transcript to pull the SRT file directly into their asset management system.
Project Status Reporting
A product manager needs a report on 50 different videos. They call create_job for all of them, and then use get_job in a batch loop to confirm which ones are ready before pulling the final JSON data with get_transcript.
Historical Archiving
An academic needs to archive historical audio. They run create_job and tag it with an internal archival ID, ensuring that when they use get_transcript, the output is perfectly linked back to their source records.
The honest tradeoffs
Doing manual status polling
Manually opening a browser tab, waiting 5 minutes, refreshing, checking if the job ID is 'complete', and then logging in again to download.
Let your agent handle the loop. Use create_job first, and then let the workflow repeatedly call get_job until it confirms the status, triggering the final call to get_transcript only when necessary.
Ignoring output formats
Assuming a service will always give you plain text, but getting raw JSON instead, which breaks your downstream database import.
Always specify the required format (DOCX, VTT, etc.) when calling get_transcript. The agent handles mapping that requirement to the correct output structure.
Forgetting external context
Running a job without telling your database what project it belongs to, resulting in orphaned transcript records.
When you call create_job, ensure you provide an external ID tag. This links the automated job directly into your existing data structures.
When It Fits, When It Doesn't
Use this MCP if your primary need is converting media (audio/video) into usable, structured text documents and managing that entire lifecycle automatically. You're dealing with multiple stages: uploading, waiting for processing, and then downloading different formats. Don't use it if you just need simple file conversion or basic audio trimming; those are better suited for dedicated media utility tools. If your process requires complex coordination—say, transcribing a video, then summarizing the transcript using an LLM, and then posting it to Slack—you should look into chaining Verbit with other MCPs in the Vinkius catalog. That's where the real power is.
Questions you might have
How do I start a transcription job with Verbit? +
You initiate the process by calling create_job and providing the public URL of the media file you want transcribed. This kicks off the entire background process.
Can I check the progress using the Verbit MCP? +
Yep. You use the get_job tool, giving it your unique job ID. It tells you exactly what percentage of completion the job is at and its current status.
What file formats can I get from Verbit? +
You can download transcripts in many formats. The get_transcript tool supports JSON, TXT, SRT, VTT, DOCX, and more, so you only get what your downstream system needs.
Does the Verbit MCP handle large files? +
Yes. It's designed for professional media processing, allowing you to manage high-quality speech-to-text tasks even with long or complex audio inputs.
What credentials do I need to successfully run `create_job` with Verbit? +
You must provide your Verbit API Key. This key connects your agent securely to the platform and is handled by Vinkius's zero-trust proxy. Your keys are never stored on disk, ensuring secure access for every job you create.
Can I use `create_job` to tag my media files with external IDs? +
Yes, you can assign custom external IDs when using create_job. This feature is key for maintaining consistency and linking the transcription job back to records in your internal database or project management tools.
If I run `get_transcript`, what should I do if the file download fails? +
First, check the status using get_job. A failed retrieval often means the job hasn't hit 100% completion yet. Wait until the status shows 'Completed' before attempting to download the transcript again.
Does the Verbit MCP support chaining all three tools (`create_job`, `get_job`, and `get_transcript`)? +
Absolutely. You can chain these calls together within your agent workflow. For example, the Job ID generated by create_job is automatically passed to get_job and subsequently used for get_transcript.
How do I start a new transcription job using a video link? +
Use the create_job tool. Simply provide the file_url of your media and optionally specify the language and a title to identify the job later.
Can I download subtitles for my video in SRT format? +
Yes! Once the job is finished, use the get_transcript tool with your job_id and set the format parameter to 'srt'.
How can I check if my transcription is already finished? +
You can use the get_job tool. By providing the job_id, the agent will return the current status and progress percentage of your transcription.
We've already built the connector for Verbit. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 3 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.