Verbit MCP. Automate captioning and transcribe media from any URL.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Verbit automates professional transcription and captioning for media files. Upload any public video or audio URL, and your AI client manages the entire pipeline: it creates a job using `create_job`, tracks status with `get_job`, and retrieves completed transcripts in JSON, SRT, TXT, DOCX, or VTT formats via `get_transcript`.
Stop manual captioning. Start processing media instantly.
What your AI agents can do
Create job
Starts the process by uploading a public media URL and defining the language for transcription.
Get job
Checks the status of an existing job using its unique Job ID, letting you know if it's 'In Progress' or finished.
Get transcript
Downloads the finalized transcript content for a completed job in various formats (TXT, SRT, DOCX, etc.).
Send a public media URL and language specification to initiate a new transcription job using create_job.
Use the unique Job ID with get_job to check the current status of any pending or active transcription request.
Call get_transcript with a completed Job ID and specify the required output format (e.g., DOCX, SRT) to download the final text file.
Tag jobs with external IDs when creating them, keeping your Verbit workflow linked to your internal project database.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Verbit: 3 Tools for Media Processing
Manage the entire transcription workflow by calling these three tools sequentially through your AI agent.
019e5d65create job
Starts the process by uploading a public media URL and defining the language for transcription.
019e5d65get job
Checks the status of an existing job using its unique Job ID, letting you know if it's 'In Progress' or finished.
019e5d65get transcript
Downloads the finalized transcript content for a completed job in various formats (TXT, SRT, DOCX, etc.).
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Verbit, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
You've got media—a video, some audio clip—and you need it transcribed clean. Forget sitting there with captions or manual transcriptions; your AI client handles the whole damn pipeline for you using the Verbit MCP Server. This thing lets you process high-quality speech-to-text jobs instantly, taking all the headaches out of media post-production.
To get started, you'll use create_job. You just send your agent a public URL pointing to any video or audio source and tell it what language you need for the transcription. The tool takes that info and kicks off a new job, immediately spitting out a unique Job ID. You can even keep track of your projects by tagging those jobs with external IDs when you create them; that keeps everything linked back to your own internal database.
Once the job starts, you'll use get_job to check its status. Just feed it that Job ID, and your agent tells you if the process is 'In Progress' or if it’s done. This lets you monitor real-time progress without having to wait around guessing what's going on in the background.
When get_job confirms the job is finished, you call get_transcript. You pass that completed Job ID and tell the tool exactly which format you need the final text. This isn't just a simple download; it lets you specify formats like JSON, SRT, VTT, DOCX, or TXT, so you get the file ready for immediate use, whether you're dropping it into an editor or archiving it.
The process is straightforward: You send your agent the URL and language via create_job. When you need a status check, you ask about the Job ID using get_job. Once that confirms completion, you hit get_transcript to download the perfectly formatted output. This workflow eliminates the need for manual captioning entirely.
When your agent uses create_job, it's not just starting a transcription; it's initiating a structured process defined by the URL and the language specification you provide, giving you that critical Job ID right out of the gate. The ability to tag jobs with external IDs means Verbit integrates into your existing project management structure—you never lose track of which transcript belongs where.
If you run into issues or just need an update, get_job is what you use. It takes that specific Job ID and returns the current status, whether it's still cooking ('In Progress') or if it's ready to be pulled down. You don't have to wait on a timer; you just check the job status until it says green.
To finish up, get_transcript is your final stop. It accepts that Job ID and lets you select from a ton of output formats: DOCX for Word docs, SRT for subtitling, VTT for web videos, JSON for machine reading, or simple TXT files. You call this tool when the job is complete to pull down the finished content in whatever format your downstream process needs.
It's about getting you the file right now, without conversion headaches.
This server handles the entire lifecycle: From the initial media upload and language definition using create_job, through the status monitoring with get_job, right up to downloading a perfectly formatted, usable transcript via get_transcript. It's a complete system for transforming audio and video into structured text files.
How Verbit MCP Works
- 1 You subscribe to the server and pass your Verbit API Key. Your AI agent accesses the toolset.
- 2 To start, you call
create_jobby providing a public media URL and language code. This starts processing and returns a Job ID. - 3 After waiting an appropriate time, you check progress using
get_jobwith that Job ID. Once the status is 'Complete,' you useget_transcriptto pull the file in your desired format.
The bottom line is: Verbit handles the three-step process—uploading, waiting, and downloading—so you never have to copy/paste URLs or manually track job IDs across different systems.
Who Is Verbit MCP For?
This server is for content teams who hate manual captioning workflows. It's for legal professionals dealing with hours of audio recordings, and media managers who need to integrate transcription status into project reports. You use this when the source material is always video or audio.
Needs captions (SRT/VTT) for every piece of content without exporting files to a separate, clunky service.
Must transcribe hours-long recorded interviews or depositions into structured text documents like DOCX for immediate review and filing.
Integrates transcription status checks directly into automated project reporting, knowing exactly when a media asset is ready.
What Changes When You Connect
- Get Captions Instantly: Instead of uploading a video to a separate tool just for captions, you use
create_jobto process the media in one flow. You get SRT/VTT files directly when they're ready. - Format Control: Don't get stuck with plain text. Use
get_transcriptand specify DOCX or JSON formats. This means the output structure matches what your internal database needs right out of the gate. - Track Jobs Programmatically: Forget checking a dashboard manually. Pass the Job ID to
get_job, and your agent knows exactly if it's 75% done or finished, allowing for automated reporting loops. - Handle Large Projects: You can tag jobs with external IDs during creation. This keeps Verbit records tied directly to your project management tools, making compliance auditing simple.
- Zero Manual Copy-Pasting: The three-step process—create, check, retrieve—is handled by the agent. Your AI client manages the state changes and tool calls for you.
Real-World Use Cases
Legal Deposition Transcription
A paralegal receives a 3-hour audio recording. They ask their agent to run create_job with the URL. After an hour, they use get_job to confirm completion. Finally, they call get_transcript to download the result as a clean DOCX file for filing.
YouTube Video Captioning
A content creator uploads a video URL and specifies English/SRT format. The agent runs create_job. When finished, the agent calls get_transcript to fetch the captions file immediately, ready for upload.
Batch Project Reporting
A PM has 20 video assets needing status updates. Instead of checking 20 URLs, they run a loop: calling create_job for all 20, then repeatedly hitting get_job until all are marked 'Complete,' streamlining the entire report generation.
Interview Data Structuring
An academic records an interview and needs structured data. The agent processes it via create_job. Using a custom external ID tag, they ensure the resulting JSON from get_transcript links back perfectly to their research database.
The Tradeoffs
Assuming the transcript is ready
Calling get_transcript immediately after running create_job, expecting a file download. This will fail because the job hasn't finished processing.
→
Always check status first. After calling create_job, you must call get_job repeatedly until the status explicitly reports 'Complete.' Only then can you reliably use get_transcript.
Ignoring required formats
Asking for a transcript without specifying if you need SRT, TXT, or DOCX. The agent won't know which format to pull.
→
When calling get_transcript, always specify the desired output type (e.g., 'in VTT format') so the system knows exactly what file structure to return.
Mixing job IDs
Using a Job ID from an old project or a different media source when calling get_job. This results in a 'Job Not Found' error.
→
Keep track of the unique Job ID returned by your initial create_job call. Use that specific ID for all subsequent calls to both get_job and get_transcript.
When It Fits, When It Doesn't
Use Verbit if your core problem is transforming audio or video into structured text, captions, or data records. You need the full workflow—from source URL to final file type (SRT/DOCX)—managed in one place. If you only have a plain text transcript and just want to summarize it, don't use Verbit; run that text through your general LLM client directly.
Don't use this if you are simply trying to store raw audio files for later retrieval (use dedicated storage buckets). You must use the combination of create_job -> get_job -> get_transcript when dealing with media assets. If any step is missing, your workflow breaks.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Verbit. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 3 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Copying and pasting captions from video players sucks.
Today's process is a mess of tabs: you download the transcript as plain text, then copy that into a subtitle editor. You manually set timestamps or upload it to another service just to get the proper SRT/VTT file for YouTube. It’s tedious and prone to formatting errors.
With Verbit, you send your video URL and ask your agent to generate captions. The agent runs `create_job`, monitors progress with `get_job`, and pulls the final output using `get_transcript`. You get a clean SRT file—no copy-pasting required.
Verbit MCP Server: Manage media processing in your chat.
Manual workflows involve creating a job, waiting for email notifications, and then logging into another platform to download the file. These steps require switching contexts and tracking IDs across multiple services.
Now, you keep everything contained within your AI agent. The agent handles the full pipeline—from `create_job` to final `get_transcript` retrieval—without you lifting a finger beyond asking the right question.
Common Questions About Verbit MCP
How do I use Verbit with multiple media files? +
You run create_job for each URL individually, or pass a list of URLs if your client supports it. Each file gets its own unique Job ID, which you must track to check status using get_job.
Can Verbit transcribe video files into Word documents? +
Yes. After the job is complete, call get_transcript and specify 'DOCX' as the desired format. This delivers a structured Word document containing the transcript text.
What if my transcription fails? How do I know? +
Use get_job. The status will report an error state or provide specific feedback indicating why the job could not be completed. You'll need to fix the source URL or media file.
Do I need a special API key for Verbit? +
Yes, you must enter your Verbit API Key when subscribing to the server. The agent uses this key to authenticate and manage all tool calls on your behalf.
How do I use the unique Job ID with the Verbit `get_job` tool? +
You pass the specific job identifier (e.g., 'vbt_123') directly to the function call. This tells your AI agent exactly which transcription request you want status updates on, preventing confusion between different jobs.
Does Verbit require a publicly accessible URL when running `create_job`? +
Yes, the media source must be publicly accessible for the server to begin processing. If your audio or video file is behind a private login, you'll need an alternate method of feeding the data into the agent.
What is the best way to get structured text using `get_transcript`? +
Specify JSON as the output format when calling get_transcript. This delivers raw, machine-readable data that's perfect for parsing into databases or feeding into other automated pipelines.
Are there usage limits or rate limits with the Verbit MCP Server? +
Usage quotas are tied to your Verbit API plan. While the server is robust, always check your dashboard for current limits. Running create_job too frequently may require a temporary pause.
How do I start a new transcription job using a video link? +
Use the create_job tool. Simply provide the file_url of your media and optionally specify the language and a title to identify the job later.
Can I download subtitles for my video in SRT format? +
Yes! Once the job is finished, use the get_transcript tool with your job_id and set the format parameter to 'srt'.
How can I check if my transcription is already finished? +
You can use the get_job tool. By providing the job_id, the agent will return the current status and progress percentage of your transcription.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Chuck Norris
Access the legendary power of Chuck Norris facts — get random jokes, browse categories, and search the entire database directly from your AI agent.
Adzuna
Search job listings and salary data — find vacancies and employment trends via AI.
Hootsuite (Social Media Management)
Manage social media via Hootsuite — schedule posts, manage social profiles, and monitor outbound messages.
You might also like
Terraform Cloud (HCP)
Manage infrastructure lifecycle via Terraform Cloud (HCP) — list organizations, manage workspaces, trigger runs, and inspect state outputs directly from your AI agent.
Kit (ConvertKit)
Enable your AI agent to manage email subscribers, organize tags, and monitor broadcast campaigns via the Kit API.
Nuvemshop
Manage your Nuvemshop e-commerce via API — list products, orders, customers, coupons, and webhooks directly from any AI agent.