Verbit MCP. Automate captioning and transcribe media from any URL.

Q: How do I use Verbit with multiple media files?

You run createjob for each URL individually, or pass a list of URLs if your client supports it. Each file gets its own unique Job ID, which you must track to check status using getjob.

Q: Can Verbit transcribe video files into Word documents?

Yes. After the job is complete, call gettranscript and specify 'DOCX' as the desired format. This delivers a structured Word document containing the transcript text.

Q: What if my transcription fails? How do I know?

Use getjob. The status will report an error state or provide specific feedback indicating why the job could not be completed. You'll need to fix the source URL or media file.

Q: How do I use the unique Job ID with the Verbit getjob tool?

You pass the specific job identifier (e.g., 'vbt123') directly to the function call. This tells your AI agent exactly which transcription request you want status updates on, preventing confusion between different jobs.

Q: What is the best way to get structured text using gettranscript?

Specify JSON as the output format when calling gettranscript. This delivers raw, machine-readable data that's perfect for parsing into databases or feeding into other automated pipelines.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Verbit automates professional transcription and captioning for media files. Upload any public video or audio URL, and your AI client manages the entire pipeline: it creates a job using `create_job`, tracks status with `get_job`, and retrieves completed transcripts in JSON, SRT, TXT, DOCX, or VTT formats via `get_transcript`.

Stop manual captioning. Start processing media instantly.

What your AI agents can do

Create job

Starts the process by uploading a public media URL and defining the language for transcription.

Get job

Checks the status of an existing job using its unique Job ID, letting you know if it's 'In Progress' or finished.

Get transcript

Downloads the finalized transcript content for a completed job in various formats (TXT, SRT, DOCX, etc.).

Process Media for Transcription

Send a public media URL and language specification to initiate a new transcription job using create_job.

Track Job Status in Real-Time

Use the unique Job ID with get_job to check the current status of any pending or active transcription request.

Download Formatted Transcripts

Call get_transcript with a completed Job ID and specify the required output format (e.g., DOCX, SRT) to download the final text file.

Maintain Project Consistency

Tag jobs with external IDs when creating them, keeping your Verbit workflow linked to your internal project database.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Verbit: 3 Tools for Media Processing

Manage the entire transcription workflow by calling these three tools sequentially through your AI agent.

create019e5d65

create job

Starts the process by uploading a public media URL and defining the language for transcription.

get019e5d65

get job

Checks the status of an existing job using its unique Job ID, letting you know if it's 'In Progress' or finished.

get019e5d65

get transcript

Downloads the finalized transcript content for a completed job in various formats (TXT, SRT, DOCX, etc.).

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Verbit, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You've got media—a video, some audio clip—and you need it transcribed clean. Forget sitting there with captions or manual transcriptions; your AI client handles the whole damn pipeline for you using the Verbit MCP Server. This thing lets you process high-quality speech-to-text jobs instantly, taking all the headaches out of media post-production.

To get started, you'll use create_job. You just send your agent a public URL pointing to any video or audio source and tell it what language you need for the transcription. The tool takes that info and kicks off a new job, immediately spitting out a unique Job ID. You can even keep track of your projects by tagging those jobs with external IDs when you create them; that keeps everything linked back to your own internal database.

Once the job starts, you'll use get_job to check its status. Just feed it that Job ID, and your agent tells you if the process is 'In Progress' or if it’s done. This lets you monitor real-time progress without having to wait around guessing what's going on in the background.

When get_job confirms the job is finished, you call get_transcript. You pass that completed Job ID and tell the tool exactly which format you need the final text. This isn't just a simple download; it lets you specify formats like JSON, SRT, VTT, DOCX, or TXT, so you get the file ready for immediate use, whether you're dropping it into an editor or archiving it.

The process is straightforward: You send your agent the URL and language via create_job. When you need a status check, you ask about the Job ID using get_job. Once that confirms completion, you hit get_transcript to download the perfectly formatted output. This workflow eliminates the need for manual captioning entirely.

When your agent uses create_job, it's not just starting a transcription; it's initiating a structured process defined by the URL and the language specification you provide, giving you that critical Job ID right out of the gate. The ability to tag jobs with external IDs means Verbit integrates into your existing project management structure—you never lose track of which transcript belongs where.

If you run into issues or just need an update, get_job is what you use. It takes that specific Job ID and returns the current status, whether it's still cooking ('In Progress') or if it's ready to be pulled down. You don't have to wait on a timer; you just check the job status until it says green.

To finish up, get_transcript is your final stop. It accepts that Job ID and lets you select from a ton of output formats: DOCX for Word docs, SRT for subtitling, VTT for web videos, JSON for machine reading, or simple TXT files. You call this tool when the job is complete to pull down the finished content in whatever format your downstream process needs.

It's about getting you the file right now, without conversion headaches.

This server handles the entire lifecycle: From the initial media upload and language definition using create_job, through the status monitoring with get_job, right up to downloading a perfectly formatted, usable transcript via get_transcript. It's a complete system for transforming audio and video into structured text files.

How Verbit MCP Works

1 You subscribe to the server and pass your Verbit API Key. Your AI agent accesses the toolset.
2 To start, you call create_job by providing a public media URL and language code. This starts processing and returns a Job ID.
3 After waiting an appropriate time, you check progress using get_job with that Job ID. Once the status is 'Complete,' you use get_transcript to pull the file in your desired format.

The bottom line is: Verbit handles the three-step process—uploading, waiting, and downloading—so you never have to copy/paste URLs or manually track job IDs across different systems.

Who Is Verbit MCP For?

This server is for content teams who hate manual captioning workflows. It's for legal professionals dealing with hours of audio recordings, and media managers who need to integrate transcription status into project reports. You use this when the source material is always video or audio.

Video Editor / Content Creator

Needs captions (SRT/VTT) for every piece of content without exporting files to a separate, clunky service.

Paralegal / Research Analyst

Must transcribe hours-long recorded interviews or depositions into structured text documents like DOCX for immediate review and filing.

Product Manager (Technical)

Integrates transcription status checks directly into automated project reporting, knowing exactly when a media asset is ready.

What Changes When You Connect

Get Captions Instantly: Instead of uploading a video to a separate tool just for captions, you use create_job to process the media in one flow. You get SRT/VTT files directly when they're ready.
Format Control: Don't get stuck with plain text. Use get_transcript and specify DOCX or JSON formats. This means the output structure matches what your internal database needs right out of the gate.
Track Jobs Programmatically: Forget checking a dashboard manually. Pass the Job ID to get_job, and your agent knows exactly if it's 75% done or finished, allowing for automated reporting loops.
Handle Large Projects: You can tag jobs with external IDs during creation. This keeps Verbit records tied directly to your project management tools, making compliance auditing simple.
Zero Manual Copy-Pasting: The three-step process—create, check, retrieve—is handled by the agent. Your AI client manages the state changes and tool calls for you.

Real-World Use Cases

Legal Deposition Transcription

A paralegal receives a 3-hour audio recording. They ask their agent to run create_job with the URL. After an hour, they use get_job to confirm completion. Finally, they call get_transcript to download the result as a clean DOCX file for filing.

YouTube Video Captioning

A content creator uploads a video URL and specifies English/SRT format. The agent runs create_job. When finished, the agent calls get_transcript to fetch the captions file immediately, ready for upload.

Batch Project Reporting

A PM has 20 video assets needing status updates. Instead of checking 20 URLs, they run a loop: calling create_job for all 20, then repeatedly hitting get_job until all are marked 'Complete,' streamlining the entire report generation.

Interview Data Structuring

An academic records an interview and needs structured data. The agent processes it via create_job. Using a custom external ID tag, they ensure the resulting JSON from get_transcript links back perfectly to their research database.

The Tradeoffs

Assuming the transcript is ready

Calling get_transcript immediately after running create_job, expecting a file download. This will fail because the job hasn't finished processing.

→ Always check status first. After calling create_job, you must call get_job repeatedly until the status explicitly reports 'Complete.' Only then can you reliably use get_transcript.

Ignoring required formats

Asking for a transcript without specifying if you need SRT, TXT, or DOCX. The agent won't know which format to pull.

→ When calling get_transcript, always specify the desired output type (e.g., 'in VTT format') so the system knows exactly what file structure to return.

Mixing job IDs

Using a Job ID from an old project or a different media source when calling get_job. This results in a 'Job Not Found' error.

→ Keep track of the unique Job ID returned by your initial create_job call. Use that specific ID for all subsequent calls to both get_job and get_transcript.

When It Fits, When It Doesn't

Use Verbit if your core problem is transforming audio or video into structured text, captions, or data records. You need the full workflow—from source URL to final file type (SRT/DOCX)—managed in one place. If you only have a plain text transcript and just want to summarize it, don't use Verbit; run that text through your general LLM client directly.

Don't use this if you are simply trying to store raw audio files for later retrieval (use dedicated storage buckets). You must use the combination of create_job -> get_job -> get_transcript when dealing with media assets. If any step is missing, your workflow breaks.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Verbit. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 3 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_job get_job get_transcript

Copying and pasting captions from video players sucks.

Today's process is a mess of tabs: you download the transcript as plain text, then copy that into a subtitle editor. You manually set timestamps or upload it to another service just to get the proper SRT/VTT file for YouTube. It’s tedious and prone to formatting errors.

With Verbit, you send your video URL and ask your agent to generate captions. The agent runs `create_job`, monitors progress with `get_job`, and pulls the final output using `get_transcript`. You get a clean SRT file—no copy-pasting required.

Verbit MCP Server: Manage media processing in your chat.

Manual workflows involve creating a job, waiting for email notifications, and then logging into another platform to download the file. These steps require switching contexts and tracking IDs across multiple services.

Now, you keep everything contained within your AI agent. The agent handles the full pipeline—from `create_job` to final `get_transcript` retrieval—without you lifting a finger beyond asking the right question.

Common Questions About Verbit MCP

How do I use Verbit with multiple media files? +

You run create_job for each URL individually, or pass a list of URLs if your client supports it. Each file gets its own unique Job ID, which you must track to check status using get_job.

Can Verbit transcribe video files into Word documents? +

Yes. After the job is complete, call get_transcript and specify 'DOCX' as the desired format. This delivers a structured Word document containing the transcript text.

What if my transcription fails? How do I know? +

Use get_job. The status will report an error state or provide specific feedback indicating why the job could not be completed. You'll need to fix the source URL or media file.

Do I need a special API key for Verbit? +

Yes, you must enter your Verbit API Key when subscribing to the server. The agent uses this key to authenticate and manage all tool calls on your behalf.

How do I use the unique Job ID with the Verbit `get_job` tool? +

You pass the specific job identifier (e.g., 'vbt_123') directly to the function call. This tells your AI agent exactly which transcription request you want status updates on, preventing confusion between different jobs.

Does Verbit require a publicly accessible URL when running `create_job`? +

Yes, the media source must be publicly accessible for the server to begin processing. If your audio or video file is behind a private login, you'll need an alternate method of feeding the data into the agent.

What is the best way to get structured text using `get_transcript`? +

Specify JSON as the output format when calling get_transcript. This delivers raw, machine-readable data that's perfect for parsing into databases or feeding into other automated pipelines.

Are there usage limits or rate limits with the Verbit MCP Server? +

Usage quotas are tied to your Verbit API plan. While the server is robust, always check your dashboard for current limits. Running create_job too frequently may require a temporary pause.

How do I start a new transcription job using a video link? +

Use the create_job tool. Simply provide the file_url of your media and optionally specify the language and a title to identify the job later.

Can I download subtitles for my video in SRT format? +

Yes! Once the job is finished, use the get_transcript tool with your job_id and set the format parameter to 'srt'.

How can I check if my transcription is already finished? +

You can use the get_job tool. By providing the job_id, the agent will return the current status and progress percentage of your transcription.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.