Maestra MCP. Automate Transcription, Translation, and Voiceovers.

Q: Can I see all the available voices for generateaivoiceover?

Yes, call listavailableaivoices. This tool returns a full list of synthetic voices and their characteristics so you can pick one that matches your brand.

Q: If I translate the text, how do I get the final file?

After running translation or voiceover generation, use exporttranscriptionresults. This gives you a temporary download link for the finished content in formats like SRT or VTT.

Q: How do I check if my large media job is still working?

Use getfiledetails with the file ID. It tells you the current processing status (e.g., 'Pending', 'Processing', or 'Complete').

Q: If my job fails during transcription, how do I use getfiledetails to diagnose the specific error?

You must pass the unique file ID into getfiledetails. This tool returns a detailed status object that includes failure codes and descriptive messages. These codes tell you if the issue was due to format incompatibility or API permission errors.

Q: How can I use listaccountfolders to keep my translated media assets organized?

You first run listaccountfolders to see your current structure. You then use the agent's capability to create new folders for specific projects (e.g., 'Global Campaign Q3'). This keeps all related transcription and voiceover results grouped logically.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Maestra MCP Server automates your entire media lifecycle: uploading videos, transcribing speech into text, translating that text into over 125 languages, and generating professional synthetic voiceovers.

Your agent handles everything from listing content assets to exporting the final SRT or VTT file.

What your AI agents can do

Export transcription results

Generates a temporary download link for the final processed media data, like an SRT or VTT file.

Generate ai voiceover

Creates synthetic audio by reading text and generating a professional-sounding voiceover.

Get file details

Retrieves the current status, size, and metadata for any specific file ID in your account.

+ 5 more capabilities included

Upload Media for Transcription

Your agent sends a public URL to the server, starting the process of converting video or audio into text.

Translate Existing Transcripts

The agent takes a finished transcript and converts it into an entirely new language using translate_transcription.

Generate Synthetic Voiceovers

Using the generated text, your agent calls generate_ai_voiceover to create high-quality audio dubs.

List and Manage Assets

The server allows you to check all files (list_maestra_files) or folders (list_account_folders) within your Maestra account for status checks.

Export Final Results

Your agent requests an export link, providing the processed data as SRT, VTT, PDF, or JSON files using export_transcription_results.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Maestra MCP Server: 8 Tools for Media Pipelines

These tools allow your agent to manage every step of the media lifecycle, from uploading raw files to exporting final translated and dubbed content.

export019d75cb

export transcription results

Generates a temporary download link for the final processed media data, like an SRT or VTT file.

generate019d75cb

generate ai voiceover

Creates synthetic audio by reading text and generating a professional-sounding voiceover.

get019d75cb

get file details

Retrieves the current status, size, and metadata for any specific file ID in your account.

list019d75cb

list account folders

Returns a list of all folder names you've set up to organize your media assets.

list019d75cb

list available ai voices

Checks and lists every synthetic AI voice available for selection when creating a voiceover.

list019d75cb

list maestra files

Outputs an index of all audio and video files currently stored in your Maestra account.

translate019d75cb

translate transcription

Converts a completed text transcript from one language into another target language.

upload019d75cb

upload media for transcription

Uploads an external media file via URL and specifies the source language to begin transcription.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Maestra, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You connect your agent to Maestra MCP Server to handle your whole media workflow, from raw footage to polished, dubbed content. You don't lift a finger; your agent does the heavy lifting.

To start, you can manage your assets by running list_account_folders which gives you every folder name you’ve set up for organization, or use list_maestra_files to get an index of every single audio and video file sitting in your Maestra account. If you gotta check the status, size, or metadata on a specific piece of content, just run get_file_details with the file ID.

When you're ready for transcription, your agent sends a public URL to the server using upload_media_for_transcription, making sure you specify the source language. It then converts that video or audio into text transcripts. Once that transcript is done and sitting there, you can run translate_transcription to convert it into an entirely new target language among over 125 options.

If you need high-quality dubbing, you first check out the available voices by running list_available_ai_voices. Then, using the translated or original text, your agent calls generate_ai_voiceover to create professional synthetic audio. This gives you a full voice track for your content.

When everything's done—the transcription, the translation, and the dubbing—your final stop is requesting an export link. You run export_transcription_results, which spits out temporary download links containing all the processed data in formats like SRT, VTT, PDF, or JSON files.

How Maestra MCP Works

1 First, your agent uses upload_media_for_transcription by providing a public URL and specifying the source language. The server starts processing.
2 Next, you can use translate_transcription to convert the resulting English transcript into Spanish or Japanese, for example.
3 Finally, your agent calls generate_ai_voiceover using the translated text to create the final dubbing audio file.

The bottom line is that Maestra takes a source media file and turns it through transcription, translation, and voice generation in one flow.

Who Is Maestra MCP For?

Content creators who need to quickly localize videos. Localization teams managing global marketing campaigns. Developers building applications around multi-lingual media processing. Anyone whose job involves taking a single piece of recorded content and needing it in dozens of languages.

Video Content Creator

You upload the master video file, let your agent handle the transcription, then use translate_transcription to get subtitles for 10 different markets.

Localization Manager

You need a full campaign rollout. You list all assets with list_maestra_files, translate them in batches using translate_transcription, and then manage the voiceover process via generate_ai_voiceover.

DevOps Engineer

You build a microservice that must ingest media, transcribe it for metadata, and save the results. You'll rely heavily on file status checks using get_file_details.

What Changes When You Connect

It saves time by automating the whole chain. Instead of manually transcribing a video, then copy-pasting that text into a translator, your agent handles upload_media_for_transcription -> translate_transcription -> generate_ai_voiceover in one sequence.
You always know what you're dealing with. Use list_maestra_files and then check the status for any file ID using get_file_details. No guesswork about whether a job finished or failed.
The output is structured, not messy. When the process finishes, your agent requests an export link via export_transcription_results, giving you clean SRT, VTT, or JSON files ready to use.
You're never stuck on language choice. Before generating a voiceover, check out every option available using list_available_ai_voices so you can pick the perfect tone and gender for your project.
It keeps everything organized. You don't lose tracks of content. Use list_account_folders to see where all your regional assets are stored within Maestra.

Real-World Use Cases

The Global Marketing Campaign

A client needs a single product video localized for 12 different countries. They use their agent to first run upload_media_for_transcription on the master file, then iterate through all target languages using translate_transcription. Finally, they loop through the results calling generate_ai_voiceover until every asset has a perfectly dubbed version.

The Conference Booth Demo

A developer needs to show how their system handles multi-lingual data. They use their agent to run list_maestra_files to confirm all source assets are present, then pick a file and check its status with get_file_details before triggering the full translation cycle.

The Podcast Archive Cleanup

A podcast producer has 50 hours of raw audio. They use their agent to process each hour via upload_media_for_transcription. Once all are transcribed, they then call list_available_ai_voices and generate a consistent voiceover for the entire archive using generate_ai_voiceover.

The Developer Sandbox Test

A developer wants to test their new file-download feature. They upload a dummy video, wait for it to process, and then use export_transcription_results immediately to confirm they get the expected SRT format link.

The Tradeoffs

Trying to translate without transcribing first

The user thinks they can just pass a raw video URL to translate_transcription. The server won't work because translation requires source text, not media.

→ You must run the content through upload_media_for_transcription first. That step creates the necessary text data that translate_transcription needs as input.

Assuming a single tool handles everything

The user tries to combine transcription, translation, and voiceover in one call. This fails because these are three distinct, sequential processes.

→ You need a multi-step sequence: Start with upload_media_for_transcription, then use translate_transcription on the result, and finally call generate_ai_voiceover.

Forgetting to check file status

The user calls a tool expecting an immediate output but gets an error because the previous media processing job is still running in the background.

→ Always use get_file_details after initiating a large process. This checks the current state of the file, telling you if it's 'Processing', 'Ready', or 'Failed'.

When It Fits, When It Doesn't

Use this Maestra server if your primary need is transforming recorded media (video/audio) into structured, multi-lingual text and then back into dubbing audio. Specifically, if you require a workflow that moves from raw asset -> transcript -> translation -> voiceover, this is built for it. Don't use it if your problem is simply editing video footage or adding visual effects; Maestra handles the data pipeline. Also, don't rely on it for real-time live captioning—it processes uploaded files. If you only need to manage assets and check status without doing any processing, list_maestra_files alone might be enough, but using the full server gives you all the options.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Maestra. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 8 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

export_transcription_results generate_ai_voiceover get_file_details list_account_folders list_available_ai_voices list_maestra_files translate_transcription upload_media_for_transcription

Dealing with global content is a mess of spreadsheets and manual copy-pasting.

Before Maestra, localizing video was painful. You'd record a master file, then manually download the transcript (often messy or incomplete). You’d paste that text into Google Translate, downloading separate files for each language. Then, you had to find a voiceover service and re-record every single segment—a process that took days and cost thousands.

Now, your agent handles it all. Start by calling `upload_media_for_transcription` with the master file. Once transcribed, use `translate_transcription` to generate 125+ languages instantly. Finally, one call to `generate_ai_voiceover` creates a perfect dubbing track—no more manual transfers or messy files.

Maestra MCP Server: Get full media automation from chat.

The specific steps that vanish are the file downloads, the language code lookups, and the individual API calls for each step. You don't need to write complex orchestration logic; your agent just needs three simple commands: upload, translate, dub.

What’s different now is the speed and consistency. Instead of treating translation and voiceover as separate services that require manual handoffs, Maestra keeps them in one reliable pipeline. It takes a single media asset and turns it into a finished, multi-lingual product.

Common Questions About Maestra MCP

How do I start the process using upload_media_for_transcription? +

You need to provide a public URL for your video or audio file and specify the source language. The server then begins processing it, generating an initial transcript.

What is the difference between transcribe_transcription and translate_transcription? +

Transcription converts media (audio/video) into text. Translation takes existing text and converts its language (e.g., English to French). You must transcribe first.

Can I see all the available voices for generate_ai_voiceover? +

Yes, call list_available_ai_voices. This tool returns a full list of synthetic voices and their characteristics so you can pick one that matches your brand.

If I translate the text, how do I get the final file? +

After running translation or voiceover generation, use export_transcription_results. This gives you a temporary download link for the finished content in formats like SRT or VTT.

How do I check if my large media job is still working? +

Use get_file_details with the file ID. It tells you the current processing status (e.g., 'Pending', 'Processing', or 'Complete').

When using `list_maestra_files`, what file formats and codecs does the Maestra server support for uploads? +

The server supports common media types including MP4, MOV, WAV, and FLAC files. While it handles most standard audio/video codecs, complex or highly compressed proprietary formats may require pre-conversion to ensure processing success.

If my job fails during transcription, how do I use `get_file_details` to diagnose the specific error? +

You must pass the unique file ID into get_file_details. This tool returns a detailed status object that includes failure codes and descriptive messages. These codes tell you if the issue was due to format incompatibility or API permission errors.

How can I use `list_account_folders` to keep my translated media assets organized? +

You first run list_account_folders to see your current structure. You then use the agent's capability to create new folders for specific projects (e.g., 'Global Campaign Q3'). This keeps all related transcription and voiceover results grouped logically.

How do I start a transcription for a public video? +

Use the upload_media_for_transcription tool and provide the public file_url along with the source language code (e.g., 'en' for English).

Can I choose which AI voice to use for dubbing? +

Yes, first use the list_available_ai_voices tool to find a voice ID that matches your target language and gender, then provide it to the generate_ai_voiceover tool.

What formats can I export my transcripts in? +

The export_transcription_results tool supports formats such as 'srt', 'vtt', 'txt', 'pdf', and 'json'.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python