Speech Synthesis MCP for AI. Generate broadcast-quality voiceovers on demand.

Q: synthesizelongtext vs synthesizespeech, which should I use?

If your text is short (under 1024 characters), use synthesizespeech. If you're working with full articles, reports, or documentation, always use the dedicated synthesizelongtext tool.

Q: What if I need to control pauses in my audio? Use synthesizessml.

The specialized SSML function lets you embed tags like and . This gives granular control over the timing, pitch, and intonation that basic text synthesis can't manage.

Q: Can I see what voices are available first? Use listvoices.

Running listvoices is essential. It pulls all current voice models—male, female, child, and style-specific options—so you can build your script around known capabilities.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Volcengine Speech Synthesis handles high-fidelity, multi-lingual text-to-speech conversion. Use this MCP to generate natural narration, including signature TikTok voice styles, from simple text or complex markup languages like SSML.

It’s built for content creators and developers needing professional audio output across English, Chinese, Japanese, and more.

What your AI can do

Get audio formats

Lists the available output formats for the generated audio (like MP3 or WAV).

List voices

Retrieves every available TTS voice model to help you select the right sound for your project.

Synthesize long text

Generates audio from texts that are too long for standard synthesis calls, like full articles or reports.

+ 2 more capabilities included

Generate standard speech

Convert any block of text into natural, spoken audio using general voice styles.

Create unique voices

Train a custom voice model from your own high-quality audio recordings to give the AI a personalized sound.

Synthesize massive documents

Convert entire articles or long manuals into speech without hitting character limits.

Control tone and pacing

Use markup language to dictate precise timing, pauses, and emphasis in the generated audio.

Manage voice selection

List all available voice models—including specialized styles—before beginning any synthesis job.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Volcengine Speech Synthesis: 7 Tools

Use these seven specific functions to manage everything from training a new voice model to synthesizing massive documents with precise audio controls.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Volcengine Speech Synthesis on Vinkius

Get Audio Formats

Lists the available output formats for the generated audio (like MP3 or WAV).

List Voices

Retrieves every available TTS voice model to help you select the right sound for...

Synthesize Long Text

Generates audio from texts that are too long for standard synthesis calls, like full...

Synthesize Ssml

Uses specialized tags to control the exact timing, pauses, and emotional delivery of...

Synthesize Speech

Converts text into speech using various voice styles and supports multiple languages...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Speech Synthesis integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "volcengine-speech-synthesis": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Speech Synthesis tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"volcengine-speech-synthesis": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Volcengine Speech Synthesis, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Volcengine Speech. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 5 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

The headache of manual voiceover production today

Right now, turning a written document into professional audio means clicking through dozens of tabs. You copy text from your draft, paste it into an external TTS site, select a voice, and download the MP3. If you need to change the pacing or add emphasis, you have to go back and manually edit the source text, repeat the process, and re-download.

With this MCP, you simply send the core content once. The platform handles all the complex synthesis steps—from choosing the right voice model to managing multi-language requirements—and returns structured audio ready for your project.

Synthesize Speech with `synthesize_speech`

You ditch the manual process of segmenting text. You don't need to worry about whether the voice supports Chinese, English, or Japanese; you just specify the desired parameters and language. The agent handles the rest.

It's immediate control. You move from needing an expensive recording session to running a single command that generates perfect audio output.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

This connector lets you take any written text and turn it into broadcast-quality audio. You can generate speech using ByteDance's advanced voice models—the ones behind TikTok's viral effects—for everything from quick social media clips to entire audiobooks. It supports multi-language synthesis across English, Chinese, Japanese, and more, letting you create global content without ever touching a recording studio.

Need precise timing? You can use SSML tags to dictate exactly where the speaker pauses or when they put emphasis. For massive documents, there’s a dedicated process for synthesizing long text that standard tools choke on. Because this MCP deals with sensitive keys and high-volume audio generation, your credentials pass through Vinkius's zero-trust proxy; your keys never sit on disk.

This means you can trust the connection while building complex automations across multiple platforms.

Built · Hosted · Managed by Vinkius Volcengine Speech Synthesis - High Fidelity TTS Generator

Server ID 019d8499-f3f3-72a9-9e4e-b5441719ab4c

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

Need a specific sound? You can use list_voices to check every available model, from general narrators to the famous TikTok voices, before you write a single line of code.

Dealing with massive documents? Forget manual splitting. Use synthesize_long_text for articles and manuals that exceed standard character limits, keeping your workflow continuous.

Want maximum control over the narrative flow? Instead of just sending text, use synthesize_ssml to programmatically insert pauses, changes in tone, or specific emphasis points.

Need a unique brand sound? Train a custom voice using create_custom_voice. You feed it 10-50 recordings, and you get a proprietary voice model for your product.

Speed is key. The ability to adjust speech rate and volume with synthesize_speech means you can tweak the delivery of any piece without re-recording anything.

See it in action

01 01

Building a multi-lingual training module

An e-learning developer needs to create course material in English and Japanese. They use list_voices to confirm language support, then call synthesize_speech multiple times with different voice IDs for each language.

02 02

Creating an automated podcast chapter

A podcaster writes a 5,000-word transcript. Instead of manually segmenting it, they use synthesize_long_text, which handles the chunking and synthesis process automatically, giving them full narration.

03 03

Improving accessibility in an app

A developer is building a medical guide app. They need to ensure complex terms are read with perfect emphasis and pacing, so they use synthesize_ssml tags around the critical phrases.

04 04

Launching branded content

A marketing team wants their video ads to feature a specific brand voice. They first run through create_custom_voice, and once approved, they use that custom voice in all subsequent calls to synthesize_speech.

The honest tradeoffs

Calling TTS for short bursts repeatedly

Anti-pattern

Sending 50 small blocks of text via synthesize_speech one after the other. This is slow, inefficient, and increases API call overhead.

The Fix

If you're doing a large amount of content, always check if your text exceeds the standard limit; if it does, use synthesize_long_text. If you are creating many small clips, remember to first run get_audio_formats so you know exactly what output format is best for batch processing.

Ignoring voice availability

Anti-pattern

Trying to synthesize speech using a specific TikTok style or language without knowing the exact model ID. The call will fail with an unrecognized voice error.

The Fix

Before calling any synthesis tool, always run list_voices. This shows you all available models and confirms if your target voice—whether it's English or Chinese—is active in the catalog.

Not checking job status

Anti-pattern

Calling a long-form synthesis tool like synthesize_long_text and assuming the audio is immediately available, leading to failed calls because the task isn't done.

The Fix

After initiating a large synthesis job, use get_task_status. This lets you check if the background process is still 'processing,' or if it has successfully 'completed'.

Questions you might have

Does `synthesize_speech` support TikTok voices? +

Yes, the core synthesis function supports specific voice styles, including the famous TikTok models. You can select these via the available voice IDs to add trending flair to your content.

How do I make my own brand voice? Use `create_custom_voice`. +

You need 10-50 high-quality recordings of a single speaker. The tool trains the model over 1 to 3 days, giving you an exclusive voice for your brand.

`synthesize_long_text` vs `synthesize_speech`, which should I use? +

If your text is short (under 1024 characters), use synthesize_speech. If you're working with full articles, reports, or documentation, always use the dedicated synthesize_long_text tool.

What if I need to control pauses in my audio? Use `synthesize_ssml`. +

The specialized SSML function lets you embed tags like <break> and <emphasis>. This gives granular control over the timing, pitch, and intonation that basic text synthesis can't manage.

Can I see what voices are available first? Use `list_voices`. +

Running list_voices is essential. It pulls all current voice models—male, female, child, and style-specific options—so you can build your script around known capabilities.

When I use `get_audio_formats`, what's the difference between MP3 and WAV for my project? +

MP3 is best for delivery. It compresses audio, making it small enough for web streaming or apps without losing too much quality. If you need to edit the file later, stick with WAV; it keeps the raw, uncompressed data.

If I run a long synthesis job using `synthesize_speech`, how do I check its progress with `get_task_status`? +

You must pass the unique task ID returned by the initial request to get_task_status. This tool lets you poll the system to see if the process is pending, running, or if it failed completely.

Does `synthesize_speech` let me control the reading speed or volume of the generated audio? +

Yes, you can adjust both. The synthesis call accepts parameters for rate and volume. This lets your agent dynamically modify how fast or loud the final narration sounds.

What makes Volcengine TTS different from other TTS services? +

Volcengine powers the iconic TikTok TTS effects used in billions of videos. It offers industry-leading Chinese speech quality, trendy social media voices, and ByteDance's proprietary neural voice technology.

Which languages are supported? +

Chinese (Mandarin), English, Japanese, and more. Use language parameter: 'zh' for Chinese, 'en' for English, 'ja' for Japanese. Each language has multiple voice styles.

What's the max text length? +

Standard synthesis supports up to 1024 characters per request. For longer texts, use the synthesize_long_text tool which automatically handles chunking and combining results for articles and audiobooks.

Connect to your AI in seconds.

Get audio formats

List voices

Synthesize long text

Volcengine Speech Synthesis: 7 Tools

Make your AI actually useful.

Get Audio Formats

List Voices

Synthesize Long Text

Synthesize Ssml

Synthesize Speech

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

The headache of manual voiceover production today

Synthesize Speech with `synthesize_speech`

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Building a multi-lingual training module

Creating an automated podcast chapter

Improving accessibility in an app

Launching branded content

The honest tradeoffs

Calling TTS for short bursts repeatedly

Ignoring voice availability

Not checking job status

When It Fits, When It Doesn't

Questions you might have