Inworld AI MCP for AI. Build characters that genuinely sound alive.
Works with every AI agent you already use
…and any MCP-compatible client








How this MCP server connects to your AI agent
Inworld AI connects advanced voice synthesis, character routing, and cloning capabilities directly to your agent. Generate high-fidelity speech from text, clone voices using audio samples, or build complex conversational logic with LLM routers.
It's designed for creating lifelike NPCs and sophisticated multimodal agents.
What AI agents can do with Inworld AI Automation
Chat completions
Generates chat completions by running the request through a defined LLM Router.
Clone voice
Creates a new voice profile by analyzing and replicating an existing audio sample.
Create realtime call
Sets up a WebRTC connection to enable real-time, bidirectional voice communication with the agent.
Synthesize high-quality audio streams synchronously or in real time using advanced text-to-speech models.
Clone a voice from an existing audio sample, or create a brand new voice by providing descriptive text prompts.
Build complex conversation flows using LLM routers to manage how the agent processes inputs and decides its next action.
Convert audio files into plain text, making spoken user input immediately available for your agent's processing.
Ask an AI about this
Waiting for input…
What AI agents can do with Inworld AI: 19 Tools
These tools cover the entire spectrum of voice processing, from cloning and synthesis to advanced conversation routing.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Inworld AI on VinkiusChat Completions
Generates chat completions by running the request through a defined LLM Router.
Clone Voice
Creates a new voice profile by analyzing and replicating an existing audio sample.
Create Realtime Call
Sets up a WebRTC connection to enable real-time, bidirectional voice communication...
Create Router
Builds and initializes a new LLM Router that manages how the agent processes...
Delete Router
Removes an existing LLM Router from your workspace.
Delete Voice
Permanently deletes a voice profile you have created or cloned.
Design Voice
Generates a unique, temporary voice preview based solely on a written text description.
Get Router
Retrieves the specific details and configuration of an LLM Router by its ID.
Get Voice
Fetches all metadata for a single voice profile using its unique identifier.
List Models
Shows you all the available Large Language Models the agent can use for processing.
List Routers
Lists every LLM Router currently set up in your workspace.
List Tts Voices
A deprecated function to list Text-to-Speech voices; use 'list_voices' instead.
List Voices
Retrieves a full catalog of all voice assets currently available in your workspace.
Publish Voice
Takes a draft or preview voice and makes it a permanent, usable asset within your...
Synthesize Speech Stream
Generates speech audio in real-time chunks for streaming playback to the user.
Synthesize Speech Sync
Creates a complete, finished speech file from text that can be played back instantly.
Transcribe Audio
Converts an uploaded audio file into plain text format in a single synchronous call.
Update Router
Modifies the logic and parameters of an existing LLM Router to change its behavior.
Update Voice
Makes changes to an existing voice profile, such as updating metadata or publication status.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Inworld AI, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Inworld AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 19 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Getting the conversation to sound natural is hard., Solved with Vinkius AI Gateway
Today, if you build an agent that needs a unique voice for each character, you're stuck in painful cycles. You manually record lines, upload them, and then write complex conditional logic to ensure Character A speaks at the right time, with the correct tone, and never exceeds its allowed dialogue length.
With this MCP, your agent manages all that complexity internally. You simply define the character voice—whether by cloning or describing it—and let the routing system handle the rest of the choreography. What you get is a living dialogue, not a script.
The `chat_completions` tool controls conversation flow.
Without this MCP's routing tools, your agent runs on one single logic path. If the user veers off-topic—saying something about history instead of combat—the basic model struggles to pivot gracefully or know which specialized knowledge base to reference.
By implementing `chat_completions` through an LLM Router, you teach the system how to think in stages. It checks for intent first, then routes the request to a specific module, ensuring the response is always context-aware and hits the right narrative beat.
What your AI can actually do with this
This MCP lets your AI client generate dynamic characters that speak and react in real time. You can create unique digital personas by cloning existing voices from simple audio files, or you can design entirely new voices just by describing them with text prompts. Beyond voice, the system manages complex character logic using routers; this lets you build conversation paths where the agent knows exactly how to behave based on context.
Need to process an incoming voice message? You can transcribe it directly into usable text for your agents. All of this is accessible through your preferred AI client on the Vinkius Marketplace. It’s built for scenarios that require more than just simple Q&A—it handles the entire spectrum, from initial speech synthesis to complex conversational branching.
019e5d27-3aed-7225-b886-3b43275d3091 Here's how it actually works
The bottom line is that your agent can access advanced voice features and complex character logic without needing local setup or custom API calls.
Subscribe to this MCP and provide your Inworld API credentials.
Your AI client gains access to all voice and character tools, letting you manage voices and routers directly from the chat interface.
You send a request—for example, asking the agent to synthesize text or listing available characters—and receive structured data back for immediate use.
Who is this actually for?
Game Developers building dynamic worlds. Content Creators automating large-scale voiceovers. AI Engineers designing sophisticated, context-aware agents.
Needs to program NPCs that don't just speak pre-recorded lines; they need unique voices and react realistically in conversation.
Requires generating massive amounts of voice content, like audiobook narration or character dialogue, using custom cloned voices.
Designs agentic workflows that require multi-step decision making and dynamic routing based on user input type (e.g., text vs. audio).
What Changes When You Connect
Real-time voice communication: Use create_realtime_call to enable immediate, two-way audio conversations with NPCs, making interactions feel natural.
Flexible character logic: Instead of hardcoding rules, use tools like create_router and chat_completions to let the agent dynamically decide how to respond based on complex context.
Voice cloning capability: You can clone a voice from any audio sample using clone_voice, eliminating the need for pre-recorded dialogue files. This is huge for content volume.
Advanced text input handling: Send an audio file and let your agent process it instantly by running the transcribe_audio tool, keeping the workflow seamless.
Full asset control: Manage every part of your character's voice library—from listing voices with list_voices to publishing them via publish_voice.
See it in action
The RPG Quest Giver
A game developer needs a quest giver NPC who speaks with a specific, deep-voiced accent. They use the design_voice tool to create a unique voice preview based on text prompts and then save it using publish_voice. The agent’s response is managed by setting up an LLM Router via create_router, ensuring that if the player asks about lore (a specific topic), the correct character logic executes.
The Live Streamer
A content creator needs to automate a video where characters interact. They send an audio clip of dialogue, which is processed by transcribe_audio and then fed into the system's chat functions. The agent responds using a cloned voice via synthesize_speech_sync, allowing them to generate hours of unique character interaction automatically.
The Customer Support Bot
An engineer needs an internal bot that can handle calls and transcribe speech for logging. They use the create_realtime_call tool to connect the agent via WebRTC, capturing user audio in real-time. The resulting text is then passed into a router configured with specific support protocols.
The Collaborative Agent Team
A team wants multiple agents to debate a topic and report their findings. They use the list_models tool to select the best LLM, set up dedicated roles using create_router, and then pass text inputs through chat_completions to simulate multi-agent deliberation.
The honest tradeoffs
Using only basic synthesis.
The agent sounds flat, robotic, and doesn't fit the character's personality. It just spits out text that reads like a Wikipedia article.
Don't stick to simple calls; use clone_voice first to establish a specific voice identity. Then, manage conversational flow with create_router so the agent knows how to speak, not just what to say.
Ignoring audio input.
The user speaks into their microphone, but the agent only replies based on text prompts because the system never processed the spoken word.
Always run incoming audio through transcribe_audio first. This converts speech to a format your router can read and act upon.
Over-relying on one model.
When the conversation gets complex, the single LLM fails because it doesn't know which specialized sub-routine (e.g., 'Lore Check' vs. 'Combat Response') to activate.
Build a routing layer using create_router. This structure allows you to check for specific keywords or topics and route the request to the appropriate, dedicated character logic.
When It Fits, When It Doesn't
Use this MCP if your project requires genuine voice fidelity (cloning, streaming) and complex conversational branching. Specifically, if your agents need to handle audio input (transcribe_audio) or require dynamic role-switching between different logical pathways (create_router), this is the toolset you need. Don't use it if you just need a simple text summary (use a basic chat endpoint) or if all your characters speak in one consistent, non-variable voice (basic TTS might suffice). The power here is combining multimodal input with structured output routing.
Questions you might have
How do I make an agent respond using a voice that sounds like me? (using clone_voice) +
You use clone_voice by providing it with an audio sample of your speaking. The tool analyzes the phonetics and generates a new, unique voice ID that you can then reference when synthesizing speech.
What is the best way to handle user audio input? (using transcribe_audio) +
You send the raw audio file directly to transcribe_audio. The tool handles the conversion synchronously, delivering clean, text-based data that your agent can immediately process.
How do I manage multiple character personalities? (using create_router) +
You set up a dedicated LLM Router using create_router. This router acts as the central decision point, checking user input and directing the request to the appropriate logic path or personality module.
Can I change my character's voice mid-conversation? (using update_voice) +
Yes. If you need to adjust a published asset, use update_voice. This allows you to modify metadata or the underlying audio profile without having to delete and recreate the entire voice.
Before building agent logic, how do I know which LLM models are available by using the `list_models` tool? +
Just call list_models. This returns a current list of all foundational models that your agents can use. It's smart to run this first so you confirm exactly which model names will work with the router before building complex flows.
If I need real-time audio feedback, should I use `synthesize_speech_stream` or `synthesize_speech_sync`? +
You must use synthesize_speech_stream. This method delivers the audio data in chunks as it's generated. That’s essential for keeping interactions responsive and eliminating noticeable delays in your agent's speech output.
How do I clean up old or unused character logic using `delete_router`? +
You use the delete_router tool, providing only the router ID. This completely removes the defined conversational path and its associated context from your workspace. It's a critical cleanup step to keep your account tidy.
When my agent needs to generate text after routing through multiple steps, how does `chat_completions` finalize the reply? +
The chat_completions tool executes the final logic defined by the LLM Router. It takes the entire conversational context and generates the ultimate text response. Think of it as the final step in a complex workflow, giving you the agent's intended words.
How can I create a custom voice using only a text description? +
You can use the design_voice tool. Simply provide a prompt like 'Warm, friendly male voice' and a preview text. The tool will generate voice options that you can later publish to your library.
What is the difference between synchronous and streaming speech synthesis? +
Use synthesize_speech_sync to receive the full audio file once processing is complete. Use synthesize_speech_stream for real-time applications where you want to receive audio chunks as they are generated for lower latency.
Can I manage multiple AI characters or models through this server? +
Yes. You can use list_routers and get_router to manage your orchestration layers, and list_models to see available AI models in your Inworld workspace.
We've already built the connector for Inworld AI. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 19 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.