# Inworld AI MCP

> Inworld AI connects advanced voice synthesis, character routing, and cloning capabilities directly to your agent. Generate high-fidelity speech from text, clone voices using audio samples, or build complex conversational logic with LLM routers. It's designed for creating lifelike NPCs and sophisticated multimodal agents.

## Overview
- **Category:** artificial-intelligence
- **Price:** Free
- **Tags:** text-to-speech, voice-cloning, ai-characters, conversational-ai, speech-synthesis

## Description

This MCP lets your AI client generate dynamic characters that speak and react in real time. You can create unique digital personas by cloning existing voices from simple audio files, or you can design entirely new voices just by describing them with text prompts. Beyond voice, the system manages complex character logic using routers; this lets you build conversation paths where the agent knows exactly how to behave based on context. Need to process an incoming voice message? You can transcribe it directly into usable text for your agents. All of this is accessible through your preferred AI client on the Vinkius Marketplace. It’s built for scenarios that require more than just simple Q&A—it handles the entire spectrum, from initial speech synthesis to complex conversational branching.

## Tools

### chat_completions
Generates chat completions by running the request through a defined LLM Router.

### clone_voice
Creates a new voice profile by analyzing and replicating an existing audio sample.

### create_realtime_call
Sets up a WebRTC connection to enable real-time, bidirectional voice communication with the agent.

### create_router
Builds and initializes a new LLM Router that manages how the agent processes different types of input.

### delete_router
Removes an existing LLM Router from your workspace.

### delete_voice
Permanently deletes a voice profile you have created or cloned.

### design_voice
Generates a unique, temporary voice preview based solely on a written text description.

### get_router
Retrieves the specific details and configuration of an LLM Router by its ID.

### get_voice
Fetches all metadata for a single voice profile using its unique identifier.

### list_models
Shows you all the available Large Language Models the agent can use for processing.

### list_routers
Lists every LLM Router currently set up in your workspace.

### list_tts_voices
A deprecated function to list Text-to-Speech voices; use 'list_voices' instead.

### list_voices
Retrieves a full catalog of all voice assets currently available in your workspace.

### publish_voice
Takes a draft or preview voice and makes it a permanent, usable asset within your published models.

### synthesize_speech_stream
Generates speech audio in real-time chunks for streaming playback to the user.

### synthesize_speech_sync
Creates a complete, finished speech file from text that can be played back instantly.

### transcribe_audio
Converts an uploaded audio file into plain text format in a single synchronous call.

### update_router
Modifies the logic and parameters of an existing LLM Router to change its behavior.

### update_voice
Makes changes to an existing voice profile, such as updating metadata or publication status.

## Prompt Examples

**Prompt:** 
```
List all the voices available in my Inworld workspace.
```

**Response:** 
```
I've retrieved your voice library. You have 12 voices available, including 'Narrator Alpha' (ID: v-123) and 'Heroic Guide' (ID: v-456). Would you like to see details for a specific one?
```

**Prompt:** 
```
Synthesize the text 'Welcome to the future of AI' using voice ID 'v-123' and the expressive delivery mode.
```

**Response:** 
```
Processing synthesis with `synthesize_speech_sync`... The audio has been generated successfully using the `EXPRESSIVE` delivery mode. You can now play or download the result.
```

**Prompt:** 
```
Design a new voice that sounds like a mysterious, deep-voiced wizard for my game.
```

**Response:** 
```
I'm using `design_voice` with your description. I've generated a preview for a 'Mysterious, deep-voiced wizard'. Would you like to hear the preview or publish this voice to your workspace?
```

## Capabilities

### Generate Speech from Text
Synthesize high-quality audio streams synchronously or in real time using advanced text-to-speech models.

### Create and Manage Digital Voices
Clone a voice from an existing audio sample, or create a brand new voice by providing descriptive text prompts.

### Orchestrate Agent Behavior
Build complex conversation flows using LLM routers to manage how the agent processes inputs and decides its next action.

### Transcribe Audio Inputs
Convert audio files into plain text, making spoken user input immediately available for your agent's processing.

## Use Cases

### The RPG Quest Giver
A game developer needs a quest giver NPC who speaks with a specific, deep-voiced accent. They use the `design_voice` tool to create a unique voice preview based on text prompts and then save it using `publish_voice`. The agent’s response is managed by setting up an LLM Router via `create_router`, ensuring that if the player asks about lore (a specific topic), the correct character logic executes.

### The Live Streamer
A content creator needs to automate a video where characters interact. They send an audio clip of dialogue, which is processed by `transcribe_audio` and then fed into the system's chat functions. The agent responds using a cloned voice via `synthesize_speech_sync`, allowing them to generate hours of unique character interaction automatically.

### The Customer Support Bot
An engineer needs an internal bot that can handle calls and transcribe speech for logging. They use the `create_realtime_call` tool to connect the agent via WebRTC, capturing user audio in real-time. The resulting text is then passed into a router configured with specific support protocols.

### The Collaborative Agent Team
A team wants multiple agents to debate a topic and report their findings. They use the `list_models` tool to select the best LLM, set up dedicated roles using `create_router`, and then pass text inputs through `chat_completions` to simulate multi-agent deliberation.

## Benefits

- Real-time voice communication: Use `create_realtime_call` to enable immediate, two-way audio conversations with NPCs, making interactions feel natural.
- Flexible character logic: Instead of hardcoding rules, use tools like `create_router` and `chat_completions` to let the agent dynamically decide how to respond based on complex context.
- Voice cloning capability: You can clone a voice from any audio sample using `clone_voice`, eliminating the need for pre-recorded dialogue files. This is huge for content volume.
- Advanced text input handling: Send an audio file and let your agent process it instantly by running the `transcribe_audio` tool, keeping the workflow seamless.
- Full asset control: Manage every part of your character's voice library—from listing voices with `list_voices` to publishing them via `publish_voice`.

## How It Works

The bottom line is that your agent can access advanced voice features and complex character logic without needing local setup or custom API calls.

1. Subscribe to this MCP and provide your Inworld API credentials.
2. Your AI client gains access to all voice and character tools, letting you manage voices and routers directly from the chat interface.
3. You send a request—for example, asking the agent to synthesize text or listing available characters—and receive structured data back for immediate use.

## Frequently Asked Questions

**How do I make an agent respond using a voice that sounds like me? (using clone_voice)**
You use `clone_voice` by providing it with an audio sample of your speaking. The tool analyzes the phonetics and generates a new, unique voice ID that you can then reference when synthesizing speech.

**What is the best way to handle user audio input? (using transcribe_audio)**
You send the raw audio file directly to `transcribe_audio`. The tool handles the conversion synchronously, delivering clean, text-based data that your agent can immediately process.

**How do I manage multiple character personalities? (using create_router)**
You set up a dedicated LLM Router using `create_router`. This router acts as the central decision point, checking user input and directing the request to the appropriate logic path or personality module.

**Can I change my character's voice mid-conversation? (using update_voice)**
Yes. If you need to adjust a published asset, use `update_voice`. This allows you to modify metadata or the underlying audio profile without having to delete and recreate the entire voice.

**Before building agent logic, how do I know which LLM models are available by using the `list_models` tool?**
Just call `list_models`. This returns a current list of all foundational models that your agents can use. It's smart to run this first so you confirm exactly which model names will work with the router before building complex flows.

**If I need real-time audio feedback, should I use `synthesize_speech_stream` or `synthesize_speech_sync`?**
You must use `synthesize_speech_stream`. This method delivers the audio data in chunks as it's generated. That’s essential for keeping interactions responsive and eliminating noticeable delays in your agent's speech output.

**How do I clean up old or unused character logic using `delete_router`?**
You use the `delete_router` tool, providing only the router ID. This completely removes the defined conversational path and its associated context from your workspace. It's a critical cleanup step to keep your account tidy.

**When my agent needs to generate text after routing through multiple steps, how does `chat_completions` finalize the reply?**
The `chat_completions` tool executes the final logic defined by the LLM Router. It takes the entire conversational context and generates the ultimate text response. Think of it as the final step in a complex workflow, giving you the agent's intended words.

**How can I create a custom voice using only a text description?**
You can use the `design_voice` tool. Simply provide a prompt like 'Warm, friendly male voice' and a preview text. The tool will generate voice options that you can later publish to your library.

**What is the difference between synchronous and streaming speech synthesis?**
Use `synthesize_speech_sync` to receive the full audio file once processing is complete. Use `synthesize_speech_stream` for real-time applications where you want to receive audio chunks as they are generated for lower latency.

**Can I manage multiple AI characters or models through this server?**
Yes. You can use `list_routers` and `get_router` to manage your orchestration layers, and `list_models` to see available AI models in your Inworld workspace.