# Resemble AI MCP

> Resemble AI MCP gives you full control over synthetic speech. Generate high-quality audio clips from simple text input, clone voices from recordings, and transform existing speech into any target voice—all through a single connection point. It also includes built-in tools to detect deepfakes and apply digital watermarks, making your media production both powerful and secure.

## Overview
- **Category:** image-video
- **Price:** Free
- **Tags:** voice-cloning, text-to-speech, synthetic-media, speech-synthesis, audio-processing, deepfake-detection

## Description

Need to generate professional audio without recording talent? This MCP lets you create and manage synthetic voices directly from your agent. You can turn simple text into high-fidelity audio using custom or system voices, even supporting SSML for fine-tuned control. If you have existing audio, the MCP transforms it, letting you change the voice while keeping the original emotion and timing intact. Keeping track of all your work is easy; you manage projects and keep records organized in one place. Plus, since media authenticity matters, you can detect deepfakes or verify watermarks on any file to guarantee content legitimacy. Connecting this MCP via Vinkius means your AI client—whether it's Claude or Cursor—can handle all these complex audio tasks without needing multiple specialized services.

## Tools

### add_watermark
Applies an invisible digital watermark signature to protect an audio file's origin.

### create_clip
Generates a new audio clip from text, supporting advanced SSML formatting.

### create_project
Sets up a dedicated container for organizing related audio assets and work streams.

### create_recording
Uploads raw audio files specifically for the purpose of training a new voice model.

### create_voice
Initiates the process to build and register a brand-new, custom voice profile.

### delete_voice
Permanently removes a specific custom voice from your available library.

### detect_deepfake
Analyzes an audio file to calculate the probability of it being AI-generated or synthetic.

### get_clip
Retrieves the details and content of a specific, previously generated audio clip.

### get_voice
Fetches comprehensive metadata for a single registered voice profile.

### list_clips
Provides an overview of all the audio clips stored within a specific project container.

### list_projects
Retrieves a list of every active and archived project you have set up in the MCP.

### list_recordings
Shows all the raw audio recordings currently associated with a particular voice profile.

### list_voices
Returns a comprehensive list of every available custom and system-provided voice for use.

### speech_to_speech
Transforms an input audio file, changing its speaker's identity to the target voice while keeping the original tone.

### update_clip
Modifies or revises the content of an existing audio clip within a project.

### verify_watermark
Checks if a digital watermark is present and valid on an uploaded audio file.

## Prompt Examples

**Prompt:** 
```
List all my Resemble AI projects and their UUIDs.
```

**Response:** 
```
I've retrieved your projects. You have 'Marketing 2024' (UUID: proj_123) and 'Game Characters' (UUID: proj_456). Which one would you like to work with?
```

**Prompt:** 
```
Create a new audio clip in project proj_123 saying 'Welcome to the future of voice' using voice voice_789.
```

**Response:** 
```
Generating clip... Success! The clip has been created in project 'Marketing 2024'. You can access it via UUID clip_abc.
```

**Prompt:** 
```
Analyze this audio URL to see if it's a deepfake: https://example.com/audio.mp3
```

**Response:** 
```
Running deepfake detection... The analysis is complete. The audio shows a 98% probability of being synthetic (AI-generated).
```

## Capabilities

### Generate speech from text
You can create new, high-quality audio clips simply by providing text and selecting a voice.

### Transform existing voices
The MCP changes an input audio file into a target voice while preserving the original speaker's emotion and rhythm.

### Build custom voices
You upload raw recordings to train new, unique voices for your projects.

### Manage audio assets
The system lets you organize everything using projects and list all available voice profiles.

### Verify media authenticity
You can run checks on an audio file to see if it's synthetic or detect the presence of a digital watermark.

## Use Cases

### Localizing a Global Podcast Series
A content team needs to release a podcast in five languages. Instead of coordinating with five different voice actors, they use `create_voice` to clone the host's natural tone and then run `create_clip` repeatedly for each language, keeping perfect vocal consistency across all markets.

### Automating E-learning Content
An instructional designer needs hundreds of audio snippets for a new course. They write the scripts, use the MCP's TTS tools to generate every clip via `create_clip`, and then manage all these assets within a dedicated project using `create_project`.

### Investigating Media Leaks
A security team receives an anonymous audio file. They immediately use the MCP to run `detect_deepfake`, confirming if it's synthetic, and then run `verify_watermark` to see if any official source protected it.

### Updating Character Voices in a Game
A development team needs an NPC character to speak new lines. They use the MCP to clone the original actor's voice using `create_recording`, and then generate the new dialogue using `speech_to_speech` for immediate implementation.

## Benefits

- Generate voiceovers instantly. Instead of hiring an actor or recording studio, you use the `create_clip` tool to turn text into professional audio using any available voice.
- Maintain vocal consistency across projects. Use `speech_to_speech` to transfer a known speaker's unique emotional tone and timing onto new source material, ensuring continuity in your brand messaging.
- Protect content integrity from the start. You can apply an imperceptible watermark with `add_watermark` and later verify it using `verify_watermark`, proving who created the audio.
- Stay organized while scaling up. The MCP lets you use `create_project` to group all assets related to one campaign, making it easy to locate everything via `list_projects`.
- Deepfake defense is built-in. Use `detect_deepfake` on suspicious files to check their source probability, or use the tool when reviewing sensitive media.

## How It Works

The bottom line is that you get advanced voice synthesis and security tools integrated into any workflow, turning complex media tasks into simple conversational commands.

1. Subscribe to this MCP and provide your Resemble AI API Token.
2. Your agent calls the necessary tools, like creating a new project or listing voices.
3. The platform processes the request—whether it's generating text-to-speech audio or running deepfake detection—and returns the resulting file or data to your client.

## Frequently Asked Questions

**How do I start using Resemble AI MCP for voice cloning?**
You must first subscribe and provide your API token to the MCP. Then, you use `create_voice` and follow up with `create_recording` to upload the necessary source audio.

**Can Resemble AI MCP handle multiple projects?**
Yes, absolutely. You can call `list_projects` to see all your work areas, and use `create_project` to segment different campaigns or client accounts.

**What is the difference between creating a clip and updating a clip using Resemble AI MCP?**
Use `create_clip` when you are generating audio from scratch, usually with new text. Use `update_clip` if the content of an existing piece needs minor revisions or edits.

**How do I check if an audio file is a deepfake using Resemble AI MCP?**
Simply use the `detect_deepfake` tool and provide the URL for the suspicious audio. It will return a probability score indicating how likely it is to be synthetic.

**Does Resemble AI MCP support SSML tagging?**
Yes, it supports full SSML (Speech Synthesis Markup Language) within the `create_clip` tool. This allows you fine-grained control over pacing and pronunciation beyond basic text input.