# Replicate MCP

> Replicate MCP lets your AI client dynamically search, run, and manage thousands of open-source machine learning models. You can command complex tasks—like generating images, running specialized language models, or processing audio—directly from a chat prompt using natural language instructions.

## Overview
- **Category:** superpower
- **Price:** Free
- **Tags:** machine-learning, model-inference, open-source-models, fine-tuning, api-access, generative-ai

## Description

This connector gives your agent the power to interact with a massive library of open-source ML models without needing to run them on your own hardware. Instead of dealing with complex API calls and parameter files, you simply tell your AI client what you want done in plain English. It handles finding the right model, checking its required inputs, starting the job, and even monitoring it until it's finished.

Need a specific type of image? Your agent can search for models and then execute a prediction with just a few words. If the process is long-running, you don't have to wait by the console; your AI client manages the status updates automatically. It’s a huge step up from traditional methods. When you connect this capability through Vinkius, you get instant access to the entire catalog of model operations, making complex ML workflows manageable right inside your chat interface.

## Tools

### list_models
It shows you a list of all public machine learning models available on Replicate.

### get_account
This retrieves basic information about your connected Replicate account details.

### list_collections
It lists curated groups of models, such as those focused on 'Image-to-Text' or 'Audio Generation'.

### list_deployments
This shows you all the active model deployments you have set up personally.

### cancel_prediction
It stops a model prediction job that is currently running and prevents further processing.

### create_prediction
You start a new model prediction by supplying the required model version ID and all necessary inputs as a JSON object.

### get_collection
It retrieves details for a specific, defined group of models using its unique slug.

### get_model
This fetches detailed information about one specific model, including its exact owner and parameter requirements.

### get_prediction
It checks the current status of a prediction job and retrieves the final output if it has finished running.

### list_hardware
This lists all available GPU hardware options you can use for running your models.

### list_predictions
It retrieves a log of the recent prediction jobs that have been run by your account.

### search_models
You can search across the entire platform to find public models that match specific keywords or use cases.

## Prompt Examples

**Prompt:** 
```
List my recent predictions.
```

**Response:** 
```
Invoking `list_predictions` has successfully found your last requests. The most recent executed instance has an ID of `p_30abc...`, which is confirmed finished, running a prompt about dog pictures.
```

**Prompt:** 
```
Query Replicate to search for 'TTS' models.
```

**Response:** 
```
I ran `search_models` using your keyword "TTS". Replicate returned a wide array of options, including 'suno-ai/bark' and 'coqui/xtts'. Please specify the precise owner/model so I can inspect their schematics thoroughly!
```

**Prompt:** 
```
Cancel the prediction that has the ID `p_abc123`.
```

**Response:** 
```
I immediately routed the termination request using the tool `cancel_prediction` targeted at your specified iteration ID `p_abc123`. The model sequence is permanently halted, halting processing instantly.
```

## Capabilities

### Find and list available models
It lets your AI client search across thousands of public model definitions based on a keyword or use case.

### Execute ML predictions
You can start running specific open-source models, providing the necessary input variables to generate output like images or text.

### Manage job status and lifecycle
Your AI client tracks ongoing jobs, retrieving the results when they're ready or canceling them immediately if you change your mind.

## Use Cases

### Generating marketing assets for a new product launch.
A content manager needs 20 unique concept images. Instead of writing a script that iterates through image generation APIs, they prompt their agent: 'Find five text-to-image models and generate ten variations for this car design.' The agent uses `search_models` to find options, then executes multiple predictions.

### Analyzing user feedback audio files.
A researcher wants to test different speech-to-text or text-to-speech models. They use their agent to execute a prediction on an audio file, and if the results are poor, they can immediately call `list_predictions` to check historical logs for better model versions.

### Prototyping an LLM feature for a client.
A developer wants to test how different language models handle specific JSON inputs. They use the agent's ability to `get_model` metadata first, ensuring they provide the correct payload structure before calling `create_prediction`.

### Monitoring a large batch of scientific simulations.
A scientist kicks off 50 complex climate models. Instead of checking every dashboard, they ask their agent to monitor all jobs using `list_predictions`, getting real-time status updates until the final output is retrieved via `get_prediction`.

## Benefits

- Access diverse models instantly. You don't need to hardcode API endpoints; just tell your agent what kind of image or text you want, and it handles the search using `search_models`.
- Manage long jobs without stress. If a video generation task takes minutes, use `get_prediction` to check its status later or call `cancel_prediction` if the results aren't right.
- Stop guessing parameters. Before running anything, use `get_model` to pull up the exact schema and input requirements for any model you find, preventing failed runs.
- Run models without local setup. This MCP lets your agent connect directly to powerful cloud infrastructure, bypassing the need to install Python dependencies or manage GPU drivers locally.
- Build complex chains easily. You can instruct your AI client to take the output of one specialized model and feed it as input to a second model using natural language instructions.

## How It Works

The bottom line is that you tell your AI client what task to complete, and it handles all the necessary backend steps.

1. First, install the Replicate platform extension module into your MCP.
2. Next, input your personal Replicate API Token into the configuration variables.
3. Finally, prompt your agent naturally: 'Search for a video generation model, check its parameters, and generate a clip of a cat on Mars.'

## Frequently Asked Questions

**Can the Replicate MCP handle image generation?**
Yes, absolutely. You can command your agent to find and run specific text-to-image models by calling `search_models` and then executing a prediction.

**What is the difference between `list_collections` and `search_models` in Replicate MCP?**
`List_collections` shows pre-curated groups of related models (like all 'Audio Generation' tools). `Search_models` lets you search across every single model on the platform using keywords.

**How do I stop a job running with Replicate MCP?**
If a prediction is taking too long or isn't giving the right result, use `cancel_prediction` to halt it immediately and cleanly. This prevents unnecessary usage costs.

**Does Replicate MCP require me to run models on my own computer?**
No. The entire purpose of this MCP is that your agent connects to the cloud infrastructure, so you never have to worry about local hardware or setup conflicts.

**What if I want to see a history of my past model runs using Replicate MCP?**
You can check your recent activity by calling `list_predictions`. This tool gives you an immediate log of all the jobs that have been run through this MCP.