# Replicate MCP

> Replicate MCP Server connects your AI client directly to thousands of open-source machine learning models. It lets you search for, execute, and monitor complex ML predictions (like image generation or specialized LLMs) using simple text commands—all without running the code on your local hardware.

## Overview
- **Category:** superpower
- **Price:** Free
- **Tags:** machine-learning, model-inference, open-source-models, fine-tuning, api-access, generative-ai

## Description

**Replicate MCP Server**

Connect your AI client directly to Replicate for thousands of open-source machine learning models. You don't need to set up local environments or manage GPU resources yourself; your agent handles it all on the backend. It lets you use complex ML predictions—like image generation or specialized LLMs—just by sending simple text commands.

**Running and Monitoring Predictions**

The core function is running model predictions. You call `create_prediction` when you want to start a new run; this requires you to supply the exact model version ID and all necessary input variables in JSON format. To keep tabs on what's happening, use `get_prediction` to check the current status or grab the final output of any prediction you started earlier. If a process runs wild or you change your mind, you can immediately halt it using `cancel_prediction`, which kills a running model job by its unique ID.

**Finding and Inspecting Models**

If you need to find a model for a specific task, use `search_models` to scan the public catalog with keywords. If you know the general category, try `list_collections` to see curated groups of models—for example, 'Image-to-Text' or 'Audio Generation.' You can also pull a list of every available public model using `list_models`. When your client finds a promising candidate model ID, it needs its specific requirements; run `get_model` to retrieve all the details, including the exact input schema and parameter rules for that single model. To see what's currently running or deployed within your organization, you can check out `list_deployments`, which shows your active models and their status.

**Advanced Discovery and System Checks**

For a deeper dive into available tools, use `get_collection` by providing a specific collection slug to fetch all the related models in that group. You can also see what GPU hardware options are available for running inferences on Replicate using `list_hardware`. To keep track of past work, `list_predictions` displays your full log of recent prediction history, including status updates and links to outputs. If you need basic verification of your access, run `get_account`; this tool retrieves essential details about your authenticated Replicate account.

**Putting It All Together**

Your AI client can build a whole workflow using these tools. You start by running `search_models` for 'text-to-image,' then use `get_model` on the best result to confirm the required JSON structure, and finally execute `create_prediction`. If you want to make sure everything is working right before calling it, you can check your active deployments with `list_deployments` or see what collections are out there using `list_collections`. This server gives your agent direct control over a massive library of open-source algorithms without needing any local setup. It's all about sending the right commands to get results.

## Tools

### cancel_prediction
Stops a model prediction that is currently running on Replicate by its unique ID.

### create_prediction
Starts a new model run, requiring the model version ID and all necessary input variables as JSON.

### get_account
Retrieves basic details about your authenticated Replicate account for verification.

### get_collection
Fetches a specific group of models using its unique collection slug (e.g., 'text-to-image').

### get_model
Retrieves all details, including the required input schema, for one specific model.

### get_prediction
Checks and retrieves the current status or final output of a previously started prediction run.

### list_collections
Lists all curated model collections available on Replicate, like 'Image-to-Text'.

### list_deployments
Shows a list of your active, deployed models and their status within Replicate.

### list_hardware
Lists the GPU hardware options currently available for running model inferences on Replicate.

### list_models
Provides a list of all public models that are generally available on the Replicate platform.

### list_predictions
Displays a log of your recent prediction history, including status and output links.

### search_models
Searches the public model catalog using keywords to find relevant open-source algorithms.

## Prompt Examples

**Prompt:** 
```
List my recent predictions.
```

**Response:** 
```
Invoking `list_predictions` has successfully found your last requests. The most recent executed instance has an ID of `p_30abc...`, which is confirmed finished, running a prompt about dog pictures.
```

**Prompt:** 
```
Query Replicate to search for 'TTS' models.
```

**Response:** 
```
I ran `search_models` using your keyword "TTS". Replicate returned a wide array of options, including 'suno-ai/bark' and 'coqui/xtts'. Please specify the precise owner/model so I can inspect their schematics thoroughly!
```

**Prompt:** 
```
Cancel the prediction that has the ID `p_abc123`.
```

**Response:** 
```
I immediately routed the termination request using the tool `cancel_prediction` targeted at your specified iteration ID `p_abc123`. The model sequence is permanently halted, halting processing instantly.
```

## Capabilities

### Run Model Predictions
Starts a new model prediction by sending the required inputs and version ID to Replicate.

### Monitor Prediction Status
Retrieves the current status, output, or final result of any given prediction run.

### Stop Running Processes
Immediately halts and cancels a prediction that is currently running on Replicate.

### Search for Models by Use Case
Scans the public catalog to find models that match a specific search query or category.

### List Available Model Groups
Retrieves curated collections of related models, like 'Image-to-Text' or 'Audio Generation'.

### Get Model Metadata
Pulls the full details and required parameter schema for a specific model ID.

## Use Cases

### Generating Video Assets from Text
A content creator needs a clip of 'a cat walking on Mars.' They tell their agent to use `search_models` for video generation. The agent finds a model, uses `get_model` to validate the required text prompt and aspect ratio, then executes the job using `create_prediction`. The creator gets the finished video link back in the chat.

### Debugging Model Inputs
A researcher finds a promising model but isn't sure what inputs it needs. Instead of wasting time, they prompt their agent to run `get_model` on that specific ID. The agent pulls the schema, showing them exactly which variables (e.g., 'seed', 'style') are mandatory before running `create_prediction`.

### Batch Testing Model Reliability
An ML engineer needs to compare three different image generation models. They use `list_collections` to find a group, then systematically call `get_model` for each one. This lets them gather the precise parameters needed before running multiple predictions.

### Stopping Stuck Jobs
A user runs a prediction that gets stuck in an infinite loop. They realize they need to stop it immediately and tell their agent: 'Cancel the job with ID p_xyz.' The agent then calls `cancel_prediction`, halting the process instantly.

## Benefits

- **Predict instantly:** Use `create_prediction` and your agent handles the entire process. You just provide the prompt; we handle the cloud computation required for image or video generation.
- **Avoid setup hell:** Forget managing local dependencies. This server executes code remotely on Replicate's infrastructure, letting you focus purely on the ML concept, not the environment.
- **Track everything:** Keep a clean record of all jobs using `list_predictions`. You can always see if that 'cat walking on Mars' prompt actually finished and what the output was.
- **Plan your workflow:** Before running anything, use `get_model` to inspect the model schema. This prevents failed runs because you know exactly which variables are required for success.
- **Handle failures gracefully:** If a job times out or fails, use `cancel_prediction` to shut it down immediately and avoid wasting API credits on dead processes.

## How It Works

The bottom line is: you use your agent to talk to Replicate's API; the server translates that conversation into a structured ML job and runs it in the cloud.

1. First, run `search_models` to find a suitable open-source algorithm (e.g., 'video generation').
2. Next, call `get_model` with the model ID found in the search results to grab its exact input parameters and schema.
3. Finally, execute the prediction using `create_prediction`, feeding it the validated variables obtained from `get_model`.

## Frequently Asked Questions

**How do I find out what models are available using search_models?**
You simply ask your agent to 'Search for image generation models.' The server runs `search_models` and returns a list of potential model IDs you can use later.

**Can I check the status of a running job using get_prediction?**
Yes. If you have an ID for a prediction, calling `get_prediction` tells you if it's 'Running,' 'Finished,' or 'Failed,' along with the output if it succeeded.

**What is the difference between list_models and search_models?**
`list_models` shows a general roster of all public models. `search_models` lets you filter that roster by specific keywords or use cases, which is usually more direct.

**If my prediction fails, how do I cancel_prediction?**
You must provide the unique ID of the job that failed. The agent runs `cancel_prediction` on that ID to ensure no lingering charges or processes remain open.

**Before running a model, how do I verify my API credentials using the `get_account` tool?**
The `get_account` tool pulls your authenticated Replicate account details directly. This confirms that your AI client has access to your billing and usage limits before you start generating expensive predictions.

**When using `create_prediction`, what format must the input variables be in?**
You must supply model parameters as a strict JSON object. The system requires key-value pairs that exactly match the schema defined by the specific model version ID you are calling.

**How does `list_collections` differ from simply listing all public models using `list_models`?**
`list_collections` returns curated groups of related models (e.g., 'Audio Generation'). This helps you browse by a specific domain or use case, rather than sifting through every single model available.

**If I'm planning for high-volume processing, how can I check the available GPU resources using `list_hardware`?**
`list_hardware` shows you the current pool of deployable hardware options. Use this to gauge capacity and select the most efficient compute resource before running a prediction.

**Can the agent pass a JSON payload directly into a Replicate model?**
Yes. You can utilize the `create_prediction` action and attach the payload parameter filled out with any required input schema (e.g., specific `prompt`, `num_inference_steps`). Since models change inputs constantly, you should always ask your assistant to fetch the schema details first via `get_model` to verify keys.

**Does the prediction command return results instantly?**
No, Replicate's API operates asynchronously. The initial command gives your assistant an ID. You must then ask your AI companion to query the `get_prediction` tool periodically using that generated ID until it displays the completed status along with the generated web URLs or generated strings.

**Can the AI browse trending or curated model collections?**
Yes. Use the `list_collections` tool to browse curated groups of models organized by category — such as image generation, text-to-speech, or video. Each collection includes a slug and description so you can quickly identify the right set of models for your use case.