# Replicate MCP

> Replicate lets your AI agent access thousands of open-source machine learning models—for generating images, text, audio, and video. Instead of jumping between web dashboards or writing complex API calls, you talk to your agent, and it handles the entire ML lifecycle: finding a model, setting parameters, running the prediction, and retrieving the final result.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** machine-learning, model-inference, generative-ai, api-integration, cloud-computing

## Description

Your AI client connects directly to this MCP to treat open-source ML models like an internal service. You can ask your agent to find specific capabilities—like text-to-image generators or advanced LLMs—and it handles model discovery and selection across thousands of available options. Need to run something? Just tell your agent what you want, and it executes the prediction. It tracks everything from 'starting' to 'succeeded', giving you a single conversation thread for complex ML operations. The whole process is abstracted away; you don't manage API keys or wait on status pages. All this power is housed within Vinkius, making Replicate an operational resource available through any MCP-compatible client.

## Tools

### cancel_prediction
Stops an ML prediction that is currently running using its unique ID.

### create_prediction
Starts a new model run by sending the required inputs and specifying the target model.

### get_account
Checks your API token status, showing your account type and usage limits.

### get_collection
Retrieves details for a specific group of models, like all audio-to-audio effects.

### get_model
Fetches detailed information about a single ML model using its full owner/name slug.

### get_model_versions
Lists all available versions for a specific model, including their IDs and required schemas.

### get_prediction
Retrieves the current status and final output data of any prediction using its ID.

### list_collections
Shows all available model collections, grouping models by type (e.g., text-to-image).

### list_hardware
Provides a list of available GPU hardware options and their pricing tiers.

### list_models
Displays a broad catalog of every model, including run counts and required hardware.

### list_predictions
Shows the history of your most recent runs, giving IDs, models, and status for tracking.

### search_models
Narrows down the catalog to find specific types of models using a keyword query (e.g., 'music' or 'llm').

## Prompt Examples

**Prompt:** 
```
List all text-to-image collections on Replicate.
```

**Response:** 
```
Found the 'text-to-image' collection with featured models including stability-ai/sdxl, black-forest-labs/flux-schnell, and ideogram-ai/ideogram-v2. The collection has 50+ models total.
```

**Prompt:** 
```
Search for LLM models on Replicate.
```

**Response:** 
```
Found popular LLM models: meta/meta-llama-3-70b-instruct (2M+ runs), mistralai/mistral-7b-instruct-v0.3 (1.5M+ runs), google/gemma-2-27b-it (800K+ runs). Each model shows hardware requirements and example inputs.
```

**Prompt:** 
```
Create a prediction using stability-ai/sdxl with prompt 'a sunset over mountains, photorealistic'.
```

**Response:** 
```
Created prediction pred_abc123. Status: starting. Check back with `get_prediction` to retrieve the generated image URL once it completes (usually 10-30 seconds).
```

## Capabilities

### Discovering Model Capabilities
Your agent finds and details specific ML models by name or category.

### Finding Related Models
You can list entire groups of related models, such as all text-to-image generators or all LLMs.

### Checking Account Status
The agent verifies your token status and shows you current usage information.

### Launching Predictions
You initiate a model run by providing the necessary input data to generate content.

### Tracking Results
The agent monitors running predictions, telling you when they start, process, fail, or finish.

### Managing Resources
You can view available GPU hardware options and list your prediction history.

## Use Cases

### Creating an AI art campaign
A marketer needs 50 different fantasy images for a product launch. Instead of manually running fifty separate commands, they ask their agent to search for the best text-to-image model, run five variations, and track all the outputs using `create_prediction` and `get_prediction`.

### Testing LLM performance
A developer needs to compare how three different Large Language Models (LLMs) handle a specific set of prompts. They use `search_models` to find the best candidates, then run multiple predictions, and finally review their usage logs using `list_predictions`.

### Building an automated video pipeline
A content creator wants to turn a text description into a short video. They first check available hardware with `list_hardware`, select the right model, and run the prediction, ensuring they get all necessary status updates via `get_prediction`.

### Debugging an ML pipeline
An ML engineer runs a batch of predictions but one fails. Instead of checking logs manually, they use `list_predictions` to see the failure ID and then check the details using `get_prediction` to understand why it failed.

## Benefits

- Stop managing multiple websites. Instead of navigating the Replicate site to check status, you simply ask your agent for the prediction status using `get_prediction` or review history with `list_predictions`. It's all in one conversation.
- You don't need to guess what models exist. Use `search_models` or `list_collections` to quickly discover everything available—from text-to-image generators to video processors—without leaving your chat window.
- Model setup used to be a pain, requiring you to find the right version ID. Now, use `get_model_versions` to inspect the full schema and get the correct ID before running a prediction with `create_prediction`.
- Managing costs is easier when you can check hardware options. Use `list_hardware` to see available GPU types and pricing tiers before launching any job, preventing expensive mistakes.
- The ability to cancel jobs mid-stream is huge. If you realize the prompt was wrong after a few seconds, use `cancel_prediction` immediately instead of letting it run to completion.

## How It Works

The bottom line is you treat complex machine learning pipelines like simple conversational commands.

1. Subscribe to this MCP and provide your Replicate API Token.
2. Tell your AI agent what you need, like 'Generate a picture of a robot reading' or 'List all video models'.
3. The agent runs the necessary tool calls in the background, providing you with the status updates and final output links directly in your conversation.

## Frequently Asked Questions

**How do I find out what models are available in Replicate using the Replicate MCP?**
Use `list_models` to get a broad overview of every model. For more focused results, try `search_models`, which lets you narrow down by keywords like 'llm' or 'video'.

**What if my prediction fails? How do I check the error details with Replicate MCP?**
Use `get_prediction` and provide the failed ID. This tool returns logs and status information, helping you diagnose whether the failure was due to bad input or a model issue.

**Does the Replicate MCP help me manage costs?**
Yes. Before running any job, check available options using `list_hardware` to see GPU types and associated pricing for your prediction workload.

**Can I run a model if I don't know the exact version ID? (Replicate MCP)**
No. To ensure compatibility, you must first use `get_model_versions` to find all versions of the model and select the correct 64-character hash ID for `create_prediction`.

**What is the difference between `list_models` and `search_models` on the Replicate MCP?**
`list_models` gives you a full directory of everything. `search_models` lets you filter that massive catalog based on specific keywords, making discovery much faster.