# Replicate Extended MCP

> Replicate Extended connects your AI agent directly to Replicate's model infrastructure. It lets you run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle via simple commands. You can search for public assets, create new deployments with custom scaling, monitor training jobs, and get predictions instantly from any client.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** machine-learning, ai-models, gpu-computing, inference-api, stable-diffusion

## Description

This server connects your agent straight into Replicate's model infrastructure. You can run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle using simple commands through your AI client.

**Discovering Assets:**
You can search the public catalog for ML tools using `search_models` based on keywords, and you'll find all available curated model groups by listing them with `list_collections`. To get the deep details on a specific collection or general model, use `get_collection` or `get_model`. You'll also check out your own user profile info using `get_account`, and if you need to verify incoming signatures for webhooks, call `get_webhook_secret`.

**Running Predictions:**
The core job is running models. Use `create_prediction` to kick off a new prediction run with specific input arguments and model versions. If you've already set up a dedicated deployment endpoint, you can run the task directly using `create_deployment_prediction`. Once that prediction is done, you check the output status and get the final data by calling `get_prediction`. You also track all recent activity by listing past job runs with `list_predictions`.

**Training Models:**
When you need to fine-tune an existing base model on custom data, start that work using `create_training`. To keep tabs on how that training job is going, call `get_training`, which gives you the current status and progress metrics. You can also manage your own assets by calling `create_model` to register a new model within your account, or update its metadata like tags and descriptions with `update_model`. If an existing model needs a tweak, use `get_model_version` to grab its detailed input schema and parameters.

**Managing Deployments:**
For production models, you've got two main tools. You can set up a brand new private deployment using `create_deployment`, letting you dictate custom autoscaling rules. Once that's live, running predictions through the dedicated endpoint is what `create_deployment_prediction` does. You can also see what hardware options are available for deployments by listing them with `list_hardware`.

**Model Lifecycles:**
You control everything from creation to deletion. After you create a model or version, you might need to clean up old stuff; use `delete_model_version` to remove specific historical versions of an asset. You can also get a full list of every version tied to a model with `list_model_versions`. If you're done with a prediction job that's still running, call `cancel_prediction` to stop it immediately.

**Tracking and Viewing Info:**
You'll use `get_model` for general info on any ML asset. To get granular details about model versions, you gotta hit up `list_model_versions`. You can also pull a list of all available curated collections using `list_collections`, which helps when you need to browse multiple assets.

This gives your agent the full toolkit: from searching for models with `search_models` to setting them live with dedicated deployments, managing their version history, and keeping tabs on every single prediction or training job.

## Tools

### cancel_prediction
Stops a prediction that is currently running on the Replicate platform.

### create_deployment_prediction
Runs a prediction using an established, dedicated deployment endpoint.

### create_deployment
Sets up a new private model deployment, allowing you to specify custom autoscaling rules for production use.

### create_model
Registers and creates a new model within your Replicate account.

### create_prediction
Initiates a new prediction job using a specified model version and input arguments.

### create_training
Starts a training job to fine-tune an existing base model on custom data.

### delete_model_version
Removes a specific, older version of a deployed model.

### get_account
Retrieves basic details about the authenticated user or organization account.

### get_collection
Fetches detailed information for a specific curated model collection on Replicate.

### get_model
Retrieves general details about a specified ML model.

### get_model_version
Gets detailed information for a model version, including its necessary input schema and parameters.

### get_prediction
Checks the status and fetches the output data from an existing prediction ID.

### get_training
Gets the current status and progress metrics for a running training job.

### get_webhook_secret
Retrieves the default webhook secret key needed to verify incoming signature authenticity.

### list_collections
Lists all available curated model collections on Replicate.

### list_hardware
Shows a list of available hardware SKUs and their corresponding descriptions for deployment.

### list_model_versions
Lists all historical versions associated with a given model identifier.

### list_predictions
Fetches a list of the most recent prediction jobs run against your account.

### search_models
Searches Replicate's public catalog for models based on keywords or filters.

### update_model
Changes the metadata (like description or tags) associated with an existing model identifier.

## Prompt Examples

**Prompt:** 
```
Search for public models related to 'Stable Diffusion' on Replicate.
```

**Response:** 
```
I found several models. The most popular is 'stability-ai/stable-diffusion' (Version: 328bd9...). Would you like to see the input schema for this model?
```

**Prompt:** 
```
Run a prediction for model version 5c51d4... with the prompt 'A futuristic city'.
```

**Response:** 
```
Prediction created (ID: pred_123abc). It is currently 'starting'. I will monitor it for you or you can ask me for the status in a moment.
```

**Prompt:** 
```
List my most recent predictions.
```

**Response:** 
```
Fetching your history... You have 3 recent predictions. The latest one (pred_123abc) succeeded and generated an image URL. Would you like to see the results?
```

## Capabilities

### Execute Model Predictions
Runs a specific model version with provided inputs to generate outputs (e.g., images, text).

### Manage Model Deployments
Creates and updates private deployments, allowing you to control autoscaling rules for production models.

### Discover ML Assets
Searches Replicate's public model catalog or lists specific collections to find usable AI tools.

### Track Job Status
Retrieves the current status and output for a prediction run or a training job.

### Manage Model Lifecycles
Allows creation, updating, and deletion of model versions and metadata.

## Use Cases

### Batch Processing Image Assets
A creative tech needs 50 variations of an AI-generated image. Instead of running 50 separate manual API calls, the agent uses `search_models` to find the right model, then loops through inputs and calls `create_prediction`, tracking all IDs via `list_predictions`. Done.

### Setting up a Production Endpoint
A DevOps engineer needs a stable endpoint for their internal tool. They use `get_model` to confirm the model identity, then call `create_deployment`, specifying resource limits and autoscaling parameters. The service is ready instantly.

### Model Troubleshooting
The prediction output fails silently. Instead of guessing which tool failed, they use `get_model_version` to pull the exact OpenAPI schema required for that version, then check `get_prediction` for detailed logs to find the failure point.

### Fine-Tuning a Niche Model
A data scientist has 10GB of niche text data. They use `create_training` to start a fine-tuning job, then wait and monitor progress using `get_training`. Once complete, they can deploy the new model with `create_deployment`.

## Benefits

- Track everything in one place. Instead of manually checking logs, use `get_prediction` to instantly check the status or retrieve output from any prediction ID.
- Build reliable endpoints. Use `create_deployment` to set up private model deployments with specific autoscaling rules, so your service handles load spikes automatically.
- Find models fast. You don't need to guess; use `search_models` to filter Replicate's massive public catalog by keywords or tags.
- Control the history. If a deployed model version breaks, you can check all previous iterations with `list_model_versions` and roll back quickly.
- Keep your systems clean. After testing, run `delete_model_version` to remove obsolete models and keep your account tidy.

## How It Works

The bottom line is you don't write boilerplate code; your AI agent runs the whole ML pipeline for you.

1. Subscribe to this server and supply your Replicate API Token.
2. Your AI client sends a command (e.g., 'run the image generator model').
3. The agent executes the necessary tool call, which interacts with Replicate's APIs and returns the results or status directly.

## Frequently Asked Questions

**How do I find out what inputs a model needs before running `create_prediction`?**
Run the `get_model_version` tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

**Can I test a new model without setting up a permanent deployment?**
Yes. You can use `create_prediction` directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use `create_deployment` for production scale.

**What if a prediction fails and I need the logs?**
Use `get_prediction` with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

**How do I list all model versions for a single model?**
The `list_model_versions` tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

**What tool do I use to check what hardware SKUs are available for my deployment?**
Run `list_hardware` first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

**How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate?**
Use `get_webhook_secret` to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

**I finished training a new model; what tool tracks its progress?**
You check the job status with `get_training`. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

**If I just need to change a model's description without recreating it, which tool do I use?**
You'll use the `update_model` tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

**How can I check if my prediction has finished and see the output?**
Use the `get_prediction` tool with your Prediction ID. It will return the current status (starting, processing, succeeded, or failed) along with the output URLs or data once completed.

**Can I search for specific types of models like 'image-to-text'?**
Yes! Use the `search_models` tool with your query. It will return a list of public models matching your terms, including their owners and descriptions.

**Is it possible to stop a model that is taking too long to run?**
Absolutely. Use the `cancel_prediction` tool with the target Prediction ID to immediately stop the execution and prevent further usage costs.