Replicate MCP. Run open-source ML workflows from chat.

Q: What is the difference between listcollections and searchmodels in Replicate MCP?

Listcollections shows pre-curated groups of related models (like all 'Audio Generation' tools). Searchmodels lets you search across every single model on the platform using keywords.

Q: What if I want to see a history of my past model runs using Replicate MCP?

You can check your recent activity by calling listpredictions. This tool gives you an immediate log of all the jobs that have been run through this MCP.

Replicate MCP lets your AI client dynamically search, run, and manage thousands of open-source machine learning models. You can command complex tasks—like generating images, running specialized language models, or processing audio—directly from a chat prompt using natural language instructions.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Give Claude and any AI agent real-world access

Find and list available models

It lets your AI client search across thousands of public model definitions based on a keyword or use case.

Execute ML predictions

You can start running specific open-source models, providing the necessary input variables to generate output like images or text.

Manage job status and lifecycle

Your AI client tracks ongoing jobs, retrieving the results when they're ready or canceling them immediately if you change your mind.

Ask an AI about this

Waiting for input…

AI Agent

What AI agents can do with Replicate MCP with 12 Tools

Use these tools to search for models, manage deployments, track job status, and execute complex machine learning predictions.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Replicate MCP

List Models

It shows you a list of all public machine learning models available on Replicate.

Get Account

This retrieves basic information about your connected Replicate account details.

List Collections

It lists curated groups of models, such as those focused on 'Image-to-Text' or...

List Deployments

This shows you all the active model deployments you have set up personally.

Cancel Prediction

It stops a model prediction job that is currently running and prevents further...

Create Prediction

You start a new model prediction by supplying the required model version ID and all necessary inputs as a JSON object.

Get Collection

It retrieves details for a specific, defined group of models using its unique slug.

Get Model

This fetches detailed information about one specific model, including its exact...

Get Prediction

It checks the current status of a prediction job and retrieves the final output if...

List Hardware

This lists all available GPU hardware options you can use for running your models.

List Predictions

It retrieves a log of the recent prediction jobs that have been run by your account.

Search Models

You can search across the entire platform to find public models that match specific keywords or use cases.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Replicate integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "replicate": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Replicate tools with full Vinkius guardrails applied.

Replicate MCP is compatible with VS Code

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"replicate": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on each call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Replicate, then connect any of our 5,200+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,200+ others, all in one place
Add new capabilities to your AI anytime you want
Connections are secured and governed automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog weekly

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Replicate API. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS CLOUD

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on each call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

The tedious cycle of ML prototyping today

You know the drill: you need to test a new image generation model. You open the documentation, copy-paste the required JSON payload structure into your local script, run it, see an error because you missed one mandatory variable, then have to manually check the platform logs to figure out what went wrong. It's a constant cycle of copying parameters and managing different dashboards.

With this MCP, that process vanishes. You tell your agent in plain English: 'Find me a good model for sci-fi concepts.' The agent handles the search (`search_models`), checks the requirements (`get_model`), and then runs the job (`create_prediction`). You get results—not error logs.

Replicate MCP gives you instant, delegated ML power

You no longer have to write wrapper code for every single model or manually check which parameters are required. The agent does the discovery work for you—finding collections (`list_collections`) and checking deployments (`list_deployments`) automatically.

Your workflow shifts from 'How do I call this API?' to 'What do I want to create?' It’s about delegating complex, multi-step technical tasks to your AI client. Period.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

machine-learning

model-inference

open-source-models

fine-tuning

api-access

generative-ai

What Replicate MCP does for your AI

This connector gives your agent the power to interact with a massive library of open-source ML models without needing to run them on your own hardware. Instead of dealing with complex API calls and parameter files, you simply tell your AI client what you want done in plain English. It handles finding the right model, checking its required inputs, starting the job, and even monitoring it until it's finished.

Need a specific type of image? Your agent can search for models and then execute a prediction with just a few words. If the process is long-running, you don't have to wait by the console; your AI client manages the status updates automatically. It’s a huge step up from traditional methods.

When you connect this capability through Vinkius, you get instant access to the entire catalog of model operations, making complex ML workflows manageable right inside your chat interface.

Built · Hosted · Managed by Vinkius Replicate MCP - Run ML Model Inference from Chat

Server ID 019d75fe-9426-7272-9964-c32556c42621

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Benefits of connecting Replicate MCP

Access diverse models instantly. You don't need to hardcode API endpoints; just tell your agent what kind of image or text you want, and it handles the search using search_models.

Manage long jobs without stress. If a video generation task takes minutes, use get_prediction to check its status later or call cancel_prediction if the results aren't right.

Stop guessing parameters. Before running anything, use get_model to pull up the exact schema and input requirements for any model you find, preventing failed runs.

Run models without local setup. This MCP lets your agent connect directly to powerful cloud infrastructure, bypassing the need to install Python dependencies or manage GPU drivers locally.

Build complex chains easily. You can instruct your AI client to take the output of one specialized model and feed it as input to a second model using natural language instructions.

Replicate MCP use cases

01 01

Generating marketing assets for a new product launch.

A content manager needs 20 unique concept images. Instead of writing a script that iterates through image generation APIs, they prompt their agent: 'Find five text-to-image models and generate ten variations for this car design.' The agent uses search_models to find options, then executes multiple predictions.

02 02

Analyzing user feedback audio files.

A researcher wants to test different speech-to-text or text-to-speech models. They use their agent to execute a prediction on an audio file, and if the results are poor, they can immediately call list_predictions to check historical logs for better model versions.

03 03

Prototyping an LLM feature for a client.

A developer wants to test how different language models handle specific JSON inputs. They use the agent's ability to get_model metadata first, ensuring they provide the correct payload structure before calling create_prediction.

04 04

Monitoring a large batch of scientific simulations.

A scientist kicks off 50 complex climate models. Instead of checking every dashboard, they ask their agent to monitor all jobs using list_predictions, getting real-time status updates until the final output is retrieved via get_prediction.

Replicate MCP tradeoffs

What to watch out for, and the recommended way to handle each one.

Assuming a model works with basic prompts

Avoid

The user types: 'Generate an image of Mars.' and the prediction fails because they didn't specify required parameters like aspect ratio or seed.

Instead

First, use search_models to find relevant tools. Then, before running it, call get_model on that specific model ID. This step exposes the exact JSON structure needed for a successful run.

Ignoring job status

Avoid

The user runs a long process and then forgets about it, assuming the result is instantly available or failed silently.

Instead

Always confirm the job status. Use get_prediction to reliably check if your running task has finished its cycle before attempting to access the output data.

Overloading the agent with too many actions at once

Avoid

The user tries to list collections, search models, and run a prediction all in one single prompt, confusing the agent's intent.

Instead

Break it down. Use list_collections first to narrow your options, then use search_models with a specific keyword from that collection, and finally call get_model for the precise setup.

Frequently asked questions about Replicate MCP

Can the Replicate MCP handle image generation? +

Yes, absolutely. You can command your agent to find and run specific text-to-image models by calling search_models and then executing a prediction.

What is the difference between `list_collections` and `search_models` in Replicate MCP? +

List_collections shows pre-curated groups of related models (like all 'Audio Generation' tools). Search_models lets you search across every single model on the platform using keywords.

How do I stop a job running with Replicate MCP? +

If a prediction is taking too long or isn't giving the right result, use cancel_prediction to halt it immediately and cleanly. This prevents unnecessary usage costs.

Does Replicate MCP require me to run models on my own computer? +

No. The entire purpose of this MCP is that your agent connects to the cloud infrastructure, so you never have to worry about local hardware or setup conflicts.

What if I want to see a history of my past model runs using Replicate MCP? +

You can check your recent activity by calling list_predictions. This tool gives you an immediate log of all the jobs that have been run through this MCP.

Give Claude and any AI agent real-world access

What AI agents can do with Replicate MCP with 12 Tools

List Models

It shows you a list of all public machine learning models available on Replicate.

Get Account

This retrieves basic information about your connected Replicate account details.

List Collections

It lists curated groups of models, such as those focused on 'Image-to-Text' or...

List Deployments

This shows you all the active model deployments you have set up personally.

Cancel Prediction

It stops a model prediction job that is currently running and prevents further...

Create Prediction

You start a new model prediction by supplying the required model version ID and all necessary inputs as a JSON object.

Get Collection

It retrieves details for a specific, defined group of models using its unique slug.

Get Model

This fetches detailed information about one specific model, including its exact...

Get Prediction

It checks the current status of a prediction job and retrieves the final output if...

List Hardware

This lists all available GPU hardware options you can use for running your models.

List Predictions

It retrieves a log of the recent prediction jobs that have been run by your account.

Search Models

You can search across the entire platform to find public models that match specific keywords or use cases.

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

The tedious cycle of ML prototyping today

Replicate MCP gives you instant, delegated ML power

machine-learning

model-inference

open-source-models

fine-tuning

api-access

generative-ai

What Replicate MCP does for your AI

How to set up Replicate MCP

Who uses Replicate MCP

Benefits of connecting Replicate MCP

Replicate MCP use cases

Generating marketing assets for a new product launch.

Analyzing user feedback audio files.

Prototyping an LLM feature for a client.

Monitoring a large batch of scientific simulations.

Replicate MCP tradeoffs

Assuming a model works with basic prompts