Replicate MCP. Run open-source ML workflows from chat.
Replicate MCP lets your AI client dynamically search, run, and manage thousands of open-source machine learning models. You can command complex tasks—like generating images, running specialized language models, or processing audio—directly from a chat prompt using natural language instructions.
Give Claude and any AI agent real-world access
It lets your AI client search across thousands of public model definitions based on a keyword or use case.
You can start running specific open-source models, providing the necessary input variables to generate output like images or text.
Your AI client tracks ongoing jobs, retrieving the results when they're ready or canceling them immediately if you change your mind.
Ask an AI about this
Waiting for input…
What AI agents can do with Replicate MCP with 12 Tools
Use these tools to search for models, manage deployments, track job status, and execute complex machine learning predictions.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Replicate MCPList Models
It shows you a list of all public machine learning models available on Replicate.
Get Account
This retrieves basic information about your connected Replicate account details.
List Collections
It lists curated groups of models, such as those focused on 'Image-to-Text' or...
List Deployments
This shows you all the active model deployments you have set up personally.
Cancel Prediction
It stops a model prediction job that is currently running and prevents further...
Create Prediction
You start a new model prediction by supplying the required model version ID and all necessary inputs as a JSON object.
Get Collection
It retrieves details for a specific, defined group of models using its unique slug.
Get Model
This fetches detailed information about one specific model, including its exact...
Get Prediction
It checks the current status of a prediction job and retrieves the final output if...
List Hardware
This lists all available GPU hardware options you can use for running your models.
List Predictions
It retrieves a log of the recent prediction jobs that have been run by your account.
Search Models
You can search across the entire platform to find public models that match specific keywords or use cases.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on each call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Replicate, then connect any of our 5,200+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,200+ others, all in one place
- Add new capabilities to your AI anytime you want
- Connections are secured and governed automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog weekly
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Replicate API. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS CLOUD
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on each call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
The tedious cycle of ML prototyping today
You know the drill: you need to test a new image generation model. You open the documentation, copy-paste the required JSON payload structure into your local script, run it, see an error because you missed one mandatory variable, then have to manually check the platform logs to figure out what went wrong. It's a constant cycle of copying parameters and managing different dashboards.
With this MCP, that process vanishes. You tell your agent in plain English: 'Find me a good model for sci-fi concepts.' The agent handles the search (`search_models`), checks the requirements (`get_model`), and then runs the job (`create_prediction`). You get results—not error logs.
Replicate MCP gives you instant, delegated ML power
You no longer have to write wrapper code for every single model or manually check which parameters are required. The agent does the discovery work for you—finding collections (`list_collections`) and checking deployments (`list_deployments`) automatically.
Your workflow shifts from 'How do I call this API?' to 'What do I want to create?' It’s about delegating complex, multi-step technical tasks to your AI client. Period.
What Replicate MCP does for your AI
This connector gives your agent the power to interact with a massive library of open-source ML models without needing to run them on your own hardware. Instead of dealing with complex API calls and parameter files, you simply tell your AI client what you want done in plain English. It handles finding the right model, checking its required inputs, starting the job, and even monitoring it until it's finished.
Need a specific type of image? Your agent can search for models and then execute a prediction with just a few words. If the process is long-running, you don't have to wait by the console; your AI client manages the status updates automatically. It’s a huge step up from traditional methods.
When you connect this capability through Vinkius, you get instant access to the entire catalog of model operations, making complex ML workflows manageable right inside your chat interface.
019d75fe-9426-7272-9964-c32556c42621 How to set up Replicate MCP
The bottom line is that you tell your AI client what task to complete, and it handles all the necessary backend steps.
First, install the Replicate platform extension module into your MCP.
Next, input your personal Replicate API Token into the configuration variables.
Finally, prompt your agent naturally: 'Search for a video generation model, check its parameters, and generate a clip of a cat on Mars.'
Who uses Replicate MCP
This MCP is for developers, content creators, and data scientists who are tired of manually logging into ML platforms or writing boiler-plate code just to test a model. If you need to quickly prototype with diverse open-source models, this connector saves hours.
You use it to rapidly prototype and integrate novel algorithms by running quick predictions without modifying local Python notebooks.
You delegate specialized tasks—like generating unique audio or high-quality visuals—directly from your chat interface instead of using multiple external web tools.
You test and compare the output of many different open-source models quickly, systematically checking model metadata to ensure parameter requirements are met before running a prediction.
Benefits of connecting Replicate MCP
Access diverse models instantly. You don't need to hardcode API endpoints; just tell your agent what kind of image or text you want, and it handles the search using search_models.
Manage long jobs without stress. If a video generation task takes minutes, use get_prediction to check its status later or call cancel_prediction if the results aren't right.
Stop guessing parameters. Before running anything, use get_model to pull up the exact schema and input requirements for any model you find, preventing failed runs.
Run models without local setup. This MCP lets your agent connect directly to powerful cloud infrastructure, bypassing the need to install Python dependencies or manage GPU drivers locally.
Build complex chains easily. You can instruct your AI client to take the output of one specialized model and feed it as input to a second model using natural language instructions.
Replicate MCP use cases
Generating marketing assets for a new product launch.
A content manager needs 20 unique concept images. Instead of writing a script that iterates through image generation APIs, they prompt their agent: 'Find five text-to-image models and generate ten variations for this car design.' The agent uses search_models to find options, then executes multiple predictions.
Analyzing user feedback audio files.
A researcher wants to test different speech-to-text or text-to-speech models. They use their agent to execute a prediction on an audio file, and if the results are poor, they can immediately call list_predictions to check historical logs for better model versions.
Prototyping an LLM feature for a client.
A developer wants to test how different language models handle specific JSON inputs. They use the agent's ability to get_model metadata first, ensuring they provide the correct payload structure before calling create_prediction.
Monitoring a large batch of scientific simulations.
A scientist kicks off 50 complex climate models. Instead of checking every dashboard, they ask their agent to monitor all jobs using list_predictions, getting real-time status updates until the final output is retrieved via get_prediction.
Replicate MCP tradeoffs
What to watch out for, and the recommended way to handle each one.
Assuming a model works with basic prompts
The user types: 'Generate an image of Mars.' and the prediction fails because they didn't specify required parameters like aspect ratio or seed.
First, use search_models to find relevant tools. Then, before running it, call get_model on that specific model ID. This step exposes the exact JSON structure needed for a successful run.
Ignoring job status
The user runs a long process and then forgets about it, assuming the result is instantly available or failed silently.
Always confirm the job status. Use get_prediction to reliably check if your running task has finished its cycle before attempting to access the output data.
Overloading the agent with too many actions at once
The user tries to list collections, search models, and run a prediction all in one single prompt, confusing the agent's intent.
Break it down. Use list_collections first to narrow your options, then use search_models with a specific keyword from that collection, and finally call get_model for the precise setup.
When to use Replicate MCP
Use this MCP if your core problem is model orchestration: you need an AI agent to discover, configure, run, and monitor diverse open-source machine learning models hosted on Replicate. You're working with varied inputs (audio, images, text) and the output requires a multi-step process of discovery followed by execution.
Don't use this if: 1) Your need is simply to call one specific API endpoint repeatedly without variation; an SDK might be cleaner. 2) You are only dealing with structured data (like reading from a database); a dedicated data connector is better. This MCP excels when the process of finding and running the model is as important as the result itself.
Frequently asked questions about Replicate MCP
Can the Replicate MCP handle image generation? +
Yes, absolutely. You can command your agent to find and run specific text-to-image models by calling search_models and then executing a prediction.
What is the difference between `list_collections` and `search_models` in Replicate MCP? +
List_collections shows pre-curated groups of related models (like all 'Audio Generation' tools). Search_models lets you search across every single model on the platform using keywords.
How do I stop a job running with Replicate MCP? +
If a prediction is taking too long or isn't giving the right result, use cancel_prediction to halt it immediately and cleanly. This prevents unnecessary usage costs.
Does Replicate MCP require me to run models on my own computer? +
No. The entire purpose of this MCP is that your agent connects to the cloud infrastructure, so you never have to worry about local hardware or setup conflicts.
What if I want to see a history of my past model runs using Replicate MCP? +
You can check your recent activity by calling list_predictions. This tool gives you an immediate log of all the jobs that have been run through this MCP.