Replicate Extended MCP. Orchestrate ML Inference and Model Deployment.

Q: How do I find out what inputs a model needs before running createprediction?

Run the getmodelversion tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

Q: Can I test a new model without setting up a permanent deployment?

Yes. You can use createprediction directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use createdeployment for production scale.

Q: What if a prediction fails and I need the logs?

Use getprediction with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

Q: How do I list all model versions for a single model?

The listmodelversions tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

Q: What tool do I use to check what hardware SKUs are available for my deployment?

Run listhardware first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

Q: How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate?

Use getwebhooksecret to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

Q: I finished training a new model; what tool tracks its progress?

You check the job status with gettraining. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

Q: If I just need to change a model's description without recreating it, which tool do I use?

You'll use the updatemodel tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Replicate Extended connects your AI agent directly to Replicate's model infrastructure. It lets you run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle via simple commands.

You can search for public assets, create new deployments with custom scaling, monitor training jobs, and get predictions instantly from any client.

What your AI agents can do

Cancel prediction

Stops a prediction that is currently running on the Replicate platform.

Create deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling rules for production use.

Create deployment prediction

Runs a prediction using an established, dedicated deployment endpoint.

+ 17 more capabilities included

Execute Model Predictions

Runs a specific model version with provided inputs to generate outputs (e.g., images, text).

Manage Model Deployments

Creates and updates private deployments, allowing you to control autoscaling rules for production models.

Discover ML Assets

Searches Replicate's public model catalog or lists specific collections to find usable AI tools.

Track Job Status

Retrieves the current status and output for a prediction run or a training job.

Manage Model Lifecycles

Allows creation, updating, and deletion of model versions and metadata.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Fetches a list of the most recent prediction jobs run against your account.

search019e5d4e

search models

Searches Replicate's public catalog for models based on keywords or filters.

update019e5d4e

update model

Changes the metadata (like description or tags) associated with an existing model identifier.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Replicate Extended, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

This server connects your agent straight into Replicate's model infrastructure. You can run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle using simple commands through your AI client.

Discovering Assets:
You can search the public catalog for ML tools using search_models based on keywords, and you'll find all available curated model groups by listing them with list_collections. To get the deep details on a specific collection or general model, use get_collection or get_model. You'll also check out your own user profile info using get_account, and if you need to verify incoming signatures for webhooks, call get_webhook_secret.

Running Predictions:
The core job is running models. Use create_prediction to kick off a new prediction run with specific input arguments and model versions. If you've already set up a dedicated deployment endpoint, you can run the task directly using create_deployment_prediction. Once that prediction is done, you check the output status and get the final data by calling get_prediction.

You also track all recent activity by listing past job runs with list_predictions.

Training Models:
When you need to fine-tune an existing base model on custom data, start that work using create_training. To keep tabs on how that training job is going, call get_training, which gives you the current status and progress metrics. You can also manage your own assets by calling create_model to register a new model within your account, or update its metadata like tags and descriptions with update_model.

If an existing model needs a tweak, use get_model_version to grab its detailed input schema and parameters.

Managing Deployments:
For production models, you've got two main tools. You can set up a brand new private deployment using create_deployment, letting you dictate custom autoscaling rules. Once that's live, running predictions through the dedicated endpoint is what create_deployment_prediction does. You can also see what hardware options are available for deployments by listing them with list_hardware.

Model Lifecycles:
You control everything from creation to deletion. After you create a model or version, you might need to clean up old stuff; use delete_model_version to remove specific historical versions of an asset. You can also get a full list of every version tied to a model with list_model_versions. If you're done with a prediction job that's still running, call cancel_prediction to stop it immediately.

Tracking and Viewing Info:
You'll use get_model for general info on any ML asset. To get granular details about model versions, you gotta hit up list_model_versions. You can also pull a list of all available curated collections using list_collections, which helps when you need to browse multiple assets.

This gives your agent the full toolkit: from searching for models with search_models to setting them live with dedicated deployments, managing their version history, and keeping tabs on every single prediction or training job.

How Replicate Extended MCP Works

1 Subscribe to this server and supply your Replicate API Token.
2 Your AI client sends a command (e.g., 'run the image generator model').
3 The agent executes the necessary tool call, which interacts with Replicate's APIs and returns the results or status directly.

The bottom line is you don't write boilerplate code; your AI agent runs the whole ML pipeline for you.

Who Is Replicate Extended MCP For?

ML Engineers, Data Scientists, and Platform Operations staff. You're the person who spends too much time jumping between a local sandbox, a cloud dashboard, and an API playground just to test one model input. This server lets you keep the entire ML lifecycle—from discovery to deployment—inside your agent chat.

ML Engineer

Runs create_prediction multiple times with different parameters to quickly benchmark various model versions before committing to a final build.

Data Scientist

Uses get_training and list_predictions to monitor the status of background data processing jobs without leaving their primary IDE environment.

DevOps Engineer

Calls create_deployment and update_model to set up production-ready endpoints, ensuring proper autoscaling and version control for services.

What Changes When You Connect

Track everything in one place. Instead of manually checking logs, use get_prediction to instantly check the status or retrieve output from any prediction ID.
Build reliable endpoints. Use create_deployment to set up private model deployments with specific autoscaling rules, so your service handles load spikes automatically.
Find models fast. You don't need to guess; use search_models to filter Replicate's massive public catalog by keywords or tags.
Control the history. If a deployed model version breaks, you can check all previous iterations with list_model_versions and roll back quickly.
Keep your systems clean. After testing, run delete_model_version to remove obsolete models and keep your account tidy.

Real-World Use Cases

Batch Processing Image Assets

A creative tech needs 50 variations of an AI-generated image. Instead of running 50 separate manual API calls, the agent uses search_models to find the right model, then loops through inputs and calls create_prediction, tracking all IDs via list_predictions. Done.

Setting up a Production Endpoint

A DevOps engineer needs a stable endpoint for their internal tool. They use get_model to confirm the model identity, then call create_deployment, specifying resource limits and autoscaling parameters. The service is ready instantly.

Model Troubleshooting

The prediction output fails silently. Instead of guessing which tool failed, they use get_model_version to pull the exact OpenAPI schema required for that version, then check get_prediction for detailed logs to find the failure point.

Fine-Tuning a Niche Model

A data scientist has 10GB of niche text data. They use create_training to start a fine-tuning job, then wait and monitor progress using get_training. Once complete, they can deploy the new model with create_deployment.

The Tradeoffs

Hardcoding Model IDs

Writing a script that assumes a specific model ID will always work. When Replicate updates or deprecates it, the whole workflow breaks at runtime.

→ Always use get_model first to pull current metadata and ensure you are referencing a valid, active model identifier before calling create_prediction. Check list_model_versions for available alternatives.

Ignoring Deployment Lifecycle

Running predictions directly without setting up a formal deployment. This leads to inconsistent resource allocation and unpredictable scaling failures.

→ Before running high-volume prediction jobs, always use create_deployment first. This establishes a dedicated, controlled environment for reliable inference.

Manual Schema Discovery

Having to consult external documentation every time you need the exact input parameters (e.g., what kind of array or string is expected).

→ Use get_model_version for any model. The result includes the full OpenAPI schema, letting your agent validate inputs automatically.

When It Fits, When It Doesn't

Use this server if your workflow needs to manage a multi-step ML lifecycle: discovery -> training -> deployment -> prediction. If you only need to fetch static data or read simple text from an external database, don't use this; those are general data retrieval tools. You must use Replicate Extended when the core problem is 'How do I run, track, and manage a complex AI model?' Specifically, if you need to test multiple inputs against different versions of the same model, stick with list_model_versions before running create_prediction. Don't try to bypass deployment by just calling create_prediction; always use create_deployment_prediction for production reliability.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Replicate. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 20 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

cancel_prediction create_deployment create_deployment_prediction create_model create_prediction create_training delete_model_version get_account get_collection get_model get_model_version get_prediction get_training get_webhook_secret list_collections list_hardware list_model_versions list_predictions search_models update_model

Manually managing ML endpoints is a nightmare. Seriously.

Today, setting up an AI inference endpoint means jumping through hoops: checking the documentation for the right model ID, figuring out the required input format (the schema), and then manually calling the prediction API while praying it doesn't time out. You spend half your time debugging connection issues instead of improving the prompt.

With Replicate Extended, you just ask your agent to run a model. It handles all that setup—it uses `get_model_version` to validate inputs and then executes `create_prediction`. The whole process is wrapped up in one clean tool call.

The Replicate Extended MCP Server: Model & Prediction Ops

You eliminate the need to copy-paste model IDs, track down deprecated versions, or manually set up scaling rules in a separate cloud console. The agent manages all of this behind the scenes.

Your AI client can now treat your entire ML pipeline—from initial `search_models` lookup to final prediction output—as one single conversation flow. It's that simple.

Common Questions About Replicate Extended MCP

How do I find out what inputs a model needs before running `create_prediction`? +

Run the get_model_version tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

Can I test a new model without setting up a permanent deployment? +

Yes. You can use create_prediction directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use create_deployment for production scale.

What if a prediction fails and I need the logs? +

Use get_prediction with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

How do I list all model versions for a single model? +

The list_model_versions tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

What tool do I use to check what hardware SKUs are available for my deployment? +

Run list_hardware first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate? +

Use get_webhook_secret to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

I finished training a new model; what tool tracks its progress? +

You check the job status with get_training. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

If I just need to change a model's description without recreating it, which tool do I use? +

You'll use the update_model tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

How can I check if my prediction has finished and see the output? +

Use the get_prediction tool with your Prediction ID. It will return the current status (starting, processing, succeeded, or failed) along with the output URLs or data once completed.

Can I search for specific types of models like 'image-to-text'? +

Yes! Use the search_models tool with your query. It will return a list of public models matching your terms, including their owners and descriptions.

Is it possible to stop a model that is taking too long to run? +

Absolutely. Use the cancel_prediction tool with the target Prediction ID to immediately stop the execution and prevent further usage costs.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python