Replicate Extended MCP for AI. Orchestrate ML Inference and Model Deployment.

Q: How do I find out what inputs a model needs before running createprediction?

Run the getmodelversion tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

Q: Can I test a new model without setting up a permanent deployment?

Yes. You can use createprediction directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use createdeployment for production scale.

Q: What if a prediction fails and I need the logs?

Use getprediction with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

Q: How do I list all model versions for a single model?

The listmodelversions tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

Q: What tool do I use to check what hardware SKUs are available for my deployment?

Run listhardware first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

Q: How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate?

Use getwebhooksecret to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

Q: I finished training a new model; what tool tracks its progress?

You check the job status with gettraining. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

Q: If I just need to change a model's description without recreating it, which tool do I use?

You'll use the updatemodel tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

How this MCP server connects to your AI agent

Replicate Extended connects your AI agent directly to Replicate's model infrastructure. It lets you run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle via simple commands.

You can search for public assets, create new deployments with custom scaling, monitor training jobs, and get predictions instantly from any client.

What AI agents can do with Replicate Automation

Cancel prediction

Stops a prediction that is currently running on the Replicate platform.

Create deployment prediction

Runs a prediction using an established, dedicated deployment endpoint.

Create deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling rules for production use.

+ 17 more capabilities included

Execute Model Predictions

Runs a specific model version with provided inputs to generate outputs (e.g., images, text).

Manage Model Deployments

Creates and updates private deployments, allowing you to control autoscaling rules for production models.

Discover ML Assets

Searches Replicate's public model catalog or lists specific collections to find usable AI tools.

Track Job Status

Retrieves the current status and output for a prediction run or a training job.

Manage Model Lifecycles

Allows creation, updating, and deletion of model versions and metadata.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

What AI agents can do with Replicate Extended: 20 Tools for Model Ops

These tools let you orchestrate the entire machine learning lifecycle—from searching public assets to running complex, managed deployments.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Replicate on Vinkius

Cancel Prediction

Stops a prediction that is currently running on the Replicate platform.

Create Deployment Prediction

Runs a prediction using an established, dedicated deployment endpoint.

Create Deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling...

Create Model

Registers and creates a new model within your Replicate account.

Create Prediction

Initiates a new prediction job using a specified model version and input arguments.

Create Training

Starts a training job to fine-tune an existing base model on custom data.

Delete Model Version

Removes a specific, older version of a deployed model.

Get Account

Retrieves basic details about the authenticated user or organization account.

Get Collection

Fetches detailed information for a specific curated model collection on Replicate.

Get Model

Retrieves general details about a specified ML model.

Get Model Version

Gets detailed information for a model version, including its necessary input schema...

Get Prediction

Checks the status and fetches the output data from an existing prediction ID.

Get Training

Gets the current status and progress metrics for a running training job.

Get Webhook Secret

Retrieves the default webhook secret key needed to verify incoming signature...

List Collections

Lists all available curated model collections on Replicate.

List Hardware

Shows a list of available hardware SKUs and their corresponding descriptions for...

List Model Versions

Lists all historical versions associated with a given model identifier.

List Predictions

Fetches a list of the most recent prediction jobs run against your account.

Search Models

Searches Replicate's public catalog for models based on keywords or filters.

Update Model

Changes the metadata (like description or tags) associated with an existing model...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Replicate Extended integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "replicate-extended": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Replicate Extended tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"replicate-extended": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Replicate, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Replicate. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 20 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Manually managing ML endpoints is a nightmare. Seriously., Solved with Vinkius AI Gateway

Today, setting up an AI inference endpoint means jumping through hoops: checking the documentation for the right model ID, figuring out the required input format (the schema), and then manually calling the prediction API while praying it doesn't time out. You spend half your time debugging connection issues instead of improving the prompt.

With Replicate Extended, you just ask your agent to run a model. It handles all that setup—it uses `get_model_version` to validate inputs and then executes `create_prediction`. The whole process is wrapped up in one clean tool call.

The Replicate Extended MCP Server: Model & Prediction Ops

You eliminate the need to copy-paste model IDs, track down deprecated versions, or manually set up scaling rules in a separate cloud console. The agent manages all of this behind the scenes.

Your AI client can now treat your entire ML pipeline—from initial `search_models` lookup to final prediction output—as one single conversation flow. It's that simple.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

machine-learning

ai-models

gpu-computing

inference-api

stable-diffusion

What your AI can actually do with this

This server connects your agent straight into Replicate's model infrastructure. You can run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle using simple commands through your AI client.

Discovering Assets:
You can search the public catalog for ML tools using search_models based on keywords, and you'll find all available curated model groups by listing them with list_collections. To get the deep details on a specific collection or general model, use get_collection or get_model. You'll also check out your own user profile info using get_account, and if you need to verify incoming signatures for webhooks, call get_webhook_secret.

Running Predictions:
The core job is running models. Use create_prediction to kick off a new prediction run with specific input arguments and model versions. If you've already set up a dedicated deployment endpoint, you can run the task directly using create_deployment_prediction. Once that prediction is done, you check the output status and get the final data by calling get_prediction.

You also track all recent activity by listing past job runs with list_predictions.

Training Models:
When you need to fine-tune an existing base model on custom data, start that work using create_training. To keep tabs on how that training job is going, call get_training, which gives you the current status and progress metrics. You can also manage your own assets by calling create_model to register a new model within your account, or update its metadata like tags and descriptions with update_model.

If an existing model needs a tweak, use get_model_version to grab its detailed input schema and parameters.

Managing Deployments:
For production models, you've got two main tools. You can set up a brand new private deployment using create_deployment, letting you dictate custom autoscaling rules. Once that's live, running predictions through the dedicated endpoint is what create_deployment_prediction does. You can also see what hardware options are available for deployments by listing them with list_hardware.

Model Lifecycles:
You control everything from creation to deletion. After you create a model or version, you might need to clean up old stuff; use delete_model_version to remove specific historical versions of an asset. You can also get a full list of every version tied to a model with list_model_versions. If you're done with a prediction job that's still running, call cancel_prediction to stop it immediately.

Tracking and Viewing Info:
You'll use get_model for general info on any ML asset. To get granular details about model versions, you gotta hit up list_model_versions. You can also pull a list of all available curated collections using list_collections, which helps when you need to browse multiple assets.

This gives your agent the full toolkit: from searching for models with search_models to setting them live with dedicated deployments, managing their version history, and keeping tabs on every single prediction or training job.

Built · Hosted · Managed by Vinkius Replicate Extended MCP Server - ML Workflow Automation

Server ID 019e5d4e-b0ee-738a-8d03-c5933f877cb8

Vinkius Inspector

Compliance Grade A+

Score 98.33/100

Report View Report ↗

Who is this actually for?

ML Engineers, Data Scientists, and Platform Operations staff. You're the person who spends too much time jumping between a local sandbox, a cloud dashboard, and an API playground just to test one model input. This server lets you keep the entire ML lifecycle—from discovery to deployment—inside your agent chat.

ML Engineer

Runs create_prediction multiple times with different parameters to quickly benchmark various model versions before committing to a final build.

Data Scientist

Uses get_training and list_predictions to monitor the status of background data processing jobs without leaving their primary IDE environment.

DevOps Engineer

Calls create_deployment and update_model to set up production-ready endpoints, ensuring proper autoscaling and version control for services.

What Changes When You Connect

Track everything in one place. Instead of manually checking logs, use get_prediction to instantly check the status or retrieve output from any prediction ID.

Build reliable endpoints. Use create_deployment to set up private model deployments with specific autoscaling rules, so your service handles load spikes automatically.

Find models fast. You don't need to guess; use search_models to filter Replicate's massive public catalog by keywords or tags.

Control the history. If a deployed model version breaks, you can check all previous iterations with list_model_versions and roll back quickly.

Keep your systems clean. After testing, run delete_model_version to remove obsolete models and keep your account tidy.

See it in action

01 01

Batch Processing Image Assets

A creative tech needs 50 variations of an AI-generated image. Instead of running 50 separate manual API calls, the agent uses search_models to find the right model, then loops through inputs and calls create_prediction, tracking all IDs via list_predictions. Done.

02 02

Setting up a Production Endpoint

A DevOps engineer needs a stable endpoint for their internal tool. They use get_model to confirm the model identity, then call create_deployment, specifying resource limits and autoscaling parameters. The service is ready instantly.

03 03

Model Troubleshooting

The prediction output fails silently. Instead of guessing which tool failed, they use get_model_version to pull the exact OpenAPI schema required for that version, then check get_prediction for detailed logs to find the failure point.

04 04

Fine-Tuning a Niche Model

A data scientist has 10GB of niche text data. They use create_training to start a fine-tuning job, then wait and monitor progress using get_training. Once complete, they can deploy the new model with create_deployment.

The honest tradeoffs

Hardcoding Model IDs

Anti-pattern

Writing a script that assumes a specific model ID will always work. When Replicate updates or deprecates it, the whole workflow breaks at runtime.

The Fix

Always use get_model first to pull current metadata and ensure you are referencing a valid, active model identifier before calling create_prediction. Check list_model_versions for available alternatives.

Ignoring Deployment Lifecycle

Anti-pattern

Running predictions directly without setting up a formal deployment. This leads to inconsistent resource allocation and unpredictable scaling failures.

The Fix

Before running high-volume prediction jobs, always use create_deployment first. This establishes a dedicated, controlled environment for reliable inference.

Manual Schema Discovery

Anti-pattern

Having to consult external documentation every time you need the exact input parameters (e.g., what kind of array or string is expected).

The Fix

Use get_model_version for any model. The result includes the full OpenAPI schema, letting your agent validate inputs automatically.

When It Fits, When It Doesn't

Use this server if your workflow needs to manage a multi-step ML lifecycle: discovery -> training -> deployment -> prediction. If you only need to fetch static data or read simple text from an external database, don't use this; those are general data retrieval tools. You must use Replicate Extended when the core problem is 'How do I run, track, and manage a complex AI model?' Specifically, if you need to test multiple inputs against different versions of the same model, stick with list_model_versions before running create_prediction. Don't try to bypass deployment by just calling create_prediction; always use create_deployment_prediction for production reliability.

Questions you might have

How do I find out what inputs a model needs before running `create_prediction`? +

Run the get_model_version tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

Can I test a new model without setting up a permanent deployment? +

Yes. You can use create_prediction directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use create_deployment for production scale.

What if a prediction fails and I need the logs? +

Use get_prediction with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

How do I list all model versions for a single model? +

The list_model_versions tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

What tool do I use to check what hardware SKUs are available for my deployment? +

Run list_hardware first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate? +

Use get_webhook_secret to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

I finished training a new model; what tool tracks its progress? +

You check the job status with get_training. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

If I just need to change a model's description without recreating it, which tool do I use? +

You'll use the update_model tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

How can I check if my prediction has finished and see the output? +

Use the get_prediction tool with your Prediction ID. It will return the current status (starting, processing, succeeded, or failed) along with the output URLs or data once completed.

Can I search for specific types of models like 'image-to-text'? +

Yes! Use the search_models tool with your query. It will return a list of public models matching your terms, including their owners and descriptions.

Is it possible to stop a model that is taking too long to run? +

Absolutely. Use the cancel_prediction tool with the target Prediction ID to immediately stop the execution and prevent further usage costs.

How this MCP server connects to your AI agent

What AI agents can do with Replicate Automation

Cancel prediction

Create deployment prediction

Create deployment

What AI agents can do with Replicate Extended: 20 Tools for Model Ops

Cancel Prediction

Stops a prediction that is currently running on the Replicate platform.

Create Deployment Prediction

Runs a prediction using an established, dedicated deployment endpoint.

Create Deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling...

Create Model

Registers and creates a new model within your Replicate account.

Create Prediction

Initiates a new prediction job using a specified model version and input arguments.

Create Training

Starts a training job to fine-tune an existing base model on custom data.

Delete Model Version

Removes a specific, older version of a deployed model.

Get Account

Retrieves basic details about the authenticated user or organization account.

Get Collection

Fetches detailed information for a specific curated model collection on Replicate.

Get Model

Retrieves general details about a specified ML model.

Get Model Version

Gets detailed information for a model version, including its necessary input schema...

Get Prediction

Checks the status and fetches the output data from an existing prediction ID.

Get Training

Gets the current status and progress metrics for a running training job.

Get Webhook Secret

Retrieves the default webhook secret key needed to verify incoming signature...

List Collections

Lists all available curated model collections on Replicate.

List Hardware

Shows a list of available hardware SKUs and their corresponding descriptions for...

List Model Versions

Lists all historical versions associated with a given model identifier.

List Predictions

Fetches a list of the most recent prediction jobs run against your account.

Search Models

Searches Replicate's public catalog for models based on keywords or filters.

Update Model

Changes the metadata (like description or tags) associated with an existing model...

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own