Vinkius
Replicate Extended

Replicate Extended MCP for AI. Orchestrate ML Inference and Model Deployment.

Claude Claude
ChatGPT ChatGPT
Cursor Cursor
Gemini Gemini
Windsurf Windsurf
VS Code VS Code
JetBrains JetBrains
Vercel Vercel
See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Replicate MCP on Cursor AI Code EditorReplicate MCP on Claude Desktop AppReplicate MCP on OpenAI Agents SDKReplicate MCP on Visual Studio CodeReplicate MCP on GitHub Copilot AI AgentReplicate MCP on Google Gemini AIReplicate MCP on Lovable AI DevelopmentReplicate MCP on Mistral AI AgentsReplicate MCP on Amazon AWS Bedrock

How this MCP server connects to your AI agent

Replicate Extended connects your AI agent directly to Replicate's model infrastructure. It lets you run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle via simple commands.

You can search for public assets, create new deployments with custom scaling, monitor training jobs, and get predictions instantly from any client.

What AI agents can do with Replicate Automation

Cancel prediction

Stops a prediction that is currently running on the Replicate platform.

Create deployment prediction

Runs a prediction using an established, dedicated deployment endpoint.

Create deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling rules for production use.

+ 17 more capabilities included
Execute Model Predictions

Runs a specific model version with provided inputs to generate outputs (e.g., images, text).

Manage Model Deployments

Creates and updates private deployments, allowing you to control autoscaling rules for production models.

Discover ML Assets

Searches Replicate's public model catalog or lists specific collections to find usable AI tools.

Track Job Status

Retrieves the current status and output for a prediction run or a training job.

Manage Model Lifecycles

Allows creation, updating, and deletion of model versions and metadata.

Included with Plan

Waiting for input…

AI Agent

What AI agents can do with Replicate Extended: 20 Tools for Model Ops

These tools let you orchestrate the entire machine learning lifecycle—from searching public assets to running complex, managed deployments.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Replicate on Vinkius

Cancel Prediction

Stops a prediction that is currently running on the Replicate platform.

Create Deployment Prediction

Runs a prediction using an established, dedicated deployment endpoint.

Create Deployment

Sets up a new private model deployment, allowing you to specify custom autoscaling...

Create Model

Registers and creates a new model within your Replicate account.

Create Prediction

Initiates a new prediction job using a specified model version and input arguments.

Create Training

Starts a training job to fine-tune an existing base model on custom data.

Delete Model Version

Removes a specific, older version of a deployed model.

Get Account

Retrieves basic details about the authenticated user or organization account.

Get Collection

Fetches detailed information for a specific curated model collection on Replicate.

Get Model

Retrieves general details about a specified ML model.

Get Model Version

Gets detailed information for a model version, including its necessary input schema...

Get Prediction

Checks the status and fetches the output data from an existing prediction ID.

Get Training

Gets the current status and progress metrics for a running training job.

Get Webhook Secret

Retrieves the default webhook secret key needed to verify incoming signature...

List Collections

Lists all available curated model collections on Replicate.

List Hardware

Shows a list of available hardware SKUs and their corresponding descriptions for...

List Model Versions

Lists all historical versions associated with a given model identifier.

List Predictions

Fetches a list of the most recent prediction jobs run against your account.

Search Models

Searches Replicate's public catalog for models based on keywords or filters.

Update Model

Changes the metadata (like description or tags) associated with an existing model...

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Claude AI

1

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

2

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

3

Start a conversation

Open a new chat. The Replicate Extended integration is available immediately — no restart needed.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

  • Import from OpenAPI, Swagger, or YAML specs
  • Create Agent Skills with progressive disclosure
  • Deploy to edge with MCPFusion framework
  • Built in DLP, auth, and compliance on every call
  • Real time usage dashboard and cost metering
  • Publish to catalog or keep private
Start building

Make Your AI Do More

Start with Replicate, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

  • Use this MCP plus 5,100+ others, all in one place
  • Add new capabilities to your AI anytime you want
  • Every connection is secured and compliant automatically
  • Track usage and costs across all your servers
  • Works with Claude, ChatGPT, Cursor, and more
  • New servers added to the catalog every week
Replicate Extended MCP server cover

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Replicate. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 20 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Manually managing ML endpoints is a nightmare. Seriously., Solved with Vinkius AI Gateway

Today, setting up an AI inference endpoint means jumping through hoops: checking the documentation for the right model ID, figuring out the required input format (the schema), and then manually calling the prediction API while praying it doesn't time out. You spend half your time debugging connection issues instead of improving the prompt.

With Replicate Extended, you just ask your agent to run a model. It handles all that setup—it uses `get_model_version` to validate inputs and then executes `create_prediction`. The whole process is wrapped up in one clean tool call.

The Replicate Extended MCP Server: Model & Prediction Ops

You eliminate the need to copy-paste model IDs, track down deprecated versions, or manually set up scaling rules in a separate cloud console. The agent manages all of this behind the scenes.

Your AI client can now treat your entire ML pipeline—from initial `search_models` lookup to final prediction output—as one single conversation flow. It's that simple.

What your AI can actually do with this

This server connects your agent straight into Replicate's model infrastructure. You can run complex machine learning models—like stable diffusion or text generation—and manage their entire lifecycle using simple commands through your AI client.

Discovering Assets:
You can search the public catalog for ML tools using search_models based on keywords, and you'll find all available curated model groups by listing them with list_collections. To get the deep details on a specific collection or general model, use get_collection or get_model. You'll also check out your own user profile info using get_account, and if you need to verify incoming signatures for webhooks, call get_webhook_secret.

Running Predictions:
The core job is running models. Use create_prediction to kick off a new prediction run with specific input arguments and model versions. If you've already set up a dedicated deployment endpoint, you can run the task directly using create_deployment_prediction. Once that prediction is done, you check the output status and get the final data by calling get_prediction.

You also track all recent activity by listing past job runs with list_predictions.

Training Models:
When you need to fine-tune an existing base model on custom data, start that work using create_training. To keep tabs on how that training job is going, call get_training, which gives you the current status and progress metrics. You can also manage your own assets by calling create_model to register a new model within your account, or update its metadata like tags and descriptions with update_model.

If an existing model needs a tweak, use get_model_version to grab its detailed input schema and parameters.

Managing Deployments:
For production models, you've got two main tools. You can set up a brand new private deployment using create_deployment, letting you dictate custom autoscaling rules. Once that's live, running predictions through the dedicated endpoint is what create_deployment_prediction does. You can also see what hardware options are available for deployments by listing them with list_hardware.

Model Lifecycles:
You control everything from creation to deletion. After you create a model or version, you might need to clean up old stuff; use delete_model_version to remove specific historical versions of an asset. You can also get a full list of every version tied to a model with list_model_versions. If you're done with a prediction job that's still running, call cancel_prediction to stop it immediately.

Tracking and Viewing Info:
You'll use get_model for general info on any ML asset. To get granular details about model versions, you gotta hit up list_model_versions. You can also pull a list of all available curated collections using list_collections, which helps when you need to browse multiple assets.

This gives your agent the full toolkit: from searching for models with search_models to setting them live with dedicated deployments, managing their version history, and keeping tabs on every single prediction or training job.

Built · Hosted · Managed by Vinkius Replicate Extended MCP Server - ML Workflow Automation
Server ID 019e5d4e-b0ee-738a-8d03-c5933f877cb8
Vinkius Inspector
Compliance Grade A+
Score 98.33/100
Vinkius Inspector Badge — Score 98.33/100

Questions you might have

How do I find out what inputs a model needs before running `create_prediction`? +

Run the get_model_version tool for that specific model ID. The output provides the full OpenAPI schema, showing exactly which parameters and data types are required.

Can I test a new model without setting up a permanent deployment? +

Yes. You can use create_prediction directly to run single-shot tests on any available version. This is great for initial benchmarking, but remember to use create_deployment for production scale.

What if a prediction fails and I need the logs? +

Use get_prediction with the ID of the failed run. The tool returns the status and often includes detailed error messages or log snippets, helping you pinpoint the failure source.

How do I list all model versions for a single model? +

The list_model_versions tool takes the model ID as input. It returns an array of every historical version that has been deployed or tested on that specific model.

What tool do I use to check what hardware SKUs are available for my deployment? +

Run list_hardware first. This command pulls all available hardware specifications, letting you pick the right compute power before creating any deployments or predictions.

How can I verify that an incoming webhook calling my endpoint was genuinely from Replicate? +

Use get_webhook_secret to fetch your default secret key. You must use this key to validate all incoming signatures, ensuring the request truly came from Replicate and wasn't spoofed.

I finished training a new model; what tool tracks its progress? +

You check the job status with get_training. This function provides real-time updates on your fine-tuning session, so you always know if your model is still running or stalled.

If I just need to change a model's description without recreating it, which tool do I use? +

You'll use the update_model tool. This lets you change metadata for an existing public model without having to rebuild or re-upload the entire asset.

How can I check if my prediction has finished and see the output? +

Use the get_prediction tool with your Prediction ID. It will return the current status (starting, processing, succeeded, or failed) along with the output URLs or data once completed.

Can I search for specific types of models like 'image-to-text'? +

Yes! Use the search_models tool with your query. It will return a list of public models matching your terms, including their owners and descriptions.

Is it possible to stop a model that is taking too long to run? +

Absolutely. Use the cancel_prediction tool with the target Prediction ID to immediately stop the execution and prevent further usage costs.

Built & Managed by Vinkius 30s setup 20 tools

We've already built the connector for Replicate Extended. Just plug in your AI agents and start using Vinkius.

No hosting. No infrastructure. No complex setup.
All 20 tools are live and waiting. You're up and running in seconds.

Vinkius runs on Claude Claude
Vinkius runs on ChatGPT ChatGPT
Vinkius runs on Cursor Cursor
Vinkius runs on Gemini Gemini
Vinkius runs on Windsurf Windsurf
Vinkius runs on VS Code VS Code
Vinkius runs on JetBrains JetBrains
Vinkius runs on Vercel Vercel
+ other MCP clients

Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.

Zero hosting required Full MCP catalog included Enterprise-grade security Auto-updated by Vinkius

Built, hosted, and secured by Vinkius. You just connect and go.