Modelbit MCP for AI. Run proprietary ML models directly from chat logic.

Q: Can I test a model version before deploying it?

The getinference tool supports versioning. You can specify tags (like 'v2') to ensure you are always testing against a known, stable model iteration.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

How this MCP server connects to your AI agent

get_inference calls any deployed Modelbit machine learning model directly from your AI agent. You pass structured data—like complex JSON arrays or specific parameters—and immediately receive computed predictions.

It eliminates the need to build custom wrapper code just to test proprietary ML logic inside an LLM chat.

What AI agents can do with Modelbit (ML Model Deployments) Automation

Get inference

Calls a deployed Modelbit machine learning model with specific input parameters, returning structured predictions or computed outputs.

Execute Production Models

The tool runs models built with various frameworks (Python, PyTorch, Scikit-learn) through a single call.

Pass Structured Data

You send complex JSON objects or arrays directly to the model for processing.

Enforce Version Control

Specify exact model versions or tags (e.g., 'v2' or 'latest') ensuring results are always reproducible.

Receive Computed Results

The agent gets the final, calculated output from the model instantly in a structured format.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

What AI agents can do with Modelbit (ML Model Deployments) MCP Server: 1 Tool

The `get_inference` tool allows your AI client to execute deployed machine learning models, passing structured data and getting computed results back instantly.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Modelbit (ML Model Deployments) on Vinkius

Get Inference

Calls a deployed Modelbit machine learning model with specific input parameters, returning structured predictions or computed outputs.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Modelbit integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "modelbit-ml-model-deployments": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Modelbit tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"modelbit-ml-model-deployments": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Modelbit (ML Model Deployments), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Modelbit. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 1 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Running ML logic today means jumping between dashboards and API calls., Solved with Vinkius AI Gateway

Right now, if your AI agent needs a prediction, you're out of luck. You have to leave the chat, jump into the Modelbit dashboard (or whatever service hosts the model), manually input parameters, hit 'run,' copy the resulting JSON/number, and paste it back into the conversation. It’s slow, error-prone, and breaks the flow.

With this MCP server, that whole process vanishes. You tell your agent to run `get_inference`. The agent handles calling Modelbit, passing the complex data, waiting for the prediction, and presenting the final result—all within one conversation thread.

Modelbit (ML Model Deployments) MCP Server: get_inference

The main pain point that disappears is the manual handoff of data. You no longer need to write custom Python or Javascript glue code in your application layer just to manage the input/output between your agent and the model's API.

You just call `get_inference`. The server handles the communication protocol, versioning, and structured data passing. It lets you treat a complex ML pipeline like it's just another function call.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

machine-learning

mlops

inference

model-deployment

python-models

What your AI can actually do with this

When you use get_inference, your AI client calls any deployed Modelbit machine learning model right from your agent. You pass structured data—whether it's a complex array of pixels or specific parameters—and immediately get computed predictions back in a clean, usable format. This tool lets you run proprietary ML logic inside an LLM chat without having to write custom wrapper code just for testing.

This server executes production-grade models built using diverse frameworks. You don't care if your model was written in PyTorch, Scikit-learn, or plain Python; the get_inference tool handles running it all through a single call. This means you can test out sophisticated data science concepts directly within your chat flow, treating the ML model like just another function available to your agent.

Passing structured data is key here. You're not sending vague text prompts; you're giving the model exact inputs. You can send complex JSON objects or entire arrays of values, and the tool processes that structure directly. This capability means if your workflow requires analyzing a specific set of coordinates or processing multiple related data points simultaneously, the agent handles it by feeding those structured payloads straight into the deployed model.

The system guarantees reproducibility through version control. When you call get_inference, you specify exact model versions or tags—say, 'v2' or maybe 'latest'—and that ensures the results are always consistent and predictable. You won't run into the headache of an unpredictable output because the underlying model definition is locked down for your session.

When the computation finishes, you receive computed results instantly. The agent doesn't get a messy block of text; it gets the final, calculated output in a structured format that your client can read and act on immediately. This direct access to clean data means you can build complex decision-making paths within your conversational AI, making those predictions part of the ongoing dialogue.

The get_inference tool handles everything from execution across multiple ML frameworks—Python, PyTorch, Scikit-learn—to accepting highly structured inputs like JSON arrays. It ensures that every call is versioned and returns a clean, computable output ready for your agent to use in its next step. You're essentially making the model an active component of your workflow, not just something you mention passing data to.

This means if your application needs deep ML analysis—say, predicting stock movement based on historical JSON inputs or classifying images using a PyTorch-trained network—you don't need a separate API layer. You simply let your agent call get_inference, pass the structured payload, and get the computed prediction right back into the chat session.

It cuts out layers of integration complexity, letting you focus solely on the logic that needs to happen.

Built · Hosted · Managed by Vinkius Modelbit MCP Server - Run ML Models with get-inference

Server ID 019e5d36-d263-734a-bbfe-d289a41c27f0

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Here's how it actually works

The bottom line is: your AI client runs complex ML logic without needing custom API wrappers or external code execution.

Subscribe to the Modelbit server and provide your workspace credentials.

Tell your AI client (your agent) to run get_inference, specifying the exact model name, version tag, and the input data payload.

The agent executes the call, and you receive the computed prediction or result directly in the chat interface.

Who is this actually for?

ML Engineers who are tired of writing boilerplate glue code just to connect a model demo to an agent; Data Scientists who need to prove out proprietary algorithms within a chat interface; Product Teams prototyping AI features that rely on deep, custom machine learning logic.

Machine Learning Engineer

Integrates production-grade models into AI workflows by calling get_inference directly, skipping the need for custom API wrappers.

Data Scientist

Tests and showcases model outputs in an interactive chat environment, passing complex data structures to prove out algorithms.

Product Manager

Rapidly prototypes AI features that require proprietary ML logic by defining inputs and analyzing get_inference results within the agent interface.

What Changes When You Connect

No custom wrapper code. You connect your agent to production models and run them immediately via get_inference. This saves time building boilerplate API integration layers.

Reproducible outputs are guaranteed. By specifying model versions or tags (e.g., 'v2'), you ensure the results never change unexpectedly, which is critical for testing.

Handles complex data natively. You can pass structured inputs—like arrays of pixel values or multi-field JSON objects—straight to the model, letting it do the math.

Supports major frameworks. Whether your model uses PyTorch, Scikit-learn, or pure Python, Modelbit exposes it through one unified endpoint for get_inference.

Test proprietary logic instantly. You can showcase custom ML algorithms inside an AI chat interface without having to deploy a separate web service just for testing.

See it in action

01 01

Detecting Fraud in Transactions

A user asks, 'What's the risk score on this transaction?' The agent runs get_inference using the 'fraud_detection' model. It passes the transaction details (amount, time, region) as a JSON object and gets back a clear risk assessment, like 'low risk (score: 0.02).'

02 02

Classifying Satellite Imagery

You need to know what's in an image. The agent uses get_inference on the 'image_classifier' model, passing an array of pixel values. It returns a specific identification and confidence level (e.g., 'high-resolution satellite imagery with 98% confidence').

03 03

Forecasting Sales Revenue

A PM asks the agent to predict sales for Q4 North region. The agent calls get_inference on the 'sales_forecast' model, feeding it {'region': 'north', 'month': 12}. It immediately replies with a calculated revenue target: '$450,000.'

04 04

Validating Product Data

You give the agent raw data and ask it to validate it against your proprietary schema. The agent uses get_inference on a validation model, passing the JSON record. It returns whether the data is valid or lists exactly which fields failed.

The honest tradeoffs

Treating ML as general knowledge

Anti-pattern

Asking the agent, 'What's a good sales forecast?' The LLM will give a generic answer because it doesn't know your company’s data or model.

The Fix

You must explicitly call get_inference. You need to pass the specific data and tell the tool which proprietary model (e.g., 'sales_forecast') to run on that data.

Passing raw text when structured data is needed

Anti-pattern

Trying to get a risk score by just saying, 'This transaction seems risky.' The tool doesn't know the exact parameters required for your model.

The Fix

Always structure your input as JSON. Pass all necessary details (region, amount, time) in the data payload when calling get_inference.

Relying on default versions

Anti-pattern

Running a critical model and hoping it uses 'the latest one.' If Modelbit updates the backend, your results could change unexpectedly.

The Fix

Always specify the exact version or tag (e.g., v2 or 'stable') when calling get_inference. This guarantees reproducible outputs.

When It Fits, When It Doesn't

Use this server if you have existing, deployed machine learning models and need to expose their results to your AI agent’s reasoning flow. Specifically, if your task requires passing structured data (JSON) into a proprietary model endpoint for computation—like classification, forecasting, or scoring—this is the tool. Don't use it if your logic is simple enough that standard LLM prompting works; don't force get_inference just because you can. Also, don't use it if your models are currently only running in Jupyter notebooks and haven't been fully deployed to Modelbit endpoints yet—the model has to be live first.

Questions you might have

How do I set up get_inference for my first time? +

You subscribe to the server and enter your Modelbit Workspace name. If your models are private, you'll also need to provide your API Key in the setup panel.

Does get_inference support different model types (PyTorch vs Scikit-learn)? +

Yes. The server is built to connect to any deployed ML framework—Python, PyTorch, Scikit-learn, etc.—as long as it's exposed via Modelbit.

What data format must I use with get_inference? +

You must pass structured data. This means using JSON objects or arrays for the input payload when calling the tool, not just plain text.

Can I test a model version before deploying it? +

The get_inference tool supports versioning. You can specify tags (like 'v2') to ensure you are always testing against a known, stable model iteration.

What happens if an ML model fails or encounters bad data when I use get_inference? +

The agent receives a structured error message. Modelbit reports specific failure codes and stack traces, telling you exactly which part of the input failed. This lets your AI client retry the call with corrected parameters.

How do I secure my model calls when using get_inference in production? +

You must use a private Modelbit API Key for secure deployments. By entering this key, you restrict access to your specific workspace and models. This keeps proprietary logic protected from unauthorized client connections.

Are there limitations on the size of data I can pass to get_inference? +

While Modelbit handles complex JSON and arrays, input size depends on the model's specific requirements and general platform limits. For extremely large datasets, consider chunking the data or using a dedicated data pipeline before calling get_inference.

What factors affect the latency when I run get_inference? +

Latency is determined by three things: network speed, Modelbit's processing time, and the model itself. Complex models or massive input arrays will naturally take longer to compute than simple predictions.

Can I specify which version of a model to use for inference? +

Yes. When using the get_inference tool, you can provide an optional version string (e.g., 'v1', 'latest', or a specific tag) to target a precise deployment.

What format should the input data be in? +

The get_inference tool accepts a data parameter which should be a JSON object or array, matching the input schema expected by your Modelbit deployment.

Is an API Key required for all models? +

The MODELBIT_API_KEY is optional. It is only required if your Modelbit deployment is private. Public deployments only require the MODELBIT_WORKSPACE name.

How this MCP server connects to your AI agent

What AI agents can do with Modelbit (ML Model Deployments) Automation

Get inference

What AI agents can do with Modelbit (ML Model Deployments) MCP Server: 1 Tool

Get Inference

Calls a deployed Modelbit machine learning model with specific input parameters, returning structured predictions or computed outputs.

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Built on the Model Context Protocol (MCP) for Claude, ChatGPT, Cursor, and more

Running ML logic today means jumping between dashboards and API calls., Solved with Vinkius AI Gateway

Modelbit (ML Model Deployments) MCP Server: get_inference

machine-learning

mlops

inference

model-deployment

python-models

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

Detecting Fraud in Transactions

Classifying Satellite Imagery

Forecasting Sales Revenue

Validating Product Data

The honest tradeoffs

Treating ML as general knowledge

Passing raw text when structured data is needed

Relying on default versions

When It Fits, When It Doesn't

Questions you might have