Baseten MCP. Manage ML deployments and run predictions from your chat.

Q: How do I list all my Baseten models using the listmodels tool?

You simply ask your agent to list models. The agent calls listmodels and returns a list of all managed models and their IDs. This is the starting point for any Baseten task.

Q: Can I use predict if I don't know the model ID?

No. You must first use listmodels to get the correct model ID. Once you have the ID, you can use getmodel to validate its configuration before calling predict.

Q: What does getdeployment actually show me?

getdeployment shows the live, granular status of a single deployment. It gives you details like replica counts, autoscaling rules, and the specific version currently running.

Q: Is it safe to check secrets using listsecrets?

Yes. listsecrets lists the names of the environment secrets. It confirms they are provisioned without ever exposing the actual secret values, keeping your environment secure.

Q: Does listdeployments show all models or just active deployments?

listdeployments shows all active deployments tied to a specific model. It's focused on the running infrastructure, not the model definition itself.

Q: How do I check the current replica state and autoscaling settings using getdeployment?

The getdeployment tool provides detailed resource metrics. You see the exact replica count, the autoscaling rules, and the specific deployment version in one call. This helps SREs verify the infrastructure state without logging into the dashboard.

Q: If my prediction fails, what should I check using predict?

First, check the model ID and input structure. The predict tool requires explicit tensor shapes or JSON dictionaries that match the deployed instance. Errors usually point to a payload mismatch or an invalid target model.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Baseten MCP Server gives you full ML-Ops control over your models. Use it to list, manage, and deploy Baseten models and run serverless predictions directly from your AI client.

It lets you inspect deployment states, fetch configurations, and execute predictions by pushing tensors or JSON payloads against active GPU weights.

No need to jump between IDEs and terminal windows to manage your AI infrastructure.

What your AI agents can do

Get deployment

Retrieves detailed status and configuration for a specific, running deployment instance.

Get model

Fetches specific metadata and configuration details for a named Baseten model.

List deployments

Lists all active deployment instances that match a specific model ID.

+ 3 more capabilities included

Discover and list models

Uses list_models to retrieve a comprehensive list of all Baseten managed models.

Get specific model details

Uses get_model to fetch specific metadata and configurations for a named Baseten model.

Track deployment instances

Uses list_deployments to find all active inference deployments tied to a specific model.

Check live deployment status

Uses get_deployment to pull granular details about a single, running deployment instance.

Run inference predictions

Uses predict to execute a serverless model prediction by passing structured tensor or JSON data.

Audit workspace secrets

Uses list_secrets to securely list environment secrets without exposing their actual values.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Baseten MCP Server: 6 Tools for ML-Ops

Use these tools to manage Baseten models, check deployment status, and execute predictions without leaving your chat or IDE.

get019d7558

get deployment

Retrieves detailed status and configuration for a specific, running deployment instance.

get019d7558

get model

Fetches specific metadata and configuration details for a named Baseten model.

list019d7558

list deployments

Lists all active deployment instances that match a specific model ID.

list019d7558

list models

Lists all models currently managed within your Baseten account.

list019d7558

list secrets

Lists all active workspace environment secrets without revealing their values.

action019d7558

predict

Runs a serverless model inference prediction using provided tensor or JSON input.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Baseten, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Your AI client gives you full ML-Ops control over your models. You'll use this server to list, manage, and deploy Baseten models and run serverless predictions right from your agent. You can check deployment states, fetch configurations, and execute predictions by pushing tensors or JSON payloads straight against active GPU weights.

You don't gotta jump between IDEs and terminal windows just to manage your AI infrastructure.

Discovering Models

You can run list_models to pull a complete list of every model Baseten manages for you.
You'll use get_model to fetch specific metadata and configurations for any named Baseten model.

Managing Deployments

list_deployments pulls a list of all active deployment instances linked to a specific model ID.
You'll use get_deployment to pull granular details about a single, running deployment instance.

Running Predictions and Checking Secrets

You'll run predict to execute a serverless model prediction. Just pass structured tensor or JSON data to get the results.
You can use list_secrets to securely list all environment secrets in your workspace without seeing the actual values.

How Baseten MCP Works

1 Subscribe to the Baseten MCP Server and input your Baseten API Key.
2 Your AI client connects, allowing your agent to issue commands like list_models or get_deployment.
3 The server executes the tool call, pulling the live ML-Ops data and returning a structured JSON response to your agent.

The bottom line is, your agent becomes a single ML Operator that talks directly to your Baseten account, letting you manage your entire model lifecycle from one place.

Who Is Baseten MCP For?

This is for the ML Engineer who hates context switching. You're tired of running a command in your terminal, checking the status in a web dashboard, and then pasting the results back into a notebook. The Ops Engineer who needs to audit resource limits, or the AI Researcher who needs to quickly validate version schemas—this server lets you handle the whole workflow without leaving your IDE.

ML Engineer

Uses predict to run test payloads instantly, skipping the setup of local Python notebooks, or uses list_models to check which models are available.

DevOps/SRE

Uses list_deployments and get_deployment to audit running resources and reliably check replica states directly from their core IDE.

AI Researcher

Uses list_models and get_model to inspect version schemas and manage inference pipeline architectures quickly.

What Changes When You Connect

Full-Cycle Control: Use list_models and list_deployments to discover available models and track every active deployment instance, all from one conversation window.
Instant Testing: Run predictions with predict by sending structured payloads. You get real-time results without spinning up local Python notebooks.
Deep Inspection: Need to know exactly how a deployment is running? get_deployment gives you the granular details you need, including replica and autoscaling settings.
Audit Readiness: Use list_secrets to audit environment secrets. You confirm what's provisioned without the agent ever seeing the actual plaintext values.
Targeted Discovery: Instead of listing everything, list_models lets you filter and check for specific model IDs, streamlining the initial discovery process.
Efficiency: Your agent handles the complex coordination. It calls list_models, then get_model, and then prepares the necessary context for predict—all in one sequence.

Real-World Use Cases

Testing a new model version

A researcher wants to test Model X before deploying it. Instead of writing a local script, they ask their agent to use get_model to confirm the schema, and then use predict with sample JSON data. The agent runs the test and returns the result immediately, confirming the model is ready.

Checking infrastructure health

The SRE needs to verify if the production deployment for the fraud detection model is running correctly. They ask the agent to use list_deployments to find the active IDs, then get_deployment on the primary ID to check the replica count and autoscaling rules. This confirms the service health without logging into the dashboard.

Debugging a failed inference

An ML Engineer knows a prediction failed but can't tell why. They ask the agent to use get_model to fetch the model's full configuration, and then use list_secrets to verify if the required API keys are correctly provisioned in the workspace. They find the missing secret.

Updating a workflow dependency

A developer needs to update a pipeline that relies on Model Y. They first use list_models to confirm Model Y's ID, then get_model to check its latest version schema, and finally use predict to ensure the input/output contract hasn't changed.

The Tradeoffs

Sequential API Calls

Calling list_models to get a list of IDs, then writing separate code blocks to call get_model for each ID, and repeating that for deployments.

→ Let your agent handle the sequence. Start with a prompt like, 'Show me the status of all models and their deployments.' The agent will intelligently call list_models and list_deployments to give you the aggregated view.

Assuming Model State

Running predict without first confirming the deployment is active or retrieving the model ID via list_models. This results in an immediate 404 or a failed prediction.

→ Always confirm the model exists first. Use list_models to find the ID, then use get_model to validate the details before attempting to run predict.

Ignoring Secrets Management

Assuming all necessary API keys are available in the environment without checking the secure workspace settings.

→ Always check the required environment variables first. Use list_secrets to audit the workspace and confirm that the necessary credentials are provisioned and available to the agent.

When It Fits, When It Doesn't

Use this server if your workflow requires managing the full ML-Ops lifecycle: discovery, configuration auditing, and live inference. You need a single point of truth to track model versions, deployment health, and secrets.

Don't use this if you just need to run a single, isolated prediction on a known endpoint. If that's the case, a simple API client might suffice. But if the prediction depends on knowing the model's version or the deployment's replica count, you need the full context provided by the get_model, list_deployments, and get_deployment tools.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Baseten. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_deployment get_model list_deployments list_models list_secrets predict

Managing ML models used to mean jumping between five different dashboards.

Today, checking a model's status is a headache. You check the web UI for deployment status. Then you open the CLI to see the active replicas. If you need to validate the model's schema, you copy the ID into a separate notebook. If you need the secrets, you hit a third dashboard. It's a mess of copy-pasting and context switches.

With the Baseten MCP Server, that whole process collapses. Your agent handles the coordination. You ask it to 'Show me the status of Model X,' and it runs `list_models`, `list_deployments`, and `get_deployment`—all in the background. You get one clean, actionable answer.

Baseten MCP Server: Run predictions with `predict`

You no longer need to manually format JSON payloads or worry about the exact tensor shape required for every prediction. The agent handles the payload structure based on the model's metadata, allowing you to focus only on the input data.

The difference is simplicity. You just ask for the prediction, and the server executes it directly against the live GPU weights. It's an immediate, single-step command.

Common Questions About Baseten MCP

How do I list all my Baseten models using the `list_models` tool? +

You simply ask your agent to list models. The agent calls list_models and returns a list of all managed models and their IDs. This is the starting point for any Baseten task.

Can I use `predict` if I don't know the model ID? +

No. You must first use list_models to get the correct model ID. Once you have the ID, you can use get_model to validate its configuration before calling predict.

What does `get_deployment` actually show me? +

get_deployment shows the live, granular status of a single deployment. It gives you details like replica counts, autoscaling rules, and the specific version currently running.

Is it safe to check secrets using `list_secrets`? +

Yes. list_secrets lists the names of the environment secrets. It confirms they are provisioned without ever exposing the actual secret values, keeping your environment secure.

Does `list_deployments` show all models or just active deployments? +

list_deployments shows all active deployments tied to a specific model. It's focused on the running infrastructure, not the model definition itself.

How do I check the current replica state and autoscaling settings using `get_deployment`? +

The get_deployment tool provides detailed resource metrics. You see the exact replica count, the autoscaling rules, and the specific deployment version in one call. This helps SREs verify the infrastructure state without logging into the dashboard.

What information does `list_deployments` provide about the active inference boundaries? +

It lists active inference bounds matching a specific model ID. This tells you which deployment instances are currently running and accessible. It's key for understanding the scope of your live ML infrastructure.

If my prediction fails, what should I check using `predict`? +

First, check the model ID and input structure. The predict tool requires explicit tensor shapes or JSON dictionaries that match the deployed instance. Errors usually point to a payload mismatch or an invalid target model.

Can the AI agent run a prediction directly against my hosted model? +

Yes. By pushing a correctly formatted JSON payload to the 'predict' tool, the agent securely triggers inference on the GPU instances, returning the exact calculated response data transparently to your editor context.

Is my workspace and environmental secret data kept safe? +

Baseten secret fetching natively obscures variable values. When you use 'list_secrets', the agent simply evaluates the key names and identifiers existing across your environment to verify configurations without exposing plaintext passwords.

How do I check auto-scaling configurations for an explicitly deployed model? +

You can examine exactly how instances are managed by using 'get_deployment'. Tell the agent to target an active deployment ID and it maps the scaling limits, replica status, and container bounds out-of-the-box.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript