MLflow MCP for AI. Audit model lineage and performance via conversation.

Q: How do I check if a model version was promoted correctly using searchregisteredmodels?

The searchregisteredmodels tool lets you query the Global Model Registry. You simply ask for models marked 'Production,' and the agent confirms which versions are live, giving you immediate status validation.

Q: What metrics can I get from a single run using getrun?

The getrun tool pulls all recorded parameters and performance metrics for that specific run ID. This includes loss curves, accuracy scores, and any custom scalar values logged during the session.

Q: Do I need to use listartifacts if I just want the model file?

Yes. While you know the model exists, listartifacts provides a complete manifest of every physical asset—the model blob and any associated graphs or YAML files—ensuring you retrieve the whole package.

Q: Can I compare metrics across multiple experiments using searchruns?

Yes, searchruns lets you query runs based on criteria that span multiple experiments. You can filter for 'all runs with loss

Q: How do I use getrun to check the exact hyperparameters used for a specific model training session?

You specify the Run ID when calling getrun. This function returns all logged parameters, including the precise hyperparameter values that defined that atomic run.

Q: What is the difference between searchexperiments and listartifacts in terms of scope?

Searchexperiments lists every registered experiment ID available. Listartifacts requires a specific Run ID to show files saved within it; they serve completely different tracking purposes.

Q: When should I use searchruns instead of getrun?

Use searchruns when you need an overview—like finding all runs for a given experiment or date range. Use getrun only if you already have the exact, unique Run ID.

Q: Does listartifacts show metadata alongside the actual model file blob?

Yes, listartifacts shows both the physical location and associated metadata for every saved item. You see which files are stored, plus details about those artifacts.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

MLflow MCP Server gives your AI client full control over complex machine learning lifecycles. You track training runs, audit model versions in the registry, and inspect performance metrics—all via natural conversation.

It lets you pinpoint exactly which run worked best and why it failed, without ever needing to open a dashboard or write boilerplate code.

What your AI can do

Search experiments

Searches and lists details for every registered MLflow experiment in the system.

Get experiment

Retrieves all configuration details for a specific MLflow Experiment by its unique ID.

Search runs

Finds specific training runs across multiple experiments based on criteria like date or metric threshold.

+ 3 more capabilities included

Search for specific training runs

Find model performance metrics by searching across multiple experiments using the search_runs tool.

Audit registered experiments and metadata

View all registered MLflow experiments and pull detailed configuration data using the search_experiments tool.

Get metrics for a single run

Retrieve parameters and performance metrics associated with one specific atomic training run ID via get_run.

Locate production model versions

Query the Global Model Registry to find models marked as Production or Staging using search_registered_models.

View saved files and artifacts

List all physical storage artifacts associated with a specific run ID by calling list_artifacts.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

MLflow (ML Lifecycle Management) MCP Server: 6 Tools for MLOps

These six tools let you query the MLflow server to search experiments, track runs, audit model registries, and inspect artifact lineage using your AI client.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using MLflow (ML Lifecycle Management) on Vinkius

Search Experiments

Searches and lists details for every registered MLflow experiment in the system.

Get Experiment

Retrieves all configuration details for a specific MLflow Experiment by its unique...

Search Runs

Finds specific training runs across multiple experiments based on criteria like date...

Get Run

Pulls the metrics and parameters logged during one precise, atomic training run...

Search Registered Models

Queries the global Model Registry to find model names, versions, and their current...

List Artifacts

Lists all physical files (blobs) saved to disk that belong to a specific model run ID.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The MLflow integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "mlflow-ml-lifecycle-management": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the MLflow tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"mlflow-ml-lifecycle-management": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with MLflow (ML Lifecycle Management), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by MLflow. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 6 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Manually tracking model performance means jumping between dashboards and exporting CSVs.

Today, finding a single answer requires clicking through the MLflow UI: checking the Experiment list, then selecting a Run ID, opening its metrics tab, cross-referencing loss curves in one chart, and finally downloading parameters from another. It's slow, tedious, and easy to miss crucial context.

With this MCP server, you just talk to your agent. You say: 'Show me the runs where accuracy was over 90%.' The agent uses `search_runs` and pulls the filtered list instantly, giving you a clean, actionable table right in your chat window.

MLflow MCP Server: Audit model versions with `search_registered_models`

Before this server, knowing which version was 'Production' meant checking a specific badge or relying on the deployment team's checklist. If that metadata was wrong or outdated, you risked deploying a bad model.

Now, your agent uses `search_registered_models` to give you a definitive list of what is officially marked as Production, Staging, or Archived. It’s immediate validation for your entire MLOps pipeline.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Look, forget those clunky dashboards and writing boilerplate code just to check if your model worked. This server hooks up your AI client directly to your MLflow tracking system, giving your agent full control over every damn thing in your machine learning lifecycle. You can track training runs, audit model versions stored in the registry, and inspect performance metrics—all by just talking to it.

It lets you nail down exactly which run was trash or which one actually hit the mark, no sweat.

Search for specific training runs: Need to know what happened across ten different experiments? You use search_runs to find specific training instances across multiple projects. You can filter those results based on dates or even a metric threshold, instantly pulling up all relevant runs you need to check. Audit registered experiments and metadata: Want a full picture of your research mess? Use search_experiments to list every single MLflow experiment recorded in the system.

If you need more detail, calling get_experiment with a unique ID pulls all the configuration details for that specific experiment.

Get metrics for a single run: When you zero in on one atomic training session, you use get_run. This tool grabs every parameter and performance metric logged during that single run instance. It's how you check the exact state vectors or loss curves to figure out why it stalled out. Locate production model versions: Don't guess if your model is ready for deployment.

You query the Global Model Registry using search_registered_models. This tells you what models are marked as Production or Staging, letting you track version deployments securely before they hit the main pipeline.

View saved files and artifacts: Every run saves some physical garbage—that’s called an artifact. To see those files, you call list_artifacts using a specific run ID. This lists every blob of data or file saved to disk that belongs to that model run. You can check the image graphs, metadata, or any other physical storage reference right there in the chat.

How it works: Just connect this server on Vinkius and give your agent access. Your AI client handles all the complex queries behind the scenes. When you ask a question—like, 'What were the parameters for the run that hit 92% accuracy last week?'—the agent uses these tools to pull the data directly from MLflow.

You don't write SQL; you just talk shop and get answers.

Built · Hosted · Managed by Vinkius MLflow MCP Server - Track Model Runs & Metrics

Server ID 019d75d6-3d7f-73d7-8f7a-41a4a42f180b

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

Who is this actually for?

This is for Data Scientists who get stuck clicking through dozens of dashboard tabs just to find a single metric. It's also for MLOps Engineers who need to audit production model lineage quickly, without running manual scripts or fighting with complex web interfaces.

Data Scientist

Uses it to compare loss curves and performance metrics across 10 different experiments by asking the agent directly. They don't want to manually map out every parameter.

ML Engineer

Uses it to audit the model registry, checking which versions are marked as 'Staging' or 'Production' and verifying artifact storage locations via list_artifacts.

AI Operations Team

Uses it for troubleshooting. If a production model fails, they ask the agent to pull detailed metrics from the source run ID (get_run) to see exactly what went wrong.

What Changes When You Connect

Pinpoint failure causes. Instead of manually checking dashboards, you ask your agent to run search_runs and get the specific metrics for the failed run ID, telling you exactly what parameters dropped off.

Verify production readiness instantly. Use search_registered_models to see if a model is truly marked 'Production' or if it’s just sitting in an unverified state. This cuts down on deployment risk.

Map out research branches easily. The search_experiments tool lets you list all project experiments, giving you a clear overview of the entire ML pipeline without clicking into every folder.

Track model components. When you find a good run ID, use list_artifacts to get a manifest of every file saved—the pickled model, the confusion matrix, the config YAML. No more guessing what's in the directory.

Deep dive on metrics. The get_run tool pulls raw parameters and performance metrics for a single run. You can feed this data directly into your agent for immediate analysis.

See it in action

01 01

Debugging model decay

The ops engineer notices the production accuracy dropped by 2%. They ask their agent to use search_runs to pull all runs from the last month, filtering for performance metrics below a certain threshold. The agent finds run 'xyz-456', and using get_run, pulls the parameter logs showing the exact hyperparameter that drifted.

02 02

Verifying deployment source

The data scientist needs to know which model version is currently serving predictions. They use search_registered_models and confirm 'Customer-Churn-Classifier' v12 is marked Production. They then ask the agent to pull the source run ID from that registry entry, ensuring full traceability.

03 03

Understanding project scope

A new team member needs context on all past research efforts. They use search_experiments and get a list of every experiment ever run—'Sentiment Analysis,' 'Image Segmentation v1,' etc.—allowing them to understand the full history without relying on tribal knowledge.

04 04

Gathering model inputs

The ML engineer needs all source files from a successful run. They provide the Run ID and ask the agent to execute list_artifacts. The agent returns a list of every file, including the model blob (model.pkl) and the environment config (conda.yaml), ensuring nothing is missed.

The honest tradeoffs

Treating it like a search engine

Anti-pattern

Trying to ask 'Show me the best model for customer churn' without providing enough context, resulting in vague or overly broad results.

The Fix

You need to narrow your scope. First, use search_registered_models to find specific models (e.g., 'Customer-Churn-Classifier'). Then, ask the agent to compare runs using search_runs against a metric like 'AUC' to pinpoint the best version.

Ignoring lineage

Anti-pattern

Finding a good model run but not knowing what files were saved or which environment it used.

The Fix

Never just rely on the metrics. After identifying a promising run ID, immediately call list_artifacts to get the manifest of all associated files and use get_run for the full parameter details.

Asking for 'all data'

Anti-pattern

Requesting every metric or artifact from an experiment (e.g., 'Give me everything about Sentiment Analysis'). This floods the conversation with irrelevant noise.

The Fix

Be specific. Use get_experiment to get metadata, and then use search_runs combined with a precise filter—like 'Search for runs in Sentiment Analysis where loss < 0.1'—to focus the results.

When It Fits, When It Doesn't

Use this server if your main pain point is correlating model performance data across time, experiments, or versions. If you are an ML practitioner who needs to answer questions like 'Why did v5 fail when v4 succeeded?' using conversational queries, this tool is necessary. The key is understanding that it's a data query layer, not a dashboard replacement. You must know which tools to call. Don't use it if your only need is simple file storage; for that, a dedicated cloud artifact service works better. If you are just starting out and don't know the right metrics or run IDs, start with search_experiments to map out your project scope first.

Questions you might have

How do I check if a model version was promoted correctly using search_registered_models? +

The search_registered_models tool lets you query the Global Model Registry. You simply ask for models marked 'Production,' and the agent confirms which versions are live, giving you immediate status validation.

What metrics can I get from a single run using get_run? +

The get_run tool pulls all recorded parameters and performance metrics for that specific run ID. This includes loss curves, accuracy scores, and any custom scalar values logged during the session.

Do I need to use list_artifacts if I just want the model file? +

Yes. While you know the model exists, list_artifacts provides a complete manifest of every physical asset—the model blob and any associated graphs or YAML files—ensuring you retrieve the whole package.

Can I compare metrics across multiple experiments using search_runs? +

Yes, search_runs lets you query runs based on criteria that span multiple experiments. You can filter for 'all runs with loss < 0.1' to quickly compare performance trends system-wide.

How do I use get_run to check the exact hyperparameters used for a specific model training session? +

You specify the Run ID when calling get_run. This function returns all logged parameters, including the precise hyperparameter values that defined that atomic run.

What is the difference between search_experiments and list_artifacts in terms of scope? +

Search_experiments lists every registered experiment ID available. List_artifacts requires a specific Run ID to show files saved within it; they serve completely different tracking purposes.

When should I use search_runs instead of get_run? +

Use search_runs when you need an overview—like finding all runs for a given experiment or date range. Use get_run only if you already have the exact, unique Run ID.

Does list_artifacts show metadata alongside the actual model file blob? +

Yes, list_artifacts shows both the physical location and associated metadata for every saved item. You see which files are stored, plus details about those artifacts.

Can I see the metrics for a specific training run through my agent? +

Yes. Use the get_run tool with a specific Run ID. Your agent will retrieve the detailed telemetry logged during that training session, including scalars like accuracy, loss, or any custom performance metrics you've defined.

How do I check which models are ready for production in the registry? +

The search_registered_models tool allows your agent to query the global model registry. You can identify models that have been explicitly promoted to production or staging environments, helping you track deployment states across your project.

Can my agent list the plots or model files saved in a specific run? +

Absolutely. Use the list_artifacts tool with a specific Run ID. Your agent will report all physical storage boundaries, including stored model blobs (e.g., .pkl, .h5) and saved image plots, ensuring you can locate critical training artifacts instantly.

Connect to your AI in seconds.

Search experiments

Get experiment

Search runs

MLflow (ML Lifecycle Management) MCP Server: 6 Tools for MLOps

Make your AI actually useful.

Search Experiments

Get Experiment

Search Runs

Get Run

Search Registered Models

List Artifacts

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Manually tracking model performance means jumping between dashboards and exporting CSVs.

MLflow MCP Server: Audit model versions with `search_registered_models`

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Debugging model decay

Verifying deployment source

Understanding project scope

Gathering model inputs

The honest tradeoffs

Treating it like a search engine

Ignoring lineage

Asking for 'all data'

When It Fits, When It Doesn't

Questions you might have