# MLflow MCP

> MLflow MCP Server gives your AI client full control over complex machine learning lifecycles. You track training runs, audit model versions in the registry, and inspect performance metrics—all via natural conversation. It lets you pinpoint exactly which run worked best and why it failed, without ever needing to open a dashboard or write boilerplate code.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** ml-lifecycle, experiment-tracking, model-registry, data-science, reproducibility, mlops

## Description

Look, forget those clunky dashboards and writing boilerplate code just to check if your model worked. This server hooks up your AI client directly to your MLflow tracking system, giving your agent full control over every damn thing in your machine learning lifecycle. You can track training runs, audit model versions stored in the registry, and inspect performance metrics—all by just talking to it. It lets you nail down exactly which run was trash or which one actually hit the mark, no sweat.

**Search for specific training runs:** Need to know what happened across ten different experiments? You use `search_runs` to find specific training instances across multiple projects. You can filter those results based on dates or even a metric threshold, instantly pulling up all relevant runs you need to check. **Audit registered experiments and metadata:** Want a full picture of your research mess? Use `search_experiments` to list every single MLflow experiment recorded in the system. If you need more detail, calling `get_experiment` with a unique ID pulls all the configuration details for that specific experiment.

**Get metrics for a single run:** When you zero in on one atomic training session, you use `get_run`. This tool grabs every parameter and performance metric logged during that single run instance. It's how you check the exact state vectors or loss curves to figure out why it stalled out. **Locate production model versions:** Don't guess if your model is ready for deployment. You query the Global Model Registry using `search_registered_models`. This tells you what models are marked as Production or Staging, letting you track version deployments securely before they hit the main pipeline.

**View saved files and artifacts:** Every run saves some physical garbage—that’s called an artifact. To see those files, you call `list_artifacts` using a specific run ID. This lists every blob of data or file saved to disk that belongs to that model run. You can check the image graphs, metadata, or any other physical storage reference right there in the chat.

**How it works:** Just connect this server on Vinkius and give your agent access. Your AI client handles all the complex queries behind the scenes. When you ask a question—like, 'What were the parameters for the run that hit 92% accuracy last week?'—the agent uses these tools to pull the data directly from MLflow. You don't write SQL; you just talk shop and get answers.

## Tools

### search_experiments
Searches and lists details for every registered MLflow experiment in the system.

### get_experiment
Retrieves all configuration details for a specific MLflow Experiment by its unique ID.

### search_runs
Finds specific training runs across multiple experiments based on criteria like date or metric threshold.

### get_run
Pulls the metrics and parameters logged during one precise, atomic training run instance.

### search_registered_models
Queries the global Model Registry to find model names, versions, and their current deployment status (e.g., Production).

### list_artifacts
Lists all physical files (blobs) saved to disk that belong to a specific model run ID.

## Prompt Examples

**Prompt:** 
```
List all training runs for the 'Sentiment Analysis' experiment
```

**Response:** 
```
Retrieving training history… I've identified 12 runs for 'Sentiment Analysis'. The latest run 'bright-fox-123' achieved 92% accuracy with a 0.12 loss. Other recent runs show similar performance with varying hyperparameters. Would you like to see the parameter logs for the best run?
```

**Prompt:** 
```
What models are currently marked as 'Production' in the registry?
```

**Response:** 
```
Scanning model registry… I've identified 3 models in Production: 'Customer-Churn-Classifier' (v4), 'Product-Recommender-Engine' (v12), and 'Demand-Forecaster' (v2). I can provide the tracking URI or source run ID for any of these production models if you'd like.
```

**Prompt:** 
```
Show me the artifacts saved for run ID 'bright-fox-123'
```

**Response:** 
```
Retrieving artifacts for 'bright-fox-123'… I've identified the following files: 1) 'model.pkl' (Pickled model), 2) 'confusion_matrix.png' (Visualization), 3) 'feature_importance.csv', and 4) 'conda.yaml'. I can provide the source tracking path for these files securely.
```

## Capabilities

### Search for specific training runs
Find model performance metrics by searching across multiple experiments using the `search_runs` tool.

### Audit registered experiments and metadata
View all registered MLflow experiments and pull detailed configuration data using the `search_experiments` tool.

### Get metrics for a single run
Retrieve parameters and performance metrics associated with one specific atomic training run ID via `get_run`.

### Locate production model versions
Query the Global Model Registry to find models marked as Production or Staging using `search_registered_models`.

### View saved files and artifacts
List all physical storage artifacts associated with a specific run ID by calling `list_artifacts`.

## Use Cases

### Debugging model decay
The ops engineer notices the production accuracy dropped by 2%. They ask their agent to use `search_runs` to pull all runs from the last month, filtering for performance metrics below a certain threshold. The agent finds run 'xyz-456', and using `get_run`, pulls the parameter logs showing the exact hyperparameter that drifted.

### Verifying deployment source
The data scientist needs to know which model version is currently serving predictions. They use `search_registered_models` and confirm 'Customer-Churn-Classifier' v12 is marked Production. They then ask the agent to pull the source run ID from that registry entry, ensuring full traceability.

### Understanding project scope
A new team member needs context on all past research efforts. They use `search_experiments` and get a list of every experiment ever run—'Sentiment Analysis,' 'Image Segmentation v1,' etc.—allowing them to understand the full history without relying on tribal knowledge.

### Gathering model inputs
The ML engineer needs all source files from a successful run. They provide the Run ID and ask the agent to execute `list_artifacts`. The agent returns a list of every file, including the model blob (`model.pkl`) and the environment config (`conda.yaml`), ensuring nothing is missed.

## Benefits

- Pinpoint failure causes. Instead of manually checking dashboards, you ask your agent to run `search_runs` and get the specific metrics for the failed run ID, telling you exactly what parameters dropped off.
- Verify production readiness instantly. Use `search_registered_models` to see if a model is truly marked 'Production' or if it’s just sitting in an unverified state. This cuts down on deployment risk.
- Map out research branches easily. The `search_experiments` tool lets you list all project experiments, giving you a clear overview of the entire ML pipeline without clicking into every folder.
- Track model components. When you find a good run ID, use `list_artifacts` to get a manifest of every file saved—the pickled model, the confusion matrix, the config YAML. No more guessing what's in the directory.
- Deep dive on metrics. The `get_run` tool pulls raw parameters and performance metrics for a single run. You can feed this data directly into your agent for immediate analysis.

## How It Works

The bottom line is: you talk to your AI client, and it uses these tools to read the MLflow server for answers.

1. Subscribe to the MLflow server on Vinkius.
2. Input your unique MLflow Tracking URI and Tracking Token into the connection settings.
3. Ask your agent a question (e.g., 'What was the accuracy of v4 in the Production model?') and let it execute the required tools.

## Frequently Asked Questions

**How do I check if a model version was promoted correctly using search_registered_models?**
The `search_registered_models` tool lets you query the Global Model Registry. You simply ask for models marked 'Production,' and the agent confirms which versions are live, giving you immediate status validation.

**What metrics can I get from a single run using get_run?**
The `get_run` tool pulls all recorded parameters and performance metrics for that specific run ID. This includes loss curves, accuracy scores, and any custom scalar values logged during the session.

**Do I need to use list_artifacts if I just want the model file?**
Yes. While you know the model exists, `list_artifacts` provides a complete manifest of every physical asset—the model blob *and* any associated graphs or YAML files—ensuring you retrieve the whole package.

**Can I compare metrics across multiple experiments using search_runs?**
Yes, `search_runs` lets you query runs based on criteria that span multiple experiments. You can filter for 'all runs with loss < 0.1' to quickly compare performance trends system-wide.

**How do I use get_run to check the exact hyperparameters used for a specific model training session?**
You specify the Run ID when calling get_run. This function returns all logged parameters, including the precise hyperparameter values that defined that atomic run.

**What is the difference between search_experiments and list_artifacts in terms of scope?**
Search_experiments lists every registered experiment ID available. List_artifacts requires a specific Run ID to show files saved within it; they serve completely different tracking purposes.

**When should I use search_runs instead of get_run?**
Use search_runs when you need an overview—like finding all runs for a given experiment or date range. Use get_run only if you already have the exact, unique Run ID.

**Does list_artifacts show metadata alongside the actual model file blob?**
Yes, list_artifacts shows both the physical location and associated metadata for every saved item. You see which files are stored, plus details about those artifacts.

**Can I see the metrics for a specific training run through my agent?**
Yes. Use the `get_run` tool with a specific Run ID. Your agent will retrieve the detailed telemetry logged during that training session, including scalars like accuracy, loss, or any custom performance metrics you've defined.

**How do I check which models are ready for production in the registry?**
The `search_registered_models` tool allows your agent to query the global model registry. You can identify models that have been explicitly promoted to production or staging environments, helping you track deployment states across your project.

**Can my agent list the plots or model files saved in a specific run?**
Absolutely. Use the `list_artifacts` tool with a specific Run ID. Your agent will report all physical storage boundaries, including stored model blobs (e.g., .pkl, .h5) and saved image plots, ensuring you can locate critical training artifacts instantly.