# Weights & Biases MCP

> Weights & Biases lets you manage your entire machine learning lifecycle through chat. Track model experiments, monitor real-time training runs, and version control artifacts like datasets and trained models—all without leaving your AI client.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** machine-learning, experiment-tracking, mlops, model-versioning, data-artifacts, model-monitoring

## Description

You're running complex ML pipelines. You need to know if the latest change in hyperparameters actually hurt performance or if it was just a random fluctuation. This MCP connects directly to your Weights & Biases account, turning deep dashboard diving into simple conversation. Instead of manually filtering through dozens of runs and checking version numbers across separate tabs, you talk to your agent. It finds the specific metrics—like final accuracy or loss curves—you need for any given run. You can also pull down all related artifacts, like the dataset version used or the model weights created, ensuring data lineage is always clear. The whole process stays secure; Vinkius ensures that every tool call generates a cryptographically signed audit trail, so you always know exactly what metrics flowed through and how your budget was spent. It’s about getting actionable answers instantly, making your AI agent an actual ML research assistant.

## Tools

### get_run_details
Retrieves the full metrics and configuration for one particular run ID.

### list_project_artifacts
Lists all datasets, models, or files versioned within a project.

### list_wandb_projects
Lists every single project folder associated with your account.

### list_project_reports
Fetches a list of saved, collaborative analysis documents for review.

### list_project_runs
Gets a list of all individual training attempts within a specific project.

### list_project_sweeps
Shows the progress and results of automated hyperparameter search tests.

## Prompt Examples

**Prompt:** 
```
List all runs in my 'transformer-nmt' project for entity 'ai-team'.
```

**Response:** 
```
I found 5 runs in 'transformer-nmt': 'vibrant-sweep-1' (Running), 'crispy-forest-12' (Finished), 'solar-wind-15' (Crashed), and 2 others. Would you like the detailed summary for any of these?
```

**Prompt:** 
```
Get the final accuracy and config for run ID 'vibrant-sweep-1'.
```

**Response:** 
```
Run 'vibrant-sweep-1' summary: accuracy = 0.942, loss = 0.156. Config: learning_rate = 0.001, batch_size = 32, optimizer = 'adam'. It finished 2 hours ago after 50 epochs.
```

**Prompt:** 
```
What artifacts are available in the 'resnet-training' project?
```

**Response:** 
```
In project 'resnet-training', I found: 1. 'imagenet-subset' (Dataset, v3), 2. 'resnet50-weights' (Model, v5), and 3. 'training-logs' (Artifact, v1). Would you like to see the versions or metadata for these?
```

## Capabilities

### List all projects
See every project folder within your WandB account to start browsing experiments.

### Track specific runs
Retrieve a list of individual experiment attempts, showing their status and basic details.

### Get run metrics
Fetch the full summary, including final accuracy, loss values, and hyperparameters for one specific training run.

### Find project artifacts
List all versioned assets—like datasets or model checkpoints—associated with a given project.

### Monitor hyperparameter sweeps
View the progress and results of automated searches that test different combinations of settings.

### Access analysis reports
Retrieve a list of saved, collaborative documents and dashboards for project review.

## Use Cases

### Diagnosing performance regression
A user notices model accuracy dropped from 0.95 to 0.82. Instead of manually checking logs, they ask their agent to run `list_project_runs` for the project. They then use `get_run_details` on the pre-drop and post-drop runs side-by-side. The agent immediately points out a subtle change in the learning rate configured in the hyperparameters.

### Reproducing old results
A scientist wants to reproduce a paper's findings. They ask their agent about the artifacts for the 'baseline-model' project, calling `list_project_artifacts`. The agent provides the exact version ID of the dataset and model weights needed, ensuring perfect reproducibility.

### Reviewing team progress
A research lead needs to check on 10 different ongoing experiments. They use `list_project_sweeps` to see which automated searches are running and get a quick summary of optimization progress, without having to log into the platform's web UI.

### Auditing project scope
A new team member joins and needs to know what projects exist. They simply ask the agent to call `list_wandb_projects`, getting a complete, current list of all work done by the team.

## Benefits

- Need to compare runs? You can use `list_project_runs` to get a list of all attempts, then use `get_run_details` on any specific run ID for its full metric summary—accuracy, loss, config. It keeps you from manually opening 50 tabs.
- Data provenance is critical. If you need proof of what data trained your model, call `list_project_artifacts`. This shows every versioned dataset and model checkpoint associated with the project.
- Automated search tracking used to mean checking a massive dashboard. Now, use `list_project_sweeps` to monitor hyperparameter optimization progress directly through chat.
- You don't want to start from scratch every time. Use `list_wandb_projects` first to see all your work across different areas of research before diving into any single project.
- Need a full historical picture? You can also use `list_project_reports` to pull up saved analysis and collaborative dashboards, linking documentation directly to the underlying results.

## How It Works

The bottom line is you manage complex ML data by talking to an assistant, instead of navigating confusing web dashboards.

1. Subscribe to this MCP, then enter your WandB API key and base URL.
2. Your AI agent connects using that credential set. You can then ask it questions like, 'What was the accuracy for run X?'
3. The agent executes the necessary calls behind the scenes and delivers a summarized answer right back to your chat window.

## Frequently Asked Questions

**How do I check my project list using the Weights & Biases MCP?**
You call `list_wandb_projects`. This gives you a clean, simple rundown of every single project folder within your account. It's the best place to start when you don't know where to look.

**What does get_run_details do for my ML experiment?**
It pulls all the summary metrics and configuration details for a single run ID. This is essential if you need precise data points like loss curves or final hyperparameter values.

**Can I use list_project_artifacts to see my datasets?**
Yes, `list_project_artifacts` shows all versioned items in a project. It's how you track data lineage—knowing exactly which dataset version trained your model.

**How can I compare different training runs with this MCP?**
Start by using `list_project_runs` to get all run IDs, then use the `get_run_details` tool on each ID you want to compare. The agent summarizes these details for you.

**How does using `list_project_sweeps` help me track automated hyperparameter searches?**
It lists all ongoing or completed optimization sweeps within a project. This lets you see how your model performed while automatically adjusting parameters like learning rate and batch size.

**What is the purpose of using `list_project_reports` in my ML workflow?**
It gathers all saved analysis reports and dashboards created within a project. This feature helps research teams access pre-compiled, collaborative documentation about model performance.

**If I need to know the exact parameters used for an experiment, how do I use `get_run_details`?**
The tool retrieves full run details, including the precise configuration and hyperparameters used when the training ran. This is crucial for reproducing results or debugging model behavior.

**How can I track data lineage by using `list_project_artifacts`?**
It lists all versioned assets in a project, such as specific datasets and trained models. You can trace dependencies to ensure that every artifact you use is tied to its correct source version.

**Can I check the latest metrics for a specific ML run?**
Yes. Using the `get_run_details` tool, your AI agent can pull the latest logged metrics (like accuracy or loss) and hyperparameters for any specific run ID within your projects.

**Is it possible to list versioned datasets and models?**
Absolutely. The `list_project_artifacts` tool allows you to see all artifacts, including datasets and models, helping you track data lineage and versioning directly through conversation.

**Can I monitor hyperparameter search sweeps via chat?**
Yes. Use the `list_project_sweeps` tool to monitor automated optimization tasks. Your agent will return a list of sweeps in the project so you can track progress without leaving your workspace.