# Comet ML MCP

> Comet ML connects your agent directly to your machine learning research data. You can audit model performance, check specific run parameters, and navigate complex project structures—all by talking to your AI client. Stop leaving the chat window; keep your entire MLOps workflow running right where you are.

## Overview
- **Category:** ship-it
- **Price:** Free
- **Tags:** mlops, experiment-tracking, model-evaluation, llm-monitoring, model-lifecycle, data-science

## Description

Managing an ML experiment used to mean jumping between a dashboard, a terminal, and a spreadsheet just to track one metric. This MCP lets you take full control of that lifecycle conversationally. You can ask your AI client for performance data across different runs or pull out specific hyperparameters that were used during training without ever leaving the chat window. It's designed for deep analysis: listing every project in an organization, finding all associated workspaces, and then pulling detailed metrics for any single run you need to audit. When you connect it via Vinkius Marketplace, your agent gains instant access to this whole catalog of ML data tools, making complex audits as simple as asking a question.

## Tools

### list_workspaces
Finds smaller, grouped sections of experiments within a larger project area.

### list_projects
Identifies the primary organizational buckets where your ML research lives inside Comet.

### list_experiments
Discovers an array of all logged experiments within a specified workspace or project.

### get_experiment
Retrieves detailed information about a specific model run using its unique ID.

### get_experiment_metrics
Calculates and returns time-series data for defined numeric metrics, like loss or precision.

### get_experiment_params
Inspects the specific hyperparameters—like learning rates—that were used to train a model.

## Prompt Examples

**Prompt:** 
```
List all projects in workspace 'research-team'
```

**Response:** 
```
I found 4 projects in 'research-team': 'NLP-LLM-v2', 'Computer-Vision-Edge', 'Tabular-AutoML', and 'Staging-Tests'. Which one would you like to explore?
```

**Prompt:** 
```
Get current metrics for experiment 'exp_abc123'
```

**Response:** 
```
Retrieving metrics for 'exp_abc123'... Current Accuracy: 0.945, Loss: 0.12, Epoch: 45. The run is still active and performance is trending upwards.
```

**Prompt:** 
```
What hyperparameters were used in experiment 'exp_789'?
```

**Response:** 
```
Experiment 'exp_789' used: learning_rate: 0.001, batch_size: 32, optimizer: 'adam', and model_architecture: 'resnet50'. I have the full list of params if you need them.
```

## Capabilities

### Audit Model Run Performance
Pull high-precision numerical metrics—like accuracy or loss—that were generated during the training cycle.

### Inspect Training Configurations
Extract explicit ML properties, such as batch size and learning rates, used for a specific model run.

### Map Project Hierarchy
Navigate the entire organizational structure by listing available projects and workspaces within Comet ML.

### Review Experiment Metadata
List and review details about specific model runs, including performance tags and status updates.

## Use Cases

### Identifying the source of model drift
A data scientist notices their production model performance dropped last week. They use the agent to call `list_experiments` for that time window, narrowing down the failing run ID. Then they call `get_experiment_metrics` on that specific ID to pull loss curves and pinpoint exactly when the performance started degrading.

### Verifying a competitor's claimed baseline
An ML Engineer needs to replicate a reported benchmark. They use the MCP to call `list_projects` to find the correct research area, then check specific configuration details using `get_experiment_params` to ensure they are matching the exact batch size and optimizer used.

### Organizing massive project data
An MLOps team is onboarding a new researcher. They ask the agent, 'Show me all research areas for the Q3 rollout.' The MCP first calls `list_projects` and then uses `list_workspaces` to provide a complete map of where all related experiments are stored.

### Debugging unexpected run failures
A researcher runs an experiment that times out. They use the agent's capability to get the full experiment details via `get_experiment`, reviewing the logs and structural configurations to understand why the job failed before rewriting the code.

## Benefits

- You don't need to open the web UI. By using this MCP, you can list all projects with `list_projects` and immediately scope your audit within your chat client.
- Debugging a failed run is faster than ever. Instead of guessing what went wrong, ask for parameters, and use `get_experiment_params` to instantly check the exact learning rates used.
- Comparing model performance across multiple runs? Use `list_experiments` first to see all trials, then call `get_experiment_metrics` on each one to get clean data points for comparison.
- Navigating massive ML research portfolios is simple. You can scope down your search by calling `list_workspaces`, which narrows the focus from an entire project.
- Real-time monitoring becomes conversational. When you need to know if a long-running job is done, just ask about its status, and the MCP handles the heavy lifting.

## How It Works

The bottom line is that it turns complex, multi-step data retrieval into a single conversation.

1. Subscribe to this MCP on Vinkius and enter your Comet ML API Key (you'll find this in the platform’s Account Settings).
2. Your AI client uses the connection to access the data structure, allowing you to query specific organizational boundaries like projects or workspaces.
3. You ask a question—for instance, 'What were the metrics for experiment X?'—and your agent executes the necessary calls and returns clean, structured answers.

## Frequently Asked Questions

**How do I find all the metrics for an experiment using get_experiment_metrics?**
You must specify the exact experiment ID you want to audit. Then, ask your agent to execute `get_experiment_metrics` on that ID, and it will return the performance data over time.

**Do I need to list_projects before listing_workspaces?**
Yes. The hierarchy works top-down. You use `list_projects` first to define the main organizational area, and then you can call `list_workspaces` within that project's scope.

**Can I check what hyperparameters were used for a model?**
Absolutely. Just ask your agent to use `get_experiment_params`. It will pull the explicit ML properties, like the learning rate and optimizer, that defined that specific run.

**What is the difference between list_experiments and get_experiment?**
`list_experiments` shows you an array of many runs in a workspace. `get_experiment` lets you drill down to pull all the detailed data from one single, specific run.

**How do I confirm my API key is active using list_workspaces?**
You run `list_workspaces`. The tool validates your credentials by returning a structured array of top-level organizational spaces. This confirms the connection works before you query specific projects or experiments.

**What happens if I use an invalid ID with get_experiment?**
The call returns a precise API error message stating that the payload ID does not exist. Your agent passes this failure response directly to your client, letting you know exactly which experiment needs fixing.

**Can I limit the results when running list_experiments?**
Yes, you pass specific filtering parameters to `list_experiments`. You can specify criteria like date ranges or status codes, so your agent only returns the exact experiment IDs relevant to your task.

**Does get_experiment provide access to raw log traces?**
Yes, this tool retrieves detailed cloud logging traces associated with a specific payload ID. This lets your agent analyze low-level system events that aren't summarized in the standard metrics.

**Can my agent retrieve real-time metrics from an active ML run?**
Yes. Use the 'get_experiment_metrics' tool with the experiment key. The agent will pull the latest numeric logged endpoints, allowing you to monitor loss, accuracy, and other custom metrics as they are generated.

**How do I audit the parameters used in a specific experiment?**
Provide the experiment key to your agent. The 'get_experiment_params' tool extracts all logged ML properties, helping you verify hyperparameters like learning rates, batch sizes, and model architectures.

**Can I see a list of all experiments within a specific project?**
Absolutely. Use the 'list_experiments' tool with the project ID. Your agent will surface all ML runs within that project, including their status and metadata, so you can quickly identify the results you need.