Comet ML MCP. Audit model metrics and project data in conversation.

Q: How do I use the listworkspaces tool with Comet ML MCP Server?

You start by asking your agent to run listworkspaces. This gives you the highest-level view of your ML organization, showing all major grouping spaces before you narrow down to a project.

Q: Can I list all experiments in a project using listexperiments?

Yes. You ask the agent to use listexperiments within a specific project. This gives you a structured array of all runs, allowing you to audit metadata and status for every experiment.

Q: Does getexperiment retrieve all logs?

No. getexperiment retrieves specific, explicit Cloud logging traces based on a Payload ID. You need the ID to get the logs, making it a deep-dive tool, not a general log getter.

Q: What if I don't know the project name for listprojects?

You should start by running listworkspaces. This helps you find the parent container for the project you're looking for, giving you the correct scope to narrow your search.

Q: How do I use listprojects to find all my datasets?

The listprojects tool finds all organizational containers for your ML work. You can filter the results by a specific date range or by the primary owner to narrow down your search.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Comet ML. This server lets your AI agent manage your entire machine learning lifecycle. You can list projects and workspaces, track model metrics, inspect run parameters, and audit specific experiments using natural language.

It connects your AI client directly to your ML research data.

What your AI agents can do

Get experiment

Retrieves explicit Cloud logging traces using a specific Payload ID.

Get experiment metrics

Executes a static mapping to pull defined numeric metrics from an experiment.

Get experiment params

Inspects internal properties, pulling details about API taxonomy types used in the experiment.

+ 3 more capabilities included

List all projects and workspaces

The agent finds the organizational containers for your ML work by listing all available projects and workspaces in your Comet ML account.

List and inspect experiments

You list all logged experiments within a specific project, getting metadata, tags, and live status details for each run.

Retrieve numeric performance metrics

The agent pulls precise, high-resolution numbers—like accuracy, loss, or F1 scores—that were logged during the training process.

Extract hyperparameters and parameters

You pull specific ML properties, such as the learning rate or optimizer used, directly from an experiment's logs.

Get detailed experiment logs

The agent retrieves specific, explicit log traces using a given Payload ID for deep debugging.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Comet ML MCP Server: 6 Tools for Experiment Tracking

Use these tools to list projects, retrieve metrics, and inspect parameters across your entire machine learning run history.

get019d7578

get experiment

Retrieves explicit Cloud logging traces using a specific Payload ID.

get019d7578

get experiment metrics

Executes a static mapping to pull defined numeric metrics from an experiment.

get019d7578

get experiment params

Inspects internal properties, pulling details about API taxonomy types used in the experiment.

list019d7578

list experiments

Discovers a list of logged experiments, structuring specific experiment limits for retrieval.

list019d7578

list projects

Performs structural extraction, matching target projects within your Comet ML account.

list019d7578

list workspaces

Identifies the top-level grouping spaces where your ML projects are stored.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Comet ML, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Comet ML Server - Track ML Experiments with AI

Your AI agent manages your whole machine learning lifecycle. Instead of logging into the web UI and clicking through dashboards, you just ask your agent to do the work. It connects your AI client directly to your ML research data, letting you control everything.

Finding Your Data

Your agent first finds the containers for your ML work. You can list all available projects and then identify the top-level grouping spaces, which are your workspaces. This lets you scope your research instantly.

Listing and Inspecting Runs

You can list all logged experiments within a specific project, getting metadata, tags, and live status details for every run. You can then check the performance metrics for a run using get_experiment_metrics, pulling precise numbers like accuracy, loss, or F1 scores logged during training. You can also inspect the internal properties of a run using get_experiment_params, pulling details about the API taxonomy types used.

Debugging and Auditing

For deep debugging, the agent retrieves specific, explicit log traces using a given Payload ID via get_experiment. You can audit training configurations by pulling specific ML properties, such as the learning rate or optimizer used, directly from an experiment's logs using get_experiment_params. You can also discover a list of logged experiments, structuring specific experiment limits for retrieval using list_experiments.

How It Works

Connect your Comet ML account to your AI client. Your agent then handles all the querying, pulling data points and configurations across your entire ML history. You just talk to your agent, and it gets the data you need.

How Comet ML MCP Works

1 Subscribe to the server and enter your Comet ML API Key. The key must be found in your Account Settings > API Keys.
2 Your AI client calls the list_projects or list_workspaces tool to establish the scope of the ML research.
3 You then follow up by calling list_experiments and get_experiment_metrics to pull the specific data you need, all through conversation.

The bottom line is you talk to your agent, and it executes the necessary API calls to pull ML data from Comet ML directly into your conversation window.

Who Is Comet ML MCP For?

The data scientist who needs to compare models across 50 runs without opening a browser. The ML engineer who needs to audit hyperparameters or check if a training job finished correctly. Anyone whose job involves tracking model performance and ensuring reproducibility in a fast-moving ML environment.

Data Scientist

Compares model metrics (e.g., accuracy, loss) across different experiments to determine the best model architecture or hyperparameter set.

ML Engineer

Verifies training run configurations, checks if the correct logging tags were applied, and audits the learning rates used in a specific experiment.

MLOps Engineer

Monitors the status of active model evaluations, lists all projects, and verifies experiment completion statuses to manage the overall ML pipeline.

What Changes When You Connect

See the full project hierarchy by using list_workspaces and list_projects. You instantly know where to look for your ML research, eliminating the need to manually browse folder structures.
Pinpoint exact performance issues by calling get_experiment_metrics. You retrieve high-precision numbers—like loss or AUC—for direct comparison across multiple runs.
Validate model reproducibility using get_experiment_params. You pull the exact hyperparameters (e.g., batch size, optimizer) that were used, ensuring your results are auditable.
Get a comprehensive view of your work using list_experiments. You list every run, allowing you to check metadata, tags, and current execution status without leaving your agent.
Deep-dive into specific logs with get_experiment. You retrieve explicit log traces by Payload ID, which is necessary for debugging complex, failed runs.
Manage scope instantly. You use list_projects to narrow down your search to a single domain, making it easy to compare experiments across related but separate research efforts.

Real-World Use Cases

Comparing model stability across branches

A data scientist needs to know which of five recent model versions performed best. They ask their agent to use list_projects first, then list_experiments for the 'Model-A' project. Finally, they call get_experiment_metrics for each run to compare the final validation loss and accuracy, identifying the most stable build.

Debugging a failing training job

An ML engineer finds a run with poor metrics. They ask the agent to use get_experiment with the specific run's Payload ID. The agent pulls the explicit Cloud logs, letting the engineer see exactly where the process failed—a critical step that avoids manually sifting through massive log files.

Auditing compliance parameters

An MLOps team member needs to prove that all experiments logged the required version number. They use get_experiment_params to pull the specific API taxonomy types and confirm that the 'version' parameter was correctly set for every run in the 'Production' project.

Organizing research in a new client

A new researcher needs to understand the scope of the entire organization's ML work. They ask the agent to run list_workspaces first, then list_projects. This gives them a complete map of all existing ML domains before they start their own work.

The Tradeoffs

Copying and pasting API calls

Manually opening the Comet ML web UI, finding the experiment ID, copying the metrics, switching to a spreadsheet, and then manually cross-referencing the parameters from a different tab.

→ Just talk to your agent. You ask it to use list_projects to scope the work, then list_experiments to find the target run, and finally get_experiment_metrics or get_experiment_params to pull the data directly into your chat window.

Ignoring the scope hierarchy

Trying to analyze metrics for a project when you don't know if it's nested under 'Research' or 'Staging', leading to ambiguous results and wasted time.

→ Always start by running list_workspaces to see the top level. Then use list_projects to identify the exact container before attempting to list any experiments.

Treating metrics and parameters as one thing

Assuming that just because you see a metric (like 'Accuracy') you know the exact hyperparameters (like 'learning rate') that generated it. These are stored separately.

→ Call get_experiment_metrics for the performance data, and then call get_experiment_params separately. This pulls the two distinct data sets—metrics and parameters—so you can correlate them precisely.

When It Fits, When It Doesn't

Use this server if your workflow requires auditing, comparison, or deep inspection of ML experiment data. Specifically, if you need to correlate 'What happened?' (metrics via get_experiment_metrics) with 'Why did it happen?' (hyperparameters via get_experiment_params), this is the tool. It's ideal for MLOps teams and data scientists who rely on reproducibility.

Don't use this if you simply need to run a new training job or upload artifacts. For that, you need the Comet ML platform itself. Also, if your goal is just to find a list of general cloud resources, a general cloud API tool is better, since this is specific to ML runs. If you need to manage user permissions, look for a dedicated identity management tool, as this only handles experiment data.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Comet ML. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_experiment get_experiment_metrics get_experiment_params list_experiments list_projects list_workspaces

Tracking ML performance used to mean clicking through dozens of tabs.

Today, to check a model's performance, you have to log into the web UI. You click on the project, then the workspace, then the specific experiment. You pull up the metrics dashboard, and if you want the parameters, you have to switch tabs and copy-paste the learning rate. It's a massive, tedious, copy-pasting headache just to get a clean comparison.

With the Comet ML MCP Server, you ask your agent for the data. You say, 'Give me the metrics and parameters for the top three runs.' The agent handles the navigation and data retrieval, presenting the raw, comparable numbers right in your chat. It cuts the manual workflow to zero clicks.

Comet ML MCP Server: Audit model metrics and project data in conversation.

Manual audits require multiple API calls or web UI navigations. You have to call the list endpoint, get the ID, and then call the metric endpoint with the ID. It's a brittle, multi-step process prone to human error.

Now, you talk to your agent. You state the goal—'I need to compare the loss between Model A and Model B.' The agent handles the whole sequence, making the necessary calls (`list_experiments`, `get_experiment_metrics`) and giving you the final, actionable comparison in one go. It's a single command, not a sequence of steps.

Common Questions About Comet ML MCP

How do I use the `list_workspaces` tool with Comet ML MCP Server? +

You start by asking your agent to run list_workspaces. This gives you the highest-level view of your ML organization, showing all major grouping spaces before you narrow down to a project.

What is the difference between `get_experiment_metrics` and `get_experiment_params`? +

Metrics track the performance results (e.g., accuracy, loss) during a run. Parameters track the input settings (e.g., learning rate, batch size). You use them to answer 'How well?' and 'With what settings?' respectively.

Can I list all experiments in a project using `list_experiments`? +

Yes. You ask the agent to use list_experiments within a specific project. This gives you a structured array of all runs, allowing you to audit metadata and status for every experiment.

Does `get_experiment` retrieve all logs? +

No. get_experiment retrieves specific, explicit Cloud logging traces based on a Payload ID. You need the ID to get the logs, making it a deep-dive tool, not a general log getter.

What if I don't know the project name for `list_projects`? +

You should start by running list_workspaces. This helps you find the parent container for the project you're looking for, giving you the correct scope to narrow your search.

How do I use `list_projects` to find all my datasets? +

The list_projects tool finds all organizational containers for your ML work. You can filter the results by a specific date range or by the primary owner to narrow down your search.

What inputs does `get_experiment_metrics` require? +

This tool requires the experiment ID and the specific metric name you want to track. It returns high-precision numeric values for that metric across the run's duration.

What happens if I try to list non-existent experiments using `list_experiments`? +

The server returns a clear error message detailing the invalid experiment ID. Your AI agent can then use that feedback to correct the ID and retry the request.

Can my agent retrieve real-time metrics from an active ML run? +

Yes. Use the 'get_experiment_metrics' tool with the experiment key. The agent will pull the latest numeric logged endpoints, allowing you to monitor loss, accuracy, and other custom metrics as they are generated.

How do I audit the parameters used in a specific experiment? +

Provide the experiment key to your agent. The 'get_experiment_params' tool extracts all logged ML properties, helping you verify hyperparameters like learning rates, batch sizes, and model architectures.

Can I see a list of all experiments within a specific project? +

Absolutely. Use the 'list_experiments' tool with the project ID. Your agent will surface all ML runs within that project, including their status and metadata, so you can quickly identify the results you need.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript