Comet ML MCP. Audit model metrics and project data in conversation.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Comet ML. This server lets your AI agent manage your entire machine learning lifecycle. You can list projects and workspaces, track model metrics, inspect run parameters, and audit specific experiments using natural language.
It connects your AI client directly to your ML research data.
What your AI agents can do
Get experiment
Retrieves explicit Cloud logging traces using a specific Payload ID.
Get experiment metrics
Executes a static mapping to pull defined numeric metrics from an experiment.
Get experiment params
Inspects internal properties, pulling details about API taxonomy types used in the experiment.
The agent finds the organizational containers for your ML work by listing all available projects and workspaces in your Comet ML account.
You list all logged experiments within a specific project, getting metadata, tags, and live status details for each run.
The agent pulls precise, high-resolution numbers—like accuracy, loss, or F1 scores—that were logged during the training process.
You pull specific ML properties, such as the learning rate or optimizer used, directly from an experiment's logs.
The agent retrieves specific, explicit log traces using a given Payload ID for deep debugging.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Comet ML MCP Server: 6 Tools for Experiment Tracking
Use these tools to list projects, retrieve metrics, and inspect parameters across your entire machine learning run history.
019d7578get experiment
Retrieves explicit Cloud logging traces using a specific Payload ID.
019d7578get experiment metrics
Executes a static mapping to pull defined numeric metrics from an experiment.
019d7578get experiment params
Inspects internal properties, pulling details about API taxonomy types used in the experiment.
019d7578list experiments
Discovers a list of logged experiments, structuring specific experiment limits for retrieval.
019d7578list projects
Performs structural extraction, matching target projects within your Comet ML account.
019d7578list workspaces
Identifies the top-level grouping spaces where your ML projects are stored.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Comet ML, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Comet ML Server - Track ML Experiments with AI
Your AI agent manages your whole machine learning lifecycle. Instead of logging into the web UI and clicking through dashboards, you just ask your agent to do the work. It connects your AI client directly to your ML research data, letting you control everything.
Finding Your Data
Your agent first finds the containers for your ML work. You can list all available projects and then identify the top-level grouping spaces, which are your workspaces. This lets you scope your research instantly.
Listing and Inspecting Runs
You can list all logged experiments within a specific project, getting metadata, tags, and live status details for every run. You can then check the performance metrics for a run using get_experiment_metrics, pulling precise numbers like accuracy, loss, or F1 scores logged during training. You can also inspect the internal properties of a run using get_experiment_params, pulling details about the API taxonomy types used.
Debugging and Auditing
For deep debugging, the agent retrieves specific, explicit log traces using a given Payload ID via get_experiment. You can audit training configurations by pulling specific ML properties, such as the learning rate or optimizer used, directly from an experiment's logs using get_experiment_params. You can also discover a list of logged experiments, structuring specific experiment limits for retrieval using list_experiments.
How It Works
Connect your Comet ML account to your AI client. Your agent then handles all the querying, pulling data points and configurations across your entire ML history. You just talk to your agent, and it gets the data you need.
How Comet ML MCP Works
- 1 Subscribe to the server and enter your Comet ML API Key. The key must be found in your Account Settings > API Keys.
- 2 Your AI client calls the
list_projectsorlist_workspacestool to establish the scope of the ML research. - 3 You then follow up by calling
list_experimentsandget_experiment_metricsto pull the specific data you need, all through conversation.
The bottom line is you talk to your agent, and it executes the necessary API calls to pull ML data from Comet ML directly into your conversation window.
Who Is Comet ML MCP For?
The data scientist who needs to compare models across 50 runs without opening a browser. The ML engineer who needs to audit hyperparameters or check if a training job finished correctly. Anyone whose job involves tracking model performance and ensuring reproducibility in a fast-moving ML environment.
Compares model metrics (e.g., accuracy, loss) across different experiments to determine the best model architecture or hyperparameter set.
Verifies training run configurations, checks if the correct logging tags were applied, and audits the learning rates used in a specific experiment.
Monitors the status of active model evaluations, lists all projects, and verifies experiment completion statuses to manage the overall ML pipeline.
What Changes When You Connect
- See the full project hierarchy by using
list_workspacesandlist_projects. You instantly know where to look for your ML research, eliminating the need to manually browse folder structures. - Pinpoint exact performance issues by calling
get_experiment_metrics. You retrieve high-precision numbers—like loss or AUC—for direct comparison across multiple runs. - Validate model reproducibility using
get_experiment_params. You pull the exact hyperparameters (e.g., batch size, optimizer) that were used, ensuring your results are auditable. - Get a comprehensive view of your work using
list_experiments. You list every run, allowing you to check metadata, tags, and current execution status without leaving your agent. - Deep-dive into specific logs with
get_experiment. You retrieve explicit log traces by Payload ID, which is necessary for debugging complex, failed runs. - Manage scope instantly. You use
list_projectsto narrow down your search to a single domain, making it easy to compare experiments across related but separate research efforts.
Real-World Use Cases
Comparing model stability across branches
A data scientist needs to know which of five recent model versions performed best. They ask their agent to use list_projects first, then list_experiments for the 'Model-A' project. Finally, they call get_experiment_metrics for each run to compare the final validation loss and accuracy, identifying the most stable build.
Debugging a failing training job
An ML engineer finds a run with poor metrics. They ask the agent to use get_experiment with the specific run's Payload ID. The agent pulls the explicit Cloud logs, letting the engineer see exactly where the process failed—a critical step that avoids manually sifting through massive log files.
Auditing compliance parameters
An MLOps team member needs to prove that all experiments logged the required version number. They use get_experiment_params to pull the specific API taxonomy types and confirm that the 'version' parameter was correctly set for every run in the 'Production' project.
Organizing research in a new client
A new researcher needs to understand the scope of the entire organization's ML work. They ask the agent to run list_workspaces first, then list_projects. This gives them a complete map of all existing ML domains before they start their own work.
The Tradeoffs
Copying and pasting API calls
Manually opening the Comet ML web UI, finding the experiment ID, copying the metrics, switching to a spreadsheet, and then manually cross-referencing the parameters from a different tab.
→
Just talk to your agent. You ask it to use list_projects to scope the work, then list_experiments to find the target run, and finally get_experiment_metrics or get_experiment_params to pull the data directly into your chat window.
Ignoring the scope hierarchy
Trying to analyze metrics for a project when you don't know if it's nested under 'Research' or 'Staging', leading to ambiguous results and wasted time.
→
Always start by running list_workspaces to see the top level. Then use list_projects to identify the exact container before attempting to list any experiments.
Treating metrics and parameters as one thing
Assuming that just because you see a metric (like 'Accuracy') you know the exact hyperparameters (like 'learning rate') that generated it. These are stored separately.
→
Call get_experiment_metrics for the performance data, and then call get_experiment_params separately. This pulls the two distinct data sets—metrics and parameters—so you can correlate them precisely.
When It Fits, When It Doesn't
Use this server if your workflow requires auditing, comparison, or deep inspection of ML experiment data. Specifically, if you need to correlate 'What happened?' (metrics via get_experiment_metrics) with 'Why did it happen?' (hyperparameters via get_experiment_params), this is the tool. It's ideal for MLOps teams and data scientists who rely on reproducibility.
Don't use this if you simply need to run a new training job or upload artifacts. For that, you need the Comet ML platform itself. Also, if your goal is just to find a list of general cloud resources, a general cloud API tool is better, since this is specific to ML runs. If you need to manage user permissions, look for a dedicated identity management tool, as this only handles experiment data.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Comet ML. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Tracking ML performance used to mean clicking through dozens of tabs.
Today, to check a model's performance, you have to log into the web UI. You click on the project, then the workspace, then the specific experiment. You pull up the metrics dashboard, and if you want the parameters, you have to switch tabs and copy-paste the learning rate. It's a massive, tedious, copy-pasting headache just to get a clean comparison.
With the Comet ML MCP Server, you ask your agent for the data. You say, 'Give me the metrics and parameters for the top three runs.' The agent handles the navigation and data retrieval, presenting the raw, comparable numbers right in your chat. It cuts the manual workflow to zero clicks.
Comet ML MCP Server: Audit model metrics and project data in conversation.
Manual audits require multiple API calls or web UI navigations. You have to call the list endpoint, get the ID, and then call the metric endpoint with the ID. It's a brittle, multi-step process prone to human error.
Now, you talk to your agent. You state the goal—'I need to compare the loss between Model A and Model B.' The agent handles the whole sequence, making the necessary calls (`list_experiments`, `get_experiment_metrics`) and giving you the final, actionable comparison in one go. It's a single command, not a sequence of steps.
Common Questions About Comet ML MCP
How do I use the `list_workspaces` tool with Comet ML MCP Server? +
You start by asking your agent to run list_workspaces. This gives you the highest-level view of your ML organization, showing all major grouping spaces before you narrow down to a project.
What is the difference between `get_experiment_metrics` and `get_experiment_params`? +
Metrics track the performance results (e.g., accuracy, loss) during a run. Parameters track the input settings (e.g., learning rate, batch size). You use them to answer 'How well?' and 'With what settings?' respectively.
Can I list all experiments in a project using `list_experiments`? +
Yes. You ask the agent to use list_experiments within a specific project. This gives you a structured array of all runs, allowing you to audit metadata and status for every experiment.
Does `get_experiment` retrieve all logs? +
No. get_experiment retrieves specific, explicit Cloud logging traces based on a Payload ID. You need the ID to get the logs, making it a deep-dive tool, not a general log getter.
What if I don't know the project name for `list_projects`? +
You should start by running list_workspaces. This helps you find the parent container for the project you're looking for, giving you the correct scope to narrow your search.
How do I use `list_projects` to find all my datasets? +
The list_projects tool finds all organizational containers for your ML work. You can filter the results by a specific date range or by the primary owner to narrow down your search.
What inputs does `get_experiment_metrics` require? +
This tool requires the experiment ID and the specific metric name you want to track. It returns high-precision numeric values for that metric across the run's duration.
What happens if I try to list non-existent experiments using `list_experiments`? +
The server returns a clear error message detailing the invalid experiment ID. Your AI agent can then use that feedback to correct the ID and retry the request.
Can my agent retrieve real-time metrics from an active ML run? +
Yes. Use the 'get_experiment_metrics' tool with the experiment key. The agent will pull the latest numeric logged endpoints, allowing you to monitor loss, accuracy, and other custom metrics as they are generated.
How do I audit the parameters used in a specific experiment? +
Provide the experiment key to your agent. The 'get_experiment_params' tool extracts all logged ML properties, helping you verify hyperparameters like learning rates, batch sizes, and model architectures.
Can I see a list of all experiments within a specific project? +
Absolutely. Use the 'list_experiments' tool with the project ID. Your agent will surface all ML runs within that project, including their status and metadata, so you can quickly identify the results you need.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
PlanetScale
Provision, branch, and manage serverless MySQL databases dynamically via AI.
Temporal
Monitor and manage distributed workflows in Temporal Cloud natively via your AI agent.
Trigger.dev
Equip your AI agent with direct access to Trigger.dev — manage background jobs, monitor task runs, and inspect workflow executions without opening the dashboard.
You might also like
AppLovin
Manage your AppLovin and MAX advertising performance — track revenue, impressions, and campaigns via AI.
Kissflow
Build low-code workflows, process apps, and case management solutions that digitize operations without heavy IT involvement.
Foxentry
Validate and autocomplete addresses, emails, and phone numbers in forms to eliminate bad data before it enters your systems.