Weights & Biases MCP for AI. Track model metrics and artifacts via chat.
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Weights & Biases lets you manage your entire machine learning lifecycle through chat. Track model experiments, monitor real-time training runs, and version control artifacts like datasets and trained models—all without leaving your AI client.
What your AI can do
Get run details
Retrieves the full metrics and configuration for one particular run ID.
List project artifacts
Lists all datasets, models, or files versioned within a project.
List wandb projects
Lists every single project folder associated with your account.
See every project folder within your WandB account to start browsing experiments.
Retrieve a list of individual experiment attempts, showing their status and basic details.
Fetch the full summary, including final accuracy, loss values, and hyperparameters for one specific training run.
List all versioned assets—like datasets or model checkpoints—associated with a given project.
View the progress and results of automated searches that test different combinations of settings.
Retrieve a list of saved, collaborative documents and dashboards for project review.
Ask an AI about this
Waiting for input…
Weights & Biases: 6 Tools for Experiment Tracking
Use these tools to list projects, track specific run metrics, monitor hyperparameter sweeps, and manage model artifacts.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Weights & Biases on VinkiusGet Run Details
Retrieves the full metrics and configuration for one particular run ID.
List Project Artifacts
Lists all datasets, models, or files versioned within a project.
List Wandb Projects
Lists every single project folder associated with your account.
List Project Reports
Fetches a list of saved, collaborative analysis documents for review.
List Project Runs
Gets a list of all individual training attempts within a specific project.
List Project Sweeps
Shows the progress and results of automated hyperparameter search tests.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Weights & Biases, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Weights & Biases. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 6 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
The painful way of checking ML performance history
Today, diagnosing a poor run requires an archaeological dig. You open the dashboard, click on Project A, find Run 42, and copy its hyperparameters into a spreadsheet. Then you have to manually jump over to the Artifacts tab to see which version of the dataset was used for that specific attempt. If you're comparing two runs, you do this entire process twice, copying six different sets of IDs just to confirm lineage.
With this MCP, all that manual clicking and copy-pasting disappears. You ask your agent a single question—for example, 'Compare the metrics between the last successful run and the one before it.' The answer is compiled instantly, providing both the performance data from `get_run_details` and confirming the related artifacts via `list_project_artifacts`. It's just conversation.
Get Model Metrics with get_run_details
Before, you had to navigate deep into a run's dedicated page, find the performance chart, and then scroll through the config panel just to grab the learning rate. It was slow work.
Now, tell your agent: 'Get the final accuracy and config for run ID X.' You get that specific data point delivered immediately in plain text. No clicking required; you just ask.
What your AI can actually do with this
You're running complex ML pipelines. You need to know if the latest change in hyperparameters actually hurt performance or if it was just a random fluctuation. This MCP connects directly to your Weights & Biases account, turning deep dashboard diving into simple conversation. Instead of manually filtering through dozens of runs and checking version numbers across separate tabs, you talk to your agent.
It finds the specific metrics—like final accuracy or loss curves—you need for any given run. You can also pull down all related artifacts, like the dataset version used or the model weights created, ensuring data lineage is always clear. The whole process stays secure; Vinkius ensures that every tool call generates a cryptographically signed audit trail, so you always know exactly what metrics flowed through and how your budget was spent.
It’s about getting actionable answers instantly, making your AI agent an actual ML research assistant.
019d761e-f403-7114-a2eb-cbfdb39ba9eb Here's how it actually works
The bottom line is you manage complex ML data by talking to an assistant, instead of navigating confusing web dashboards.
Subscribe to this MCP, then enter your WandB API key and base URL.
Your AI agent connects using that credential set. You can then ask it questions like, 'What was the accuracy for run X?'
The agent executes the necessary calls behind the scenes and delivers a summarized answer right back to your chat window.
Who is this actually for?
ML Engineers who get tired of clicking through 50 different dashboard tabs to compare two model versions. Data Scientists who need absolute proof of data lineage for publication-grade research.
Needs to quickly compare hyperparameters and summary metrics across dozens of runs to find the best performing model.
Must track which specific versioned dataset was used for training a model, linking it directly back to the final results.
Wants to monitor shared project sweeps and access saved reports without having to manually gather data from multiple sources.
What Changes When You Connect
Need to compare runs? You can use list_project_runs to get a list of all attempts, then use get_run_details on any specific run ID for its full metric summary—accuracy, loss, config. It keeps you from manually opening 50 tabs.
Data provenance is critical. If you need proof of what data trained your model, call list_project_artifacts. This shows every versioned dataset and model checkpoint associated with the project.
Automated search tracking used to mean checking a massive dashboard. Now, use list_project_sweeps to monitor hyperparameter optimization progress directly through chat.
You don't want to start from scratch every time. Use list_wandb_projects first to see all your work across different areas of research before diving into any single project.
Need a full historical picture? You can also use list_project_reports to pull up saved analysis and collaborative dashboards, linking documentation directly to the underlying results.
See it in action
Diagnosing performance regression
A user notices model accuracy dropped from 0.95 to 0.82. Instead of manually checking logs, they ask their agent to run list_project_runs for the project. They then use get_run_details on the pre-drop and post-drop runs side-by-side. The agent immediately points out a subtle change in the learning rate configured in the hyperparameters.
Reproducing old results
A scientist wants to reproduce a paper's findings. They ask their agent about the artifacts for the 'baseline-model' project, calling list_project_artifacts. The agent provides the exact version ID of the dataset and model weights needed, ensuring perfect reproducibility.
Reviewing team progress
A research lead needs to check on 10 different ongoing experiments. They use list_project_sweeps to see which automated searches are running and get a quick summary of optimization progress, without having to log into the platform's web UI.
Auditing project scope
A new team member joins and needs to know what projects exist. They simply ask the agent to call list_wandb_projects, getting a complete, current list of all work done by the team.
The honest tradeoffs
Checking data lineage manually
Copying model IDs from one tab and cross-referencing them with dataset versions on another page to confirm they match.
Use the agent. First, call list_project_artifacts to see all available assets. Then, use a single query to verify that specific artifacts were used in a run by referencing their names or version IDs.
Forgetting project scope
Trying to find metrics for 'Project Alpha' but getting confused because the account has 15 projects, and they don't know which one to start with.
Always start by calling list_wandb_projects first. This grounds your query and ensures you are only looking at runs within the correct scope.
Missing critical run context
Asking for 'the best performance' without knowing if that metric was measured after 50 epochs or 100. The answer is incomplete.
Use get_run_details to force the agent to provide specific metrics, like 'What was the final accuracy and loss on run ID X?' This forces concrete data rather than vague summaries.
When It Fits, When It Doesn't
Use this MCP if your job requires linking deep technical metadata: comparing runs, tracking dataset versions, or monitoring hyperparameter sweeps. You need verifiable metrics for reproducibility. Don't use it if you just want to send a quick message or update status—use a dedicated messaging MCP instead. If the data needed is simple (e.g., 'What are our team members doing today?'), this isn't necessary either; stick to communication tools. However, if your core process involves tracking model iterations across time and different datasets, this MCP is non-negotiable.
Questions you might have
How do I check my project list using the Weights & Biases MCP? +
You call list_wandb_projects. This gives you a clean, simple rundown of every single project folder within your account. It's the best place to start when you don't know where to look.
What does get_run_details do for my ML experiment? +
It pulls all the summary metrics and configuration details for a single run ID. This is essential if you need precise data points like loss curves or final hyperparameter values.
Can I use list_project_artifacts to see my datasets? +
Yes, list_project_artifacts shows all versioned items in a project. It's how you track data lineage—knowing exactly which dataset version trained your model.
How can I compare different training runs with this MCP? +
Start by using list_project_runs to get all run IDs, then use the get_run_details tool on each ID you want to compare. The agent summarizes these details for you.
How does using `list_project_sweeps` help me track automated hyperparameter searches? +
It lists all ongoing or completed optimization sweeps within a project. This lets you see how your model performed while automatically adjusting parameters like learning rate and batch size.
What is the purpose of using `list_project_reports` in my ML workflow? +
It gathers all saved analysis reports and dashboards created within a project. This feature helps research teams access pre-compiled, collaborative documentation about model performance.
If I need to know the exact parameters used for an experiment, how do I use `get_run_details`? +
The tool retrieves full run details, including the precise configuration and hyperparameters used when the training ran. This is crucial for reproducing results or debugging model behavior.
How can I track data lineage by using `list_project_artifacts`? +
It lists all versioned assets in a project, such as specific datasets and trained models. You can trace dependencies to ensure that every artifact you use is tied to its correct source version.
Can I check the latest metrics for a specific ML run? +
Yes. Using the get_run_details tool, your AI agent can pull the latest logged metrics (like accuracy or loss) and hyperparameters for any specific run ID within your projects.
Is it possible to list versioned datasets and models? +
Absolutely. The list_project_artifacts tool allows you to see all artifacts, including datasets and models, helping you track data lineage and versioning directly through conversation.
Can I monitor hyperparameter search sweeps via chat? +
Yes. Use the list_project_sweeps tool to monitor automated optimization tasks. Your agent will return a list of sweeps in the project so you can track progress without leaving your workspace.
Powerful workflows you can unlock today
We've already built the connector for Weights & Biases. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 6 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.