Weights & Biases MCP. Analyze ML runs and artifacts via natural chat.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Weights & Biases MCP Server lets you manage complex ML experiments through natural conversation. Instead of manually clicking through dashboards, your AI agent talks to your WandB account to list projects, monitor live runs, check hyperparameter sweeps, and pull full metrics for specific model versions (artifacts).
It turns tedious dashboard navigation into direct chat queries.
What your AI agents can do
Get run details
It retrieves all metrics, loss values, and configuration parameters for a single specific experiment run ID.
List project artifacts
This tool lists every versioned asset—like datasets or model weights—available in a given project.
List project reports
It finds and lists all saved analysis reports and dashboards tied to a specific project.
It lists every project you have set up in WandB under your user or team account.
You can get a list of individual model runs within a specific project, showing their status (running, finished, crashed).
It pulls the full summary data for any single run ID, including final accuracy, loss values, and the exact hyperparameters used.
This tool lists versioned datasets, trained model weights, or custom files associated with a specific project.
It tracks automated optimization runs (sweeps), showing which combinations of parameters are currently being tested.
You can fetch a list of analysis reports or collaboration dashboards you've saved within WandB.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Weights & Biases: 6 Tools for ML Ops
These tools let your AI agent interact with every core feature of WandB—from listing projects to pulling deep run metrics.
019d761eget run details
It retrieves all metrics, loss values, and configuration parameters for a single specific experiment run ID.
019d761elist project artifacts
This tool lists every versioned asset—like datasets or model weights—available in a given project.
019d761elist project reports
It finds and lists all saved analysis reports and dashboards tied to a specific project.
019d761elist project runs
This tool lists every completed or active experiment run within an entire project.
019d761elist project sweeps
It monitors automated hyperparameter search runs, showing progress and status for optimization sweeps.
019d761elist wandb projects
This function lists every project you have created under your main Weights & Biases account.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Weights & Biases, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Forget clicking through a dozen dashboards just to check one loss value or find that specific dataset version. This MCP Server connects your AI agent straight into your Weights & Biases account. Your agent handles the messy dashboard navigation for you. You simply chat with it, and it runs the necessary queries against all your live model data.
You can use list_wandb_projects to list every single project you've set up under your main WandB account. This gives you a complete overview of everything you’re tracking across teams or personal accounts. From there, you'll get the ability to view all experiment runs for any given project using list_project_runs. It lists individual model attempts—whether they finished cleanly, crashed, or are still running right now.
If you need the nitty-gritty on a specific attempt, the get_run_details tool pulls the full summary data for any single run ID. This includes final accuracy metrics, loss values, and even the exact set of hyperparameters that were used. It’s like getting the whole cheat sheet for one model version.
Need to keep an eye on automated tuning? You can track optimization runs using list_project_sweeps. This tool monitors hyperparameter searches, showing you which parameter combinations are currently being tested as part of your sweep process. It keeps tabs on how far along those automated tests are going.
When it comes to the actual materials, you'll use list_project_artifacts to list all versioned assets tied to a specific project. That means you can pull up every dataset version, trained model weight file, or custom piece of data associated with that work. On top of that, if you saved any analysis reports or collaboration dashboards, the list_project_reports tool finds and lists those saved items for the project.
This setup means your agent doesn't just read a single number; it builds a narrative around your entire ML lifecycle—from listing all available projects to checking specific runs, monitoring sweeps, pulling artifact details, and finding saved reports. It takes what used to be tedious dashboard work and turns it into direct chat queries.
How Weights & Biases MCP Works
- 1 Subscribe to this server, then input your private WandB API Key and optional Base URL.
- 2 Your AI client authenticates with the server. You tell your agent what you need: e.g., 'What ran in my project last week?'
- 3 The agent selects the right tool (like
list_project_runs), calls it, gets the data, and then uses that information to answer you conversationally.
The bottom line is, your AI client acts as an API wrapper for WandB. You talk like a human; the server talks machine code, and gives you clean answers.
Who Is Weights & Biases MCP For?
ML Engineers who are tired of clicking through dozens of dashboards just to compare loss values across ten runs. Data Scientists who need guaranteed data lineage before publishing a model. Research Leads needing a single pane of glass for shared project results.
Uses this to quickly check the status of parallel training jobs using list_project_runs, or to pull configuration details from get_run_details when debugging a failure.
Runs checks on data lineage by calling list_project_artifacts to ensure the correct, versioned dataset was used for training. Keeps reproducibility simple.
Uses this to track overall optimization progress across multiple team members by monitoring automated sweeps using list_project_sweeps, and reviewing shared reports via list_project_reports.
What Changes When You Connect
- Stop clicking through dashboards. Instead of manually filtering a dashboard to find the loss value from run 'X', you just ask your agent, and it calls
get_run_detailsdirectly. You get the exact metrics instantly. - Track data lineage easily. Need to know which version of the dataset was used? Use
list_project_artifacts. It shows every saved model weight and dataset version linked to the project. - Monitor optimization progress in chat. Instead of opening a separate 'Sweeps' tab, use
list_project_sweepsto get status updates on automated hyperparameter searches right in your conversation thread. - Get an overview without digging deep. Use
list_wandb_projectsfirst. It lists every project you manage, giving you the scope of work before you even start asking detailed questions about runs. - Centralized reporting access. Never lose a key insight again.
list_project_reportslets your agent pull up saved collaboration dashboards and analysis reports instantly.
Real-World Use Cases
Debugging a failed training run
A model fails during testing, but you don't know why. You ask the agent: 'What were the hyperparameters for the last run in my project?' The agent runs list_project_runs to find the ID, then uses get_run_details to pull the full config and loss metrics, letting you pinpoint if it was a bad learning rate or an unexpected crash.
Checking for data changes
The model suddenly performs worse. You ask: 'Did we update our input dataset?' The agent uses list_project_artifacts to list all available datasets in the project, letting you compare the version numbers and confirm if a new, unapproved artifact was introduced.
Reviewing team progress
It's time for a weekly review. You ask: 'Show me the status of all active model sweeps.' The agent calls list_project_sweeps, giving you a quick summary of optimization progress across multiple parallel runs, saving hours of manual checking.
Starting a new project
You're starting fresh. You ask: 'What projects do I have?' The agent uses list_wandb_projects to provide an immediate list of your work areas, helping you scope out where the data lives.
The Tradeoffs
Trying to compare runs manually
Opening 10 different tabs in WandB's dashboard and copy/pasting loss values into a spreadsheet just to see which run was best. It's slow, error-prone, and tedious.
→
Instead, ask the agent to use list_project_runs to get all IDs, then tell it: 'Give me the summary for those five runs.' The agent handles the batch data retrieval using multiple tool calls.
Forgetting artifact versions
Thinking that just because a project exists means you have the right model weights. You might end up training on an old, deprecated dataset.
→
Always check list_project_artifacts first. It forces you to confront the versioning system and ensures your agent only uses the explicitly approved model or dataset.
Asking for vague summaries
Simply asking 'How is the model doing?' The agent can't guess what metrics matter right now, leading to generic, unhelpful responses.
→
Be specific. Use get_run_details by giving a run ID and demanding: 'What was the final accuracy AND what were the hyperparameters used?'
When It Fits, When It Doesn't
You should use this server if your workflow involves constant monitoring, detailed comparison of experimental runs, or tracking versioned assets (datasets/models). If you need to know why a model performed a certain way—was it the learning rate? Was it the dataset version?—this is essential. The strength here is turning complex, multi-step dashboard filtering into simple conversation.
Don't use this if your only goal is basic data logging or simply reading static documentation. If you just need to read text reports and don't care about run metrics, a standard API call might suffice. But if the core of your job revolves around ML Ops—comparing loss curves, checking artifact integrity via list_project_artifacts, or monitoring hyperparameter sweeps with list_project_sweeps—this is required.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Weights & Biases. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Dashboard Overload: Finding a single metric shouldn't take ten clicks.
Right now, checking model performance means navigating to the 'Runs' tab, finding the ID you need, clicking into it, then scrolling down through epochs and metrics. If you want to compare that same loss value across five different runs, you copy the number, paste it in a spreadsheet, and do it again for four more times. It’s slow, tedious data entry.
With this MCP server, your agent does all the clicking. You tell it: 'Compare the final accuracy of these three models.' The agent uses `list_project_runs` to gather the IDs, then fires off multiple calls using `get_run_details`. It presents you with a clean comparison table right in the chat. Done.
Weights & Biases MCP Server: Get artifact details from chat.
Manually tracking datasets and model weights is hellish. You have to remember which version was used for which training run, cross-referencing multiple tabs and folders just to confirm data integrity. It's a nightmare of manual record-keeping.
Now, you simply ask your agent: 'What artifacts are available in the resnet project?' The agent calls `list_project_artifacts` and gives you an immediate list with version numbers, dataset types, and model tags. You know exactly what data is where.
Common Questions About Weights & Biases MCP
How do I check all my projects using the Weights & Biases MCP Server? +
Run list_wandb_projects. This tool immediately lists every project you have set up under your account or team. It’s the fastest way to see what work areas are available.
Can I get the full config for a run using get_run_details? +
Yes, that's exactly what get_run_details does. You provide the specific run ID, and it returns all metrics, including the learning rate, batch size, and optimizer used in that specific experiment.
What is list_project_artifacts for? +
list_project_artifacts tracks data lineage. It lists every versioned asset—like a model or dataset—associated with your project so you never lose track of what was used when.
Does list_project_sweeps show me optimization progress? +
Yes, list_project_sweeps monitors automated hyperparameter searches. It shows the current status and progress of those sweeps in a single view, so you don't have to check multiple runs individually.
How do I find reports using list_project_reports? +
list_project_reports retrieves all saved analysis dashboards. It gives you a catalog of your team's documented insights, making collaboration easier than digging through emails or shared drives.
What happens if my W&B API key doesn't have the right scope for list_project_artifacts? +
The server will return an authorization error. You must ensure your API key has read access to the specific project or entity you are trying to inspect. Check your W&B settings to confirm permissions before running any artifact listing tools.
Can I filter results when using list_project_runs? +
Yes, you can pass filters for status and time ranges. For example, asking for 'failed' runs or only those completed after a certain date helps narrow down the output immediately.
Does get_run_details provide live performance metrics? +
No, it provides a snapshot of the run's details at the time you call it. While it includes final summary metrics like accuracy and loss, if the run is still active, you will see its current state rather than real-time streaming data.
Can I check the latest metrics for a specific ML run? +
Yes. Using the get_run_details tool, your AI agent can pull the latest logged metrics (like accuracy or loss) and hyperparameters for any specific run ID within your projects.
Is it possible to list versioned datasets and models? +
Absolutely. The list_project_artifacts tool allows you to see all artifacts, including datasets and models, helping you track data lineage and versioning directly through conversation.
Can I monitor hyperparameter search sweeps via chat? +
Yes. Use the list_project_sweeps tool to monitor automated optimization tasks. Your agent will return a list of sweeps in the project so you can track progress without leaving your workspace.
Multi-server workflows that include Weights & Biases MCP
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
H2O.ai
Manage AI models via H2O.ai — track data frames, monitor machine learning models and training jobs, and audit cloud cluster status directly from any AI agent.
Zilliz Cloud
Manage vector collections and perform similarity searches via Zilliz Cloud.
Ideogram
Generate stunning images from text prompts with an AI model that excels at typography, logos, and photorealistic compositions.
You might also like
Cloudflare Tunnel
Manage Cloudflare Tunnels directly from your AI agent — list, create, and configure secure Zero Trust connections to your private infrastructure.
EdApp
Train your workforce with mobile-first microlearning courses, quizzes, and gamified lessons that employees complete on their phones.
Gelato
Manage print-on-demand orders, track fulfillment, and get shipping quotes via AI agents with Gelato.