Weights & Biases MCP. Analyze ML runs and artifacts via natural chat.

Q: How do I check all my projects using the Weights & Biases MCP Server?

Run listwandbprojects. This tool immediately lists every project you have set up under your account or team. It’s the fastest way to see what work areas are available.

Q: Can I get the full config for a run using getrundetails?

Yes, that's exactly what getrundetails does. You provide the specific run ID, and it returns all metrics, including the learning rate, batch size, and optimizer used in that specific experiment.

Q: What is listprojectartifacts for?

listprojectartifacts tracks data lineage. It lists every versioned asset—like a model or dataset—associated with your project so you never lose track of what was used when.

Q: Does listprojectsweeps show me optimization progress?

Yes, listprojectsweeps monitors automated hyperparameter searches. It shows the current status and progress of those sweeps in a single view, so you don't have to check multiple runs individually.

Q: How do I find reports using listprojectreports?

listprojectreports retrieves all saved analysis dashboards. It gives you a catalog of your team's documented insights, making collaboration easier than digging through emails or shared drives.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Weights & Biases MCP Server lets you manage complex ML experiments through natural conversation. Instead of manually clicking through dashboards, your AI agent talks to your WandB account to list projects, monitor live runs, check hyperparameter sweeps, and pull full metrics for specific model versions (artifacts).

It turns tedious dashboard navigation into direct chat queries.

What your AI agents can do

Get run details

It retrieves all metrics, loss values, and configuration parameters for a single specific experiment run ID.

List project artifacts

This tool lists every versioned asset—like datasets or model weights—available in a given project.

List project reports

It finds and lists all saved analysis reports and dashboards tied to a specific project.

+ 3 more capabilities included

List all projects

It lists every project you have set up in WandB under your user or team account.

View all experiment runs

You can get a list of individual model runs within a specific project, showing their status (running, finished, crashed).

Get detailed run metrics

It pulls the full summary data for any single run ID, including final accuracy, loss values, and the exact hyperparameters used.

List project artifacts

This tool lists versioned datasets, trained model weights, or custom files associated with a specific project.

Monitor hyperparameter sweeps

It tracks automated optimization runs (sweeps), showing which combinations of parameters are currently being tested.

List saved reports and dashboards

You can fetch a list of analysis reports or collaboration dashboards you've saved within WandB.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Weights & Biases: 6 Tools for ML Ops

These tools let your AI agent interact with every core feature of WandB—from listing projects to pulling deep run metrics.

get019d761e

get run details

It retrieves all metrics, loss values, and configuration parameters for a single specific experiment run ID.

list019d761e

list project artifacts

This tool lists every versioned asset—like datasets or model weights—available in a given project.

list019d761e

list project reports

It finds and lists all saved analysis reports and dashboards tied to a specific project.

list019d761e

list project runs

This tool lists every completed or active experiment run within an entire project.

list019d761e

list project sweeps

It monitors automated hyperparameter search runs, showing progress and status for optimization sweeps.

list019d761e

list wandb projects

This function lists every project you have created under your main Weights & Biases account.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Weights & Biases, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Forget clicking through a dozen dashboards just to check one loss value or find that specific dataset version. This MCP Server connects your AI agent straight into your Weights & Biases account. Your agent handles the messy dashboard navigation for you. You simply chat with it, and it runs the necessary queries against all your live model data.

You can use list_wandb_projects to list every single project you've set up under your main WandB account. This gives you a complete overview of everything you’re tracking across teams or personal accounts. From there, you'll get the ability to view all experiment runs for any given project using list_project_runs. It lists individual model attempts—whether they finished cleanly, crashed, or are still running right now.

If you need the nitty-gritty on a specific attempt, the get_run_details tool pulls the full summary data for any single run ID. This includes final accuracy metrics, loss values, and even the exact set of hyperparameters that were used. It’s like getting the whole cheat sheet for one model version.

Need to keep an eye on automated tuning? You can track optimization runs using list_project_sweeps. This tool monitors hyperparameter searches, showing you which parameter combinations are currently being tested as part of your sweep process. It keeps tabs on how far along those automated tests are going.

When it comes to the actual materials, you'll use list_project_artifacts to list all versioned assets tied to a specific project. That means you can pull up every dataset version, trained model weight file, or custom piece of data associated with that work. On top of that, if you saved any analysis reports or collaboration dashboards, the list_project_reports tool finds and lists those saved items for the project.

This setup means your agent doesn't just read a single number; it builds a narrative around your entire ML lifecycle—from listing all available projects to checking specific runs, monitoring sweeps, pulling artifact details, and finding saved reports. It takes what used to be tedious dashboard work and turns it into direct chat queries.

How Weights & Biases MCP Works

1 Subscribe to this server, then input your private WandB API Key and optional Base URL.
2 Your AI client authenticates with the server. You tell your agent what you need: e.g., 'What ran in my project last week?'
3 The agent selects the right tool (like list_project_runs), calls it, gets the data, and then uses that information to answer you conversationally.

The bottom line is, your AI client acts as an API wrapper for WandB. You talk like a human; the server talks machine code, and gives you clean answers.

Who Is Weights & Biases MCP For?

ML Engineers who are tired of clicking through dozens of dashboards just to compare loss values across ten runs. Data Scientists who need guaranteed data lineage before publishing a model. Research Leads needing a single pane of glass for shared project results.

Machine Learning Engineer

Uses this to quickly check the status of parallel training jobs using list_project_runs, or to pull configuration details from get_run_details when debugging a failure.

Data Scientist

Runs checks on data lineage by calling list_project_artifacts to ensure the correct, versioned dataset was used for training. Keeps reproducibility simple.

Research Team Lead

Uses this to track overall optimization progress across multiple team members by monitoring automated sweeps using list_project_sweeps, and reviewing shared reports via list_project_reports.

What Changes When You Connect

Stop clicking through dashboards. Instead of manually filtering a dashboard to find the loss value from run 'X', you just ask your agent, and it calls get_run_details directly. You get the exact metrics instantly.
Track data lineage easily. Need to know which version of the dataset was used? Use list_project_artifacts. It shows every saved model weight and dataset version linked to the project.
Monitor optimization progress in chat. Instead of opening a separate 'Sweeps' tab, use list_project_sweeps to get status updates on automated hyperparameter searches right in your conversation thread.
Get an overview without digging deep. Use list_wandb_projects first. It lists every project you manage, giving you the scope of work before you even start asking detailed questions about runs.
Centralized reporting access. Never lose a key insight again. list_project_reports lets your agent pull up saved collaboration dashboards and analysis reports instantly.

Real-World Use Cases

Debugging a failed training run

A model fails during testing, but you don't know why. You ask the agent: 'What were the hyperparameters for the last run in my project?' The agent runs list_project_runs to find the ID, then uses get_run_details to pull the full config and loss metrics, letting you pinpoint if it was a bad learning rate or an unexpected crash.

Checking for data changes

The model suddenly performs worse. You ask: 'Did we update our input dataset?' The agent uses list_project_artifacts to list all available datasets in the project, letting you compare the version numbers and confirm if a new, unapproved artifact was introduced.

Reviewing team progress

It's time for a weekly review. You ask: 'Show me the status of all active model sweeps.' The agent calls list_project_sweeps, giving you a quick summary of optimization progress across multiple parallel runs, saving hours of manual checking.

Starting a new project

You're starting fresh. You ask: 'What projects do I have?' The agent uses list_wandb_projects to provide an immediate list of your work areas, helping you scope out where the data lives.

The Tradeoffs

Trying to compare runs manually

Opening 10 different tabs in WandB's dashboard and copy/pasting loss values into a spreadsheet just to see which run was best. It's slow, error-prone, and tedious.

→ Instead, ask the agent to use list_project_runs to get all IDs, then tell it: 'Give me the summary for those five runs.' The agent handles the batch data retrieval using multiple tool calls.

Forgetting artifact versions

Thinking that just because a project exists means you have the right model weights. You might end up training on an old, deprecated dataset.

→ Always check list_project_artifacts first. It forces you to confront the versioning system and ensures your agent only uses the explicitly approved model or dataset.

Asking for vague summaries

Simply asking 'How is the model doing?' The agent can't guess what metrics matter right now, leading to generic, unhelpful responses.

→ Be specific. Use get_run_details by giving a run ID and demanding: 'What was the final accuracy AND what were the hyperparameters used?'

When It Fits, When It Doesn't

You should use this server if your workflow involves constant monitoring, detailed comparison of experimental runs, or tracking versioned assets (datasets/models). If you need to know why a model performed a certain way—was it the learning rate? Was it the dataset version?—this is essential. The strength here is turning complex, multi-step dashboard filtering into simple conversation.

Don't use this if your only goal is basic data logging or simply reading static documentation. If you just need to read text reports and don't care about run metrics, a standard API call might suffice. But if the core of your job revolves around ML Ops—comparing loss curves, checking artifact integrity via list_project_artifacts, or monitoring hyperparameter sweeps with list_project_sweeps—this is required.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Weights & Biases. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 6 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_run_details list_project_artifacts list_project_reports list_project_runs list_project_sweeps list_wandb_projects

Dashboard Overload: Finding a single metric shouldn't take ten clicks.

Right now, checking model performance means navigating to the 'Runs' tab, finding the ID you need, clicking into it, then scrolling down through epochs and metrics. If you want to compare that same loss value across five different runs, you copy the number, paste it in a spreadsheet, and do it again for four more times. It’s slow, tedious data entry.

With this MCP server, your agent does all the clicking. You tell it: 'Compare the final accuracy of these three models.' The agent uses `list_project_runs` to gather the IDs, then fires off multiple calls using `get_run_details`. It presents you with a clean comparison table right in the chat. Done.

Weights & Biases MCP Server: Get artifact details from chat.

Manually tracking datasets and model weights is hellish. You have to remember which version was used for which training run, cross-referencing multiple tabs and folders just to confirm data integrity. It's a nightmare of manual record-keeping.

Now, you simply ask your agent: 'What artifacts are available in the resnet project?' The agent calls `list_project_artifacts` and gives you an immediate list with version numbers, dataset types, and model tags. You know exactly what data is where.

Common Questions About Weights & Biases MCP

How do I check all my projects using the Weights & Biases MCP Server? +

Run list_wandb_projects. This tool immediately lists every project you have set up under your account or team. It’s the fastest way to see what work areas are available.

Can I get the full config for a run using get_run_details? +

Yes, that's exactly what get_run_details does. You provide the specific run ID, and it returns all metrics, including the learning rate, batch size, and optimizer used in that specific experiment.

What is list_project_artifacts for? +

list_project_artifacts tracks data lineage. It lists every versioned asset—like a model or dataset—associated with your project so you never lose track of what was used when.

Does list_project_sweeps show me optimization progress? +

Yes, list_project_sweeps monitors automated hyperparameter searches. It shows the current status and progress of those sweeps in a single view, so you don't have to check multiple runs individually.

How do I find reports using list_project_reports? +

list_project_reports retrieves all saved analysis dashboards. It gives you a catalog of your team's documented insights, making collaboration easier than digging through emails or shared drives.

What happens if my W&B API key doesn't have the right scope for list_project_artifacts? +

The server will return an authorization error. You must ensure your API key has read access to the specific project or entity you are trying to inspect. Check your W&B settings to confirm permissions before running any artifact listing tools.

Can I filter results when using list_project_runs? +

Yes, you can pass filters for status and time ranges. For example, asking for 'failed' runs or only those completed after a certain date helps narrow down the output immediately.

Does get_run_details provide live performance metrics? +

No, it provides a snapshot of the run's details at the time you call it. While it includes final summary metrics like accuracy and loss, if the run is still active, you will see its current state rather than real-time streaming data.

Can I check the latest metrics for a specific ML run? +

Yes. Using the get_run_details tool, your AI agent can pull the latest logged metrics (like accuracy or loss) and hyperparameters for any specific run ID within your projects.

Is it possible to list versioned datasets and models? +

Absolutely. The list_project_artifacts tool allows you to see all artifacts, including datasets and models, helping you track data lineage and versioning directly through conversation.

Can I monitor hyperparameter search sweeps via chat? +

Yes. Use the list_project_sweeps tool to monitor automated optimization tasks. Your agent will return a list of sweeps in the project so you can track progress without leaving your workspace.

View all recipes →

Fine-Tune AI Models Using MCP Servers

GPT-4 costs $30 per 1M tokens for your classification task , fine-tune a $0.20/M model on Together AI that scores 96% accuracy, track every experiment in W&B, and save $29.80 per million tokens

Together Ai Weights Biases Google Sheets

View all recipes

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python