# Arize AI MCP MCP

> Arize AI monitors model performance by giving your agent full visibility into ML observability. You can detect data drift, analyze execution spans, and troubleshoot prediction quality in real time, all through natural conversation.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** ml-observability, model-monitoring, data-drift, ai-performance, telemetry, troubleshooting

## Description

ML models don't run in a vacuum; they break when the world changes, which means their inputs shift—that’s data drift. Instead of logging into dedicated observability dashboards to check model health or trace performance spikes, you simply talk to your agent. This MCP lets your AI client take control of complex machine learning monitoring workflows using natural language. You can programmatically list active projects and retrieve high-fidelity execution spans, pinpointing exactly where a prediction went wrong. Need to validate a new model? Use the agent to create or check existing datasets for evaluation. The whole process—from managing core ML infrastructure to analyzing performance anomalies—gets wrapped up in one conversational flow via Vinkius, making your AI client act like a dedicated MLOps engineer.

## Tools

### create_dataset
Creates a new, designated dataset for model evaluation purposes.

### get_model
Retrieves specific metadata details about a machine learning model.

### list_datasets
Lists all available datasets within your ML observability account.

### list_experiments
Retrieves a list of recorded machine learning experiments and their outcomes.

### list_projects
Lists all active tracking projects within the ML environment.

### list_spans
Retrieves detailed records of model execution spans and telemetry data.

## Prompt Examples

**Prompt:** 
```
List all active ML projects in my Arize account.
```

**Response:** 
```
I've retrieved your tracing projects. You currently have 3 active projects, including 'Production Classifier' (ID: 1024) and 'Beta Recommender'. Which one would you like to inspect for recent traces?
```

**Prompt:** 
```
Show the recent execution spans for project '1024'.
```

**Response:** 
```
Accessing telemetry data... I found 5 recent spans for project 1024. Most executions are showing low latency (avg 120ms). One span is flagged with a 'Schema Mismatch' warning. Shall I retrieve the detailed metadata for that trace?
```

**Prompt:** 
```
Create a new dataset 'Q2_Eval_Data' for model evaluation.
```

**Response:** 
```
Dataset orchestrated! I've successfully created 'Q2_Eval_Data' in your Arize environment. The dataset ID is 'ds_456'. You can now begin uploading model versions for automated high-fidelity validation. Shall I list your available models?
```

## Capabilities

### Monitor Project Status
List and track all active machine learning tracing projects.

### Analyze Model Spans
Retrieve detailed, real-time telemetry data for model execution spans to find performance bottlenecks.

### Manage Evaluation Datasets
Create and manage the required datasets needed for rigorous model validation and evaluation.

### Audit Model Metadata
Get detailed metadata about specific ML models to coordinate organizational AI strategy.

### Review Experiment History
Access and track historical machine learning experiments for performance and quality analysis.

## Use Cases

### Debugging a Prediction Failure
An ML Engineer notices an increase in prediction errors and asks the agent, 'Show me the recent execution spans for Project Alpha.' The agent uses `list_spans` to return telemetry data, immediately flagging that 40% of failures are due to a schema mismatch detected at the input layer.

### Starting a New Evaluation Cycle
A Data Scientist needs to validate Model Beta against new Q3 data. They tell their agent, 'Create a dataset for Q3 evaluation.' The agent uses `create_dataset`, providing the necessary ID so the scientist can proceed with validation checks.

### Reviewing Project Scope
An AI Developer is onboarding to a new ML product and needs to know what’s running. They ask, 'List all active projects.' The agent uses `list_projects`, giving them an immediate overview of the entire operational scope.

## Benefits

- Instantly check performance metrics. Instead of navigating to a 'Spans' tab, you can ask your agent to list spans for specific projects and immediately see if there are latency warnings.
- Automated validation workflow. You don't have to manually manage data sources; the agent handles creating datasets so you can start high-fidelity model validation right away.
- Track model health over time. Need to know how a model performed after an update? Use `list_experiments` to review historical runs and understand drift across different versions.
- Maintain organizational alignment. You can use `get_model` to pull detailed metadata on any ML model, helping coordinate your overall AI strategy without opening multiple portals.
- Centralized oversight. The agent handles everything from listing active projects (`list_projects`) to verifying API connectivity for instant performance reporting.

## How It Works

The bottom line is you don't need to learn a new dashboard; you just talk about it.

1. Subscribe to this MCP and retrieve your API Key from your Arize dashboard (Settings > API).
2. Connect the key to any MCP-compatible client, giving your agent access to the model observability tools.
3. Use natural language commands with your agent: 'Show me recent spans for project X' or 'List all active projects.' The agent executes the calls and returns actionable performance reports.

## Frequently Asked Questions

**How do I check model performance using the `list_spans` tool?**
You ask your agent to retrieve spans for a specific project ID or time range. The system uses `list_spans` to pull telemetry data, letting you see latency and error rates instantly.

**Does the `create_dataset` tool handle all my data types?**
The dataset management tools help maintain a coordinated ML infrastructure. You should check the documentation for `create_dataset` to ensure your specific data source type is supported for evaluation.

**What if I forget the model's ID? Can I still use `get_model`?**
No, you generally need an identifier. If you can list projects first using `list_projects`, you might find contextual information that helps you identify the correct model for `get_model`.

**What is the difference between listing datasets and listing experiments?**
Datasets (`list_datasets`) are the raw data used to test models, while experiments (`list_experiments`) track the performance and results of specific model runs against that data.

**Before running `list_projects`, what credentials do I need to authenticate my agent?**
You must first retrieve your API Key from your Arize dashboard. This key authenticates your connection, allowing your AI client to access all project and tracing data via the MCP.

**If an ML run fails, how can I use `list_spans` to pinpoint the failure point?**
The tool lists execution spans and flags their status. Look for any 'ERROR' or warning statuses within the span details to identify exactly where the prediction failed or drifted.

**When I use `list_projects`, can I retrieve more than just the project name, like its purpose or owner?**
Yes, it returns detailed metadata for each active ML tracing project. This includes context about who owns the project and what scope of models it monitors.

**When running `list_experiments`, can I filter the results by a specific data environment (e.g., 'staging')?**
You can apply filters to narrow down your list of experiments. Filtering by environment or date range lets you focus only on model runs relevant to staging or production.