Arize AI MCP for AI. Monitor ML Model Drift via Conversation

Q: What is the difference between listing datasets and listing experiments?

Datasets (listdatasets) are the raw data used to test models, while experiments (listexperiments) track the performance and results of specific model runs against that data.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Arize AI monitors model performance by giving your agent full visibility into ML observability. You can detect data drift, analyze execution spans, and troubleshoot prediction quality in real time, all through natural conversation.

What your AI can do

Create dataset

Creates a new, designated dataset for model evaluation purposes.

Get model

Retrieves specific metadata details about a machine learning model.

List datasets

Lists all available datasets within your ML observability account.

+ 3 more capabilities included

Monitor Project Status

List and track all active machine learning tracing projects.

Analyze Model Spans

Retrieve detailed, real-time telemetry data for model execution spans to find performance bottlenecks.

Manage Evaluation Datasets

Create and manage the required datasets needed for rigorous model validation and evaluation.

Audit Model Metadata

Get detailed metadata about specific ML models to coordinate organizational AI strategy.

Review Experiment History

Access and track historical machine learning experiments for performance and quality analysis.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Arize AI: 6 Tools for ML Observability

These tools let your agent manage the full lifecycle of an ML project, from creating validation datasets to monitoring real-time model performance spans.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Arize AI on Vinkius

Create Dataset

Creates a new, designated dataset for model evaluation purposes.

Get Model

Retrieves specific metadata details about a machine learning model.

List Datasets

Lists all available datasets within your ML observability account.

List Experiments

Retrieves a list of recorded machine learning experiments and their outcomes.

List Projects

Lists all active tracking projects within the ML environment.

List Spans

Retrieves detailed records of model execution spans and telemetry data.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Arize AI integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "arize-ai-alternative": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Arize AI tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"arize-ai-alternative": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Arize AI, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Arize AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 6 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Debugging ML models today means jumping between too many dashboards.

Right now, if your model gives a weird prediction, you're stuck. You have to manually log into the observability portal, find the correct project ID, check for data drift alerts in one tab, and then cross-reference performance spikes in another. It’s clicking through three or four separate dashboards just to get a single answer.

With this MCP, your AI acts as that coordinator. You ask it directly: 'Why did Project Beta fail today?' The agent handles the calls—it checks the spans for recent errors and compares them against the defined datasets. What you get is a clean report explaining the root cause.

Using `list_projects` gives instant visibility into your entire ML estate.

Before, figuring out which projects were even running required manually checking status reports or digging through account-level settings. You'd spend time compiling a list just to understand the scope of the problem.

Now, you simply prompt for it. The agent executes `list_projects`, giving you an immediate, structured list of every active tracing project. It’s that simple.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

ML models don't run in a vacuum; they break when the world changes, which means their inputs shift—that’s data drift. Instead of logging into dedicated observability dashboards to check model health or trace performance spikes, you simply talk to your agent. This MCP lets your AI client take control of complex machine learning monitoring workflows using natural language.

You can programmatically list active projects and retrieve high-fidelity execution spans, pinpointing exactly where a prediction went wrong. Need to validate a new model? Use the agent to create or check existing datasets for evaluation. The whole process—from managing core ML infrastructure to analyzing performance anomalies—gets wrapped up in one conversational flow via Vinkius, making your AI client act like a dedicated MLOps engineer.

Built · Hosted · Managed by Vinkius Arize AI MCP - Monitor ML Model Performance

Server ID 019dd0bb-d52e-73d9-b2db-32e86b093f07

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

Instantly check performance metrics. Instead of navigating to a 'Spans' tab, you can ask your agent to list spans for specific projects and immediately see if there are latency warnings.

Automated validation workflow. You don't have to manually manage data sources; the agent handles creating datasets so you can start high-fidelity model validation right away.

Track model health over time. Need to know how a model performed after an update? Use list_experiments to review historical runs and understand drift across different versions.

Maintain organizational alignment. You can use get_model to pull detailed metadata on any ML model, helping coordinate your overall AI strategy without opening multiple portals.

Centralized oversight. The agent handles everything from listing active projects (list_projects) to verifying API connectivity for instant performance reporting.

See it in action

01 01

Debugging a Prediction Failure

An ML Engineer notices an increase in prediction errors and asks the agent, 'Show me the recent execution spans for Project Alpha.' The agent uses list_spans to return telemetry data, immediately flagging that 40% of failures are due to a schema mismatch detected at the input layer.

02 02

Starting a New Evaluation Cycle

A Data Scientist needs to validate Model Beta against new Q3 data. They tell their agent, 'Create a dataset for Q3 evaluation.' The agent uses create_dataset, providing the necessary ID so the scientist can proceed with validation checks.

03 03

Reviewing Project Scope

An AI Developer is onboarding to a new ML product and needs to know what’s running. They ask, 'List all active projects.' The agent uses list_projects, giving them an immediate overview of the entire operational scope.

The honest tradeoffs

Over-reliance on Dashboards

Anti-pattern

Spending twenty minutes clicking through multiple tabs and filtering reports in a dashboard just to confirm if model drift occurred.

The Fix

Ask your agent. Use natural language commands with the MCP, like 'Check for data drift in Project Alpha.' The agent handles the necessary calls (e.g., list_spans) and gives you a direct answer.

Forgetting Dataset Management

Anti-pattern

Assuming that raw model output is sufficient for validation, leading to poorly managed or incomplete test data sets.

The Fix

Before starting any evaluation, prompt the agent to create_dataset and confirm the ID. This ensures your data source is tracked and ready for rigorous testing.

Mixing Up Model IDs

Anti-pattern

Manually referencing an old model version number found in an email, without knowing if that version was actually used or monitored.

The Fix

Use get_model to pull the accurate and current metadata for a specific ML model. This confirms its status and helps coordinate your strategy.

Questions you might have

How do I check model performance using the `list_spans` tool? +

You ask your agent to retrieve spans for a specific project ID or time range. The system uses list_spans to pull telemetry data, letting you see latency and error rates instantly.

Does the `create_dataset` tool handle all my data types? +

The dataset management tools help maintain a coordinated ML infrastructure. You should check the documentation for create_dataset to ensure your specific data source type is supported for evaluation.

What if I forget the model's ID? Can I still use `get_model`? +

No, you generally need an identifier. If you can list projects first using list_projects, you might find contextual information that helps you identify the correct model for get_model.

What is the difference between listing datasets and listing experiments? +

Datasets (list_datasets) are the raw data used to test models, while experiments (list_experiments) track the performance and results of specific model runs against that data.

Before running `list_projects`, what credentials do I need to authenticate my agent? +

You must first retrieve your API Key from your Arize dashboard. This key authenticates your connection, allowing your AI client to access all project and tracing data via the MCP.

If an ML run fails, how can I use `list_spans` to pinpoint the failure point? +

The tool lists execution spans and flags their status. Look for any 'ERROR' or warning statuses within the span details to identify exactly where the prediction failed or drifted.

When I use `list_projects`, can I retrieve more than just the project name, like its purpose or owner? +

Yes, it returns detailed metadata for each active ML tracing project. This includes context about who owns the project and what scope of models it monitors.

When running `list_experiments`, can I filter the results by a specific data environment (e.g., 'staging')? +

You can apply filters to narrow down your list of experiments. Filtering by environment or date range lets you focus only on model runs relevant to staging or production.

How do I find my Arize API Key? +

Can I track model drift via AI? +

Yes! Use the list_experiments tool to retrieve data on active model evaluations and track performance variations programmatically.

How do I retrieve telemetry traces? +

Use the list_spans tool to retrieve high-fidelity execution spans and traces for your ML projects directly from the platform.

Connect to your AI in seconds.

Create dataset

Get model

List datasets

Arize AI: 6 Tools for ML Observability

Make your AI actually useful.

Create Dataset

Get Model

List Datasets

List Experiments

List Projects

List Spans

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Debugging ML models today means jumping between too many dashboards.

Using `list_projects` gives instant visibility into your entire ML estate.

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Debugging a Prediction Failure

Starting a New Evaluation Cycle

Reviewing Project Scope

The honest tradeoffs

Over-reliance on Dashboards

Forgetting Dataset Management

Mixing Up Model IDs

When It Fits, When It Doesn't

Questions you might have