Arize AI MCP. Analyze ML performance and track data drift in chat.

Q: How do I check for data drift using the getmetrics tool?

The getmetrics tool fetches real-time observability metrics for an ML model. You pass the model ID and the environment, and the tool returns specific performance and data quality metrics, including prediction drift.

Q: Can I run an evaluation without listing all the datasets first? (listdatasets)

No. The runeval tool requires a defined dataset ID. You must first use listdatasets to find the available ground truth datasets, and then use getdataset to validate the correct ID before triggering the run.

Q: What is the difference between listmodels and listspaces?

Use listspaces first. This shows you the top-level workspaces (e.g., 'Finance' or 'Customer Service'). Then, use listmodels within that space to see the specific ML models deployed there.

Q: How does the ingestlog tool work?

The ingestlog tool accepts a structured payload of raw telemetry logs. You send the agent the logs, and it formats them correctly and pushes them into the Arize platform for tracking.

Q: How do I use the listenvironments tool to check which model deployments are segregated?

The listenvironments tool shows the configured deployment areas (like Production, Training, or Verification). This lets you know exactly where a model is running and what kind of data it's using.

Q: What is the difference between listmodels and listspaces?

listspaces shows the top-level workspaces or containers for your data. listmodels then lists the specific, tracked ML models or LLMs that live within one of those spaces.

Q: When should I use the getdataset tool instead of listing all datasets with listdatasets?

Use getdataset when you know the exact name of the evaluation dataset you need. It lets you pull specific metadata without having to scroll through a full list of available datasets.

Q: How do I trigger an evaluation run using the runeval tool?

You initiate an evaluation by calling runeval and providing the target datasets and the model ID. This starts a custom check against static data without manual dashboard interaction.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Arize AI connects your AI client directly to your Machine Learning and LLM observability platform. Monitor model performance, track data drift, and manage telemetry by listing models, fetching real-time metrics, or running evaluation checks.

It gives your agent the full ML Ops toolkit to analyze prediction health without opening a dashboard.

What your AI agents can do

Get dataset

Retrieves specific details and metadata for a static evaluation dataset.

Get metrics

Fetches current observability metrics and performance data for a specified ML model.

Get model

Gets detailed metadata, including inputs, outputs, and features, for a specific tracked model.

+ 7 more capabilities included

Check Model Performance and Drift

Fetch real-time performance metrics and detect data drift for any tracked ML model using get_metrics.

Log Model Artifacts

Push raw telemetry logs, predictions, and inferences directly into Arize using ingest_log.

Manage ML Context

List available model spaces (list_spaces), deployment environments (list_environments), and tracked models (list_models).

Run Automated Model Evaluations

Trigger specific LLM evaluation runs (run_eval) against static datasets to test for issues like toxicity or PII.

Validate Data Assets

List available evaluation datasets (list_datasets) or retrieve specific dataset metadata (get_dataset).

Inspect Model Metadata

Get detailed inputs, outputs, and features for a specific tracked model using get_model.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Arize AI MCP Server: 10 Tools for ML Model Ops

These tools give your AI agent the full command set to manage model lifecycles, check performance metrics, and validate data assets in the Arize platform.

get019d7552

get dataset

Retrieves specific details and metadata for a static evaluation dataset.

get019d7552

get metrics

Fetches current observability metrics and performance data for a specified ML model.

get019d7552

get model

Gets detailed metadata, including inputs, outputs, and features, for a specific tracked model.

ingest019d7552

ingest log

Pushes raw telemetry logs and inference data into Arize for immediate tracking and analysis.

list019d7552

list datasets

Lists all available static evaluation datasets loaded in the system.

list019d7552

list environments

Lists all configured deployment environments (e.g., Production, Training, Verification) used for model segregation.

list019d7552

list evals

Lists all automated evaluation runs that have been completed or are scheduled.

list019d7552

list models

Lists all active and tracked Machine Learning models or LLMs in the workspace.

list019d7552

list spaces

Lists all accessible workspaces, used to separate different models and telemetry datasets.

run019d7552

run eval

Initiates and triggers a custom, automated evaluation run for an LLM against a specified dataset.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Arize AI, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Arize AI connects your AI client straight to your Machine Learning and LLM observability platform. You'll monitor model performance, track data drift, and handle telemetry by having your agent use the full ML Ops toolkit. It lets your agent analyze prediction health without you ever having to open a dashboard.

Model Performance and Drift

Your agent can fetch current observability metrics and performance data for any tracked ML model using get_metrics. You can also detect data drift on a model using get_metrics. To keep your ML context organized, your agent can list all active and tracked Machine Learning models or LLMs in the workspace with list_models.

You can also list all accessible workspaces, which keeps different models and telemetry datasets separated, using list_spaces. For deployment segregation, your agent can list all configured deployment environments, like Production or Training, via list_environments. If you need to see exactly what a model is built with, your agent can get detailed metadata, including inputs, outputs, and features, for a specific tracked model using get_model.

Logging and Data Validation

Don't waste time pushing logs manually; your agent pushes raw telemetry logs, predictions, and inferences directly into Arize using ingest_log for immediate tracking and analysis. You can validate your data assets by listing all available static evaluation datasets loaded in the system with list_datasets or getting specific dataset metadata using get_dataset.

To test your models, your agent can list all automated evaluation runs that have been completed or are scheduled with list_evals, and it can initiate a custom, automated evaluation run for an LLM against a specified dataset to test for issues like toxicity or PII using run_eval.

Context Management

Your agent can list all available evaluation datasets with list_datasets or grab specific dataset metadata with get_dataset. It can list all active and tracked ML models or LLMs in the workspace using list_models. You'll also be able to list all accessible workspaces with list_spaces and all configured deployment environments with list_environments.

For more details on a specific model's structure, your agent uses get_model to retrieve inputs, outputs, and features.

How Arize AI MCP Works

1 First, tell your agent to list the necessary ML models or spaces using list_models or list_spaces to establish context.
2 Next, use get_metrics or get_model to pull specific data points—like performance scores or schema details—for validation.
3 Finally, trigger the action—whether it's running a check with run_eval or pushing logs with ingest_log—to complete the workflow.

The bottom line is, your agent executes a sequence of API calls to manage the entire lifecycle, keeping you in the chat interface.

Who Is Arize AI MCP For?

This is for the ML Engineer who spends too much time clicking between dashboards. It's for the Data Scientist who needs to validate a model's performance before deployment. It's for the AI Product Manager who needs to prove output safety without leaving their chat client.

ML Engineer

Uses ingest_log to stream inference telemetry and get_metrics to query performance degradation flags directly in the terminal.

Data Scientist

Manages baseline evaluation datasets using list_datasets and triggers custom scoring loops via run_eval.

AI Product Manager

Monitors output toxicity and drift rates across multiple LLM integrations by calling list_models and get_metrics.

What Changes When You Connect

Real-time drift detection: Call get_metrics to instantly see if your model's performance is degrading due to data drift. This is faster than waiting for dashboard refreshes.
Immediate logging: Use ingest_log to push raw inference data. You don't need to copy/paste logs; your agent handles the structured payload.
Structured context: Before checking anything, use list_spaces to ensure your agent is querying the right workspace. This prevents mixing production data with training data.
Safety checks: Need to know if the model output is toxic? run_eval triggers an automated check using ground truth baselines, flagging issues like Hallucination.
Auditability: Use list_environments to understand exactly which deployment stage (Production, Verification) the model metrics come from. This is crucial for compliance.
Schema validation: get_model pulls the full metadata—inputs, outputs, and features—so you know exactly what the model expects before you call it.

Real-World Use Cases

Debugging a sudden model performance drop

The fraud detection model started flagging too many false positives. Instead of jumping into the GUI, the ML engineer tells their agent to get_metrics for the model. The agent returns the recent prediction drift metrics, showing the exact feature that changed. The problem is found and logged in seconds.

Testing a new LLM prompt safely

A product manager wants to update the customer bot's prompt, but needs to check for toxicity first. They ask the agent to list_datasets and then run_eval, targeting the new prompt against the 'Toxicity-Benchmark' dataset. The agent reports the toxicity score before the code ever goes live.

Validating data for a new ML feature

A data scientist needs to know if the data used for the new feature is clean. They use list_datasets to find the right ground truth data, then call get_dataset to pull the metadata. This confirms the feature's input schema matches the model's requirements via get_model.

Tracking live service behavior

The team is running a beta feature. Every request needs to be tracked. The engineer uses ingest_log to push the raw predictions and inferences from the live service into Arize. This provides a continuous, auditable log stream for later analysis.

The Tradeoffs

Guessing the scope

Calling get_metrics without first knowing if the model is running in Production or Training. You might pull stale or irrelevant metrics, wasting time and giving a false sense of security.

→ Always call list_environments first. This forces the agent to confirm the deployment context. Then, use list_models to scope the specific model ID before calling get_metrics. Start with context, then data.

Ignoring data provenance

Running an run_eval check and assuming the results are based on the latest data. You might be using an outdated dataset version, making the entire evaluation worthless.

→ First, call list_datasets to see all available baselines. Then, use get_dataset to verify the exact dataset ID you need. Finally, pass that verified ID to run_eval to guarantee the right ground truth is used.

Overloading the agent's memory

Asking the agent to list everything—list_spaces, list_models, list_datasets, list_environments—all in one prompt. The resulting wall of text is unreadable and forces you to re-ask questions anyway.

→ Break it up. Use a sequence. Start by defining the scope: list_spaces. Once you have the space, narrow it down: list_models. Never try to query the entire system at once.

When It Fits, When It Doesn't

Use this server if you need to manage the full, traceable lifecycle of an ML model in a conversational way. If your job involves checking data drift, validating LLM outputs (toxicity, hallucination), or comparing model performance across different environments (Production vs. Staging), this is your tool. You need a single interface for ML Ops.

Don't use this if you just need to look up a simple piece of data (e.g., a single user's record). For that, a simple database query tool is better. If you only care about code structure, a static code analyzer is sufficient. This server is for observability and model governance.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Arize AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_dataset get_metrics get_model ingest_log list_datasets list_environments list_evals list_models list_spaces run_eval

ML Ops monitoring shouldn't feel like navigating a dozen dashboards.

Right now, checking model drift means context-switching. You jump from your IDE to the Arize dashboard, then you click 'Metrics,' then you filter by 'Data Drift,' and finally, you wait for the chart to load. It's a multi-step, manual process that kills flow.

With this MCP server, you just tell your agent, 'Show me the drift on the payment model.' The agent executes `list_models` and `get_metrics` in the background, and the results—the key numbers and graphs—appear right here, instantly.

Arize AI MCP Server: Model & Data Ops

You no longer need to manually push logs or run evaluation scripts outside of your chat session. The agent handles `ingest_log` for raw telemetry and executes `run_eval` against your static datasets. It manages the whole pipeline automatically.

The model lifecycle, from data ingestion to final evaluation, is now governed by a few simple commands. It's built for operational speed, not for GUI exploration.

Common Questions About Arize AI MCP

How do I check for data drift using the get_metrics tool? +

The get_metrics tool fetches real-time observability metrics for an ML model. You pass the model ID and the environment, and the tool returns specific performance and data quality metrics, including prediction drift.

Can I run an evaluation without listing all the datasets first? (list_datasets) +

No. The run_eval tool requires a defined dataset ID. You must first use list_datasets to find the available ground truth datasets, and then use get_dataset to validate the correct ID before triggering the run.

What is the difference between list_models and list_spaces? +

Use list_spaces first. This shows you the top-level workspaces (e.g., 'Finance' or 'Customer Service'). Then, use list_models within that space to see the specific ML models deployed there.

How does the ingest_log tool work? +

The ingest_log tool accepts a structured payload of raw telemetry logs. You send the agent the logs, and it formats them correctly and pushes them into the Arize platform for tracking.

How do I use the list_environments tool to check which model deployments are segregated? +

The list_environments tool shows the configured deployment areas (like Production, Training, or Verification). This lets you know exactly where a model is running and what kind of data it's using.

What is the difference between `list_models` and `list_spaces`? +

list_spaces shows the top-level workspaces or containers for your data. list_models then lists the specific, tracked ML models or LLMs that live within one of those spaces.

When should I use the get_dataset tool instead of listing all datasets with list_datasets? +

Use get_dataset when you know the exact name of the evaluation dataset you need. It lets you pull specific metadata without having to scroll through a full list of available datasets.

How do I trigger an evaluation run using the run_eval tool? +

You initiate an evaluation by calling run_eval and providing the target datasets and the model ID. This starts a custom check against static data without manual dashboard interaction.

Can my AI automatically trigger a hallucination evaluation on a new dataset? +

Yes! You can ask your agent to retrieve the specific Ground Truth dataset ID, formulate a testing payload, and invoke the run_eval tool natively. Arize will process the asynchronous scoring internally and log the evaluation securely.

How can I quickly check if a production model is experiencing data drift? +

Just tell your agent: 'Fetch the primary metrics for model X'. The AI uses the get_metrics query to immediately surface latency degradation, prediction drift flags, and incoming data quality indexes without opening the browser.

Is it possible to track telemetry simultaneously for both local development and production environments? +

Absolutely. Arize enforces strict separation using Spaces and Environments. You can instruct your AI agent to query the list_environments tool, figure out the sandbox ID, and push manual test logs strictly to the sandbox scope during debugging sessions, keeping production metrics clean.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript