Arize AI MCP. Analyze ML performance and track data drift in chat.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Arize AI connects your AI client directly to your Machine Learning and LLM observability platform. Monitor model performance, track data drift, and manage telemetry by listing models, fetching real-time metrics, or running evaluation checks.
It gives your agent the full ML Ops toolkit to analyze prediction health without opening a dashboard.
What your AI agents can do
Get dataset
Retrieves specific details and metadata for a static evaluation dataset.
Get metrics
Fetches current observability metrics and performance data for a specified ML model.
Get model
Gets detailed metadata, including inputs, outputs, and features, for a specific tracked model.
Fetch real-time performance metrics and detect data drift for any tracked ML model using get_metrics.
Push raw telemetry logs, predictions, and inferences directly into Arize using ingest_log.
List available model spaces (list_spaces), deployment environments (list_environments), and tracked models (list_models).
Trigger specific LLM evaluation runs (run_eval) against static datasets to test for issues like toxicity or PII.
List available evaluation datasets (list_datasets) or retrieve specific dataset metadata (get_dataset).
Get detailed inputs, outputs, and features for a specific tracked model using get_model.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Arize AI MCP Server: 10 Tools for ML Model Ops
These tools give your AI agent the full command set to manage model lifecycles, check performance metrics, and validate data assets in the Arize platform.
019d7552get dataset
Retrieves specific details and metadata for a static evaluation dataset.
019d7552get metrics
Fetches current observability metrics and performance data for a specified ML model.
019d7552get model
Gets detailed metadata, including inputs, outputs, and features, for a specific tracked model.
019d7552ingest log
Pushes raw telemetry logs and inference data into Arize for immediate tracking and analysis.
019d7552list datasets
Lists all available static evaluation datasets loaded in the system.
019d7552list environments
Lists all configured deployment environments (e.g., Production, Training, Verification) used for model segregation.
019d7552list evals
Lists all automated evaluation runs that have been completed or are scheduled.
019d7552list models
Lists all active and tracked Machine Learning models or LLMs in the workspace.
019d7552list spaces
Lists all accessible workspaces, used to separate different models and telemetry datasets.
019d7552run eval
Initiates and triggers a custom, automated evaluation run for an LLM against a specified dataset.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Arize AI, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Arize AI connects your AI client straight to your Machine Learning and LLM observability platform. You'll monitor model performance, track data drift, and handle telemetry by having your agent use the full ML Ops toolkit. It lets your agent analyze prediction health without you ever having to open a dashboard.
Model Performance and Drift
Your agent can fetch current observability metrics and performance data for any tracked ML model using get_metrics. You can also detect data drift on a model using get_metrics. To keep your ML context organized, your agent can list all active and tracked Machine Learning models or LLMs in the workspace with list_models.
You can also list all accessible workspaces, which keeps different models and telemetry datasets separated, using list_spaces. For deployment segregation, your agent can list all configured deployment environments, like Production or Training, via list_environments. If you need to see exactly what a model is built with, your agent can get detailed metadata, including inputs, outputs, and features, for a specific tracked model using get_model.
Logging and Data Validation
Don't waste time pushing logs manually; your agent pushes raw telemetry logs, predictions, and inferences directly into Arize using ingest_log for immediate tracking and analysis. You can validate your data assets by listing all available static evaluation datasets loaded in the system with list_datasets or getting specific dataset metadata using get_dataset.
To test your models, your agent can list all automated evaluation runs that have been completed or are scheduled with list_evals, and it can initiate a custom, automated evaluation run for an LLM against a specified dataset to test for issues like toxicity or PII using run_eval.
Context Management
Your agent can list all available evaluation datasets with list_datasets or grab specific dataset metadata with get_dataset. It can list all active and tracked ML models or LLMs in the workspace using list_models. You'll also be able to list all accessible workspaces with list_spaces and all configured deployment environments with list_environments.
For more details on a specific model's structure, your agent uses get_model to retrieve inputs, outputs, and features.
How Arize AI MCP Works
- 1 First, tell your agent to list the necessary ML models or spaces using
list_modelsorlist_spacesto establish context. - 2 Next, use
get_metricsorget_modelto pull specific data points—like performance scores or schema details—for validation. - 3 Finally, trigger the action—whether it's running a check with
run_evalor pushing logs withingest_log—to complete the workflow.
The bottom line is, your agent executes a sequence of API calls to manage the entire lifecycle, keeping you in the chat interface.
Who Is Arize AI MCP For?
This is for the ML Engineer who spends too much time clicking between dashboards. It's for the Data Scientist who needs to validate a model's performance before deployment. It's for the AI Product Manager who needs to prove output safety without leaving their chat client.
Uses ingest_log to stream inference telemetry and get_metrics to query performance degradation flags directly in the terminal.
Manages baseline evaluation datasets using list_datasets and triggers custom scoring loops via run_eval.
Monitors output toxicity and drift rates across multiple LLM integrations by calling list_models and get_metrics.
What Changes When You Connect
- Real-time drift detection: Call
get_metricsto instantly see if your model's performance is degrading due to data drift. This is faster than waiting for dashboard refreshes. - Immediate logging: Use
ingest_logto push raw inference data. You don't need to copy/paste logs; your agent handles the structured payload. - Structured context: Before checking anything, use
list_spacesto ensure your agent is querying the right workspace. This prevents mixing production data with training data. - Safety checks: Need to know if the model output is toxic?
run_evaltriggers an automated check using ground truth baselines, flagging issues like Hallucination. - Auditability: Use
list_environmentsto understand exactly which deployment stage (Production, Verification) the model metrics come from. This is crucial for compliance. - Schema validation:
get_modelpulls the full metadata—inputs, outputs, and features—so you know exactly what the model expects before you call it.
Real-World Use Cases
Debugging a sudden model performance drop
The fraud detection model started flagging too many false positives. Instead of jumping into the GUI, the ML engineer tells their agent to get_metrics for the model. The agent returns the recent prediction drift metrics, showing the exact feature that changed. The problem is found and logged in seconds.
Testing a new LLM prompt safely
A product manager wants to update the customer bot's prompt, but needs to check for toxicity first. They ask the agent to list_datasets and then run_eval, targeting the new prompt against the 'Toxicity-Benchmark' dataset. The agent reports the toxicity score before the code ever goes live.
Validating data for a new ML feature
A data scientist needs to know if the data used for the new feature is clean. They use list_datasets to find the right ground truth data, then call get_dataset to pull the metadata. This confirms the feature's input schema matches the model's requirements via get_model.
Tracking live service behavior
The team is running a beta feature. Every request needs to be tracked. The engineer uses ingest_log to push the raw predictions and inferences from the live service into Arize. This provides a continuous, auditable log stream for later analysis.
The Tradeoffs
Guessing the scope
Calling get_metrics without first knowing if the model is running in Production or Training. You might pull stale or irrelevant metrics, wasting time and giving a false sense of security.
→
Always call list_environments first. This forces the agent to confirm the deployment context. Then, use list_models to scope the specific model ID before calling get_metrics. Start with context, then data.
Ignoring data provenance
Running an run_eval check and assuming the results are based on the latest data. You might be using an outdated dataset version, making the entire evaluation worthless.
→
First, call list_datasets to see all available baselines. Then, use get_dataset to verify the exact dataset ID you need. Finally, pass that verified ID to run_eval to guarantee the right ground truth is used.
Overloading the agent's memory
Asking the agent to list everything—list_spaces, list_models, list_datasets, list_environments—all in one prompt. The resulting wall of text is unreadable and forces you to re-ask questions anyway.
→
Break it up. Use a sequence. Start by defining the scope: list_spaces. Once you have the space, narrow it down: list_models. Never try to query the entire system at once.
When It Fits, When It Doesn't
Use this server if you need to manage the full, traceable lifecycle of an ML model in a conversational way. If your job involves checking data drift, validating LLM outputs (toxicity, hallucination), or comparing model performance across different environments (Production vs. Staging), this is your tool. You need a single interface for ML Ops.
Don't use this if you just need to look up a simple piece of data (e.g., a single user's record). For that, a simple database query tool is better. If you only care about code structure, a static code analyzer is sufficient. This server is for observability and model governance.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Arize AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
ML Ops monitoring shouldn't feel like navigating a dozen dashboards.
Right now, checking model drift means context-switching. You jump from your IDE to the Arize dashboard, then you click 'Metrics,' then you filter by 'Data Drift,' and finally, you wait for the chart to load. It's a multi-step, manual process that kills flow.
With this MCP server, you just tell your agent, 'Show me the drift on the payment model.' The agent executes `list_models` and `get_metrics` in the background, and the results—the key numbers and graphs—appear right here, instantly.
Arize AI MCP Server: Model & Data Ops
You no longer need to manually push logs or run evaluation scripts outside of your chat session. The agent handles `ingest_log` for raw telemetry and executes `run_eval` against your static datasets. It manages the whole pipeline automatically.
The model lifecycle, from data ingestion to final evaluation, is now governed by a few simple commands. It's built for operational speed, not for GUI exploration.
Common Questions About Arize AI MCP
How do I check for data drift using the get_metrics tool? +
The get_metrics tool fetches real-time observability metrics for an ML model. You pass the model ID and the environment, and the tool returns specific performance and data quality metrics, including prediction drift.
Can I run an evaluation without listing all the datasets first? (list_datasets) +
No. The run_eval tool requires a defined dataset ID. You must first use list_datasets to find the available ground truth datasets, and then use get_dataset to validate the correct ID before triggering the run.
What is the difference between list_models and list_spaces? +
Use list_spaces first. This shows you the top-level workspaces (e.g., 'Finance' or 'Customer Service'). Then, use list_models within that space to see the specific ML models deployed there.
How does the ingest_log tool work? +
The ingest_log tool accepts a structured payload of raw telemetry logs. You send the agent the logs, and it formats them correctly and pushes them into the Arize platform for tracking.
How do I use the list_environments tool to check which model deployments are segregated? +
The list_environments tool shows the configured deployment areas (like Production, Training, or Verification). This lets you know exactly where a model is running and what kind of data it's using.
What is the difference between `list_models` and `list_spaces`? +
list_spaces shows the top-level workspaces or containers for your data. list_models then lists the specific, tracked ML models or LLMs that live within one of those spaces.
When should I use the get_dataset tool instead of listing all datasets with list_datasets? +
Use get_dataset when you know the exact name of the evaluation dataset you need. It lets you pull specific metadata without having to scroll through a full list of available datasets.
How do I trigger an evaluation run using the run_eval tool? +
You initiate an evaluation by calling run_eval and providing the target datasets and the model ID. This starts a custom check against static data without manual dashboard interaction.
Can my AI automatically trigger a hallucination evaluation on a new dataset? +
Yes! You can ask your agent to retrieve the specific Ground Truth dataset ID, formulate a testing payload, and invoke the run_eval tool natively. Arize will process the asynchronous scoring internally and log the evaluation securely.
How can I quickly check if a production model is experiencing data drift? +
Just tell your agent: 'Fetch the primary metrics for model X'. The AI uses the get_metrics query to immediately surface latency degradation, prediction drift flags, and incoming data quality indexes without opening the browser.
Is it possible to track telemetry simultaneously for both local development and production environments? +
Absolutely. Arize enforces strict separation using Spaces and Environments. You can instruct your AI agent to query the list_environments tool, figure out the sandbox ID, and push manual test logs strictly to the sandbox scope during debugging sessions, keeping production metrics clean.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
Midjourney AI (Generative Image Arts)
Generate professional AI art via Midjourney — use 'imagine' for text-to-image, upscale grids, and perform camera edits.
Cerebras Inference
Access lightning-fast AI inference via Cerebras Wafer-Scale Engine — generate chat completions, manage models, and run batch jobs at record speeds.
Luma AI (Generative Video & Creative)
Generate cinematic AI videos and images via Luma — use Dream Machine for text-to-video, image-to-video, and professional camera control.
You might also like
Clerk
Manage authentication and users via Clerk — track active sessions, monitor organizations, and manage invitations directly from any AI agent.
PBGC Pension Data
Access official US Pension Benefit Guaranty Corporation data — query single and multiemployer plans, ERISA 4044 rates, and financial assistance records.
Fastn
Automate and execute low-code workflows via Fastn — manage flow definitions, monitor executions, and handle connector credentials directly.