Arize AI MCP for AI. Monitor model performance and data drift instantly.
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Arize AI connects your agent to ML observability. You monitor LLM performance, track model metrics, and check data drift right from your terminal or IDE.
It lets you ingest raw inference logs and run automated evaluations against static datasets without opening a dashboard. This is for engineers who need real-time visibility into their models.
What your AI can do
List datasets
Returns a list of all available static evaluation datasets for testing.
List environments
Lists configured deployment environments (like Production or Training) used to segment model data.
List evals
Shows a list of automated evaluation runs that have been executed against models.
List all active ML models and retrieve their detailed configuration schemas.
Fetch current observability metrics, including performance scores and data quality reports for any tracked model.
List available static evaluation datasets or retrieve specific dataset metadata for testing purposes.
Push raw logs, predictions, and inferences into the platform for immediate visualization and drift analysis.
List configured deployment environments, such as Production or Verification, to ensure data segregation.
Ask an AI about this
Waiting for input…
Arize AI with 10 Tools
These tools let you interact with the entire Arize observability platform: list models, fetch performance metrics, manage datasets, and trigger automated model evaluations.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Arize AI on VinkiusList Datasets
Returns a list of all available static evaluation datasets for testing.
List Environments
Lists configured deployment environments (like Production or Training) used to...
List Evals
Shows a list of automated evaluation runs that have been executed against models.
Get Dataset
Retrieves details for a specific static dataset used in evaluations.
Get Model
Gets metadata, inputs, and outputs for a specific tracked machine learning model.
Ingest Log
Accepts raw telemetry data (payload_json) and sends it into the Arize logging system.
Get Metrics
Fetches real-time observability metrics and performance scores for an ML model.
List Models
Lists all ML models or LLMs currently being tracked within the platform space.
Run Eval
Triggers an automated evaluation run for LLM checks using configured ground truth...
List Spaces
Returns a list of accessible workspaces, which separate different model telemetry...
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Arize AI, then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Arize AI. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 10 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Tracking model behavior used to be a multi-tab headache.
Today, if you want to know why your LLM output dipped in quality, you're slammed. You jump into the Arize dashboard, find the right Space, pull up the correct Model, and then hunt through tabs for drift metrics or raw logs that explain the drop. It’s tedious, slow work.
With this MCP, the agent does it all. You just ask: 'What's wrong with Model X?' The system responds by fetching live metrics, checking data quality, and pointing you straight to the problem—no clicking required.
Get model status checks directly via `get_model`.
Before writing a single line of code that interacts with an ML service, manual steps included checking documentation and manually confirming the expected inputs and outputs. This was prone to human error.
Now, you simply ask the agent to run `get_model`. It gives you the full metadata in plain text, right where you're working. That’s how you eliminate boilerplate checks.
What your AI can actually do with this
You can connect this MCP to any agent client, giving it full access to your ML observability platform. Forget switching context into heavy graphical dashboards just to see if an LLM prompt hallucinated or if performance dipped. Now, your AI acts like a dedicated MLOps engineer talking to you in plain English.
Need to know what models are running? You can ask the agent to list all tracked ML models. Want to check data quality? It fetches real-time metrics and shows prediction drift flags. The system also lets you push raw logs, predictions, and inferences directly into Arize for immediate tracking using ingest_log.
For governance, you can browse organizational spaces and deployment environments via list_environments, keeping track of Production versus Training data.
Beyond monitoring, the agent handles testing. You can list automated evaluation runs or even trigger a custom check using run_eval against static datasets. It’s about making your ML telemetry workflow conversational; it just works.
019d7552-62cd-70d2-a1f4-cdbc8fc5e9e7 Here's how it actually works
The bottom line is you don't need a GUI; your AI client handles the API calls and reports back what it finds.
Subscribe to this MCP and provide your Arize API Key and Space ID.
Reference a model by name (e.g., 'Fraud-Detection-v2') so the agent knows where to look for metrics.
Ask the agent to perform an action, like fetching drift metrics or listing active models.
Who is this actually for?
ML Engineers, Data Scientists, and AI Product Managers. You’re the person staring at dashboards until 2 AM trying to figure out why model drift spiked overnight. This MCP lets you query those metrics directly from your terminal.
You need to push inference telemetry and rapidly check for performance degradation flags without leaving your IDE or command line.
You manage baseline evaluation datasets, triggering custom scoring loops asynchronously while reviewing model output toxicity rates.
You monitor usage metrics and output quality across multiple LLM integrations to ensure the product stays aligned with user expectations.
What Changes When You Connect
Stop context-switching. You don't have to leave your terminal or IDE just because you need to check get_metrics for prediction drift. Your agent does the heavy lifting, keeping your focus on coding.
Better governance means knowing where your data comes from. Use list_environments and list_spaces to separate Production telemetry from Training runs, which is critical for clean audits.
ingest_log allows you to push raw inference payloads programmatically. This guarantees that every piece of observed behavior gets tracked in Arize for later analysis.
When you need assurance on model output quality, the agent can list automated evaluation runs (list_evals) or even kick off a new check using run_eval against ground truth data.
The system provides deep visibility into your entire ML stack. You get to see everything from the initial schema definition via get_model all the way through live performance tracking.
See it in action
Debugging a Production Drift Spike
A user notices model accuracy dropped in production. Instead of diving into the UI, they ask their agent to check get_metrics for the specific model and then use list_environments to confirm if the issue is isolated to the active deployment space.
Setting up a New Evaluation Benchmark
A data scientist needs to test an LLM against new toxicity rules. They first run list_datasets to find available benchmarks, then use get_dataset to confirm the schema, and finally trigger the check with run_eval.
Capturing Live Inference Data
A developer writes a new feature that makes many calls. They don't want to manually record everything; they simply use ingest_log to push the entire payload stream, guaranteeing Arize sees every single prediction.
Auditing Model Readiness
A product manager needs proof that a model is stable before release. They ask the agent to list all active models (list_models), check its current performance metrics using get_metrics, and confirm it's running in a verified environment.
The honest tradeoffs
Manual Dashboard Clicking
A developer runs the same test five times, manually copying results from one dashboard tab to another for comparison.
Use ingest_log repeatedly with your agent. This streams all payloads directly into Arize, giving you a single source of truth for comparison and historical analysis.
Forgetting Environment Context
A scientist runs an evaluation using production data when they meant to use the dedicated 'Training' environment.
Always verify your boundaries. Use list_environments before any run, and confirm your workspace via list_spaces.
Assuming Model Schema is Static
The agent fails because the developer didn't realize the model had changed its required inputs or output fields.
Always query the schema first. Run get_model to confirm the precise inputs, outputs, and features before attempting any action.
When It Fits, When It Doesn't
Use this MCP if your primary bottleneck is context-switching between development tools (like VS Code or a terminal) and observability platforms (Arize). You need a programmatic way to query performance metrics (get_metrics), track live data streams (ingest_log), and manage model lifecycles conversationally. Don't use this if you just need to view historical, static reports; for that, the native Arize UI is fine. If your goal is merely listing models without checking their health, list_models handles it, but combining it with get_metrics provides the real value.
Questions you might have
How does I use the ingest_log tool with Arize AI? +
You pass a payload JSON structure to ingest_log. The agent handles structuring your raw telemetry logs into the valid format and pushing them directly to Arize for analysis.
Can I list all monitored ML models with list_models? +
Yes, running list_models retrieves a complete list of every tracked ML or LLM model in your current workspace, helping you narrow down where the issue is occurring.
What's the difference between getting metrics and listing environments? +
get_metrics gives quantitative data (performance scores, drift rates) for a specific model. list_environments just shows you the names of available deployment contexts like 'Production' or 'Staging'.
Do I need to use run_eval if I want to test my LLM? +
No, not always. If you have a specific dataset and just need metrics, get_metrics might suffice. However, using run_eval triggers the formal evaluation process against ground truth baselines.
How do I use list_spaces to see all my available workspaces? +
It lists every organizational space you have access to in Arize. This lets your agent pinpoint exactly which model or telemetry dataset needs monitoring, keeping your work properly segmented.
What information does get_model need about my tracked ML model? +
The tool requires the specific name and ID of the model you are tracking. This confirms the metadata, defining all inputs, outputs, and features so your agent knows exactly what to monitor.
What does list_environments show me about my deployment stages? +
It shows defined contexts like Production, Training, or Verification. You can use this to restrict monitoring to a specific lifecycle stage, which is critical for accurate reporting before going live.
If I list_datasets, how do I get the details on a particular dataset using get_dataset? +
The tool retrieves all metadata for a specified dataset. You'll find immediate details like row counts, column names, and schema information without having to guess.
Can my AI automatically trigger a hallucination evaluation on a new dataset? +
Yes! You can ask your agent to retrieve the specific Ground Truth dataset ID, formulate a testing payload, and invoke the run_eval tool natively. Arize will process the asynchronous scoring internally and log the evaluation securely.
How can I quickly check if a production model is experiencing data drift? +
Just tell your agent: 'Fetch the primary metrics for model X'. The AI uses the get_metrics query to immediately surface latency degradation, prediction drift flags, and incoming data quality indexes without opening the browser.
Is it possible to track telemetry simultaneously for both local development and production environments? +
Absolutely. Arize enforces strict separation using Spaces and Environments. You can instruct your AI agent to query the list_environments tool, figure out the sandbox ID, and push manual test logs strictly to the sandbox scope during debugging sessions, keeping production metrics clean.
We've already built the connector for Arize AI. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 10 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.