Datadog AI MCP. Audit model usage, track costs, and manage incidents.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Datadog AI (LLM Observability) MCP Server lets your agent monitor and audit LLM performance metrics. Track token usage, query model latency, and audit prompts directly from your AI workflow.
List monitors, check incidents, and analyze global AI expenses—all without leaving your dev environment.
What your AI agents can do
Create event
Creates a new event record by inspecting deep internal arrays related to plan math.
Create monitor
Creates a new monitoring alert by extracting rich churn flags and explicit validations.
List ai monitors
Retrieves a list of existing AI monitors by checking cloud logging and vault limits.
Query specific LLM timeseries (like datadog.llm_observability.tokens) to get average usage and performance data for your agent.
Retrieve detailed payload contents, allowing you to search through exact prompts and response traces for debugging.
List existing AI monitors or create new ones to automatically alert when model performance dips below required thresholds.
Get a list of dashboards showing aggregate AI expenses across different providers (OpenAI, Anthropic, etc.).
Check for current service disruptions or active incidents that could break your multi-agent workflow.
Pull a timeline of deployment marks to see exactly when the LLM model used by your agent was switched.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Datadog AI MCP Server: 10 Tools for Observability
These tools give your agent direct access to Datadog's LLM observability data, allowing you to query metrics, search traces, and manage monitoring alerts.
019d7581create event
Creates a new event record by inspecting deep internal arrays related to plan math.
019d7581create monitor
Creates a new monitoring alert by extracting rich churn flags and explicit validations.
019d7581list ai monitors
Retrieves a list of existing AI monitors by checking cloud logging and vault limits.
019d7581list dashboards
Enumerates available dashboards by exporting active billing rules.
019d7581list events
Identifies active arrays spanning native gateway authentication deployment marks.
019d7581list incidents
Dispatches an automated validation check to retrieve explicit gateway history and active outages.
019d7581list service accounts
Identifies active arrays spanning native hold parsing for service accounts.
019d7581query metrics
Queries specific metrics, such as `datadog.llm_observability.tokens`, within the Headless Datadog Platform.
019d7581search llm spans
Searches for LLM spans by providing a JSON payload to find specific customer bindings.
019d7581submit series
Performs structural extraction of properties driving active account logic.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Datadog AI (LLM Observability), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Yo, this Datadog AI MCP Server lets your agent keep a tight grip on LLM performance. You can track everything—token usage, query latency, and even audit prompts—right from your dev environment. It’s basically a whole suite for keeping your AI workflows running clean.
Analyze Token and Latency Metrics
Your agent can query specific LLM timeseries, like datadog.llm_observability.tokens, so you get the average usage and performance data you need. You can also run query_metrics to grab specific metrics within the Headless Datadog Platform.
Search Specific LLM Prompts and Traces
Need to debug a specific prompt or trace? Your agent can search_llm_spans by giving it a JSON payload, letting you hunt down exact customer bindings. If you need to look at raw data, your agent can list_events to identify active arrays spanning native gateway authentication deployment marks, or it can search_llm_spans to find specific customer bindings.
Manage AI Alert Monitors
Your agent can list_ai_monitors to see what AI monitors are already set up by checking cloud logging and vault limits, or it can use create_monitor to set up a new alert by extracting rich churn flags and explicit validations. If you want to check what dashboards are available, your agent can list_dashboards by exporting active billing rules.
Report Global AI Spending
To get a picture of your total AI spend, your agent can list_dashboards to enumerate available dashboards by exporting active billing rules. It can also submit_series to perform structural extraction of properties driving active account logic.
Track Active Service Outages
If you gotta know if something's broken, your agent can list_incidents to dispatch an automated validation check that retrieves explicit gateway history and active outages. It can also list_service_accounts to identify active arrays spanning native hold parsing for service accounts.
View Model Deployment History
To see when the LLM model used by your agent changed, your agent can list_events to identify active arrays spanning native gateway authentication deployment marks, or it can create_event to create a new event record by inspecting deep internal arrays related to plan math. It's all about knowing what's running and when it started running.
How Datadog AI MCP Works
- 1 Subscribe to the server and input your Datadog API Key, APP Key, and Site.
- 2 Your AI client uses the server's tools to run specific queries (e.g.,
query_metricsorlist_incidents). - 3 The server returns structured data—metrics, logs, or event lists—that your agent uses to answer questions or take action.
The bottom line is, your agent can talk to your live AI infrastructure data through natural conversation, getting metrics and alerts back immediately.
Who Is Datadog AI MCP For?
The ML Ops Engineer who needs to know why the model cost spiked last night. The SRE who gets paged at 2 AM because an agent workflow failed. The FinOps analyst tracking global LLM spend. If you build agents, you need this visibility.
Audits prompt logs and traces AI model performance across different versions or checks search_llm_spans for specific errors.
Sets up alerts using create_monitor and checks list_incidents to track outages affecting agent workflows.
Analyzes dashboards via list_dashboards to graph and track global AI infrastructure expenses and usage patterns.
What Changes When You Connect
- See exactly how much an agent costs. Use
query_metricsto pull token counts and latency data for specific models, so you know if the cost spike was due to usage or inefficiency. - Debug model failures instantly. Use
search_llm_spansto retrieve the full payload, letting you examine the exact prompt logic and response traces when something goes wrong. - Get ahead of outages. Use
list_incidentsto check for active service disruptions, preventing your multi-agent workflow from failing silently. - Manage alerts with
create_monitor. Set thresholds that automatically flag when an AI response rate drops or when token usage plateaus, so you don't miss performance dips. - Understand your budget. Run
list_dashboardsto see widgets that graph global AI expenses across providers like OpenAI or Anthropic, keeping FinOps happy. - Know the history. Use
list_eventsto pull deployment marks, telling you precisely when the LLM model was switched, which is key for root cause analysis.
Real-World Use Cases
Investigating a Sudden Cost Spike
The FinOps team noticed the OpenAI bill was 30% higher than usual. They ask their agent to run list_dashboards to graph global AI expenses. The agent finds the spike correlates with a specific model version, which they then confirm using list_events to pull the exact deployment mark.
Debugging Agent Logic Failure
An agent fails to complete a task. The developer asks the agent to run search_llm_spans using the error time. The agent returns the full payload, showing the exact prompt and response trace that caused the failure, allowing immediate debugging.
Monitoring Critical Service Health
An SRE needs to know if the multi-agent orchestration is currently blocked. They ask the agent to run list_incidents. If an active outage is reported, they know to pause the workflow and fix the underlying infrastructure.
Verifying Model Performance Drift
The MLOps team suspects the model is degrading. They ask the agent to list_ai_monitors to check existing alerts. If none exist, they use create_monitor to set up a new alert that triggers if the LLM latency goes above 1 second.
The Tradeoffs
Filtering Raw Logs Manually
Manually downloading large log files and running regex searches to find all 'token usage' or 'error' messages. This is slow, misses trends, and requires massive data handling.
→
Use query_metrics to get aggregate token counts, or use search_llm_spans to filter logs down to specific prompts and response traces. Never try to manually correlate massive log dumps.
Guessing Service Status
Calling list_service_accounts and assuming that if an account is listed, it is currently healthy and fully functional for the agent workflow.
→
Always check list_incidents first. This tool provides the definitive status of active service disruptions, letting you know if the service account is blocked or under maintenance.
Over-relying on single-point checks
Only checking list_ai_monitors to see if an alert is active, and ignoring whether the underlying service is actually experiencing a performance dip.
→
Run query_metrics to get the real-time performance data (latency, tokens). Then, use create_monitor to make sure that specific metric is watched, not just that a generic monitor exists.
When It Fits, When It Doesn't
Use this if you need to answer operational questions about how your AI systems are running—specifically costs, performance, or failures. You need to track metrics (like tokens or latency) or audit specific prompts.
Don't use this if your goal is simply to manage user permissions or billing records outside of AI usage. For those, a general cloud billing tool is better. Also, if you only need to check for generic system health (e.g., 'is the database up?'), list_incidents is often sufficient. But if you need AI-specific metrics, this server is mandatory.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Datadog. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Tracking LLM usage shouldn't feel like digging through a dumpster fire.
Before this server, checking model usage meant logging into Datadog, navigating to the LLM Observability dashboard, and manually running queries for tokens, latency, and provider costs. It was a multi-step process involving context switching, copy-pasting time ranges, and fighting slow dashboard loading times.
Now, your agent runs `query_metrics` directly. You just ask: 'What was the average token usage for GPT-4 over the last hour?' and the agent gives you the number, the spike time, and the latency, right in the chat window. No clicks, no dashboards, just answers.
Datadog AI MCP Server: Audit model performance and spending.
You no longer have to wait for a manual report to understand why costs jumped. You can use `search_llm_spans` to pull the full prompt and response trace for every failure, or run `list_incidents` to see if a service outage is the root cause of the poor performance.
The difference is context. You move from seeing a red alert on a graph to seeing the exact line of code, the exact prompt, and the exact service failure that caused the alert. It’s immediate, traceable, and actionable.
Common Questions About Datadog AI MCP
How do I check token usage with the Datadog AI MCP Server? +
Run query_metrics using the appropriate metric name. You specify the metric, time range, and aggregation function, and the agent returns the average token usage and latency.
What is the purpose of the search_llm_spans tool? +
It lets you search for specific LLM spans using a JSON payload. This is how you retrieve the full prompt and response traces needed for deep debugging.
How can I list active AI monitors using list_ai_monitors? +
Simply ask the agent to run list_ai_monitors. It returns a list of all existing monitors and their current status (e.g., 'Alert' or 'OK').
Does the Datadog AI MCP Server help with billing? +
Yes. Use list_dashboards to enumerate dashboards that graph global AI expenses across providers, helping you track your spending patterns.
What happens if my agent workflow fails? Should I use list_incidents? +
Yes. Running list_incidents checks for active service disruptions. This tells you if the failure is due to a system-wide outage, rather than a bad prompt or bad code.
How do I use `list_service_accounts` to manage agent permissions? +
The list_service_accounts tool shows active accounts, letting you verify which service credentials your agent is using. This is crucial for security audits and ensuring your agent has the minimum permissions it needs to run.
Can I use `query_metrics` to track specific performance data for a model? +
Yes, query_metrics lets you track high-precision telemetry like token counts and latency directly. You can query specific metrics, such as datadog.llm_observability.tokens, to get granular performance data.
What does the `search_llm_spans` tool actually find in my logs? +
The search_llm_spans tool retrieves detailed APM payload contents, giving you access to literal prompt logic and response traces. This lets you pinpoint exactly where an LLM interaction failed or performed unexpectedly.
Can my agent check token usage for a specific LLM model? +
Yes. Use the 'query_metrics' tool with a query like 'avg:datadog.llm_observability.tokens{model:gpt-4}'. The agent will retrieve the numeric timeseries data directly from Datadog's metrics engine.
How do I search for specific prompt text in my logs? +
Use the 'search_llm_spans' tool. Provide a search query matching your prompt identifiers. The agent will pull the explicit REST maps capturing the literal prompt logic text from your Datadog logs.
Can I see if there are any active incidents affecting my AI services? +
Absolutely. The 'list_incidents' tool tracks outages and service disruptions in real-time. This allows your agent to identify exactly which external factors might be blocking your multi-agent orchestration pipelines.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
AssemblyAI
Transcribe and audit audio — manage speech-to-text jobs via AI.
Redis Vector
Equip your AI to autonomously manage embeddings, run KNN similarity searches, and administrate vector indexes natively inside your Redis stack.
Midjourney
AI image generation — create, upscale, vary, and blend images using Midjourney's Imagine API.
You might also like
QStash (Serverless Message Queue)
Manage serverless messaging, task scheduling, and webhooks via Upstash QStash — publish messages, manage queues, and handle DLQs directly.
PractiTest
Bring your end-to-end QA management to your AI — list tests, instances, test sets, requirements, and trace logical software defects natively.
HRBlade
Streamline recruitment with an ATS that manages job postings, candidate pipelines, and interview scheduling for growing teams.