Datadog AI MCP. Audit model usage, track costs, and manage incidents.

Q: How do I check token usage with the Datadog AI MCP Server?

Run querymetrics using the appropriate metric name. You specify the metric, time range, and aggregation function, and the agent returns the average token usage and latency.

Q: How can I list active AI monitors using listaimonitors?

Simply ask the agent to run listaimonitors. It returns a list of all existing monitors and their current status (e.g., 'Alert' or 'OK').

Q: Does the Datadog AI MCP Server help with billing?

Yes. Use listdashboards to enumerate dashboards that graph global AI expenses across providers, helping you track your spending patterns.

Q: What happens if my agent workflow fails? Should I use listincidents?

Yes. Running listincidents checks for active service disruptions. This tells you if the failure is due to a system-wide outage, rather than a bad prompt or bad code.

Q: How do I use listserviceaccounts to manage agent permissions?

The listserviceaccounts tool shows active accounts, letting you verify which service credentials your agent is using. This is crucial for security audits and ensuring your agent has the minimum permissions it needs to run.

Q: Can I use querymetrics to track specific performance data for a model?

Yes, querymetrics lets you track high-precision telemetry like token counts and latency directly. You can query specific metrics, such as datadog.llmobservability.tokens, to get granular performance data.

Q: What does the searchllmspans tool actually find in my logs?

The searchllmspans tool retrieves detailed APM payload contents, giving you access to literal prompt logic and response traces. This lets you pinpoint exactly where an LLM interaction failed or performed unexpectedly.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Datadog AI (LLM Observability) MCP Server lets your agent monitor and audit LLM performance metrics. Track token usage, query model latency, and audit prompts directly from your AI workflow.

List monitors, check incidents, and analyze global AI expenses—all without leaving your dev environment.

What your AI agents can do

Create event

Creates a new event record by inspecting deep internal arrays related to plan math.

Create monitor

Creates a new monitoring alert by extracting rich churn flags and explicit validations.

List ai monitors

Retrieves a list of existing AI monitors by checking cloud logging and vault limits.

+ 7 more capabilities included

Analyze Token and Latency Metrics

Query specific LLM timeseries (like datadog.llm_observability.tokens) to get average usage and performance data for your agent.

Search Specific LLM Prompts and Traces

Retrieve detailed payload contents, allowing you to search through exact prompts and response traces for debugging.

Manage AI Alert Monitors

List existing AI monitors or create new ones to automatically alert when model performance dips below required thresholds.

Report Global AI Spending

Get a list of dashboards showing aggregate AI expenses across different providers (OpenAI, Anthropic, etc.).

Track Active Service Outages

Check for current service disruptions or active incidents that could break your multi-agent workflow.

View Model Deployment History

Pull a timeline of deployment marks to see exactly when the LLM model used by your agent was switched.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Datadog AI MCP Server: 10 Tools for Observability

These tools give your agent direct access to Datadog's LLM observability data, allowing you to query metrics, search traces, and manage monitoring alerts.

create019d7581

create event

Creates a new event record by inspecting deep internal arrays related to plan math.

create019d7581

create monitor

Creates a new monitoring alert by extracting rich churn flags and explicit validations.

list019d7581

list ai monitors

Retrieves a list of existing AI monitors by checking cloud logging and vault limits.

list019d7581

list dashboards

Enumerates available dashboards by exporting active billing rules.

list019d7581

list events

Identifies active arrays spanning native gateway authentication deployment marks.

list019d7581

list incidents

Dispatches an automated validation check to retrieve explicit gateway history and active outages.

list019d7581

list service accounts

Identifies active arrays spanning native hold parsing for service accounts.

query019d7581

query metrics

Queries specific metrics, such as `datadog.llm_observability.tokens`, within the Headless Datadog Platform.

search019d7581

search llm spans

Searches for LLM spans by providing a JSON payload to find specific customer bindings.

submit019d7581

submit series

Performs structural extraction of properties driving active account logic.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Datadog AI (LLM Observability), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Yo, this Datadog AI MCP Server lets your agent keep a tight grip on LLM performance. You can track everything—token usage, query latency, and even audit prompts—right from your dev environment. It’s basically a whole suite for keeping your AI workflows running clean.

Analyze Token and Latency Metrics

Your agent can query specific LLM timeseries, like datadog.llm_observability.tokens, so you get the average usage and performance data you need. You can also run query_metrics to grab specific metrics within the Headless Datadog Platform.

Search Specific LLM Prompts and Traces

Need to debug a specific prompt or trace? Your agent can search_llm_spans by giving it a JSON payload, letting you hunt down exact customer bindings. If you need to look at raw data, your agent can list_events to identify active arrays spanning native gateway authentication deployment marks, or it can search_llm_spans to find specific customer bindings.

Manage AI Alert Monitors

Your agent can list_ai_monitors to see what AI monitors are already set up by checking cloud logging and vault limits, or it can use create_monitor to set up a new alert by extracting rich churn flags and explicit validations. If you want to check what dashboards are available, your agent can list_dashboards by exporting active billing rules.

Report Global AI Spending

To get a picture of your total AI spend, your agent can list_dashboards to enumerate available dashboards by exporting active billing rules. It can also submit_series to perform structural extraction of properties driving active account logic.

Track Active Service Outages

If you gotta know if something's broken, your agent can list_incidents to dispatch an automated validation check that retrieves explicit gateway history and active outages. It can also list_service_accounts to identify active arrays spanning native hold parsing for service accounts.

View Model Deployment History

To see when the LLM model used by your agent changed, your agent can list_events to identify active arrays spanning native gateway authentication deployment marks, or it can create_event to create a new event record by inspecting deep internal arrays related to plan math. It's all about knowing what's running and when it started running.

How Datadog AI MCP Works

1 Subscribe to the server and input your Datadog API Key, APP Key, and Site.
2 Your AI client uses the server's tools to run specific queries (e.g., query_metrics or list_incidents).
3 The server returns structured data—metrics, logs, or event lists—that your agent uses to answer questions or take action.

The bottom line is, your agent can talk to your live AI infrastructure data through natural conversation, getting metrics and alerts back immediately.

Who Is Datadog AI MCP For?

The ML Ops Engineer who needs to know why the model cost spiked last night. The SRE who gets paged at 2 AM because an agent workflow failed. The FinOps analyst tracking global LLM spend. If you build agents, you need this visibility.

MLOps Engineer

Audits prompt logs and traces AI model performance across different versions or checks search_llm_spans for specific errors.

SRE (Site Reliability Engineer)

Sets up alerts using create_monitor and checks list_incidents to track outages affecting agent workflows.

FinOps Analyst

Analyzes dashboards via list_dashboards to graph and track global AI infrastructure expenses and usage patterns.

What Changes When You Connect

See exactly how much an agent costs. Use query_metrics to pull token counts and latency data for specific models, so you know if the cost spike was due to usage or inefficiency.
Debug model failures instantly. Use search_llm_spans to retrieve the full payload, letting you examine the exact prompt logic and response traces when something goes wrong.
Get ahead of outages. Use list_incidents to check for active service disruptions, preventing your multi-agent workflow from failing silently.
Manage alerts with create_monitor. Set thresholds that automatically flag when an AI response rate drops or when token usage plateaus, so you don't miss performance dips.
Understand your budget. Run list_dashboards to see widgets that graph global AI expenses across providers like OpenAI or Anthropic, keeping FinOps happy.
Know the history. Use list_events to pull deployment marks, telling you precisely when the LLM model was switched, which is key for root cause analysis.

Real-World Use Cases

Investigating a Sudden Cost Spike

The FinOps team noticed the OpenAI bill was 30% higher than usual. They ask their agent to run list_dashboards to graph global AI expenses. The agent finds the spike correlates with a specific model version, which they then confirm using list_events to pull the exact deployment mark.

Debugging Agent Logic Failure

An agent fails to complete a task. The developer asks the agent to run search_llm_spans using the error time. The agent returns the full payload, showing the exact prompt and response trace that caused the failure, allowing immediate debugging.

Monitoring Critical Service Health

An SRE needs to know if the multi-agent orchestration is currently blocked. They ask the agent to run list_incidents. If an active outage is reported, they know to pause the workflow and fix the underlying infrastructure.

Verifying Model Performance Drift

The MLOps team suspects the model is degrading. They ask the agent to list_ai_monitors to check existing alerts. If none exist, they use create_monitor to set up a new alert that triggers if the LLM latency goes above 1 second.

The Tradeoffs

Filtering Raw Logs Manually

Manually downloading large log files and running regex searches to find all 'token usage' or 'error' messages. This is slow, misses trends, and requires massive data handling.

→ Use query_metrics to get aggregate token counts, or use search_llm_spans to filter logs down to specific prompts and response traces. Never try to manually correlate massive log dumps.

Guessing Service Status

Calling list_service_accounts and assuming that if an account is listed, it is currently healthy and fully functional for the agent workflow.

→ Always check list_incidents first. This tool provides the definitive status of active service disruptions, letting you know if the service account is blocked or under maintenance.

Over-relying on single-point checks

Only checking list_ai_monitors to see if an alert is active, and ignoring whether the underlying service is actually experiencing a performance dip.

→ Run query_metrics to get the real-time performance data (latency, tokens). Then, use create_monitor to make sure that specific metric is watched, not just that a generic monitor exists.

When It Fits, When It Doesn't

Use this if you need to answer operational questions about how your AI systems are running—specifically costs, performance, or failures. You need to track metrics (like tokens or latency) or audit specific prompts.

Don't use this if your goal is simply to manage user permissions or billing records outside of AI usage. For those, a general cloud billing tool is better. Also, if you only need to check for generic system health (e.g., 'is the database up?'), list_incidents is often sufficient. But if you need AI-specific metrics, this server is mandatory.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Datadog. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_event create_monitor list_ai_monitors list_dashboards list_events list_incidents list_service_accounts query_metrics search_llm_spans submit_series

Tracking LLM usage shouldn't feel like digging through a dumpster fire.

Before this server, checking model usage meant logging into Datadog, navigating to the LLM Observability dashboard, and manually running queries for tokens, latency, and provider costs. It was a multi-step process involving context switching, copy-pasting time ranges, and fighting slow dashboard loading times.

Now, your agent runs `query_metrics` directly. You just ask: 'What was the average token usage for GPT-4 over the last hour?' and the agent gives you the number, the spike time, and the latency, right in the chat window. No clicks, no dashboards, just answers.

Datadog AI MCP Server: Audit model performance and spending.

You no longer have to wait for a manual report to understand why costs jumped. You can use `search_llm_spans` to pull the full prompt and response trace for every failure, or run `list_incidents` to see if a service outage is the root cause of the poor performance.

The difference is context. You move from seeing a red alert on a graph to seeing the exact line of code, the exact prompt, and the exact service failure that caused the alert. It’s immediate, traceable, and actionable.

Common Questions About Datadog AI MCP

How do I check token usage with the Datadog AI MCP Server? +

Run query_metrics using the appropriate metric name. You specify the metric, time range, and aggregation function, and the agent returns the average token usage and latency.

What is the purpose of the search_llm_spans tool? +

It lets you search for specific LLM spans using a JSON payload. This is how you retrieve the full prompt and response traces needed for deep debugging.

How can I list active AI monitors using list_ai_monitors? +

Simply ask the agent to run list_ai_monitors. It returns a list of all existing monitors and their current status (e.g., 'Alert' or 'OK').

Does the Datadog AI MCP Server help with billing? +

Yes. Use list_dashboards to enumerate dashboards that graph global AI expenses across providers, helping you track your spending patterns.

What happens if my agent workflow fails? Should I use list_incidents? +

Yes. Running list_incidents checks for active service disruptions. This tells you if the failure is due to a system-wide outage, rather than a bad prompt or bad code.

How do I use `list_service_accounts` to manage agent permissions? +

The list_service_accounts tool shows active accounts, letting you verify which service credentials your agent is using. This is crucial for security audits and ensuring your agent has the minimum permissions it needs to run.

Can I use `query_metrics` to track specific performance data for a model? +

Yes, query_metrics lets you track high-precision telemetry like token counts and latency directly. You can query specific metrics, such as datadog.llm_observability.tokens, to get granular performance data.

What does the `search_llm_spans` tool actually find in my logs? +

The search_llm_spans tool retrieves detailed APM payload contents, giving you access to literal prompt logic and response traces. This lets you pinpoint exactly where an LLM interaction failed or performed unexpectedly.

Can my agent check token usage for a specific LLM model? +

Yes. Use the 'query_metrics' tool with a query like 'avg:datadog.llm_observability.tokens{model:gpt-4}'. The agent will retrieve the numeric timeseries data directly from Datadog's metrics engine.

How do I search for specific prompt text in my logs? +

Use the 'search_llm_spans' tool. Provide a search query matching your prompt identifiers. The agent will pull the explicit REST maps capturing the literal prompt logic text from your Datadog logs.

Can I see if there are any active incidents affecting my AI services? +

Absolutely. The 'list_incidents' tool tracks outages and service disruptions in real-time. This allows your agent to identify exactly which external factors might be blocking your multi-agent orchestration pipelines.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript