Datadog MCP. Query metrics, search logs, and check alerts from chat.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Datadog connects your AI agent directly to your infrastructure monitoring stack. Query performance metrics, search logs for specific errors, and check system monitor status using natural conversation.
You get real-time visibility into application health without opening a dashboard.
What your AI agents can do
Get dashboard
Retrieves all widget configurations, template variables, and layout structures for a specific dashboard visualization.
Get monitor
Pulls the notification settings, threshold values, and historical status changes for a single monitor ID.
List dashboards
Lists available dashboards by title, type (timeboard/screenboard), and direct access URL.
List all configured monitors and check their operational state (alert, warning, ok) in a single command.
Query historical metrics data within a defined UNIX timestamp range to track usage patterns like CPU or latency.
Search massive log collections using query syntax to find entries matching error codes, user IDs, or service names.
List scheduled downtimes and scope tags so you know when the system is legitimately offline for planned work.
Get a list of all monitored hosts, including agent versions and which cloud provider they report to.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Datadog MCP Server: 11 Tools for Observability
Analyze metrics, search logs, check monitor status, and manage infrastructure data using these eleven specific tools.
019d7581get dashboard
Retrieves all widget configurations, template variables, and layout structures for a specific dashboard visualization.
019d7581get monitor
Pulls the notification settings, threshold values, and historical status changes for a single monitor ID.
019d7581list dashboards
Lists available dashboards by title, type (timeboard/screenboard), and direct access URL.
019d7581list downtimes
Returns known planned maintenance periods, including scope tags and current status.
019d7581list events
Gathers a collection of recent system events, such as alerts or deployment completions, along with their priority level.
019d7581list hosts
Provides metadata for all monitored infrastructure hosts, showing agent version and cloud provider tags.
019d7581list monitors
Filters the list of monitors by state (alert, warn, ok) to quickly see what's currently broken or needs attention.
019d7581list slos
Retrieves Service Level Objective definitions, showing target percentages and current compliance status for defined services.
019d7581mute monitor
Temporarily silences an alert monitor, setting a specific expiration time for the silence period.
019d7581query metrics
Fetches time-series data (like CPU usage or request counts) across multiple dimensions within a specified historical timeframe.
019d7581search logs
Searches through structured application logs using query syntax, returning entries with timestamps and status levels.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Datadog, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
You're connecting your AI client straight into your infrastructure monitoring stack. This isn't some dashboard you gotta open up and click through; it's direct, conversational access to real-time application health data. You can query performance metrics, search logs for errors, and check system monitor status without the usual headache of building complex queries.
Checking System Health Status
The list_monitors tool lets you quickly see what's running rough right now. It filters all your configured monitors by their operational state—you can immediately pull up a list showing only alerts, warnings, or green 'ok' statuses. You don't have to scroll through hundreds of rules just to find the broken ones.
Once you identify an issue using list_monitors, you can drill down with the get_monitor tool. This pulls all the nitty-gritty details for a single monitor ID, including its current notification settings, what its specific threshold values are set to, and a full history of every status change it’s reported over time.
If an alert is annoying you right now but you know it'll clear up in twenty minutes, you can use mute_monitor to temporarily silence the alarm for a fixed amount of time.
Analyzing Performance Trends Over Time
The query_metrics tool lets you pull time-series data—think CPU usage or total request counts—across multiple dimensions. You specify a historical timeframe using defined UNIX timestamps, and it spits out the raw performance data for that window. This helps you track patterns like load spikes or latency creep over days or weeks.
To get a bird's-eye view of your metrics setup, use get_dashboard. It retrieves all widget configurations, template variables, and the whole layout structure for any dashboard visualization, letting your AI client know exactly what data points are available to analyze. Meanwhile, if you just need to see what dashboards exist in the system—maybe you can't find where that key metric is displayed—list_dashboards shows all available boards by title, type (like a timeboard or screenboard), and even gives you the direct access URL for each one.
Finding Specific Errors in Application Logs
The search_logs tool lets your agent search through massive collections of structured application logs using query syntax. You can find entries matching specific error codes, user IDs, or particular service names by specifying the criteria. The results return timestamps and status levels right alongside the log entry.
For a broader view of what's happening system-wide, list_events gathers recent system events—this includes everything from high-priority alerts to successful deployment completions—along with their defined priority level. If you need to know about planned outages so you don't freak out when the system goes down, list_downtimes returns known maintenance periods, including scope tags and the current status of that scheduled work.
Auditing Infrastructure Metadata and Compliance
You can audit every machine connected by using the list_hosts tool. This gives you metadata for every monitored host, showing things like which agent version they're running or what cloud provider they report to—super useful if you gotta check compliance across multiple environments.
To see how well your services are actually performing against internal goals, use list_slos. It retrieves Service Level Objective definitions, letting you check target percentages and the current compliance status for every service you've defined. Finally, you can get a quick rundown of all active monitors using list_monitors to keep track of what needs attention.
How Datadog MCP Works
- 1 Connect the Datadog integration to your AI client.
- 2 Authorize access using your Datadog API Key, APP Key, and Site URL.
- 3 Ask your agent a question: 'What was the average request latency for Service X over the last 4 hours?'
The bottom line is you talk to it like you're talking to an on-call teammate, and the agent runs all the necessary API calls in the background.
Who Is Datadog MCP For?
The DevOps Engineer who's tired of clicking through 5 different dashboards at 2 AM. The SRE needing quick visibility into active alerts during an incident. Or any developer who just wants to know why the user complained about slowness, without writing a single query.
Uses list_monitors and query_metrics during an active incident response. They need to quickly correlate a spike in latency with the state of other related services.
Checks system health by running list_hosts or auditing monitor configurations (get_monitor) without ever leaving their terminal or IDE.
Uses search_logs to track down a specific user-reported bug. They provide the error message, and the agent finds the exact stack trace from the last hour.
What Changes When You Connect
- Instantly audit the system's status. Instead of clicking through dashboards to check if a service is up, use
list_monitorsorget_monitorto confirm health in seconds. - Pinpoint failure sources with precision. Use
search_logsto filter logs by ISO boundary mappings and identify exactly which error code triggered the issue, saving hours of manual log review. - Track performance over time. The
query_metricstool lets you pull full time-series data for CPU usage or latency across any host ID, letting you spot subtle degradation patterns that are buried in massive datasets. - Manage alerts without leaving your flow. If a critical service needs temporary silence due to maintenance, the agent can run
mute_monitor, handling the API interaction instantly. - Understand scope and dependencies. Use
list_hoststo get host metadata—agent version, tags, cloud provider—so you know exactly what infrastructure is involved in the incident.
Real-World Use Cases
Diagnosing an intermittent spike in latency
A user reports slowness. You ask your agent to check metrics: 'Show me request latency for User Auth Service over the last 2 hours.' The agent uses query_metrics and finds a spike starting at 10:15 AM. Knowing when it started, you then use search_logs to filter logs around that precise timestamp range to find related database connection errors.
Auditing a suspicious monitor alert
A junior engineer flags an alert for 'High Disk Space.' Before escalating, they ask the agent to run get_monitor on that specific ID. The agent returns not only the threshold but also the historical change log and status history, confirming if it's a recurring false positive or a real issue.
Investigating why a deployment failed
Following a bad deploy, you ask your agent to list events (list_events). This shows the deployment marker. You then use search_logs with the deployment ID and status code '503' to immediately find all failing requests related to that specific release.
Planning for planned maintenance downtime
The team needs to know if a scheduled database upgrade will conflict with an ongoing alert. You use list_downtimes to check the schedule and then run list_monitors to confirm which services are actively monitored during that specific time window.
The Tradeoffs
Over-reliance on dashboards
Scrolling through a dashboard looking for the needle in the haystack. You see CPU is high, but you don't know if it was due to an application bug or a scheduled job.
→
Don't just look at the graph. Use query_metrics to isolate the CPU data and then use search_logs (filtering by time range) to find specific error messages that correlate with the spike, getting the root cause instead of just the symptom.
Guessing the scope
Opening three different tabs—one for metrics, one for logs, and one for hosts—and manually cross-referencing timestamps to figure out what went wrong.
→
Start by running list_hosts to get an authoritative list of all active nodes. Then, use the host tags you found to narrow your focus when calling both query_metrics and search_logs, keeping everything contained.
Assuming service status is constant
Thinking a service is fine because it's green now. You forget that the monitor itself might be misconfigured, leading to false positives or negatives.
→
Always verify the monitoring setup. Use list_monitors and then run get_monitor for any critical service to check its threshold values and current state history before trusting the color.
When It Fits, When It Doesn't
Use this MCP Server if your primary goal is correlation: finding a link between a metric change (e.g., latency spiking) AND an error log entry that happened at the exact same time, or determining which infrastructure component (list_hosts) caused the failure. This tool is best for deep Root Cause Analysis and incident response.
Don't use it if you just need to know general business metrics (like quarterly revenue). For those, a dedicated BI tool works better. If your pain point is merely navigating many tabs or running basic lookups without context, the agent helps. But if the actual root cause of failure is procedural—a human forgot to update a firewall rule—this tool will only tell you that the monitored system failed, not why.
If you need to check status and metrics for multiple services across different domains (e.g., AWS, Kubernetes, internal app), this combination of list_hosts, query_metrics, and search_logs is unmatched.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Datadog. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 11 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Context switching kills efficiency.
Today, diagnosing a failure means jumping between the Grafana dashboard to see metrics, then opening the ELK stack to search logs, and finally checking the CI/CD pipeline status. You spend half your time copying timestamps and service names into different panes just to correlate what happened.
With this MCP server, you give one command: 'Why did checkout fail last night?' The agent handles all the context switching—it hits `query_metrics` for latency, runs `search_logs` for errors, and checks if any monitors were alerting—and it gives you a single narrative answer.
Datadog MCP Server: Find specific service data in chat.
You no longer need to remember the exact API endpoint or query language syntax. You don't have to manually call `list_monitors` and then take that ID to use it elsewhere. The agent knows how all the pieces fit together.
It’s simple: you state the problem, and the system executes the entire diagnostic workflow for you.
Common Questions About Datadog MCP
How do I check if a service is currently alerting using list_monitors? +
Run list_monitors. This tool filters results by operational state, so you can immediately see which monitors are in the 'Alert' or 'Warning' status without having to manually filter the entire list.
What is the difference between query_metrics and search_logs? +
query_metrics handles numeric, time-series data (CPU %, request count). search_logs searches through unstructured text logs. You use them together to correlate when a metric spiked with what error message was written.
Can I find out what hosts are connected to the system using list_hosts? +
Yes, list_hosts provides metadata for all monitored infrastructure. It shows things like the agent version and which cloud provider attributes that host is reporting.
How do I check scheduled maintenance windows with list_downtimes? +
Use list_downtimes. This returns scope tags and recurring schedules, letting you quickly verify if the current degradation might be due to planned work rather than an outage.
I need to temporarily silence an alert; how do I use `mute_monitor`? +
You can use mute_monitor to set temporary silence periods on a specific monitor. This tool handles the alerting boundary, letting you mute it either until a set time or indefinitely.
What does `get_dashboard` tell me about visualization structures? +
get_dashboard resolves all widget configurations, template variables, and layout structures for a given dashboard. It’s useful when you need to understand exactly how a visual panel is built.
How do I check our service reliability goals using `list_slos`? +
list_slos returns defined Service Level Objectives (SLOs). You get target percentages, time windows, and the current compliance status for both metric-based and monitor-based goals.
I want to review historical incidents. How does `list_events` help? +
list_events returns a collection of events. You can pull titles, priority levels, and source identifiers from these records, which helps map out the timeline of an issue.
Can my agent query specific Datadog metrics using DDQL? +
Yes. Use the 'query_metrics' tool. Provide your DDQL query string and the target time range. The agent will fetch the numeric timeseries data directly from Datadog's telemetry datastores.
How do I search for a specific error message across my application logs? +
Use the 'search_logs' tool. Provide a query matching your error string and an ISO time boundary. The agent will retrieve the structural extraction of logs matching those parameters to help you identify failures.
Can I see which monitors are currently in an alert state? +
Absolutely. The 'list_monitors' tool allows you to filter by group state (e.g., 'alert,warn'). The agent pulls the explicitly configured system triggers to show you which services are currently unhealthy.
Multi-server workflows that include Datadog MCP
Get Instant Incident Alerts in Discord via MCP
Monitors fire, Discord gets the alert, the incident log updates itself , no human in the loop
MCP Recipe for Full-Stack Observability
Two monitoring tools, zero correlation , your Datadog alerts say 'high latency' and your Grafana dashboards say 'database connections maxed' but nobody connected the dots until the postmortem
MCP Recipe for Pre-Mortem System Analysis
Architecture red-teamed, failure modes quantified, monitoring alerts created , pre-mortem your system before production breaks it
MCP Servers for Cache Performance Monitoring
Your Redis cache has 47,000 keys but only 3,200 are ever accessed , the rest are ghosts from features you deleted 6 months ago, silently eating memory and money
MCP Servers for Monitored Deploy Orchestration
PR merged, deployment triggered, health check passed , and the deploy summary posted itself to the PR thread
MCP Servers to Find Your Most Expensive APIs
API traffic metered, cache savings calculated, origin load measured, cost projections generated , optimize your API infrastructure costs with data
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
S&P Global Commodity Insights
Access global commodity price assessments — crude oil, natural gas, power, coal, metals, petrochemicals, and agriculture via S&P Global Platts benchmarks.
Microsoft Ads
Connect Microsoft Ads to any AI agent via MCP.
Teambition
Collaborative project management platform by Alibaba — manage tasks, projects, and team workflows via AI.
You might also like
Zoho CRM Admin
Manage Zoho CRM users, roles, profiles, layouts, territories, and tags — complete admin control through conversation.
PagerDuty
Manage incidents, services, on-call schedules, and escalation policies via PagerDuty — trigger, acknowledge, and resolve alerts from any AI agent.
Storyblok
Connect your AI to Storyblok. Manage content spaces, craft data stories, and orchestrate headless CMS elements effortlessly.