# Better Stack MCP for AI Agents MCP

> Better Stack automates incident response and uptime monitoring. Connect your Better Stack account to any AI client so it can act as a Level 1 SRE, diagnosing downtime, checking monitor status, and managing escalations through natural conversation.

## Overview
- **Category:** ship-it
- **Price:** Free
- **Tags:** incident-management, on-call-scheduling, downtime-alerts, escalation-chains, sre-automation

## Description

This MCP lets your AI agent function like an on-call engineer, letting you manage critical system alerts without leaving your chat window. Instead of jumping between monitoring dashboards and terminal sessions during an outage, your agent handles the initial triage. It can check which monitors are failing—whether it's a simple HTTP endpoint ping or a complex DNS probe. If there’s an active issue, the AI finds all firing incidents, checks their full timelines, and even lets you acknowledge them to stop the paging cascade. The agent also tracks who is currently scheduled on call so you know exactly who to talk to next. Connecting this through Vinkius means your preferred AI client can access hundreds of other tools too, giving you a complete operational picture right where you work.

## Tools

### list_monitors
Lists all uptime monitors configured in Better Stack.

### get_monitor
Retrieves specific details for one monitor, like its ping type or latency rules.

### list_incidents
Gets a list of all explicitly reported system incidents.

### get_incident
Fetches the full technical payload and timeline for a specific incident.

### acknowledge_incident
Changes the status of an ongoing incident to 'acknowledged' in Better Stack, which stops automated paging notifications.

### resolve_incident
Forces a specific reported outage into a resolved state within the monitoring system.

### list_heartbeats
Lists all passive tracking endpoints and cron jobs used to validate background worker status.

### get_heartbeat
Pulls explicit technical details about a single, passive heartbeat check.

### list_status_pages
Lists all configured public-facing status pages for your global services.

### list_on_call
Retrieves the current on-call team schedules and routing calendars.

## Prompt Examples

**Prompt:** 
```
I'm seeing a lot of alerts today. What active incidents are going on?
```

**Response:** 
```
**System Incident Report**

*   **ID #651239:** Critical API Down (Started 4 minutes ago)
    *   Status: Active, Escalating via Phone Call.
    *   Action: Would you like me to acknowledge this alert locally?
*   **ID #7801:** Database Connection Fluctuation
    *   Status: Resolved. Acknowledged at 9/3/24.
```

## Capabilities

### List all active monitors
Retrieves a comprehensive list of every uptime monitor configured in Better Stack.

### Get specific monitor details
Fetches the full technical definition and status for any single monitoring check, like HTTP pings or latency rules.

### List current system incidents
Provides a list of all known or active outages currently recorded in Better Stack.

### Inspect incident timelines
Pulls the full, detailed history and root cause payload for a specific reported outage.

### Acknowledge an ongoing alert
Flags an active system incident as acknowledged in Better Stack, temporarily stopping automated paging alerts.

### Force resolve an issue
Manually sets a specific reported outage to a resolved state within the monitoring platform.

### Check scheduled workers
Lists all passive heartbeats and background worker checks to ensure core system processes are running correctly.

### List status pages
Reads the configuration details for public-facing status dashboards across your global infrastructure.

### View on-call schedules
Retrieves current active shifts and team rotation calendars to identify who is responsible right now.

## Use Cases

### The API endpoint is throwing 502 errors.
A developer asks their agent, 'Why is our main API failing?' The agent responds by listing monitors and uses get_monitor to pinpoint the exact HTTP endpoint ping that's failing. It then checks the incident timeline (get_incident) to find the root cause in minutes.

### It's 3 AM and nothing is working.
The on-call engineer asks, 'Who should I talk to about this downtime?' The agent uses list_on_call to confirm John Doe is currently paged in Level 1. This saves them from wasting time checking old rotation schedules.

### We need to update the incident report.
An engineering manager asks, 'What happened with Incident #8012?' The agent retrieves the detailed timeline payload using get_incident, providing all necessary logs and status changes for a comprehensive post-mortem.

### We have background workers that might be failing.
A backend developer asks, 'Check our cron jobs.' The agent calls list_heartbeats, verifying that all passive tracking endpoints are reporting green. If one is down, they get the specific details via get_heartbeat to start patching.

## Benefits

- Stop switching context during an outage. Your agent handles incident triage—listing active incidents or inspecting monitor details—all without you leaving the chat.
- Quickly determine who needs to be paged. You can pull active shifts and team schedules using the list_on_call tool, so you instantly know who's responsible for a failing system component.
- Reduce alert fatigue and noise. By acknowledging an incident (acknowledge_incident) or forcing it resolved (resolve_incident), your agent controls the severity of notifications automatically.
- Deep dive into failures without logging in. Need to audit why something broke? Use get_incident to pull the full technical timeline payload for any reported issue.
- Verify background services immediately. The list_heartbeats tool lets you check passive tracking endpoints, ensuring critical workers haven't silently failed.
- Maintain public transparency while debugging privately. You can read configured status pages (list_status_pages) to ensure customers see accurate information regardless of the current incident.

## How It Works

The bottom line is that it gives your AI client a single, conversational point of entry to manage complex operational alerts across multiple systems.

1. Connect your Better Stack API token through Vinkius, granting your AI client the necessary permissions.
2. Tell your agent what you need: for instance, 'Show me all active incidents' or 'Who is on call today?'
3. The agent runs the appropriate tool and delivers the actionable data directly in conversation.

## Frequently Asked Questions

**How can I use the Better Stack MCP for AI Agents to check if my service is down?**
You ask your agent to list all monitors. It will give you a clean breakdown of every single uptime check, showing which HTTP endpoints or DNS probes are currently failing so you know exactly where the problem lies.

**Can I use the Better Stack MCP for AI Agents to find out who is on call right now?**
Yes. Your agent uses the list_on_call tool to pull up the active rotation calendars. You'll get a clear name and role, so you don't waste time calling people outside their shift.

**What if I want to know why an old incident happened?**
You can ask your agent about specific incidents using the tool that retrieves the full timeline payload. It pulls all the historical data, including error codes and timestamps, so you have everything for a post-mortem report.

**Does the Better Stack MCP for AI Agents help me stop constant notifications?**
Absolutely. If an incident is confirmed or if it's already been addressed, your agent can acknowledge_incident or resolve_incident to manage the alert status and prevent unnecessary paging.

**Does this MCP work for multiple services? Like my front end AND back end?**
Yes. It manages all monitors across your entire fleet, whether they are tracking HTTP endpoints or background cron heartbeats. You get one view of everything connected to Better Stack.