#Sre MCP Servers
Discover 8 MCP servers tagged with Sre on the Vinkius App Catalog.
Prometheus
14 toolsMonitor your infrastructure with Prometheus. Run PromQL queries, analyze metrics, and manage time-series data directly from your AI agent.
Opsgenie
11 toolsAutomate incident management via Opsgenie. Manage alerts, track on-calls, and coordinate incidents directly from any AI agent.
Better Stack MCP
10 toolsMonitor uptime and incidents via Better Stack. List monitors, heartbeats, and on-call schedules directly from any AI agent.
Incident.io MCP
10 toolsManage incidents, roles, and on-call schedules via Incident.io API.
Reversibility Architect Prover
1 toolsLLMs suggest irreversible architectural changes. This engine is a 6-pivot cognitive trap that forces the agent to map data rollbacks, blast radius, and canary deployments before executing.
Incident Postmortem Prover MCP
1 toolsMost postmortems fail: vague timelines, symptom-level root causes, and action items with no owner. This tool forces SRE-grade rigor: minute-by-minute timeline reconstruction, systemic 5-Whys analysis, root cause isolation, accountable action items with owners and deadlines, and historical pattern detection.
Kubernetes Architecture Prover MCP Server
1 toolsAn AI generated Kubernetes manifests for a payment service. No resource requests or limits. No PodSecurityStandards. Single replica, no PDB. Zero NetworkPolicies. Every pod could reach every other pod. The payment pod got OOM-killed at 3 AM by a logging sidecar with no memory ceiling. This tool forces resource governance, security hardening, reliability design, observability instrumentation, and network restriction on every workload.
Migration Strategy Prover MCP Server
1 toolsAn AI recommended a big-bang database migration over the weekend. No dependency map — 7 services read from that database. No rollback plan — 'just restore from backup.' No data validation — 2.3 million records with timezone-dependent timestamps. The migration ran Saturday at 2 AM. By 4 AM, 3 downstream services were returning stale data, the backup was 6 hours old, and 14,000 customer records had corrupted timestamps. Monday morning: 72-hour incident. This tool forces risk assessment, rollback definition, data integrity verification, cutover planning, and stakeholder alignment.