# Prometheus MCP

> Prometheus MCP lets you talk to your monitoring system. Instead of building dashboards or running complex PromQL in a terminal, just ask your agent about service health, historical trends, and resource usage. It gives you instant access to time-series data analysis from right inside your chat window.

## Overview
- **Category:** loved-by-devs
- **Price:** Free
- **Tags:** prometheus, promql, metrics, observability, monitoring, sre

## Description

Running an infrastructure check used to mean jumping between Grafana, the command line, and documentation pages just to answer one simple question. Now, connect your Prometheus instance via this MCP, and treat your monitoring stack like a conversation. You simply ask your agent about performance—whether you need to know the average CPU usage over the last hour or if a specific service is currently failing. It handles the complex PromQL required for instant queries and historical range data retrieval automatically.

This connection doesn't just display numbers; it translates raw metrics into actionable insights using natural language. If you’re working with other monitoring systems, you'll appreciate this focused approach to observability. By connecting through Vinkius, your agent accesses a dedicated stream of system metrics and configurations, letting you bypass manual dashboard building entirely. You get the power of an SRE or DevOps engineer talking directly to you.

## Tools

### clean_tombstones
Removes deleted data entries from disk, requiring admin permissions to run.

### delete_series
Deletes specific time series data within a defined range, also requires admin permissions.

### get_label_values
Retrieves every unique value associated with a specified label name.

### get_labels
Lists all available labels attached to the metrics in your environment.

### get_metadata
Pulls detailed metadata about a metric, including its unit and type, scraped from monitored targets.

### query_range
Evaluates a PromQL expression to show how metrics have changed over an extended period of time.

### query
Runs a PromQL query to get the metric value at one specific moment in time.

### find_series
Locates all time series data that match your specified label selectors.

### create_snapshot
Creates a complete snapshot of all current metric data, requiring admin permissions.

### get_status_buildinfo
Retrieves general build information about the Prometheus instance itself.

### get_status_config
Displays the currently loaded YAML configuration settings for the monitoring stack.

### get_status_flags
Shows all configured flag values set within Prometheus.

### get_status_runtimeinfo
Provides general runtime details and operational information about the monitoring service.

### get_status_tsdb
Retrieves cardinality statistics for the Time Series Database (TSDB).

## Prompt Examples

**Prompt:** 
```
Run an instant query for 'up' to see which targets are currently reachable.
```

**Response:** 
```
I've executed the `query`. All 5 targets are currently returning a value of 1, meaning they are all 'up' and healthy.
```

**Prompt:** 
```
Show me the average CPU usage for the last 30 minutes using query_range.
```

**Response:** 
```
I've fetched the range data. The average CPU usage across your nodes peaked at 45% about 10 minutes ago and has since stabilized at 20%.
```

**Prompt:** 
```
What is the metadata for the metric 'http_requests_total'?
```

**Response:** 
```
Using `get_metadata`, I found that 'http_requests_total' is a Counter metric. It tracks the total number of HTTP requests received, partitioned by status code and method.
```

## Capabilities

### Analyze historical performance trends
Retrieve complex metric expressions over specific time windows using PromQL.

### Find current system status
Get instant readings of metrics at a single point in time, like checking if a target is currently up or down.

### Inspect data structure and labels
Discover all available labels for a metric or retrieve metadata to understand what the units and types are.

### Manage time-series data (Admin)
Perform administrative tasks like creating snapshots of current metrics or cleaning up old, deleted data entries.

## Use Cases

### Diagnosing a sudden latency spike
The agent needs to know why requests slowed down. The user asks, 'Show me the average request duration over the last 15 minutes.' The MCP executes `query_range`, providing a clear trend graph and identifying the exact time period when performance dropped.

### Verifying service readiness during deployment
Before deploying code, the developer asks, 'Is the user authentication endpoint currently returning 200 OK?' The MCP runs `query` to check the current status of that specific metric, confirming system availability immediately.

### Investigating resource leak patterns
The team needs to determine if memory usage is slowly creeping up. They ask for a historical trend over three days using `query_range`, allowing the agent to analyze the data and pinpoint potential memory leaks.

### Checking monitoring stack integrity
A platform team member needs to verify if the Prometheus configuration has changed. They use the MCP's status tools (like `get_status_config`) to pull the YAML settings and confirm compliance with internal standards.

## Benefits

- Stop context switching. Instead of navigating to a separate dashboard tool just to check latency, you ask your agent in the chat, and it executes the necessary query instantly. You get answers without leaving your workspace.
- Understand metrics deeply using `get_metadata`. If you aren't sure what 'http_requests_total' measures or if it's a counter or gauge, run this tool to get clear documentation on units and types.
- Audit system health with status tools. Use the MCP to check the loaded configuration (`get_status_config`) or review runtime information (`get_status_runtimeinfo`) without needing SSH access to the server.
- Troubleshoot historical issues using `query_range`. Need to know if CPU usage spiked last Tuesday? Specify a time window and get the full trend data back in plain English.
- Administer your monitoring stack safely. If you need to clean up old, unneeded metrics or create a specific snapshot of current data, use tools like `clean_tombstones` or `create_snapshot` through simple prompts.

## How It Works

The bottom line is that your monitoring data becomes available through natural language prompts, eliminating the need for manual dashboard construction.

1. You subscribe to the MCP and provide your Prometheus server URL and any necessary authentication tokens.
2. Your AI client recognizes the connection and lets you ask questions about system health or performance metrics in plain English.
3. The agent translates your request into the correct PromQL query, executes it against the live data, and presents a clear, conversational answer.

## Frequently Asked Questions

**How do I query Prometheus metrics using the Prometheus MCP?**
You ask your agent a question in plain English. The agent interprets your request, builds the necessary PromQL expression, and runs the query for you.

**Can I use the Prometheus MCP to check service uptime?**
Yes, you can. You simply ask the agent about the 'up' metric for a specific target or service to see if it is currently reporting a value of 1 (healthy).

**What does `get_metadata` do in the Prometheus MCP?**
`get_metadata` lets you check what a metric actually measures. It tells you things like whether the data is counted, measured, or if it's partitioned by status code.

**Is `query_range` different from `query` in this MCP?**
`query` provides a single point-in-time value. You use `query_range` when you need to see how the metric changed over an entire period of time, giving you a trend.

**Do I need admin rights for all tools in the Prometheus MCP?**
No. Basic querying and metadata retrieval are standard. However, administrative actions like `create_snapshot` or `delete_series` require elevated permissions.