# HyperDX MCP

> HyperDX (Open Source Observability) connects your agent directly to your infrastructure data. It lets you search logs, manage alert rules, and inspect dashboards using natural conversation. Stop jumping between tabs to correlate metrics with error streams; just ask your AI client for the full picture.

## Overview
- **Category:** devops-cicd
- **Price:** Free
- **Tags:** observability, logging, monitoring, alerts, dashboards

## Description

This MCP gives your agent deep visibility into your application performance and system health. You can query structured logs, review raw event data, and check dashboard trends without ever leaving your chat interface. Need to debug an issue? Your agent handles the complex queries—you just ask it for logs from a specific service or time window. Want to adjust monitoring? You manage alert rules directly through conversation, creating new alerts or deleting old ones. It works by letting you view all available dashboards and getting details on any one of them. Connecting your instance via Vinkius's catalog means you get this powerful observability layer integrated into whatever client you use.

## Tools

### list_alerts
Gets a list of all alert rules currently set up in your organization.

### create_alert
Sets up a new alert rule for when specific conditions are met.

### list_dashboards
Retrieves the names and details for every dashboard available to you.

### delete_alert
Removes an existing alert rule using its unique identifier.

### list_events
Gathers structured logs or spans from HyperDX using specific query filters.

### get_dashboard
Pulls the detailed metrics for a specific dashboard you name.

### list_logs
Retrieves a list of general application logs based on criteria like service name or error level.

## Prompt Examples

**Prompt:** 
```
List all logs with level:error for the auth service from the last hour.
```

**Response:** 
```
I've retrieved the logs for the auth service. I found 12 error entries in the last hour, mostly related to 'Invalid Token' exceptions. Would you like to see the full details of the most recent one?
```

**Prompt:** 
```
Create a new alert named 'High Error Rate' that triggers when errors exceed 50 in 5 minutes.
```

**Response:** 
```
Successfully created the alert 'High Error Rate'. It is configured to monitor the query 'level:error' and will trigger if the count exceeds 50 within a 5-minute interval.
```

**Prompt:** 
```
Show me all my available dashboards in HyperDX.
```

**Response:** 
```
I found 3 dashboards in your organization: 'System Health', 'API Performance', and 'User Analytics'. Which one would you like to inspect in detail?
```

## Capabilities

### Inspect system dashboards
Retrieve metrics, trends, and visualizations from all available organizational dashboards.

### Search application logs and events
Query massive volumes of structured logs or raw event spans using specific filters for debugging in real-time.

### Manage alert rules
List, create, or delete system alerts to keep track of performance regressions and service health.

## Use Cases

### Investigating a sudden performance drop
A developer notices slow response times. They ask their agent to 'Show me error logs for the checkout service from the last 2 hours.' The agent uses `list_logs` and immediately points out a spike in timeouts, solving the issue without needing to write complex queries.

### Auditing an incident response
An SRE needs to know what happened during last night's deployment. They ask their agent to 'List all events for service X from 1:00 AM to 2:00 AM.' The agent uses `list_events` and provides the full timeline of structured spans.

### Setting up better monitoring
A DevOps engineer wants to prevent future outages. They ask their agent, 'Create an alert if the error count exceeds 50 in the next 15 minutes.' The agent uses `create_alert` and configures the necessary rule.

### Reviewing system health before a call
A team lead needs to prepare for an incident meeting. They ask their agent to 'Show me all dashboards and get details on System Health.' The agent uses `list_dashboards` followed by `get_dashboard`, giving the full overview instantly.

## Benefits

- Stop context switching. You can get logs and dashboard metrics in one chat session, eliminating the need to jump between monitoring tools.
- Manage alerts directly from your agent using `list_alerts` or `create_alert`. Set thresholds for high error rates without touching a UI.
- Debugging is faster when you can query structured events via `list_events` and general logs via `list_logs`, all filtered by precise time ranges (like 1h or 24h).
- Quickly understand system health. Use `list_dashboards` first, then call `get_dashboard` to view deep metrics on the specific component you need.
- It's an SRE assistant that never sleeps. Your agent handles the complex querying of historical data, so you stay focused on fixing the problem.

## How It Works

The bottom line is you talk to it like an on-call engineer talking to a teammate, not like filling out a complex monitoring dashboard.

1. Subscribe to this MCP in Vinkius and input your HyperDX API Key.
2. Your AI client sends a request (e.g., 'Show me errors for the auth service').
3. The system processes the query against HyperDX data and returns the filtered logs, events, or dashboard metrics directly into your chat.

## Frequently Asked Questions

**How do I find general errors using list_logs?**
Use the agent to call `list_logs` and include specific query filters. You can filter by level:error or service:auth, for example, to narrow down results immediately.

**Can I see all available dashboards using list_dashboards?**
Yes, calling `list_dashboards` retrieves a complete inventory of every dashboard in the organization. After you get the name, you can use `get_dashboard` to inspect its metrics.

**How do I manage my alerts using create_alert?**
You tell your agent to execute `create_alert`. You provide parameters like the query and the threshold (e.g., 'errors exceed 50 in 5 minutes'), and it handles setting up the rule.

**Is list_events different from list_logs?**
Yes, `list_logs` retrieves general application logs based on simple search criteria. However, `list_events` pulls structured events or spans, which are usually more detailed and useful for deep debugging.

**What happens if I use `delete_alert` but don't know the alert ID?**
You must supply the unique ID of the rule you want to remove. The system requires this specific identifier because it doesn't support deleting alerts by name or pattern.

**How do I use `get_dashboard` if I only know the dashboard's purpose, not its ID?**
First, run `list_dashboards` to get a list of IDs. Then, pass the specific ID you need to `get_dashboard`. This allows your agent to retrieve detailed metrics for that single board.

**Can I filter logs using relative time ranges when running `list_logs`?**
Yes, you can use relative parameters like '1h' or '24h'. This is ideal for troubleshooting because it lets your agent query data based on a duration without needing specific ISO 8601 timestamps.

**Is there a way to see the full configuration of an event stream using `list_events`?**
The tool retrieves structured events and spans. If you need details beyond just the list, you'll use another function that accepts a specific dashboard ID for deep inspection.

**Can I search for specific errors in my logs using this server?**
Yes! Use the `list_logs` tool with a query like `level:error`. You can also specify a time range using the `from` parameter (e.g., '1h' or '24h') to narrow down the results.

**How do I set up a new alert for a specific service?**
You can use the `create_alert` tool. You'll need to provide a name, the search query (e.g., `service:auth level:error`), a threshold value, the type of alert (like 'count'), and the evaluation interval (e.g., '5m').

**Is it possible to delete an alert rule if it's no longer needed?**
Yes, simply use the `delete_alert` tool and provide the unique ID of the alert rule you wish to remove.