# New Relic AI (LLM Observability) MCP

> New Relic AI (LLM Observability) lets you pull performance data, token costs, and user feedback directly from your LLMs using natural conversation. Instead of logging into dashboards to check p95 latency or calculating total USD spend, you ask your agent for the metrics immediately. Track every chat completion, audit model behavior, and verify infrastructure health—all in one place.

## Overview
- **Category:** loved-by-devs
- **Price:** Free
- **Tags:** llm-monitoring, token-cost-tracking, performance-analytics, ai-observability, latency-tracking

## Description

You run complex AI agents that use Large Language Models (LLMs). Things break, costs spike unexpectedly, or performance dips when nobody is looking. This MCP connects New Relic AI to your existing agent workflow, giving you full visibility into everything happening under the hood. You can ask for total token usage across all models in dollars and cents. Need to know why responses slow down? Check the p95 latency metrics instantly. Want to audit model behavior? Review raw chat completion messages to understand exactly what the LLM saw or generated. This access means you don't have to jump between cost dashboards, performance monitoring tools, and logs just to get a complete picture. By connecting this MCP via Vinkius, your agent becomes an operational detective for your AI stack.

## Tools

### list_alert_policies
Checks all existing automated alerts configured for the system's plan math.

### list_apm_apps
Retrieves a list of currently running APM applications to validate service status.

### custom_nrql
Runs sophisticated, read-only queries using the New Relic Query Language (NRQL) for deep data insights.

### list_dashboards
Finds all active operational dashboards tied to native Gateway authentication.

### query_llm_errors
Identifies and lists specific error logs related to LLM processing.

### query_llm_costs
Calculates the precise monetary cost of tokens used by your agents over a specified period.

### query_llm_events
Retrieves bounded records tracking general activity within the New Relic platform.

### query_llm_feedback
Gathers human-submitted feedback and rating scores associated with LLM outputs.

### query_llm_latency
Measures the speed of your LLMs by retrieving p95 latency matrices and average response times.

### post_custom_event
Sends custom telemetry rows to track unique internal states or behaviors within your agent workflow.

## Prompt Examples

**Prompt:** 
```
Show me the last 5 LLM events for the 'OpenAI' vendor
```

**Response:** 
```
Retrieving LLM events… I've identified 5 recent messages through the OpenAI module. Highlights: 1) Chat completion 'task-123' (Tokens: 1,500, Cost: $0.03), 2) Prompt 'User Query Alpha' (Status: Success). Would you like to see the literal prompt text for any of these?
```

**Prompt:** 
```
What is my total LLM token cost for the last 24 hours?
```

**Response:** 
```
Retrieving cost metrics… Your total LLM token spend for the last 24 hours is $12.45. This is distributed across 'gpt-4o' ($8.50), 'gpt-3.5-turbo' ($2.45), and 'claude-3-sonnet' ($1.50). Spend is trending 5% lower than yesterday.
```

**Prompt:** 
```
Run NRQL: SELECT count(*) FROM LlmEvent WHERE duration > 2 SINCE 1 hour ago
```

**Response:** 
```
Executing custom NRQL query… I've identified 12 LLM events in the last hour that exceeded 2 seconds in duration. This represents approximately 3% of your total traffic. Would you like me to facet these slow events by model or region?
```

## Capabilities

### Audit LLM Performance Metrics
Get average response times and the 95th percentile latency data to ensure your models remain fast.

### Track Token Expenditure
Calculate precise USD costs for all token usage across your entire AI infrastructure.

### Review Model Interactions
Retrieve detailed chat completion messages and original prompts to audit model behavior in real-time.

### Measure User Satisfaction
Fetch chronological user feedback and 1-5 rating scores provided by human supervisors.

### Execute Custom Queries
Run advanced, read-only queries using the New Relic Query Language (NRQL) against your AI datasets.

### Monitor Infrastructure Health
Examine active APM apps, dashboards, and alert policies to check overall system integrity.

## Use Cases

### Debugging an unexpected cost spike
An AI Engineer notices their LLM costs are higher than normal. They ask the agent, 'What was my total token spend last week?' The agent executes `query_llm_costs` and reports that a specific integration caused a massive spike in usage, allowing the engineer to immediately pinpoint the source.

### Checking user acceptance of new prompts
An Observability Lead wants to know if recent prompt changes affected quality. They ask the agent for `query_llm_feedback`. The agent pulls up a list of ratings, showing that user satisfaction dropped sharply after the change was deployed.

### Validating system readiness before launch
A DevOps team member needs to ensure all monitoring is active. They instruct the agent to run `list_apm_apps` and check `list_alert_policies`. The agent confirms that all necessary applications are running and alert triggers are correctly configured.

### Analyzing slow agent responses
An AI Engineer reports that sometimes the chat feels sluggish. They ask the agent to run `query_llm_latency`, which returns a matrix showing that the average response time exceeds 2 seconds during peak usage hours.

## Benefits

- Stop guessing about spending. Use `query_llm_costs` to get the exact dollar amount of your token usage, giving you tight control over infrastructure spend.
- Debug slowness fast. Running `query_llm_latency` provides p95 latency matrices and average response times so you know exactly when your LLM generation is dipping below acceptable speed.
- Audit model behavior instantly. Instead of digging through raw logs, use the agent to retrieve detailed chat completion messages, allowing you to verify what the LLM saw or generated.
- Measure quality with real data. `query_llm_feedback` pulls in human supervisor ratings and feedback messages, letting you spot quality regressions immediately after deployment.
- Stay ahead of system decay. Running `list_apm_apps` and `list_dashboards` lets DevOps check the structural health of your entire environment without leaving the chat window.

## How It Works

The bottom line is you talk to your agent like talking to a teammate; it handles the complex monitoring data retrieval for you.

1. Subscribe to this MCP and enter your New Relic API Key and Account ID.
2. Connect your preferred AI client—Claude, Cursor, or any compatible agent—to Vinkius.
3. Ask a natural language question about your LLM activity. Your agent executes the necessary queries and reports back with performance metrics or cost breakdowns.

## Frequently Asked Questions

**How does New Relic AI (LLM Observability) track token costs?**
This MCP uses `query_llm_costs` to calculate your total LLM token spend. It gives you the exact USD consumption across different models and services, so you never lose money tracking usage.

**Can I check my LLM performance latency with this MCP?**
Yes, use `query_llm_latency`. It pulls p95 latency matrices and average response times, helping you pinpoint exactly when your agent's responses slow down.

**What kind of data can I audit with New Relic AI (LLM Observability)?**
You can audit everything: chat completion messages for model behavior, human supervisor feedback using `query_llm_feedback`, and raw internal agent states via `post_custom_event`.

**Is New Relic AI (LLM Observability) read-only?**
Yes. The tool uses mechanisms like `custom_nrql` which are strictly read-only queries, meaning you can pull insights without risking any changes to your live infrastructure.

**Does this MCP help with general system health checks?**
It does. You can use tools like `list_apm_apps` and `list_alert_policies` to check the operational status of your entire environment, not just the LLM component.