# Helicone Observability MCP

> Helicone provides deep observability into your LLM usage by connecting directly to any AI client. It lets you track every request, analyze costs broken down by user or feature, measure real-time latency spikes, and manage prompt versions without logging into a separate dashboard. You get full visibility across all your upstream LLM calls—all from conversation with your agent.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** llm-observability, request-logging, cost-analysis, latency-tracking, prompt-management, ai-gateway

## Description

Running an AI application means managing complexity, especially around cost and performance. This MCP gives you total control over that mess. Instead of hopping between billing portals and log viewers, you just ask your agent questions about its own activity. You can find out exactly how much money the system burned yesterday, or pinpoint which LLM provider is causing a latency spike during peak hours. It even lets you trace complex multi-step workflows to see exactly where an agent failed or slowed down. If you're already using Vinkius for other services, adding this MCP means all your AI infrastructure data lives in one place—right inside your conversation.

## Tools

### query_costs
Calculates total spending by analyzing properties that drive account charges.

### query_feedback
Inspects stored user feedback data to see what users liked or disliked about the output.

### query_latency
Retrieves performance metrics, showing how fast requests were processed in real-time.

### log_feedback
Logs user critiques or feedback directly into the system for model improvement.

### query_prompts
Pulls detailed log tracing of prompts and the associated rate limits used.

### list_properties
Identifies active authentication arrays used by the gateway for access control.

### query_requests
Identifies all bounded client-server records that passed through the platform gateway.

### query_sessions
Counts and organizes structured rules related to billing and usage periods.

### query_users
Checks system history to validate which users are interacting with the platform.

### get_prompt_versions
Retrieves historical versions of a prompt, allowing you to compare changes over time.

## Prompt Examples

**Prompt:** 
```
How much did we spend on GPT-4o yesterday?
```

**Response:** 
```
Yesterday, you spent a total of $42.15 on GPT-4o across 12,450 requests. The largest portion came from the 'Customer Support' feature tag ($28.40). Would you like a breakdown by user?
```

**Prompt:** 
```
Show me the 10 slowest requests from the last hour
```

**Response:** 
```
I've identified the 10 slowest requests. The highest TTFT was 4.2s for an 'anthropic.claude-3-opus' call. Average latency for these 10 is 3.1s. Would you like to inspect the prompt for the slowest one?
```

**Prompt:** 
```
List all versions for the 'customer-service-bot' prompt
```

**Response:** 
```
Found 5 versions for 'customer-service-bot'. Version 5 (latest) was deployed 2 days ago with updated grounding rules. Version 4 was active for 3 months. I can fetch the exact instruction text for any version.
```

## Capabilities

### Analyze Spending
Break down total LLM spending by specific models or user groups to understand your exact operational burn rate.

### Measure Performance
Identify the slowest parts of a call, measuring Time To First Token (TTFT) and pinpointing latency issues across different AI providers.

### Inspect Prompts
View deep proxy logs to see the exact instructions or data sent to the LLM API calls by your agent.

### Review Conversations
Isolate and analyze entire multi-turn conversation histories to debug complex, chained agentic processes.

### Track Users and Feedback
Identify your most active human users or log specific user critiques (like thumbs up/down) to improve the core model grounding.

## Use Cases

### The billing surprise
A Product Owner needs to explain a sudden spike in AI costs. Instead of pulling messy spreadsheets, they ask their agent: 'Show me why our spending jumped last week.' The agent uses `query_costs` and immediately provides a breakdown by feature tag and user group.

### The slow checkout process
An LLM Engineer notices the chat interface feels sluggish during complex queries. They ask the agent to check performance, triggering `query_latency`. The results show that one specific model provider is causing a 3-second delay, allowing them to switch providers.

### The confusing agent failure
A Data Scientist has an agent fail in a multi-step process. They ask the agent to trace the interaction history, which executes `query_sessions`. The results reveal that the second LLM call was using outdated instructions, pointing them toward checking `get_prompt_versions`.

### The flaky authentication bug
A DevOps team member suspects an auth issue. They ask the agent to check recent activity, triggering `query_requests`. The output shows that certain API calls are failing due to incorrect gateway permissions, directing them straight to checking system properties via `list_properties`.

## Benefits

- Stop guessing about costs. Use `query_costs` to break down every dollar spent on models, making billing transparent for product owners.
- Pinpoint slow spots immediately. Run `query_latency` to measure Time To First Token (TTFT) and figure out which LLM provider is dragging your performance down.
- Improve prompts over time. Use `get_prompt_versions` to see every iteration of a prompt's instructions, so you never lose historical context on refinement.
- Debug complex workflows easily. The agent can use `query_sessions` to trace entire multi-step conversations and isolate exactly where the logic broke.
- Understand your audience better. Use `query_users` or `log_feedback` to track who is using the system most often, and what they actually think of the output.

## How It Works

The bottom line is you get natural language access to your entire LLM operational dashboard.

1. First, subscribe to this MCP and provide your Helicone API Key.
2. Next, connect it to any MCP-compatible client (like Claude or Cursor).
3. Then, talk to the agent. It uses the tool's data to answer questions about costs, latency, or specific prompts.

## Frequently Asked Questions

**How do I check my spending using query_costs?**
You ask your agent to run `query_costs`. It immediately provides a structural breakdown of your current LLM expenditures, letting you see exactly which models and features are driving the most charges.

**Can I use query_latency to find performance issues?**
Yes. Running `query_latency` measures Time To First Token (TTFT) and average speed across all calls, helping you pinpoint exactly which upstream LLM provider is slowing things down.

**What does query_sessions do for debugging?**
`query_sessions` allows the agent to enumerate structured rules exporting active billing data. It's crucial for tracing multi-step workflows and seeing how an agent progressed through its tasks.

**How do I check if a user is valid with query_users?**
You ask the agent to run `query_users`. This dispatches a validation check, confirming which clients have interacted with your system and ensuring you're tracking usage from all sources.

**How do I use get_prompt_versions to audit a prompt's instruction text?**
It fetches the exact historical versions of your prompts. You can compare changes, see when grounding rules were updated, and pinpoint exactly what instructions the model received at any given time.

**What does query_prompts retrieve about the API inputs?**
It retrieves detailed logs of every prompt sent to your LLM APIs. You can inspect these explicit prompts and outputs directly from your agent, which is key for debugging complex workflows.

**How do I use log_feedback to gather user critique data?**
Using log_feedback captures user ratings like thumbs up or down. This logged data is crucial for offline Human-in-the-Loop evaluation and improving model grounding over time.

**What information does query_requests provide about my API usage?**
This tool identifies bounded records of every single request made through your gateway. It gives a comprehensive view of activity, letting you monitor the total volume and context of all interactions.

**Can I see the exact prompt that caused a specific error?**
Yes. Use the `query_requests` tool to fetch direct prompts and outputs from the proxy logs. You can filter by status or custom tags to find the exact interaction that needs debugging.

**How do I track costs for a specific customer ID?**
Ask your agent to `query_costs` and include your customer identity in the filter. Helicone maps costs per model and user, allowing you to see exactly how much each client is burning in LLM tokens.

**Can my agent log human feedback into Helicone?**
Absolutely. Use the `log_feedback` tool to inject offline Human-in-the-Loop verdicts or text critiques directly into Helicone's database, helping you refine your model's grounding over time.