Helicone (LLM Observability) MCP. Audit every API call, track costs, and debug agent performance.

Q: How do I use the querycosts tool in Helicone (LLM Observability)?

You ask the agent directly. It runs querycosts and lets you break down spending by model, user, or custom tag. This is the best way to manage your AI budget.

Q: What is TTFT latency using querylatency?

TTFT stands for Time To First Token. Using querylatency, you measure how fast the model starts sending words. A high TTFT means the model is slow to respond, even if the average speed is okay.

Q: Can I find out which users are using the AI most often with queryusers?

Yes. The agent runs queryusers and provides a list of user history data. This helps you identify your most active or most valuable human clients.

Q: How does querysessions help debug complex agent workflows?

The querysessions tool enumerates structured rules for multi-turn graph traces. This lets you see the entire conversation path and pinpoint where the agent deviated from the expected flow.

Q: What is the difference between queryrequests and querysessions?

queryrequests lists every single API call (the raw action). querysessions groups those calls into structured, multi-turn conversations, showing the flow between them.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Helicone (LLM Observability) tracks your AI usage in real-time. Monitor requests, analyze costs per model or user, and measure latency across all LLM providers.

You can also track multi-turn session graphs, manage prompt versions, and log user feedback directly through your agent. It gives you full visibility into your AI spend and performance.

What your AI agents can do

Get prompt versions

Retrieves all managed versions of a specific prompt template.

List properties

Identifies active properties associated with the gateway authentication.

Log feedback

Records user feedback (Thumbs Up/Down) and offline critiques for model improvement.

+ 7 more capabilities included

Analyze LLM Costs

Queries track spending by model, user, or custom metadata properties to monitor your AI burn rate in real-time.

Measure API Latency

Determines the Time To First Token (TTFT) and isolates performance issues linked to specific LLM providers.

Inspect API Requests

Retrieves deep proxy logs, showing the exact prompts and outputs sent to LLM APIs by your agent.

Trace Agent Sessions

Isolates and analyzes multi-turn graph traces connecting consecutive LLM calls to debug complex agentic workflows.

Manage Prompts

Accesses managed prompt versions and tracks changes in your AI instruction logic over time.

Identify User Activity

Tracks precise LLM interactions using Helicone tags to identify the most active human clients.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Helicone (LLM Observability) MCP Server: 10 Tools

Use these tools to query costs, analyze performance, track user activity, and manage prompts across your entire LLM stack.

get019d75af

get prompt versions

Retrieves all managed versions of a specific prompt template.

list019d75af

list properties

Identifies active properties associated with the gateway authentication.

log019d75af

log feedback

Records user feedback (Thumbs Up/Down) and offline critiques for model improvement.

query019d75af

query costs

Calculates spending based on various account and usage properties.

query019d75af

query feedback

Inspects stored user feedback data and critiques.

query019d75af

query latency

Generates performance data, including Time To First Token (TTFT) metrics.

query019d75af

query prompts

Retrieves detailed cloud logging logs related to prompt usage and limits.

query019d75af

query requests

Gets a list of all API requests made through the platform gateway.

query019d75af

query sessions

Lists all structured rules and metrics from multi-turn agent sessions.

query019d75af

query users

Dispatches a validation check and lists user history data.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Helicone (LLM Observability), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Helicone's observability server lets your agent keep a tight leash on your AI usage. You'll get full visibility into where you're spending money and why things feel sluggish. query_costs lets you calculate spending by model, user, or custom metadata, so you'll always know your AI burn rate. query_latency gives you performance data, including Time To First Token (TTFT) metrics, letting you pinpoint slowness tied to specific LLM providers. query_requests retrieves deep proxy logs, showing the exact prompts and outputs your agent sends to LLM APIs.

You can trace complex workflows with query_sessions, which lists structured rules and metrics from multi-turn agent sessions. To manage your prompts, get_prompt_versions retrieves every managed version of a specific prompt template. You can track user activity with query_users, which validates and lists user history data. You'll keep track of user sentiment and help fine-tune models by calling log_feedback, which records user feedback (Thumbs Up/Down) and offline critiques.

To see what users are telling you, query_feedback inspects all stored user feedback and critiques. query_prompts gets detailed cloud logging logs related to prompt usage and limits. You can also check what's running on the gateway using list_properties, which identifies active properties associated with the gateway authentication. Finally, query_skills lets you check all available tools for your agent.

How Helicone (LLM Observability) MCP Works

1 Subscribe to the Helicone server and enter your API key.
2 Connect your AI client (Claude, Cursor, or any MCP-compatible client).
3 Use your agent to query performance data. You get real-time insights into costs, latency, and usage patterns.

The bottom line is you get a single, natural chat interface to audit and optimize your entire LLM infrastructure.

Who Is Helicone (LLM Observability) MCP For?

This is for the LLM Engineer who needs to debug prompt performance and measure TTFT latency across multiple providers. It's for the Product Owner monitoring AI spending and calculating costs per user. Data Scientists use it to analyze user feedback and improve model response quality. DevOps/SREs rely on it to ensure the availability and reliability of the AI gateway.

LLM Engineer

Uses the server to debug prompt performance and measure TTFT latency across different upstream LLM providers.

Product Owner

Monitors AI spending and calculates costs per user, feature, or organization.

Data Scientist

Analyzes user feedback and logs critiques to improve model response quality.

DevOps/SRE

Ensures the availability and reliability of the AI gateway and proxy layers.

What Changes When You Connect

See your AI burn rate in real-time. The query_costs tool lets you break down spending by model, user, or custom metadata, so you know exactly what you're paying for.
Pinpoint slow parts of your system. Use query_latency to measure Time To First Token (TTFT) and figure out which upstream LLM provider is causing the slowdown.
Track complex agent behaviors. The query_sessions tool isolates multi-turn graph traces, letting you debug complex workflows step by step.
Manage your instructions. get_prompt_versions tracks all changes to your core prompts, so you never lose track of which version was running when.
Understand user intent. The query_users tool tracks precise interactions, helping you identify your most active or valuable human clients.
Improve model quality. Use log_feedback to log user critiques (thumbs up/down) directly, feeding data back into your model grounding process.

Real-World Use Cases

Billing Dispute: Need to know why the bill is too high.

A Product Owner notices the AI bill is spiking. They ask their agent: 'What did we spend on GPT-4o last week?' The agent runs query_costs and identifies that a specific 'Customer Support' feature tag drove the bulk of the cost, allowing the PO to limit that usage.

Debugging Slowness: The agent feels laggy.

An LLM Engineer reports that the agent is slow. They ask: 'Show me the 10 slowest requests from the last hour.' The agent runs query_latency and immediately identifies that the highest TTFT came from an 'anthropic.claude-3-opus' call, pointing the fix to the specific provider.

Auditing Logic: Why did the agent give the wrong answer last week?

A Data Scientist suspects the agent's instructions changed. They ask: 'List all versions for the customer service bot prompt.' The agent runs get_prompt_versions and shows that Version 3 was active for three months, but Version 5 was deployed two days ago with updated grounding rules, confirming the change point.

Fixing a Workflow: The multi-step agent keeps failing.

A DevOps engineer observes a complex agent flow failing intermittently. They ask the agent to 'Show me the full interaction graph for the last failure.' The agent runs query_sessions, isolating the exact sequence of LLM calls that led to the breakdown.

The Tradeoffs

Manual Dashboard Hunting

Logging into separate billing systems, then switching to a separate latency dashboard, and manually cross-referencing dates and IDs. This takes 30 minutes and requires exporting three different CSV files.

→ Use the agent to run query_costs and query_latency in one conversation. The agent combines the data, telling you, 'Spending was X, and the average TTFT was Y.' You get the full picture instantly.

Guessing Prompt Versions

Assuming the latest prompt version is the one causing the issue, and rolling it back without knowing the exact dates it was deployed or what rules were active.

→ Use get_prompt_versions to list all past versions. You can then pinpoint exactly when Version 4 was active and what its specific instruction text was.

Ignoring User Context

Tuning the model based on general metrics, without knowing if the poor performance was specific to a high-value user or a new feature rollout.

→ Use query_users to filter interactions by specific tags. This lets you see the LLM performance only for your most active client segment, making your improvements targeted.

When It Fits, When It Doesn't

Use this if you need to tie performance metrics (latency) directly to business outcomes (cost). You need to know not just that the model is slow, but who is slow, what prompt caused it, and how much it cost. This server is for optimization and auditing. Don't use it if you just need a simple counter (e.g., 'how many messages were sent'). For simple counting, a basic messaging logging tool works. If you only care about the prompt text and don't care about cost or latency, a basic logging service might suffice. But if you need to calculate cost by user and track latency by model, you need the detailed view this server provides.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Helicone. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

get_prompt_versions list_properties log_feedback query_costs query_feedback query_latency query_prompts query_requests query_sessions query_users

Tracking AI Usage Used to Be a Spreadsheet Nightmare.

Before this server, you had to jump between a billing dashboard, a separate logging console, and a dedicated performance monitoring tool. You’d spend hours cross-referencing dates and trying to match a high cost spike to a specific slow request. You’d end up with three different, unlinked sources of truth.

Now, you tell your agent, 'Show me the cost of the last 10 slow requests.' The agent runs `query_costs` and `query_latency` and presents you with one clean, actionable list. It connects the money spent to the performance problem, instantly.

Helicone (LLM Observability) MCP Server: Understand the Full Lifecycle.

Manually checking prompt versions means logging into the prompt management UI, finding the specific prompt, and trying to remember which version was live on a given date. It’s prone to human error and misses context.

With this server, you simply ask: 'What were the grounding rules for the customer service bot on June 1st?' The agent runs `get_prompt_versions` and gives you the exact instruction text, closing the knowledge gap.

Common Questions About Helicone (LLM Observability) MCP

How do I use the `query_costs` tool in Helicone (LLM Observability)? +

You ask the agent directly. It runs query_costs and lets you break down spending by model, user, or custom tag. This is the best way to manage your AI budget.

What is TTFT latency using `query_latency`? +

TTFT stands for Time To First Token. Using query_latency, you measure how fast the model starts sending words. A high TTFT means the model is slow to respond, even if the average speed is okay.

Can I find out which users are using the AI most often with `query_users`? +

Yes. The agent runs query_users and provides a list of user history data. This helps you identify your most active or most valuable human clients.

How does `query_sessions` help debug complex agent workflows? +

The query_sessions tool enumerates structured rules for multi-turn graph traces. This lets you see the entire conversation path and pinpoint where the agent deviated from the expected flow.

What is the difference between `query_requests` and `query_sessions`? +

query_requests lists every single API call (the raw action). query_sessions groups those calls into structured, multi-turn conversations, showing the flow between them.

How do I use the `get_prompt_versions` tool to track changes in my instructions? +

It shows a history of every prompt version. You can see when a version was deployed and by whom, which is critical for tracking changes to your AI instructions.

What can I check using the `query_feedback` tool? +

You inspect logged user critiques. This lets you see thumbs up/down signals and offline Human-in-the-Loop verdicts to directly improve your model's performance.

How does `query_prompts` help me debug API calls? +

This tool retrieves explicit cloud logging. It shows the exact prompts and outputs sent to the LLM APIs, allowing you to trace and debug specific vault limits.

Can I see the exact prompt that caused a specific error? +

Yes. Use the query_requests tool to fetch direct prompts and outputs from the proxy logs. You can filter by status or custom tags to find the exact interaction that needs debugging.

How do I track costs for a specific customer ID? +

Ask your agent to query_costs and include your customer identity in the filter. Helicone maps costs per model and user, allowing you to see exactly how much each client is burning in LLM tokens.

Can my agent log human feedback into Helicone? +

Absolutely. Use the log_feedback tool to inject offline Human-in-the-Loop verdicts or text critiques directly into Helicone's database, helping you refine your model's grounding over time.

View all recipes →