Helicone Observability MCP for AI. Track LLM Costs, Latency, and Usage in Conversation
Works with every AI agent you already use
…and any MCP-compatible client








Connect to your AI in seconds.
Helicone provides deep observability into your LLM usage by connecting directly to any AI client. It lets you track every request, analyze costs broken down by user or feature, measure real-time latency spikes, and manage prompt versions without logging into a separate dashboard.
You get full visibility across all your upstream LLM calls—all from conversation with your agent.
What your AI can do
Query costs
Calculates total spending by analyzing properties that drive account charges.
Query feedback
Inspects stored user feedback data to see what users liked or disliked about the output.
Query latency
Retrieves performance metrics, showing how fast requests were processed in real-time.
Break down total LLM spending by specific models or user groups to understand your exact operational burn rate.
Identify the slowest parts of a call, measuring Time To First Token (TTFT) and pinpointing latency issues across different AI providers.
View deep proxy logs to see the exact instructions or data sent to the LLM API calls by your agent.
Isolate and analyze entire multi-turn conversation histories to debug complex, chained agentic processes.
Identify your most active human users or log specific user critiques (like thumbs up/down) to improve the core model grounding.
Ask an AI about this
Waiting for input…
Helicone (LLM Observability) with 10 Tools
These tools give your agent the raw data it needs to analyze costs, track performance metrics, inspect prompts, and monitor all LLM activity in detail.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Helicone (LLM Observability) on VinkiusQuery Costs
Calculates total spending by analyzing properties that drive account charges.
Query Feedback
Inspects stored user feedback data to see what users liked or disliked about the...
Query Latency
Retrieves performance metrics, showing how fast requests were processed in real-time.
Log Feedback
Logs user critiques or feedback directly into the system for model improvement.
Query Prompts
Pulls detailed log tracing of prompts and the associated rate limits used.
List Properties
Identifies active authentication arrays used by the gateway for access control.
Query Requests
Identifies all bounded client-server records that passed through the platform gateway.
Query Sessions
Counts and organizes structured rules related to billing and usage periods.
Query Users
Checks system history to validate which users are interacting with the platform.
Get Prompt Versions
Retrieves historical versions of a prompt, allowing you to compare changes over time.
Security and governance baked right in.
Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Helicone (LLM Observability), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,100+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Helicone. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This connection provides 10 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.
Sifting through logs and spreadsheets for every AI metric is exhausting.
Right now, if your agent acts weird or the bill arrives higher than expected, you're stuck. You have to jump into a dashboard, pull up the log service, cross-reference timestamps with billing reports, and maybe check an outdated Git branch for the prompt version. It takes hours of clicking and copy-pasting just to answer: 'What went wrong?'
With this MCP, you talk to your agent like it's a helpful teammate. Instead of navigating multiple services, you ask natural questions—like 'Where did we spend most on Claude last week?'—and the agent instantly aggregates all that data for you.
Better control over prompt versions using `get_prompt_versions`
Before this, if a prompt change broke something, you were manually tracing through commit history and hoping the old version was still backed up somewhere. You had no easy way to compare exactly what instructions were active last month versus today's rules.
Now, when things break or you want to prove performance improvements, you simply ask your agent to run `get_prompt_versions`. It shows you every recorded change and the exact text of past versions, letting you rollback logic without touching code.
What your AI can actually do with this
Running an AI application means managing complexity, especially around cost and performance. This MCP gives you total control over that mess. Instead of hopping between billing portals and log viewers, you just ask your agent questions about its own activity. You can find out exactly how much money the system burned yesterday, or pinpoint which LLM provider is causing a latency spike during peak hours.
It even lets you trace complex multi-step workflows to see exactly where an agent failed or slowed down. If you're already using Vinkius for other services, adding this MCP means all your AI infrastructure data lives in one place—right inside your conversation.
019d75af-3782-7271-8c2e-071c1a2f6ce4 Here's how it actually works
The bottom line is you get natural language access to your entire LLM operational dashboard.
First, subscribe to this MCP and provide your Helicone API Key.
Next, connect it to any MCP-compatible client (like Claude or Cursor).
Then, talk to the agent. It uses the tool's data to answer questions about costs, latency, or specific prompts.
Who is this actually for?
This MCP is for the engineering teams—the folks who realize that building an AI product isn't just about writing clever code; it's about managing unpredictable costs and diagnosing frustrating, intermittent failures. If you spend time arguing over why a feature slows down at 3 PM, this is for you.
Debugging prompt performance or measuring TTFT latency across multiple LLMs to fix bottlenecks.
Ensuring the reliability and availability of the entire AI gateway layer by monitoring usage trends.
Calculating precise costs per user or feature to justify spending to stakeholders.
What Changes When You Connect
Stop guessing about costs. Use query_costs to break down every dollar spent on models, making billing transparent for product owners.
Pinpoint slow spots immediately. Run query_latency to measure Time To First Token (TTFT) and figure out which LLM provider is dragging your performance down.
Improve prompts over time. Use get_prompt_versions to see every iteration of a prompt's instructions, so you never lose historical context on refinement.
Debug complex workflows easily. The agent can use query_sessions to trace entire multi-step conversations and isolate exactly where the logic broke.
Understand your audience better. Use query_users or log_feedback to track who is using the system most often, and what they actually think of the output.
See it in action
The billing surprise
A Product Owner needs to explain a sudden spike in AI costs. Instead of pulling messy spreadsheets, they ask their agent: 'Show me why our spending jumped last week.' The agent uses query_costs and immediately provides a breakdown by feature tag and user group.
The slow checkout process
An LLM Engineer notices the chat interface feels sluggish during complex queries. They ask the agent to check performance, triggering query_latency. The results show that one specific model provider is causing a 3-second delay, allowing them to switch providers.
The confusing agent failure
A Data Scientist has an agent fail in a multi-step process. They ask the agent to trace the interaction history, which executes query_sessions. The results reveal that the second LLM call was using outdated instructions, pointing them toward checking get_prompt_versions.
The flaky authentication bug
A DevOps team member suspects an auth issue. They ask the agent to check recent activity, triggering query_requests. The output shows that certain API calls are failing due to incorrect gateway permissions, directing them straight to checking system properties via list_properties.
The honest tradeoffs
Assuming all data is available
A team member assumes that just because they have logs for prompts, the cost data must be in there too. They try to manually cross-reference query_prompts with billing reports.
Don't eyeball it. Use the agent to run both query_prompts and query_costs. The MCP links these metrics together so you get a complete, auditable view in one conversation.
Debugging single-turn failures
A developer only checks the last API call that failed, missing the context of the preceding successful steps. They only run query_requests for the time window.
Always check the full scope by running query_sessions. This shows you the entire graph of calls, making it clear which early step caused the final failure.
Over-logging everything
A developer writes code that logs every single input and output to a database table for perfect auditing. The cost is astronomical.
Use this MCP's tools like query_costs first. It gives you the necessary insights without forcing massive data storage, keeping your infrastructure lean.
When It Fits, When It Doesn't
Use this MCP if managing unpredictable costs and complex performance issues is part of your job description. You need to know why an agent slowed down or who spent money on what—not just that it happened. Don't use it if you only need simple, basic logging; those dedicated tools are fine. However, if your application relies on multi-step reasoning and needs to track the evolution of its internal logic, this is essential. Always run query_latency and query_costs together first; that pair tells you everything you need to know about health.
Questions you might have
How do I check my spending using query_costs? +
You ask your agent to run query_costs. It immediately provides a structural breakdown of your current LLM expenditures, letting you see exactly which models and features are driving the most charges.
Can I use query_latency to find performance issues? +
Yes. Running query_latency measures Time To First Token (TTFT) and average speed across all calls, helping you pinpoint exactly which upstream LLM provider is slowing things down.
What does query_sessions do for debugging? +
query_sessions allows the agent to enumerate structured rules exporting active billing data. It's crucial for tracing multi-step workflows and seeing how an agent progressed through its tasks.
How do I check if a user is valid with query_users? +
You ask the agent to run query_users. This dispatches a validation check, confirming which clients have interacted with your system and ensuring you're tracking usage from all sources.
How do I use get_prompt_versions to audit a prompt's instruction text? +
It fetches the exact historical versions of your prompts. You can compare changes, see when grounding rules were updated, and pinpoint exactly what instructions the model received at any given time.
What does query_prompts retrieve about the API inputs? +
It retrieves detailed logs of every prompt sent to your LLM APIs. You can inspect these explicit prompts and outputs directly from your agent, which is key for debugging complex workflows.
How do I use log_feedback to gather user critique data? +
Using log_feedback captures user ratings like thumbs up or down. This logged data is crucial for offline Human-in-the-Loop evaluation and improving model grounding over time.
What information does query_requests provide about my API usage? +
This tool identifies bounded records of every single request made through your gateway. It gives a comprehensive view of activity, letting you monitor the total volume and context of all interactions.
Can I see the exact prompt that caused a specific error? +
Yes. Use the query_requests tool to fetch direct prompts and outputs from the proxy logs. You can filter by status or custom tags to find the exact interaction that needs debugging.
How do I track costs for a specific customer ID? +
Ask your agent to query_costs and include your customer identity in the filter. Helicone maps costs per model and user, allowing you to see exactly how much each client is burning in LLM tokens.
Can my agent log human feedback into Helicone? +
Absolutely. Use the log_feedback tool to inject offline Human-in-the-Loop verdicts or text critiques directly into Helicone's database, helping you refine your model's grounding over time.
Powerful workflows you can unlock today
Cut AI Model Costs Without Losing Quality via MCP
Your GPT-4o bill is $4,200/month and 60% of those calls could run on Groq for $0.003 , your agent finds the waste
Monitor AI Agent Performance Using MCP Servers
Your agents run in production but you cannot explain why one failed at 3am , fix that
Track LLM Cost vs Quality Using MCP Servers
Your OpenAI bill grew from $200 to $2,400 in 2 months and you have no idea which feature caused it , because you track API spend at the account level, not at the prompt level
We've already built the connector for Helicone Observability. Just plug in your AI agents and start using Vinkius.
No hosting. No infrastructure. No complex setup.
All 10 tools are live and waiting.
You're up and running in seconds.
Vinkius gives your AI agents access to the full catalog of app connectors, all fully managed, secure, and enterprise-ready. One subscription, every tool you need.
Built, hosted, and secured by Vinkius. You just connect and go.