Helicone Observability MCP for AI. Track LLM Costs, Latency, and Usage in Conversation

Q: How do I check my spending using querycosts?

You ask your agent to run querycosts. It immediately provides a structural breakdown of your current LLM expenditures, letting you see exactly which models and features are driving the most charges.

Q: Can I use querylatency to find performance issues?

Yes. Running querylatency measures Time To First Token (TTFT) and average speed across all calls, helping you pinpoint exactly which upstream LLM provider is slowing things down.

Q: What does querysessions do for debugging?

querysessions allows the agent to enumerate structured rules exporting active billing data. It's crucial for tracing multi-step workflows and seeing how an agent progressed through its tasks.

Q: How do I check if a user is valid with queryusers?

You ask the agent to run queryusers. This dispatches a validation check, confirming which clients have interacted with your system and ensuring you're tracking usage from all sources.

Q: How do I use logfeedback to gather user critique data?

Using logfeedback captures user ratings like thumbs up or down. This logged data is crucial for offline Human-in-the-Loop evaluation and improving model grounding over time.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Helicone provides deep observability into your LLM usage by connecting directly to any AI client. It lets you track every request, analyze costs broken down by user or feature, measure real-time latency spikes, and manage prompt versions without logging into a separate dashboard.

You get full visibility across all your upstream LLM calls—all from conversation with your agent.

What your AI can do

Query costs

Calculates total spending by analyzing properties that drive account charges.

Query feedback

Inspects stored user feedback data to see what users liked or disliked about the output.

Query latency

Retrieves performance metrics, showing how fast requests were processed in real-time.

+ 7 more capabilities included

Analyze Spending

Break down total LLM spending by specific models or user groups to understand your exact operational burn rate.

Measure Performance

Identify the slowest parts of a call, measuring Time To First Token (TTFT) and pinpointing latency issues across different AI providers.

Inspect Prompts

View deep proxy logs to see the exact instructions or data sent to the LLM API calls by your agent.

Review Conversations

Isolate and analyze entire multi-turn conversation histories to debug complex, chained agentic processes.

Track Users and Feedback

Identify your most active human users or log specific user critiques (like thumbs up/down) to improve the core model grounding.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Helicone (LLM Observability) with 10 Tools

These tools give your agent the raw data it needs to analyze costs, track performance metrics, inspect prompts, and monitor all LLM activity in detail.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Helicone (LLM Observability) on Vinkius

Query Costs

Calculates total spending by analyzing properties that drive account charges.

Query Feedback

Inspects stored user feedback data to see what users liked or disliked about the...

Query Latency

Retrieves performance metrics, showing how fast requests were processed in real-time.

Log Feedback

Logs user critiques or feedback directly into the system for model improvement.

Query Prompts

Pulls detailed log tracing of prompts and the associated rate limits used.

List Properties

Identifies active authentication arrays used by the gateway for access control.

Query Requests

Identifies all bounded client-server records that passed through the platform gateway.

Query Sessions

Counts and organizes structured rules related to billing and usage periods.

Query Users

Checks system history to validate which users are interacting with the platform.

Get Prompt Versions

Retrieves historical versions of a prompt, allowing you to compare changes over time.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Helicone Observability integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "helicone-llm-observability": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Helicone Observability tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"helicone-llm-observability": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Helicone (LLM Observability), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Helicone. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 10 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Sifting through logs and spreadsheets for every AI metric is exhausting.

Right now, if your agent acts weird or the bill arrives higher than expected, you're stuck. You have to jump into a dashboard, pull up the log service, cross-reference timestamps with billing reports, and maybe check an outdated Git branch for the prompt version. It takes hours of clicking and copy-pasting just to answer: 'What went wrong?'

With this MCP, you talk to your agent like it's a helpful teammate. Instead of navigating multiple services, you ask natural questions—like 'Where did we spend most on Claude last week?'—and the agent instantly aggregates all that data for you.

Better control over prompt versions using `get_prompt_versions`

Before this, if a prompt change broke something, you were manually tracing through commit history and hoping the old version was still backed up somewhere. You had no easy way to compare exactly what instructions were active last month versus today's rules.

Now, when things break or you want to prove performance improvements, you simply ask your agent to run `get_prompt_versions`. It shows you every recorded change and the exact text of past versions, letting you rollback logic without touching code.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Running an AI application means managing complexity, especially around cost and performance. This MCP gives you total control over that mess. Instead of hopping between billing portals and log viewers, you just ask your agent questions about its own activity. You can find out exactly how much money the system burned yesterday, or pinpoint which LLM provider is causing a latency spike during peak hours.

It even lets you trace complex multi-step workflows to see exactly where an agent failed or slowed down. If you're already using Vinkius for other services, adding this MCP means all your AI infrastructure data lives in one place—right inside your conversation.

Built · Hosted · Managed by Vinkius Helicone Observability MCP - Track LLM Costs & Latency

Server ID 019d75af-3782-7271-8c2e-071c1a2f6ce4

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

Stop guessing about costs. Use query_costs to break down every dollar spent on models, making billing transparent for product owners.

Pinpoint slow spots immediately. Run query_latency to measure Time To First Token (TTFT) and figure out which LLM provider is dragging your performance down.

Improve prompts over time. Use get_prompt_versions to see every iteration of a prompt's instructions, so you never lose historical context on refinement.

Debug complex workflows easily. The agent can use query_sessions to trace entire multi-step conversations and isolate exactly where the logic broke.

Understand your audience better. Use query_users or log_feedback to track who is using the system most often, and what they actually think of the output.

See it in action

01 01

The billing surprise

A Product Owner needs to explain a sudden spike in AI costs. Instead of pulling messy spreadsheets, they ask their agent: 'Show me why our spending jumped last week.' The agent uses query_costs and immediately provides a breakdown by feature tag and user group.

02 02

The slow checkout process

An LLM Engineer notices the chat interface feels sluggish during complex queries. They ask the agent to check performance, triggering query_latency. The results show that one specific model provider is causing a 3-second delay, allowing them to switch providers.

03 03

The confusing agent failure

A Data Scientist has an agent fail in a multi-step process. They ask the agent to trace the interaction history, which executes query_sessions. The results reveal that the second LLM call was using outdated instructions, pointing them toward checking get_prompt_versions.

04 04

The flaky authentication bug

A DevOps team member suspects an auth issue. They ask the agent to check recent activity, triggering query_requests. The output shows that certain API calls are failing due to incorrect gateway permissions, directing them straight to checking system properties via list_properties.

The honest tradeoffs

Assuming all data is available

Anti-pattern

A team member assumes that just because they have logs for prompts, the cost data must be in there too. They try to manually cross-reference query_prompts with billing reports.

The Fix

Don't eyeball it. Use the agent to run both query_prompts and query_costs. The MCP links these metrics together so you get a complete, auditable view in one conversation.

Debugging single-turn failures

Anti-pattern

A developer only checks the last API call that failed, missing the context of the preceding successful steps. They only run query_requests for the time window.

The Fix

Always check the full scope by running query_sessions. This shows you the entire graph of calls, making it clear which early step caused the final failure.

Over-logging everything

Anti-pattern

A developer writes code that logs every single input and output to a database table for perfect auditing. The cost is astronomical.

The Fix

Use this MCP's tools like query_costs first. It gives you the necessary insights without forcing massive data storage, keeping your infrastructure lean.

When It Fits, When It Doesn't

Use this MCP if managing unpredictable costs and complex performance issues is part of your job description. You need to know why an agent slowed down or who spent money on what—not just that it happened. Don't use it if you only need simple, basic logging; those dedicated tools are fine. However, if your application relies on multi-step reasoning and needs to track the evolution of its internal logic, this is essential. Always run query_latency and query_costs together first; that pair tells you everything you need to know about health.

Questions you might have

How do I check my spending using query_costs? +

You ask your agent to run query_costs. It immediately provides a structural breakdown of your current LLM expenditures, letting you see exactly which models and features are driving the most charges.

Can I use query_latency to find performance issues? +

Yes. Running query_latency measures Time To First Token (TTFT) and average speed across all calls, helping you pinpoint exactly which upstream LLM provider is slowing things down.

What does query_sessions do for debugging? +

query_sessions allows the agent to enumerate structured rules exporting active billing data. It's crucial for tracing multi-step workflows and seeing how an agent progressed through its tasks.

How do I check if a user is valid with query_users? +

You ask the agent to run query_users. This dispatches a validation check, confirming which clients have interacted with your system and ensuring you're tracking usage from all sources.

How do I use get_prompt_versions to audit a prompt's instruction text? +

It fetches the exact historical versions of your prompts. You can compare changes, see when grounding rules were updated, and pinpoint exactly what instructions the model received at any given time.

What does query_prompts retrieve about the API inputs? +

It retrieves detailed logs of every prompt sent to your LLM APIs. You can inspect these explicit prompts and outputs directly from your agent, which is key for debugging complex workflows.

How do I use log_feedback to gather user critique data? +

Using log_feedback captures user ratings like thumbs up or down. This logged data is crucial for offline Human-in-the-Loop evaluation and improving model grounding over time.

What information does query_requests provide about my API usage? +

This tool identifies bounded records of every single request made through your gateway. It gives a comprehensive view of activity, letting you monitor the total volume and context of all interactions.

Can I see the exact prompt that caused a specific error? +

Yes. Use the query_requests tool to fetch direct prompts and outputs from the proxy logs. You can filter by status or custom tags to find the exact interaction that needs debugging.

How do I track costs for a specific customer ID? +

Ask your agent to query_costs and include your customer identity in the filter. Helicone maps costs per model and user, allowing you to see exactly how much each client is burning in LLM tokens.

Can my agent log human feedback into Helicone? +

Absolutely. Use the log_feedback tool to inject offline Human-in-the-Loop verdicts or text critiques directly into Helicone's database, helping you refine your model's grounding over time.

View all recipes →

View all recipes

Connect to your AI in seconds.

Query costs

Query feedback

Query latency

Helicone (LLM Observability) with 10 Tools

Make your AI actually useful.

Query Costs

Query Feedback

Query Latency

Log Feedback

Query Prompts

List Properties

Query Requests

Query Sessions

Query Users

Get Prompt Versions

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Sifting through logs and spreadsheets for every AI metric is exhausting.

Better control over prompt versions using `get_prompt_versions`

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

The billing surprise

The slow checkout process

The confusing agent failure

The flaky authentication bug

The honest tradeoffs

Assuming all data is available

Debugging single-turn failures

Over-logging everything

When It Fits, When It Doesn't

Questions you might have

Powerful workflows you can unlock today

Cut AI Model Costs Without Losing Quality via MCP

Monitor AI Agent Performance Using MCP Servers

Track LLM Cost vs Quality Using MCP Servers