Langfuse MCP for AI. See exactly how your AI calls work.

Q: How do I check the total spending with Langfuse MCP?

Run getdailymetrics. This tool provides an aggregated report on your total USD costs and average latency across all runs for the day.

Q: I need to see what prompts are used by my agent using Langfuse MCP.

Use listprompts. This tool extracts and displays all actively managed prompt templates, letting you inspect their system instructions and expected input variables.

Q: How do I track multiple conversations in Langfuse MCP?

Call listsessions to get high-level user session entities. This groups together related multi-turn interactions, helping you understand the full context.

Q: Should I use createscore when evaluating model grounding and accuracy?

Yes, using createscore lets you attach structured feedback or evaluation metrics to a specific trace or observation. This is critical for monitoring model performance against defined human standards or automated quality checks.

Q: What's the difference between gettrace and getobservation when troubleshooting?

gettrace retrieves the complete, nested graph of an entire LLM API session. If you only need to check a single event or span within that trace, use getobservation for faster, more targeted context retrieval.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

Langfuse connects your AI agent directly to deep LLM observability and evaluation data. You track API session traces, inspect token usage, manage prompt versions, and audit model accuracy metrics without leaving your chat window.

What your AI can do

Get trace

Fetches all telemetry and the nested graph for one complete LLM API session.

Get daily metrics

Generates rolled-up reports showing total USD cost and aggregated latency for the day.

Create observation

Adds a detailed event, span, or generation record into an active LLM trace.

+ 7 more capabilities included

Audit full interaction chains

Retrieve the complete history of an AI session, including all steps, timings, and token counts.

Pinpoint performance bottlenecks

Drill down into specific moments within a trace to find out exactly where latency or failures occurred.

Manage system instructions

View and query the active versions of prompt templates used by the model, checking for expected inputs.

Measure quality and cost

Attach human feedback or automated metrics to specific runs, and generate daily reports on total USD spending and average latency.

Analyze user context flow

Group together related conversations to understand multi-turn interaction boundaries over time.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

Langfuse (LLM Tracing & Evals) - 10 Tools

These tools let you query every part of your LLM application—from full session traces to specific cost metrics and prompt versions.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Langfuse (LLM Tracing & Evals) on Vinkius

Get Trace

Fetches all telemetry and the nested graph for one complete LLM API session.

Get Daily Metrics

Generates rolled-up reports showing total USD cost and aggregated latency for the...

Create Observation

Adds a detailed event, span, or generation record into an active LLM trace.

Get Observation

Retrieves context from a single specific span or generation event within a trace.

List Observations

Lists raw observation objects across multiple different traces.

List Prompts

Extracts and views all active prompt templates and their versions.

Create Score

Attaches human feedback or automated quality metrics to a specific model run.

List Scores

Lists all stored evaluation scores, mapping quality or cost algorithms used on model...

List Sessions

Retrieves high-level groups of user interactions that contain multiple related...

List Traces

Lists all recorded LLM API sessions for quick review.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The Langfuse integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "langfuse-llm-tracing-evals": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the Langfuse tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"langfuse-llm-tracing-evals": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Langfuse (LLM Tracing & Evals), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Langfuse. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 10 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

The hardest part isn't building the AI; it's knowing what happened when it failed.

Today, if your agent breaks or performs slowly, you end up in a mess. You copy IDs from one dashboard, then hop to another to check token counts, and maybe jump to a third system just to see the payload. It's manual, it’s slow, and it's impossible to track correlation across multiple services.

With this MCP, you talk to your agent about the failure. You don't copy anything; you just ask. Your agent then pulls all that cross-referenced data—the timings, the payloads, the entire execution graph—and gives it back in a readable format. It cuts out the dashboard hopping.

Langfuse MCP: Get Quality Scores and Usage Metrics

You stop relying on guesswork for model quality. Instead of hoping the AI is good enough, you can now attach structured human feedback or automated metrics directly to specific runs using `create_score`. You also get real-time financial visibility by running `get_daily_metrics`.

This means your development cycle shifts from 'Did it work?' to 'How well did it work, and what did it cost us?' It’s a fundamental shift in how you treat AI functionality.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Every time you build an application using large language models, the actual execution details get buried in logs. This MCP lets your agent connect to Langfuse, giving you full visibility into what the model is doing—and why it might fail. You can ask about specific API calls and retrieve the exact payload that caused a latency spike.

It's not just logging; it’s structured monitoring for performance and quality control. If you need to track costs or check how good the prompts are, this MCP gives your agent direct access. It integrates into your existing stack via Vinkius, letting you pull insights from complex systems simply by asking questions in natural language.

Built · Hosted · Managed by Vinkius Langfuse - LLM Tracing & Evals for AI Observability

Server ID 019d75c4-7f86-73f7-9d96-ef98162e59dd

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

You instantly see the cost breakdown. Instead of guessing, use get_daily_metrics to get aggregated reports on total USD spending and average latency for today's runs.

Debugging complex chains is faster. You can retrieve a full session graph using get_trace, letting you see every single payload that passed through the system.

Never lose track of a conversation. By calling list_sessions, your agent groups together all related user interactions, making it easier to improve long-term workflows.

Manage prompt drift easily. Use list_prompts to inspect active templates and see exactly what system instructions are currently running in production.

Validate model output quality using structured feedback. You can assign scores via create_score, attaching human judgment or automated metrics to specific runs.

Deep dive into failures. If a call breaks, you don't have to search logs; just ask your agent and use get_observation to get the context of that failure.

See it in action

01 01

Debugging an intermittent API error

An engineer notices a chat feature fails sometimes. They tell their agent, 'Show me the last three failed traces.' The agent uses list_traces and then pulls the specific context with get_observation, revealing that the failure only happens when a certain variable is null.

02 02

Auditing prompt compliance

A Product Owner needs to check if developers are using the latest version of the internal 'customer support' guide. They ask their agent, and it uses list_prompts to display the system instructions and expected variables for review.

03 03

Calculating operational cost

The CTO needs an end-of-month report on AI spending. The agent runs a query using get_daily_metrics, providing an accurate, aggregated dollar amount of total tokens consumed and average latency for the month.

04 04

Analyzing multi-user behavior

A data scientist wants to know if users who interact with Feature A also tend to use Feature B. The agent uses list_sessions to group correlated user activity, allowing them to pinpoint usage patterns across different features.

The honest tradeoffs

Searching for single log lines

Anti-pattern

A dev manually copies a trace ID and then navigates through three separate logging dashboards (latency, tokens, payload) to piece together what happened.

The Fix

Tell your agent to run get_trace with the specific ID. It pulls all the nested graph data—latencies, token counts, payloads—in one go.

Assuming cost is constant

Anti-pattern

A PM assumes a new feature will only cost $50/day, based on initial estimates, without tracking actual usage.

The Fix

Run get_daily_metrics to get the real-time data. This shows exactly how many tokens were consumed and what the average latency was today.

Ignoring prompt changes

Anti-pattern

A team updates a core prompt template but forgets to check if older versions are still in use, leading to unpredictable behavior.

The Fix

Use list_prompts to see every version and the current system instructions. This ensures you know exactly what context the model is operating under.

Questions you might have

How do I check the total spending with Langfuse MCP? +

Run get_daily_metrics. This tool provides an aggregated report on your total USD costs and average latency across all runs for the day.

What does get_trace do in Langfuse MCP? +

It retrieves the complete, detailed telemetry graph for a single LLM session. This shows every internal step (span) that occurred during the API call.

I need to see what prompts are used by my agent using Langfuse MCP. +

Use list_prompts. This tool extracts and displays all actively managed prompt templates, letting you inspect their system instructions and expected input variables.

How do I track multiple conversations in Langfuse MCP? +

Call list_sessions to get high-level user session entities. This groups together related multi-turn interactions, helping you understand the full context.

How can I use list_observations to find a specific performance bottleneck within an LLM trace? +

You get raw data points by listing observations, which lets you examine individual spans or generations. This pinpoints exactly where latency spikes or errors occurred in the chain, helping you diagnose bottlenecks without reviewing the entire session graph.

Should I use create_score when evaluating model grounding and accuracy? +

Yes, using create_score lets you attach structured feedback or evaluation metrics to a specific trace or observation. This is critical for monitoring model performance against defined human standards or automated quality checks.

What's the difference between get_trace and get_observation when troubleshooting? +

get_trace retrieves the complete, nested graph of an entire LLM API session. If you only need to check a single event or span within that trace, use get_observation for faster, more targeted context retrieval.

How do I analyze which parts of my application are consuming the most tokens using list_traces? +

You can list traces to review metadata attached to each API session. This raw data allows you to quickly sort and identify sessions with unusually high token counts or excessive latencies across your various pipelines.

Can I see the exact system instruction for a specific prompt version? +

Yes. Use the list_prompts tool to browse your managed templates. Your agent can retrieve the exact text and variables for any deployed prompt version, making it easy to audit AI logic through natural conversation.

How do I log human feedback for a specific trace? +

Use the create_score tool by providing the Trace ID and a JSON payload defining the score name (e.g. 'user-satisfaction') and value. Your agent will attach this structured data directly to the Langfuse record.

Can my agent report on my LLM spending for the current day? +

Absolutely. The get_daily_metrics tool retrieves aggregated USD costs and average latency metrics from Langfuse. Your agent can summarize these statistics to help you monitor your infrastructure budget in real-time.

View all recipes →

View all recipes

Connect to your AI in seconds.

Get trace

Get daily metrics

Create observation

Langfuse (LLM Tracing & Evals) - 10 Tools

Make your AI actually useful.

Get Trace

Get Daily Metrics

Create Observation

Get Observation

List Observations

List Prompts

Create Score

List Scores

List Sessions

List Traces

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

The hardest part isn't building the AI; it's knowing what happened when it failed.

Langfuse MCP: Get Quality Scores and Usage Metrics

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

Debugging an intermittent API error

Auditing prompt compliance

Calculating operational cost

Analyzing multi-user behavior

The honest tradeoffs

Searching for single log lines

Assuming cost is constant

Ignoring prompt changes

When It Fits, When It Doesn't

Questions you might have

Powerful workflows you can unlock today

MCP Recipe for AI Inference Monitoring

Monitor AI Agent Performance Using MCP Servers

Route AI Requests to the Fastest Model via MCP

Track LLM Cost vs Quality Using MCP Servers