Langfuse MCP. See exactly how your LLM works, cost, and performs.

Q: How do I use the Langfuse (LLM Tracing & Evals) MCP Server to check costs?

You run getdailymetrics. This tool gives you a summary report of total USD spending and average latency for the day. It's the fastest way to track your AI infrastructure spending.

Q: What is the best way to debug a complex LLM failure using Langfuse (LLM Tracing & Evals) MCP Server?

You should use gettrace. This tool retrieves the complete telemetry and nested graph for a single trace, letting you see the full flow, payloads, and error status at a glance.

Q: Can I find out what variables a prompt needs using Langfuse (LLM Tracing & Evals) MCP Server?

Yes, use listprompts. This tool extracts all active prompt templates and shows the system instructions, along with the exact input variables the prompt requires.

Q: Is Langfuse (LLM Tracing & Evals) MCP Server good for multi-turn chats?

Yes, use listsessions. This tool groups multiple related traces together, providing context for multi-turn user workflows, which is better than looking at individual messages.

Q: How do I use the listtraces tool to view multiple LLM API sessions?

The listtraces tool retrieves all LLM API sessions. This lets you see a high-level view of multiple chains, including the start time, end time, and overall status for quick comparison.

Q: Can I use the getdailymetrics tool to track specific cost models?

Yes, getdailymetrics generates aggregated reports on USD costs and average latency. You can analyze spending patterns across different models used in your LLM application.

Q: How does the createobservation tool help me debug an error?

The createobservation tool lets you attach specific spans, events, or generations directly into a trace. This pinpoints exactly where a failure or performance bottleneck occurred in the LLM workflow.

Q: What information can the listprompts tool provide about prompt templates?

The listprompts tool extracts actively managed prompt templates and versions. You can inspect the system instructions and see what input variables each template expects.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Langfuse (LLM Tracing & Evals) monitors your LLM apps. It lets your AI client track API calls, view detailed latencies, and manage prompt versions.

You can attach human feedback or automated metrics to specific traces. It's for seeing exactly how your AI works, from token count to dollar cost.

What your AI agents can do

Create observation

Adds a new granular piece of data (like a span, event, or generation) into an existing LLM trace.

Create score

Attaches structured human feedback (e.g., 1-5 stars) or automated metrics to a specified trace or observation.

Get daily metrics

Retrieves a summary report showing total USD costs and aggregated latency statistics for the day.

+ 7 more capabilities included

Review LLM API Session Data

Use list_traces and get_trace to view the full flow, latency, and token usage for any LLM interaction.

Inspect Prompt Templates

Use list_prompts to pull up active prompt templates and see their system instructions and required input variables.

Calculate Usage Costs

Use get_daily_metrics to generate a summary of total USD spending and average latency across the defined time period.

Analyze Interaction Details

Use get_observation or list_observations to look at specific, granular events or generations within a full trace.

Score and Grade Model Output

Use create_score to attach custom human feedback (e.g., 1-5 stars) or automated metrics to a specific trace or observation.

Manage User Interactions

Use list_sessions to group multiple related traces, giving context to multi-turn user workflows.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Langfuse (LLM Tracing & Evals) MCP Server: 10 Tools for LLM Ops

Use these tools to track every LLM interaction, audit prompt templates, and analyze performance and cost metrics for your AI agents.

create019d75c4

create observation

Adds a new granular piece of data (like a span, event, or generation) into an existing LLM trace.

create019d75c4

create score

Attaches structured human feedback (e.g., 1-5 stars) or automated metrics to a specified trace or observation.

get019d75c4

get daily metrics

Retrieves a summary report showing total USD costs and aggregated latency statistics for the day.

get019d75c4

get observation

Pulls the specific context details for a single span or generation within a trace.

get019d75c4

get trace

Gets the complete, nested graph of data for a single, full LLM interaction.

list019d75c4

list observations

Lists all raw observation objects across many different traces and sessions.

list019d75c4

list prompts

Extracts a list of all actively managed prompt templates and their versions.

list019d75c4

list scores

Lists all recorded scores, helping you track quality or cost metrics across your models.

list019d75c4

list sessions

Lists high-level user session groups that contain multiple related LLM interaction traces.

list019d75c4

list traces

Retrieves a list of every completed LLM API session that Langfuse has tracked.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Langfuse (LLM Tracing & Evals), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

Langfuse monitors your LLM apps. Your AI client tracks every API call, showing you detailed latencies, how many tokens you used, and what version of the prompt ran. You can attach human feedback or automated metrics to specific traces. This lets you see exactly how your AI works, from the token count to the dollar cost.

Reviewing LLM API Session Data

To see the full flow, latency, and token usage for any LLM interaction, you use list_traces to get a list of every completed API session that Langfuse has tracked, then you use get_trace to pull the complete, nested graph of data for a single interaction. You can also use list_sessions to group multiple related traces, giving context to multi-turn user workflows.

Inspecting Prompt Templates

You use list_prompts to pull up all active prompt templates and see their system instructions and required input variables.

Calculating Usage Costs

To generate a summary of total USD spending and average latency across a defined time period, you run get_daily_metrics.

Analyzing Interaction Details

For granular detail, you use list_observations to look at all raw observation objects across different traces and sessions. You can also pull the specific context details for a single span or generation within a trace using get_observation.

Scoring and Grading Model Output

You use create_score to attach custom human feedback, like 1-5 stars, or automated metrics to a specific trace or observation. You can also use create_observation to add a new granular piece of data, like a span, event, or generation, into an existing LLM trace.

Other Tools

To track quality or cost metrics across your models, you can use list_scores to list all recorded scores. You can also use get_observation to pull the specific context details for a single span or generation within a trace.

How Langfuse MCP Works

1 Subscribe to this server and provide your Langfuse API URL, Public Key, and Secret Key.
2 Your AI client begins sending interaction data to the MCP server, starting the monitoring process.
3 You then ask your agent to run specific analysis, like 'What were our costs yesterday?' or 'Show me the prompt for customer support.' The agent uses the relevant tool.

The bottom line is you can ask your AI client to access deep, structured data about your LLM performance and costs, without leaving your chat window.

Who Is Langfuse MCP For?

This is for the LLM Engineer who needs to debug complex AI chains without manually clicking through dashboards. It’s for the Product Owner who needs to track daily AI costs and user satisfaction scores across production environments. Data Scientists use this to audit model quality and manage prompt templates efficiently.

LLM Engineer

Uses get_trace and list_observations to debug complex AI chains and measure exact token latencies in real time.

Product Owner

Uses get_daily_metrics to monitor daily AI costs and user satisfaction scores across multiple production environments.

Data Scientist

Uses list_prompts and create_score to audit model response quality and manage prompt templates to improve model grounding.

What Changes When You Connect

See full API session details by calling list_traces and get_trace. You instantly see latencies, token counts, and chained payloads for every LLM interaction.
Track your budget using get_daily_metrics. This tool gives you a rolled-up view of total USD spending and average latency, so you know exactly what your AI infrastructure costs.
Manage and audit your prompts with list_prompts. You can query all active templates to inspect system instructions and check what variables they need.
Pinpoint failures with get_observation. Instead of sifting through logs, you pull specific spans or generations to find the exact point where an LLM interaction failed.
Improve model quality by using create_score. You can attach structured human feedback or automated metrics to a trace, making your evaluation systematic.
Understand user journeys with list_sessions. This tool groups multiple related traces together, helping you see the whole picture of a multi-turn conversation.

Real-World Use Cases

Debugging a broken agent workflow

An LLM Engineer notices the agent fails on complex data extraction. They ask their agent to run list_traces to find the error. Then, they use get_trace on the failing session to view the full, nested graph of the failure, pinpointing the exact API payload that caused the issue.

Checking daily operational costs

A Product Owner needs to check the budget before the end of the month. They ask their agent to run get_daily_metrics. The agent instantly reports the total USD spending and average latency, letting the PO know if they're over budget.

Auditing prompt consistency

A Data Scientist wants to ensure the 'customer-support' prompt is up-to-date. They ask their agent to run list_prompts. The agent shows the system instructions and required variables, confirming the team is using the right template.

Analyzing a multi-step user chat

A team lead needs to understand a user's multi-turn interaction. They ask their agent to run list_sessions. The agent groups all related traces, allowing the team lead to analyze the full context of the conversation, not just isolated messages.

The Tradeoffs

Assuming all data is visible

Manually scrolling through raw logs looking for the token count or latency for a specific turn. It's a nightmare of copy-pasting timestamps.

→ Instead, ask your agent to run get_trace and get_observation. This pulls the exact, structured data—like token counts and specific latencies—for one single interaction, instantly.

Treating prompts as static text

Assuming the system instruction is just the text you see, without knowing if it has different versions or required inputs.

→ Use the list_prompts tool. This shows you every active version and the exact variables—like customer_name—that the prompt requires to run correctly.

Ignoring the cost factor

Building a complex AI agent without knowing if it will exceed the monthly budget due to unexpected token usage.

→ Run get_daily_metrics. This tool gives you a real-time breakdown of total USD costs and average latency, keeping your AI development financially accountable.

When It Fits, When It Doesn't

Use this server if your core problem is visibility: you need to know why your LLM behaved the way it did, how much it cost, and if it was accurate. This is essential for LLM Engineers debugging complex chains, and Product Owners tracking spend.

Don't use this if you just need to chat with an AI or perform a simple single-step task. If you just need to run a basic prompt, you don't need Langfuse. But if you need to measure, audit, or debug that basic prompt, this server is required. If your problem is connecting to an external service that isn't AI-related (like a CRM or accounting system), this server won't help—you need a different type of integration.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Langfuse. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_observation create_score get_daily_metrics get_observation get_trace list_observations list_prompts list_scores list_sessions list_traces

Debugging an LLM chain shouldn't feel like archaeology.

Today, when an AI agent fails, you're stuck clicking through dashboards, cross-referencing logs, and manually guessing where the failure happened. You copy-paste payloads and try to piece together the timing to figure out if the issue was the model, the prompt, or the chain itself.

With this MCP server, you simply ask your agent to `get_trace` on the failed session. It instantly pulls the full, nested graph, showing exactly which step failed, the payload, the token count, and the precise moment the error occurred. It’s immediate diagnosis.

Langfuse (LLM Tracing & Evals) MCP Server: Track & Score Management

You no longer have to manually check separate tools for cost, quality, and usage. You ask for a summary, and the agent uses `get_daily_metrics` to pull the rolled-up USD costs and average latency. It combines finance and performance metrics in one go.

The system is now connected. You get real-time, consolidated data on everything—from the cost of a single token to the overall quality score attached via `create_score`. It moves observability out of the dashboard and into the conversation.

Common Questions About Langfuse MCP

How do I use the Langfuse (LLM Tracing & Evals) MCP Server to check costs? +

You run get_daily_metrics. This tool gives you a summary report of total USD spending and average latency for the day. It's the fastest way to track your AI infrastructure spending.

What is the best way to debug a complex LLM failure using Langfuse (LLM Tracing & Evals) MCP Server? +

You should use get_trace. This tool retrieves the complete telemetry and nested graph for a single trace, letting you see the full flow, payloads, and error status at a glance.

Can I find out what variables a prompt needs using Langfuse (LLM Tracing & Evals) MCP Server? +

Yes, use list_prompts. This tool extracts all active prompt templates and shows the system instructions, along with the exact input variables the prompt requires.

Is Langfuse (LLM Tracing & Evals) MCP Server good for multi-turn chats? +

Yes, use list_sessions. This tool groups multiple related traces together, providing context for multi-turn user workflows, which is better than looking at individual messages.

How do I use the list_traces tool to view multiple LLM API sessions? +

The list_traces tool retrieves all LLM API sessions. This lets you see a high-level view of multiple chains, including the start time, end time, and overall status for quick comparison.

Can I use the get_daily_metrics tool to track specific cost models? +

Yes, get_daily_metrics generates aggregated reports on USD costs and average latency. You can analyze spending patterns across different models used in your LLM application.

How does the create_observation tool help me debug an error? +

The create_observation tool lets you attach specific spans, events, or generations directly into a trace. This pinpoints exactly where a failure or performance bottleneck occurred in the LLM workflow.

What information can the list_prompts tool provide about prompt templates? +

The list_prompts tool extracts actively managed prompt templates and versions. You can inspect the system instructions and see what input variables each template expects.

Can I see the exact system instruction for a specific prompt version? +

Yes. Use the list_prompts tool to browse your managed templates. Your agent can retrieve the exact text and variables for any deployed prompt version, making it easy to audit AI logic through natural conversation.

How do I log human feedback for a specific trace? +

Use the create_score tool by providing the Trace ID and a JSON payload defining the score name (e.g. 'user-satisfaction') and value. Your agent will attach this structured data directly to the Langfuse record.

Can my agent report on my LLM spending for the current day? +

Absolutely. The get_daily_metrics tool retrieves aggregated USD costs and average latency metrics from Langfuse. Your agent can summarize these statistics to help you monitor your infrastructure budget in real-time.

View all recipes →