Langfuse MCP. See exactly how your LLM works, cost, and performs.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Langfuse (LLM Tracing & Evals) monitors your LLM apps. It lets your AI client track API calls, view detailed latencies, and manage prompt versions.
You can attach human feedback or automated metrics to specific traces. It's for seeing exactly how your AI works, from token count to dollar cost.
What your AI agents can do
Create observation
Adds a new granular piece of data (like a span, event, or generation) into an existing LLM trace.
Create score
Attaches structured human feedback (e.g., 1-5 stars) or automated metrics to a specified trace or observation.
Get daily metrics
Retrieves a summary report showing total USD costs and aggregated latency statistics for the day.
Use list_traces and get_trace to view the full flow, latency, and token usage for any LLM interaction.
Use list_prompts to pull up active prompt templates and see their system instructions and required input variables.
Use get_daily_metrics to generate a summary of total USD spending and average latency across the defined time period.
Use get_observation or list_observations to look at specific, granular events or generations within a full trace.
Use create_score to attach custom human feedback (e.g., 1-5 stars) or automated metrics to a specific trace or observation.
Use list_sessions to group multiple related traces, giving context to multi-turn user workflows.
Ask AI about this MCP
Supported MCP Clients
Waiting for input…
Langfuse (LLM Tracing & Evals) MCP Server: 10 Tools for LLM Ops
Use these tools to track every LLM interaction, audit prompt templates, and analyze performance and cost metrics for your AI agents.
019d75c4create observation
Adds a new granular piece of data (like a span, event, or generation) into an existing LLM trace.
019d75c4create score
Attaches structured human feedback (e.g., 1-5 stars) or automated metrics to a specified trace or observation.
019d75c4get daily metrics
Retrieves a summary report showing total USD costs and aggregated latency statistics for the day.
019d75c4get observation
Pulls the specific context details for a single span or generation within a trace.
019d75c4get trace
Gets the complete, nested graph of data for a single, full LLM interaction.
019d75c4list observations
Lists all raw observation objects across many different traces and sessions.
019d75c4list prompts
Extracts a list of all actively managed prompt templates and their versions.
019d75c4list scores
Lists all recorded scores, helping you track quality or cost metrics across your models.
019d75c4list sessions
Lists high-level user session groups that contain multiple related LLM interaction traces.
019d75c4list traces
Retrieves a list of every completed LLM API session that Langfuse has tracked.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Langfuse (LLM Tracing & Evals), then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 4,700+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
What you can do with this MCP connector
Langfuse monitors your LLM apps. Your AI client tracks every API call, showing you detailed latencies, how many tokens you used, and what version of the prompt ran. You can attach human feedback or automated metrics to specific traces. This lets you see exactly how your AI works, from the token count to the dollar cost.
Reviewing LLM API Session Data
To see the full flow, latency, and token usage for any LLM interaction, you use list_traces to get a list of every completed API session that Langfuse has tracked, then you use get_trace to pull the complete, nested graph of data for a single interaction. You can also use list_sessions to group multiple related traces, giving context to multi-turn user workflows.
Inspecting Prompt Templates
You use list_prompts to pull up all active prompt templates and see their system instructions and required input variables.
Calculating Usage Costs
To generate a summary of total USD spending and average latency across a defined time period, you run get_daily_metrics.
Analyzing Interaction Details
For granular detail, you use list_observations to look at all raw observation objects across different traces and sessions. You can also pull the specific context details for a single span or generation within a trace using get_observation.
Scoring and Grading Model Output
You use create_score to attach custom human feedback, like 1-5 stars, or automated metrics to a specific trace or observation. You can also use create_observation to add a new granular piece of data, like a span, event, or generation, into an existing LLM trace.
Other Tools
To track quality or cost metrics across your models, you can use list_scores to list all recorded scores. You can also use get_observation to pull the specific context details for a single span or generation within a trace.
How Langfuse MCP Works
- 1 Subscribe to this server and provide your Langfuse API URL, Public Key, and Secret Key.
- 2 Your AI client begins sending interaction data to the MCP server, starting the monitoring process.
- 3 You then ask your agent to run specific analysis, like 'What were our costs yesterday?' or 'Show me the prompt for customer support.' The agent uses the relevant tool.
The bottom line is you can ask your AI client to access deep, structured data about your LLM performance and costs, without leaving your chat window.
Who Is Langfuse MCP For?
This is for the LLM Engineer who needs to debug complex AI chains without manually clicking through dashboards. It’s for the Product Owner who needs to track daily AI costs and user satisfaction scores across production environments. Data Scientists use this to audit model quality and manage prompt templates efficiently.
Uses get_trace and list_observations to debug complex AI chains and measure exact token latencies in real time.
Uses get_daily_metrics to monitor daily AI costs and user satisfaction scores across multiple production environments.
Uses list_prompts and create_score to audit model response quality and manage prompt templates to improve model grounding.
What Changes When You Connect
- See full API session details by calling
list_tracesandget_trace. You instantly see latencies, token counts, and chained payloads for every LLM interaction. - Track your budget using
get_daily_metrics. This tool gives you a rolled-up view of total USD spending and average latency, so you know exactly what your AI infrastructure costs. - Manage and audit your prompts with
list_prompts. You can query all active templates to inspect system instructions and check what variables they need. - Pinpoint failures with
get_observation. Instead of sifting through logs, you pull specific spans or generations to find the exact point where an LLM interaction failed. - Improve model quality by using
create_score. You can attach structured human feedback or automated metrics to a trace, making your evaluation systematic. - Understand user journeys with
list_sessions. This tool groups multiple related traces together, helping you see the whole picture of a multi-turn conversation.
Real-World Use Cases
Debugging a broken agent workflow
An LLM Engineer notices the agent fails on complex data extraction. They ask their agent to run list_traces to find the error. Then, they use get_trace on the failing session to view the full, nested graph of the failure, pinpointing the exact API payload that caused the issue.
Checking daily operational costs
A Product Owner needs to check the budget before the end of the month. They ask their agent to run get_daily_metrics. The agent instantly reports the total USD spending and average latency, letting the PO know if they're over budget.
Auditing prompt consistency
A Data Scientist wants to ensure the 'customer-support' prompt is up-to-date. They ask their agent to run list_prompts. The agent shows the system instructions and required variables, confirming the team is using the right template.
Analyzing a multi-step user chat
A team lead needs to understand a user's multi-turn interaction. They ask their agent to run list_sessions. The agent groups all related traces, allowing the team lead to analyze the full context of the conversation, not just isolated messages.
The Tradeoffs
Assuming all data is visible
Manually scrolling through raw logs looking for the token count or latency for a specific turn. It's a nightmare of copy-pasting timestamps.
→
Instead, ask your agent to run get_trace and get_observation. This pulls the exact, structured data—like token counts and specific latencies—for one single interaction, instantly.
Treating prompts as static text
Assuming the system instruction is just the text you see, without knowing if it has different versions or required inputs.
→
Use the list_prompts tool. This shows you every active version and the exact variables—like customer_name—that the prompt requires to run correctly.
Ignoring the cost factor
Building a complex AI agent without knowing if it will exceed the monthly budget due to unexpected token usage.
→
Run get_daily_metrics. This tool gives you a real-time breakdown of total USD costs and average latency, keeping your AI development financially accountable.
When It Fits, When It Doesn't
Use this server if your core problem is visibility: you need to know why your LLM behaved the way it did, how much it cost, and if it was accurate. This is essential for LLM Engineers debugging complex chains, and Product Owners tracking spend.
Don't use this if you just need to chat with an AI or perform a simple single-step task. If you just need to run a basic prompt, you don't need Langfuse. But if you need to measure, audit, or debug that basic prompt, this server is required. If your problem is connecting to an external service that isn't AI-related (like a CRM or accounting system), this server won't help—you need a different type of integration.
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Langfuse. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Available Capabilities
Debugging an LLM chain shouldn't feel like archaeology.
Today, when an AI agent fails, you're stuck clicking through dashboards, cross-referencing logs, and manually guessing where the failure happened. You copy-paste payloads and try to piece together the timing to figure out if the issue was the model, the prompt, or the chain itself.
With this MCP server, you simply ask your agent to `get_trace` on the failed session. It instantly pulls the full, nested graph, showing exactly which step failed, the payload, the token count, and the precise moment the error occurred. It’s immediate diagnosis.
Langfuse (LLM Tracing & Evals) MCP Server: Track & Score Management
You no longer have to manually check separate tools for cost, quality, and usage. You ask for a summary, and the agent uses `get_daily_metrics` to pull the rolled-up USD costs and average latency. It combines finance and performance metrics in one go.
The system is now connected. You get real-time, consolidated data on everything—from the cost of a single token to the overall quality score attached via `create_score`. It moves observability out of the dashboard and into the conversation.
Common Questions About Langfuse MCP
How do I use the Langfuse (LLM Tracing & Evals) MCP Server to check costs? +
You run get_daily_metrics. This tool gives you a summary report of total USD spending and average latency for the day. It's the fastest way to track your AI infrastructure spending.
What is the best way to debug a complex LLM failure using Langfuse (LLM Tracing & Evals) MCP Server? +
You should use get_trace. This tool retrieves the complete telemetry and nested graph for a single trace, letting you see the full flow, payloads, and error status at a glance.
Can I find out what variables a prompt needs using Langfuse (LLM Tracing & Evals) MCP Server? +
Yes, use list_prompts. This tool extracts all active prompt templates and shows the system instructions, along with the exact input variables the prompt requires.
Is Langfuse (LLM Tracing & Evals) MCP Server good for multi-turn chats? +
Yes, use list_sessions. This tool groups multiple related traces together, providing context for multi-turn user workflows, which is better than looking at individual messages.
How do I use the list_traces tool to view multiple LLM API sessions? +
The list_traces tool retrieves all LLM API sessions. This lets you see a high-level view of multiple chains, including the start time, end time, and overall status for quick comparison.
Can I use the get_daily_metrics tool to track specific cost models? +
Yes, get_daily_metrics generates aggregated reports on USD costs and average latency. You can analyze spending patterns across different models used in your LLM application.
How does the create_observation tool help me debug an error? +
The create_observation tool lets you attach specific spans, events, or generations directly into a trace. This pinpoints exactly where a failure or performance bottleneck occurred in the LLM workflow.
What information can the list_prompts tool provide about prompt templates? +
The list_prompts tool extracts actively managed prompt templates and versions. You can inspect the system instructions and see what input variables each template expects.
Can I see the exact system instruction for a specific prompt version? +
Yes. Use the list_prompts tool to browse your managed templates. Your agent can retrieve the exact text and variables for any deployed prompt version, making it easy to audit AI logic through natural conversation.
How do I log human feedback for a specific trace? +
Use the create_score tool by providing the Trace ID and a JSON payload defining the score name (e.g. 'user-satisfaction') and value. Your agent will attach this structured data directly to the Langfuse record.
Can my agent report on my LLM spending for the current day? +
Absolutely. The get_daily_metrics tool retrieves aggregated USD costs and average latency metrics from Langfuse. Your agent can summarize these statistics to help you monitor your infrastructure budget in real-time.
Multi-server workflows that include Langfuse (LLM Tracing & Evals) MCP
MCP Recipe for AI Inference Monitoring
Your GPT-4 API takes 4 seconds per response , Groq returns the same quality answer in 180 milliseconds, Langfuse traces every call, and Sheets shows the latency-cost comparison that makes your product feel instant
Monitor AI Agent Performance Using MCP Servers
Your agents run in production but you cannot explain why one failed at 3am , fix that
Route AI Requests to the Fastest Model via MCP
You run everything on GPT-4o because choosing a model per task is hard , your agent benchmarks Groq and Mistral against your actual workloads
Track LLM Cost vs Quality Using MCP Servers
Your OpenAI bill grew from $200 to $2,400 in 2 months and you have no idea which feature caused it , because you track API spend at the account level, not at the prompt level
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.
More in this category
LangSmith
Observability and evaluation platform for LLM applications — monitor traces, debug agent runs, and track performance metrics across your AI stack.
Workato
Monitor automation recipes, manage job executions, and audit app connections on Workato — the leading enterprise iPaaS platform.
Chainlit
Empower your AI agents to audit chat threads, analyze model steps, and track LLM observability metrics securely.
You might also like
Spotify Listening History Parser
Parse your Spotify data export and discover your top artists, tracks, and total listening hours. Turn your AI into your personal music analyst local.
Mailosaur
Automate email and SMS testing and management via the Mailosaur REST API.
SevenRooms
Manage restaurant reservations, guest CRM profiles, waitlists, table availability, and events for your SevenRooms venues through natural conversation.