# LangSmith MCP

> LangSmith gives you full visibility into your LLM applications. Use this MCP to track performance, debug agent runs, and see exactly where your AI workflows break down. It gathers aggregate metrics for projects and lets you deep-dive into every step of a single run—essential for any engineer building complex AI systems.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** llm-observability, tracing, evaluation, performance-metrics, ai-debugging, prompt-testing

## Description

When you're running LLMs or multi-step agents, the execution path can feel like a black box. LangSmith changes that. This connector gives your agent the ability to monitor and debug those tricky workflows in real time. You can look at an entire project's health, checking metrics like median latency or total runs across dozens of models. If something goes wrong, you don't have to guess; you can get a full trace for any specific run, seeing every input and output. This whole system—listing projects, viewing recent activity, and getting detailed run reports—is all available through Vinkius, letting your AI client manage the complexity for you.

## Tools

### langsmith_get_run
Retrieves full execution details and inputs/outputs for a single, specific run ID.

### langsmith_list_projects
Lists all your tracing projects with key metrics like total runs, median latency, and feedback counts.

### langsmith_list_runs
Shows a list of recent traces across a project, detailing status, type (LLM/chain/tool), and token usage.

## Prompt Examples

**Prompt:** 
```
List all my LangSmith projects and show their metrics.
```

**Response:** 
```
Found 4 projects:

| Name | ID | Runs | Latency | Created |
|---|---|---|---|---|
| production-agent | `a1b2c3d4` | 12,450 | 340ms | 2026-01-15 |
| staging-chatbot | `e5f6g7h8` | 3,200 | 280ms | 2026-02-20 |
| research-rag | `i9j0k1l2` | 890 | 520ms | 2026-03-01 |
| test-evaluations | `m3n4o5p6` | 150 | 190ms | 2026-03-28 |
```

**Prompt:** 
```
Show me the last 5 runs in my production-agent project.
```

**Response:** 
```
Last 5 runs in 'production-agent':

| Name | ID | Type | Latency | Created |
|---|---|---|---|---|
| agent_executor (success) | `r1s2t3` | chain | 1,250 tokens | 2026-04-04 |
| gpt-4o (success) | `u4v5w6` | llm | 890 tokens | 2026-04-04 |
| tool:web_search (success) | `x7y8z9` | tool | 340ms | 2026-04-04 |
| agent_executor (error) | `a0b1c2` | chain | 450 tokens | 2026-04-03 |
| gpt-4o (success) | `d3e4f5` | llm | 1,100 tokens | 2026-04-03 |
```

**Prompt:** 
```
Get details on the failed run a0b1c2.
```

**Response:** 
```
Run details for `a0b1c2`:

| Name | ID | Type | Latency | Created |
|---|---|---|---|---|
| agent_executor \| chain \| error | `a0b1c2` | chain | 450 tokens | 2026-04-03 |

Error: Tool 'web_search' returned timeout after 30s. The agent retried 3 times before failing.
```

## Capabilities

### View project health metrics
Get aggregate data about a group of related traces, including total runs and average latency.

### List recent workflow activity
Browse all the latest completed or failed agent actions, showing status and token usage for quick checks.

### Deep-dive into a single trace
Retrieve every input, output, and timing detail for one specific run to pinpoint the failure point.

## Use Cases

### The user reports inconsistent answers.
An AI Engineer suspects a specific tool is failing intermittently. They use langsmith_list_runs to filter for failed runs, then use langsmith_get_run on those IDs. This immediately reveals the exact input data that caused the timeout, allowing them to fix the upstream logic.

### The team needs to compare two model versions.
An ML Team wants to know if Model B is better than Model A. They use langsmith_list_projects to group results by model version and review the median latency data, making a data-driven decision on which one to deploy.

### A new feature adds unexpected cost.
DevOps notices an unexplained spike in monthly costs. They check langsmith_list_projects for the 'total runs' metric and use langsmith_list_runs to trace back to the specific agent workflow that is running too frequently.

### The overall AI pipeline slows down post-deployment.
An engineer notices performance dipping. They check the project metrics using langsmith_list_projects, which shows a sudden rise in latency compared to yesterday's baseline, directing them straight to the bottleneck.

## Benefits

- Pinpoint failures fast. Instead of just knowing a run failed, you can use langsmith_get_run to see the full stack trace, including which tool timed out and why.
- Track performance over time. Use langsmith_list_projects to compare aggregate metrics like median latency across different versions or environments.
- Stay ahead of regressions. Quickly list recent activity with langsmith_list_runs to spot a sudden spike in token usage or an unexpected increase in error statuses.
- Isolate model issues. By viewing project and run types, you can easily determine if the slowdown is due to a specific LLM call versus a complex agent action (chain).
- Improve reliability. The ability to see associated feedback metrics helps your team prioritize which parts of the workflow need debugging first.

## How It Works

The bottom line is: your agent can now inspect complex AI code paths without needing manual API calls.

1. Subscribe to this MCP and enter your LangSmith API key.
2. Your agent uses the connection to monitor LLM calls and agent actions as they happen in production.
3. The AI client then provides you with structured data, letting you query project metrics or specific run details.

## Frequently Asked Questions

**How do I check overall performance with langsmith_list_projects?**
You use langsmith_list_projects to get a summary table. It shows aggregate metrics like median latency and total runs across your entire project group, letting you gauge overall health instantly.

**What is the difference between langsmith_list_runs and langsmith_get_run?**
langsmith_list_runs gives you a list of recent attempts (the 'what'). langsmith_get_run requires a specific ID to give you the full, deep-dive trace that shows every single input and output from that run.

**Can I use LangSmith MCP for simple logging?**
No. This MCP is built for tracing complex flows. If your task is just sending a message or updating one record, you don't need this; it handles the complexity of multi-step AI execution.

**How do I track performance across my whole app?**
You start with langsmith_list_projects. This tool groups all your related traces and provides those aggregate metrics that let you compare project health at a glance.

**How do I analyze the full details of a specific failed run using langsmith_get_run?**
It provides a complete, deep dive into that single execution. You'll see the entire trace flow, including all inputs and outputs, which lets you pinpoint exactly where the agent ran into an error or unexpected behavior.

**Can I use langsmith_list_runs to filter traces by specific types, like only 'tool' calls?**
Yes. The tool lists runs and allows filtering by type (LLM, chain, or tool). This is useful because you can isolate the performance data for just one component of your larger agent workflow.

**What does langsmith_list_projects show regarding project setup and scope?**
It gives a high-level dashboard view of all tracing projects in your account. You immediately get aggregate metrics like total runs, median latency, and feedback scores across entire groups of related traces.

**How can I track token usage or specific performance timings using this MCP?**
Every run recorded by the MCP tracks these core metrics. You see both token counts and precise timing data for every step, whether it's an LLM call or a complex chain execution.

**What is LangSmith and why do I need it?**
LangSmith is the 'Datadog for LLM applications'. Without observability, AI agents in production are black boxes — you can't see what they're doing, why they fail, or how much they cost. LangSmith traces every LLM call, chain execution, and tool use, giving you complete visibility into inputs, outputs, latency, token usage, and error rates.

**Does LangSmith work only with LangChain?**
No! While LangSmith is built by the LangChain team and has native LangChain/LangGraph integration, it works with any LLM application. You can trace OpenAI, Anthropic, or any LLM provider directly using the REST API. It also integrates with CrewAI, AutoGen, and other frameworks.

**How much does LangSmith cost?**
LangSmith offers a generous free tier with 5,000 traces per month — no credit card required. The Developer plan is $39/month with 50,000 traces. Enterprise plans include SSO, RBAC, dedicated support, and unlimited traces with volume discounts.