LangSmith MCP for AI. Debug complex AI pipelines in natural conversation.

Q: How do I check the performance metrics for a single LLM invocation run using getrun?

You use getrun by providing the specific run ID. This returns precise telemetry, including total tokens consumed and latency in seconds. It’s the fastest way to measure performance.

Q: What is listprojects for in LangSmith?

listprojects maps out all distinct AI pipelines you are currently monitoring. This tool helps scope your investigation by showing which projects have recent activity or need auditing.

Q: Can I see what prompt templates my agent is using with listprompts?

Yes, listprompts extracts all available templates from the LangChain Hub. This lets you audit which instructions are active and check their version histories.

Q: What should I do if I need to see a list of evaluation datasets?

To view your curated 'golden' datasets for testing, use listdatasets. This confirms the data structure you should be using when measuring model performance.

Q: When using getrun, how do I find specific error messages from a failed run?

The telemetry returned by getrun includes exact error strings. This lets you pinpoint failure modes—like API rate limits or invalid inputs—without having to guess the cause of the crash.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Connect to your AI in seconds.

LangSmith (LLM Observability & Hub) gives you full control over LLM pipelines. It lets your agent trace every model call, audit prompt templates, and track performance metrics.

You get detailed logs for debugging complex multi-step AI workflows directly through natural conversation with any MCP-compatible client.

What your AI can do

List projects

Maps out the boundaries of distinct AI pipelines, allowing you to see all active tracing projects.

List runs

Lists specific LLM invocation runs, showing the prompts sent and responses received within a project.

Get run

Gets detailed performance metrics for a single, specific LLM invocation run.

+ 3 more capabilities included

Trace entire agent workflows

See the step-by-step execution path of multi-turn agents, including every tool call and internal reasoning decision.

Analyze model performance metrics

Extract precise data points like token count, prompt latency, and error strings from any completed LLM run.

Manage prompt versions

Access the central hub to view, retrieve, and audit all managed prompt templates and their version history.

Audit human feedback queues

List active annotation queues where human reviewers assess model safety, alignment, or accuracy in generated traces.

Track evaluation datasets

View the curated 'golden' datasets used for automatically testing prompt logic and few-shot models.

Ask an AI about this

Included with Plan

Waiting for input…

AI Agent

LangSmith (LLM Observability & Hub) with 6 Tools

These tools let your agent connect to LangSmith's core functions. You can scope projects, get specific run metrics, and manage prompt assets through direct conversation.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using LangSmith (LLM Observability & Hub) on Vinkius

List Projects

Maps out the boundaries of distinct AI pipelines, allowing you to see all active tracing projects.

List Runs

Lists specific LLM invocation runs, showing the prompts sent and responses received...

Get Run

Gets detailed performance metrics for a single, specific LLM invocation run.

List Datasets

Retrieves a list of all evaluation and fine-tuning datasets tracked in LangSmith.

List Prompts

Extracts a directory listing of all available prompt templates hosted in the...

List Annotation Queues

Lists all active human-in-the-loop queues where people are reviewing generated model traces.

Security and governance baked right in.

Pick your AI client below to get set up. Just create a Vinkius account, subscribe, and you're instantly up and running. We handle the entire backend infrastructure, delivering out-of-the-box support for HTTPS Streamable, SSE, and OAuth2—zero messy routing required.

Claude AI

Open Claude Settings

Go to claude.ai, click your profile icon, then navigate to Customize → Connectors.

Add Custom Connector

Click the "+" button and select Add custom connector. Paste your Vinkius endpoint URL:

https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp

Replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com. For OAuth-protected servers, expand Advanced settings to add credentials.

Start a conversation

Open a new chat. The LangSmith integration is available immediately — no restart needed.

Antigravity

Configure Agent Environment

Open your Antigravity agent's workspace configuration or mcp-servers.json file.

Bind the Endpoint

Add the Vinkius endpoint URL to your agent's MCP connections list:

"mcp_servers": {
  "langsmith-llm-observability-hub": {
    "serverUrl": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
  }
}

Provide your secure token in place of [YOUR_TOKEN_HERE] to ensure your agent requests are authenticated.

Execute

Start your Antigravity session. The agent will autonomously discover and utilize the LangSmith tools with full Vinkius guardrails applied.

VS Code Copilot

⚡

One-Click Install (Recommended)

In your Vinkius Dashboard, simply click the Add to VS Code button for this server. We'll automatically configure your local workspace.

Or configure manually

Open MCP Settings

Open VS Code, press Ctrl/Cmd + Shift + P, and search for GitHub Copilot: MCP Servers.

Add Server Config

Add the Vinkius endpoint configuration to your mcp-servers.json file:

"langsmith-llm-observability-hub": {
  "url": "https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp"
}

Ensure you replace [YOUR_TOKEN_HERE] with your token from cloud.vinkius.com.

LangChain

Install Dependencies

Install the LangChain MCP adapters for your environment:

pip install langchain-mcp-adapters

Connect the Server

Use the SSEClient in LangChain to connect to the Vinkius managed endpoint:

from langchain_mcp_adapters.client import SSEClient

# Connect to Vinkius
client = SSEClient(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")
tools = client.get_tools()

CrewAI

Define the Tool

Load the Vinkius MCP tools into your CrewAI agents:

from crewai import Agent
from mcp_crewai import MCPTool

# Connect securely to Vinkius
vinkius_tools = MCPTool(url="https://edge.vinkius.com/[YOUR_TOKEN_HERE]/mcp")

# Assign to Agent
researcher = Agent(
    role='Data Researcher',
    tools=vinkius_tools.get_all()
)

Execute Task

Run your CrewAI process. The agent will autonomously route tasks to the Vinkius managed server.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with LangSmith (LLM Observability & Hub), then connect any of our 5,100+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,100+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by LangSmith. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This connection provides 6 powerful capabilities that interface natively with Claude, ChatGPT, Cursor, and other compatible AI platforms. No middleware. No custom integration required.

Debugging LLMs used to mean manually sifting through endless dashboards.

Today, when an agent fails in production, your process is a nightmare. You jump into the platform UI, click on projects, then runs, and then you're looking at metrics that feel incomplete. Finding the source of truth—the exact prompt version used or the specific token count—requires clicking through five different tabs and copying data points manually.

With this MCP, your agent handles the heavy lifting. You ask a question in natural language, and it pulls together all the necessary diagnostic details: run telemetry, prompt history, and project boundaries. It puts the entire debugging suite into one conversational output.

LangSmith (LLM Observability & Hub) gives you full-stack visibility.

You no longer have to rely on manual logging or hope that your team remembered to capture everything. You can use `list_runs` to see the raw conversation history and simultaneously call `get_run` to pull the precise token usage for that exact exchange, all in one query.

The system shows you exactly what happened—not just that it failed. This immediate diagnostic capability means less time debugging and more time building.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What your AI can actually do with this

Debugging large language models can be a nightmare. When an agent fails, you need to know exactly why. This MCP connects your LLM application to LangSmith, giving you deep observability over every run. Instead of digging through massive UI dashboards and filtering logs manually, you talk to your agent, and it retrieves the necessary data for you.

You can ask what happened in a specific pipeline, pull precise metrics on token usage or latency, or check the full history of prompt templates used across projects. It's like having a dedicated diagnostic console built into your workflow. Because Vinkius hosts this MCP, you connect once from any client and get access to robust LLM governance for debugging and auditing.

Built · Hosted · Managed by Vinkius LangSmith MCP - LLM Observability and Debugging Hub

Server ID 019d75c4-6571-72e8-bb67-756905764333

Vinkius Inspector

Compliance Grade A+

Score 100/100

Report View Report ↗

What Changes When You Connect

Stop guessing why an agent failed. By calling get_run, you instantly pull precise metrics like token consumption and latency, pinpointing the exact moment of failure.

Manage your prompt logic centrally. Use list_prompts to see every template in the LangChain Hub and check its full version history without navigating a separate UI.

Track model safety with human oversight. The list_annotation_queues tool lets you audit where human reviewers are assessing accuracy, helping you ground your model's behavior.

Map out your entire infrastructure quickly. Running list_projects shows all active AI pipelines, letting you focus only on the systems that matter right now.

Verify testing assets with one call. Use list_datasets to enumerate 'golden' datasets, confirming the structure used for automated evaluation before deployment.

See it in action

01 01

The agent hallucinated a key fact.

An ML Engineer notices an agent giving incorrect data. They first use list_projects to find the correct pipeline, then call list_runs for that project. Finally, they use get_run on the failing run ID to get the exact token usage and error strings needed to fix the prompt.

02 02

We need a new feature-specific prompt.

An AI Developer needs a better data extraction template. They start by running list_prompts to see what's available in the Hub, verify existing templates, and then retrieve the full instruction text for versioning.

03 03

Our model seems unsafe on edge cases.

An LLM Analyst suspects alignment issues. They use list_annotation_queues to pull up the live queue where human reviewers are assessing safety, allowing them to report on overall model grounding immediately.

04 04

We need to test a new dataset against an old prompt.

A data scientist wants to benchmark. They run list_datasets to confirm the available evaluation sets and then use these identifiers when checking performance metrics via get_run.

The honest tradeoffs

Treating it like a simple log dump.

Anti-pattern

Just dumping all raw logs for a project to figure out the cause. You'll get noise, and you won't know which metrics matter or if the error was in the prompt or the model call itself.

The Fix

First, run list_projects to scope down the investigation. Then, use get_run with a specific run ID to pull only the precise telemetry (tokens, latency) you need, keeping the noise out.

Ignoring prompt version control.

Anti-pattern

Updating your agent's instructions and hoping it works. Without tracking changes, you have no idea which prompt template actually caused the regression when things break in production.

The Fix

Always use list_prompts to see all versions of a template before making changes. This tracks history and lets you revert immediately.

Debugging without context.

Anti-pattern

Seeing an error message but not knowing if the model failed because of bad input data or poor prompt design. The error is meaningless without surrounding context.

The Fix

Use list_runs to isolate the raw interaction, and then use get_run on that specific run ID to get the detailed execution logs alongside the prompts sent.

When It Fits, When It Doesn't

Use this MCP if your problem is tracing execution—you need to see how far an agent got, what data it used, and exactly why a model failed. This is for debugging complexity. Don't use it if you just need basic text retrieval or simple API calls; those are better handled by direct client-side code execution. You should rely on the list_projects tool to define your scope first, then use get_run for deep dives. If all you want is a list of available templates, simply use list_prompts. This MCP adds observability and governance; it doesn't replace core functionality, but it lets you validate everything.

Questions you might have

How do I check the performance metrics for a single LLM invocation run using get_run? +

You use get_run by providing the specific run ID. This returns precise telemetry, including total tokens consumed and latency in seconds. It’s the fastest way to measure performance.

What is list_projects for in LangSmith? +

list_projects maps out all distinct AI pipelines you are currently monitoring. This tool helps scope your investigation by showing which projects have recent activity or need auditing.

Can I see what prompt templates my agent is using with list_prompts? +

Yes, list_prompts extracts all available templates from the LangChain Hub. This lets you audit which instructions are active and check their version histories.

What should I do if I need to see a list of evaluation datasets? +

To view your curated 'golden' datasets for testing, use list_datasets. This confirms the data structure you should be using when measuring model performance.

If I want to see all raw interactions in a project, should I use list_runs? +

Yes. This tool isolates every single interaction run within a specific project. You get the full history of prompts sent and responses received from the LLM model, which is critical for debugging complex failure paths.

What does list_annotation_queues do regarding human oversight? +

This tool lists active queues where human reviewers are assessing generated LLM traces. You can check if your model's outputs meet alignment or safety standards before you deploy them.

How can I use list_projects to understand my monitoring scope? +

It maps out the boundaries of every distinct AI pipeline currently running in your environment. This helps you know exactly where all your tracing data is segmented across the platform.

When using get_run, how do I find specific error messages from a failed run? +

The telemetry returned by get_run includes exact error strings. This lets you pinpoint failure modes—like API rate limits or invalid inputs—without having to guess the cause of the crash.

Can I see the token usage for a specific LLM run through my agent? +

Yes. Use the get_run_telemetry tool with a specific Run ID. Your agent will retrieve the exact token count (prompt + completion) and latency metrics calculated by LangSmith for that interaction.

How do I fetch a prompt template from the LangChain Hub using natural language? +

The list_prompts tool allows your agent to navigate your hosted Hub repository. You can ask your agent to find a specific prompt by name to inspect its instruction text, variables, and version history.

Can my agent check the status of human annotation queues? +

Absolutely. Use the list_annotation_queues tool to retrieve all active queues where human feedback is being collected. Your agent can report on the number of pending traces and general alignment scores established by your reviewers.

Connect to your AI in seconds.

List projects

List runs

Get run

LangSmith (LLM Observability & Hub) with 6 Tools

Make your AI actually useful.

List Projects

List Runs

Get Run

List Datasets

List Prompts

List Annotation Queues

Security and governance baked right in.

Claude AI

Open Claude Settings

Add Custom Connector

Start a conversation

Claude Code

Open your terminal

Add the MCP Server

Start coding

Cursor

One-Click Install (Recommended)

Open Cursor Settings

Add New Server

Use in Composer

Antigravity

Configure Agent Environment

Bind the Endpoint

Execute

VS Code Copilot

One-Click Install (Recommended)

Open MCP Settings

Add Server Config

Windsurf

One-Click Install (Recommended)

Open Windsurf Settings

Add Server Endpoint

LangChain

Install Dependencies

Connect the Server

CrewAI

Define the Tool

Execute Task

Choose How to Get Started

Build Your Own

Make Your AI Do More

Works with Claude, ChatGPT, Cursor, and more

Debugging LLMs used to mean manually sifting through endless dashboards.

LangSmith (LLM Observability & Hub) gives you full-stack visibility.

What your AI can actually do with this

Here's how it actually works

Who is this actually for?

What Changes When You Connect

See it in action

The agent hallucinated a key fact.

We need a new feature-specific prompt.

Our model seems unsafe on edge cases.

We need to test a new dataset against an old prompt.

The honest tradeoffs

Treating it like a simple log dump.

Ignoring prompt version control.

Debugging without context.

When It Fits, When It Doesn't

Questions you might have