# Chainlit Observability MCP

> Chainlit provides observability for AI applications, letting you audit chat threads and track LLM performance metrics securely. It maps global traffic statistics across your entire AI portfolio; lets you query full chronological conversations from users; and tracks every internal logic jump—identifying prompts, tool executions, and retrieval boundaries used per interaction.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** llm-observability, conversational-ai, telemetry, ai-analytics, model-tracking, chat-logs

## Description

This MCP connects to your Chainlit Cloud projects, giving your agent direct access to deep conversation data. Instead of just seeing if the chatbot worked, you can look inside: audit specific interactions or pull global analytics mapping usage across all deployed apps. You'll get explicit records of what happened in every chat session, from listing the full thread payload to tracking user sentiment via collected feedbacks. It’s crucial for diagnosing failures; your agent finds out exactly which internal prompts and tool calls led to a bad output. For product teams, this means automatically summarizing negative feedback or polling new chats for compliance parameters, all without manually reading logs. Integrating this through Vinkius lets you connect this powerful audit capability directly into your preferred AI client.

## Tools

### list_projects
Retrieves a list of every configured Chainlit Cloud project space that is actively managing app tracking.

### list_threads
Identifies all individual conversational threads that occurred within a specified deployed project.

### get_thread
Retrieves the full, exact payload for a single conversational thread, mapping its complete topology.

### list_steps
Lists the raw sequence of programmatic interaction steps, defining the prompts and generations within one specific thread.

### list_feedbacks
Lists all user reviews and feedback signals related to conversational accuracy and value across deployments.

### get_stats
Pulls explicit analytics statistics detailing traffic volume and resource consumption across designated projects.

## Prompt Examples

**Prompt:** 
```
Retrieve the analytics stats of my currently enabled Chainlit cloud project targeting traffic.
```

**Response:** 
```
Statistics collected securely via the Project endpoints. Currently, your implementation 'SupportBot-Alpha' processes approximately 389 conversations accounting for nearly 41,000 internal generative tokens in this time boundary. Anything specific to narrow down?
```

**Prompt:** 
```
Search my cloud instance for the recent recorded chat interactions (threads) to fetch what clients asked today.
```

**Response:** 
```
Execution successful on your native environment. Found 8 active raw threads in the last few windows. Chiefly, Thread #d8s1_.. tracks questions involving "Pricing plans and multi-factor capabilities". Should I dive deeper reading the full step logic of this thread?
```

**Prompt:** 
```
Gather all negative feedbacks users submitted across this AI project.
```

**Response:** 
```
Negative feedback list acquired covering the whole node tree. I detected exactly 4 'thumbs_down' signals mapped explicitly against responses generated by standard API calls. Users complained specifically stating: "The table format is badly cropped on mobile." Would you like me to identify the trace of those failed responses?
```

## Capabilities

### View overall project statistics
Retrieves global traffic data and usage figures across all configured applications.

### Inspect full conversation history
Gets the complete, chronological transcript for a specific user interaction thread.

### Trace model logic steps
Maps out the internal decision process by listing every prompt and tool execution within a single chat session.

### Analyze user ratings and feedback
Collects explicit and implicit user reviews, including thumbs up/down signals, for performance tracking.

### List available AI projects
Retrieves a list of all independently tracked Chainlit Cloud project spaces.

## Use Cases

### Identifying hallucination triggers
An agent needs to know why the model provided incorrect data for a client. It calls `list_steps` on the affected thread, which returns the exact prompt and internal tool execution that caused the failure, allowing immediate developer correction.

### Calculating overall product value
A PM wants to know if the recent UI update improved user satisfaction. The agent calls `list_feedbacks` across all projects, grouping and summarizing negative outcomes ('thumbs down') versus positive ones.

### Auditing compliance in chat logs
A QA team member needs to check if the bot is adhering to privacy policies. The agent calls `list_threads` for a specific project, then uses `get_thread` to pull transcripts and scan them for unmasked PII.

### Debugging resource bottlenecks
The ops engineer notices slow performance spikes. They use `get_stats` to check traffic boundaries and consumption rates; if the usage is high, they narrow down the issue by listing all available projects via `list_projects`.

## Benefits

- Diagnose production errors instantly. By calling `list_steps`, you extract the full logical sequence, pinpointing whether a bad output resulted from an initial prompt or a flawed tool execution.
- Track user sentiment systematically. Use `list_feedbacks` to gather all explicit and implicit ratings across your app—you get a metric of success versus failure immediately.
- Monitor total application health. Calling `get_stats` gives you global traffic counts and usage figures, letting you understand the scale of activity without diving into dashboards.
- Review specific conversations easily. You can use `list_threads` to find recent user interactions, then `get_thread` to pull the full payload for deep analysis on a single chat session.
- Manage multiple apps efficiently. Start by using `list_projects` to see every independent tracking space you manage before running targeted audits.

## How It Works

The bottom line is that your agent can turn raw chat logs into actionable performance metrics by systematically querying project data and interaction histories.

1. First, subscribe to this MCP and provide your Chainlit Cloud URL along with the necessary Project API Key.
2. Next, direct your AI agent to identify the specific resource you want data from (e.g., list projects or select a thread).
3. Finally, prompt the agent using one of the available tools; it will execute the query and return structured diagnostics like traffic stats or detailed step logic.

## Frequently Asked Questions

**How do I find out how many different AI apps I'm running with Chainlit? (list_projects)**
Call `list_projects`. This tool returns a clean list of all the independently tracked projects you have configured in your Cloud instance.

**I need to see what users talked about today. Which tool do I use? (list_threads)**
Use `list_threads`. It finds and lists every unique conversational thread within a specific project, giving you the IDs needed for deeper inspection.

**What is the difference between `get_thread` and `list_steps`? (get_thread)**
`get_thread` gives you the entire conversation payload—the raw chat transcript. `list_steps`, on the other hand, breaks that down into the machine logic: identifying each specific prompt or tool call made during the session.

**How do I check if my chatbot is popular? (get_stats)**
`get_stats` pulls global metrics. It provides traffic boundaries and usage figures, telling you how many conversations were processed and what resource consumption was measured over time.

**When I run `list_steps`, what does the raw programmatic interaction step actually show me?**
It reveals the exact sequence of internal logic jumps used in a single chat. You get explicit details on every prompt, the model's output, and which tools were executed during that specific interaction.

**How can I use `list_feedbacks` to find all the specific reasons users rated my bot poorly?**
This tool aggregates user review feedbacks, letting you filter by sentiment (like 'thumbs down'). You can read explicit textual complaints and spot recurring issues, such as formatting problems or poor tone.

**If I know the ID, how do I use `get_thread` to pull the full payload of one specific chat?**
You provide the unique thread ID, and the tool returns the complete data structure. This is ideal for compliance audits or recreating a session without pulling in surrounding conversations.

**Before I use `list_projects`, what credentials do I need to connect my project?**
You must provide your Chainlit Cloud URL along with the associated Project API Key. These two pieces of information authenticate your agent and set the specific scope for all data queries.

**Will the AI agent be able to monitor the user interactions and evaluate chat history?**
Yes! The agent can dive into the `list_threads` and `get_thread` endpoints to retrieve comprehensive interaction logs from your deployed Chainlit apps. You can essentially command the agent to read past AI chats, summarize usage, or identify edge cases in the user input.

**Can it track the individual thought steps and LLM prompt tokens consumed?**
Absolutely. Using the `list_steps` tool, your agent analyzes the programmatic trace—including specific LLM calls, function blocks, or retrieval events. Thus, identifying hallucinations or latency issues is as easy as typing a prompt.

**Is it possible to extract and analyze human feedback scores instantly?**
Yes. The integration provides native capabilities via `list_feedbacks` to retrieve the explicit thumbs up, down, and textual comments your users left on specific messages, streamlining QA.