# LlamaIndex MCP

> LlamaIndex (AI Data Framework & RAG) connects your AI agent directly to private, indexed enterprise knowledge bases. It lets you execute natural language queries against complex data pipelines, audit source files, and manage entire semantic search projects without writing boilerplate code.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** rag, semantic-search, data-framework, unstructured-data, indexing, llm-applications

## Description

Listen up. This server hooks your AI client straight into your private LlamaCloud data—it’s full operational control over Retrieval-Augmented Generation and semantic search orchestration. You don't gotta write boilerplate code for this stuff; you just talk to it.

To get a picture of what you're working with, start by running `list_projects`. This shows every active, top-level LlamaCloud project in your organization, letting you manage collections of related search boundaries and pipelines. Once you know which project you’re dealing with, you can run `list_pipelines` to see all the data pipelines deployed across your account.

Need details on a specific flow? You'll use `get_pipeline`. This tool pulls up the exact configuration settings for one pipeline you name, letting you check connected sources and embedding parameters. It’s how you audit exactly what kind of data that pipe is supposed to be using.

When it comes to making queries, this thing handles it like a pro. You run `query_pipeline` to execute a natural language query right against one specific pipeline. The agent retrieves answers that cite the exact source documents so you know where the information came from. That keeps everything grounded. If you want to check your semantic search boundaries, use `list_indexes`. This shows every active LlamaCloud index, confirming your proprietary data is set up correctly for searching.

For tracking raw material, you'll run `list_files`. This lists all the source files that got ingested by a specific pipeline. You can check the metadata on those files to verify document tracking status and see what ingestion limits apply. It’s crucial for knowing your audit trail is clean.

This whole setup lets your agent navigate complex data pipelines, letting you list every deployed flow with `list_pipelines` and then drill down into its specific settings using `get_pipeline`. You're controlling the entire RAG lifecycle—from project scope management to running live queries. It’s pure control, period.

## Tools

### get_pipeline
Retrieves detailed configuration settings for a single, specified data pipeline.

### list_files
Lists all raw source files that have been ingested by a given data pipeline.

### list_indexes
Retrieves a list of all active, managed LlamaCloud indexes.

### list_pipelines
Lists all currently deployed data pipelines within your account.

### list_projects
Retrieves a list of active, top-level LlamaCloud projects in your organization.

### query_pipeline
Executes an actual natural language query directly against a specific data pipeline for context retrieval.

## Prompt Examples

**Prompt:** 
```
Query the 'Product-Docs' pipeline about 'multi-tenant security architecture'
```

**Response:** 
```
Querying RAG pipeline… Based on your indexed documentation, the multi-tenant architecture uses isolated logical schemas per tenant and mandatory JWT-based attribute filtering at the gateway level. I've found 3 source documents explaining the row-level security implementation. Would you like the links?
```

**Prompt:** 
```
List all files ingested by the 'Engineering-Handbook' pipeline (ID: pipe-123)
```

**Response:** 
```
I've retrieved 15 files from the 'Engineering-Handbook' pipeline. Highlights include 'coding_standards.md', 'deployment_workflow.pdf', and 'api_best_practices.txt'. All files show a status of 'Ingested'. Would you like me to fetch the metadata for 'coding_standards.md'?
```

**Prompt:** 
```
What are the active LlamaCloud projects in our organization?
```

**Response:** 
```
I've identified 3 active LlamaCloud projects: 'Customer-Service-RAG' (ID: proj-001), 'Internal-Knowledge-Base' (ID: proj-005), and 'Market-Analysis-Tools' (ID: proj-008). Each project manages its own set of pipelines and indices. Which one would you like to explore?
```

## Capabilities

### Query Grounded Answers
Your AI client executes a natural language query against a specific data pipeline, retrieving answers that cite the exact source documents.

### Inspect Indexed Data Structures
You list and view all active LlamaCloud indexes to confirm your semantic search boundaries are properly set up and connected.

### Audit Source File Metadata
Retrieve metadata for raw source files ingested by a pipeline, allowing you to verify document tracking status and ingestion limits.

### List and Configure Data Pipelines
You list all deployed pipelines and retrieve their detailed configurations, including the connected sources and embedding settings used.

### Manage AI Projects
Navigate through high-level LlamaIndex projects to manage collections of related data pipelines and queryable search boundaries.

## Use Cases

### Checking a New Document's Status
A data scientist uploads 50 new PDF manuals. They need to know if all of them were indexed correctly and if any failed. The agent runs `list_files` on the 'Manual-Docs' pipeline, immediately showing status confirmations for every uploaded file.

### Debugging a Bad Answer
The AI agent gives an answer that seems wrong. Before escalating, you run `get_pipeline` to confirm which sources and embedding settings the agent used. This helps isolate whether the issue is in the data source or the pipeline configuration itself.

### Mapping Organizational Knowledge
You're tasked with finding all relevant RAG systems across 5 departments. You start by running `list_projects` to map out every high-level project, giving you a clear inventory of where the knowledge bases live.

### Testing New Search Topics
You want to test if your 'Finance' pipeline can answer questions about multi-tenant security. You use `query_pipeline` with a natural language prompt, and the system returns synthesized answers citing 3 specific documents from the indexed knowledge.

## Benefits

- **Verify Source Data with `list_files`:** Instead of guessing, you list all raw source files ingested by a pipeline. This confirms exactly which documents the AI agent has access to and helps you track ingestion limits.
- **Manage Scope with `list_projects`:** You gain an overview of your entire data ecosystem. By listing active LlamaCloud projects, you know where different collections of pipelines and search boundaries reside, keeping your work organized.
- **Deep Dive into Settings with `get_pipeline`:** Need to check the embedding model or connected sources for a specific pipeline? Use `get_pipeline` to pull up detailed configurations without logging into the web dashboard.
- **Test Queries Safely with `query_pipeline`:** Run complex natural language queries against a live pipeline. The server runs the RAG process and returns answers grounded in your private knowledge, eliminating guesswork.
- **Audit Index Health with `list_indexes`:** Quickly list all active indexes to ensure that changes to pipelines or data sources have correctly updated the semantic search boundaries.

## How It Works

The bottom line is that your agent translates complex data framework commands into natural language chat interactions.

1. Subscribe to the server and enter your unique LlamaCloud API Key.
2. Direct your AI client to interact with the MCP tools, asking it to list projects or pipelines.
3. The agent retrieves configuration details (e.g., pipeline settings) and passes them back to you for analysis.

## Frequently Asked Questions

**How do I see all my different RAG systems with LlamaIndex (AI Data Framework & RAG)?**
Use `list_projects` first. This shows you high-level project containers, letting you map out the entire organizational scope before drilling down into specific pipelines.

**I want to query a pipeline but I don't know its ID; what should I do with LlamaIndex (AI Data Framework & RAG)?**
Run `list_pipelines` first. This gives you the necessary names or IDs, which you then pass to your agent so it can execute the `query_pipeline` function correctly.

**Can I check what files were uploaded by a pipeline using LlamaIndex (AI Data Framework & RAG)?**
Yes. Use the `list_files` tool, providing the specific pipeline ID. This returns metadata for every raw source file currently ingested, helping you audit document coverage.

**What is the difference between listing indexes and listing pipelines with LlamaIndex (AI Data Framework & RAG)?**
`list_pipelines` shows the operational data flow definitions. `list_indexes` shows the resultant semantic stores—the actual, queryable data structures derived from those pipelines.

**What credentials do I need to use `list_indexes` with LlamaIndex (AI Data Framework & RAG)?**
You must provide a valid LlamaCloud API Key. This key authenticates your agent client and grants the necessary permissions to access, list, and manage all active semantic indexes within your connected environment.

**If I run `query_pipeline`, what happens if the source documents are out of date?**
The query will execute but return a confidence score warning. The agent will inform you that it found no recent context, helping you know when your underlying data needs manual refreshing or re-ingestion.

**When using LlamaIndex (AI Data Framework & RAG), how do I narrow my search to a specific organizational project?**
Use the `list_projects` tool first. This shows all top-level projects, allowing your agent client to scope subsequent commands like `get_pipeline` only within that defined business boundary.

**Are there rate limits when I repeatedly use `query_pipeline` with LlamaIndex (AI Data Framework & RAG)?**
Yes, API quotas apply based on your subscription tier. If you exceed the limit, the system returns a 429 error code and advises waiting or upgrading your plan for higher throughput.

**Can I query my indexed documents using natural language through my agent?**
Yes. Use the `query_pipeline` tool by providing the Pipeline ID and your natural language question. Your agent will trigger a real-time RAG extraction and return a synthesized answer based on the relevant source documents found in the index.

**How do I check which files have been successfully ingested into a pipeline?**
The `list_files` tool allows your agent to retrieve explicit metadata for all physical documents attached to a pipeline. This is perfect for auditing your data source boundaries and ensuring all required documents are correctly indexed.

**Can my agent manage multiple semantic indices?**
Absolutely. Use the `list_indexes` tool to see all active semantic stores managed by LlamaCloud. Your agent will report the index names and types, making it easy to identify the correct target for your search or ingestion workflows.