# LlamaCloud MCP

> LlamaCloud (Managed RAG & Parsing) connects your AI agent directly to your enterprise document infrastructure. Manage entire Retrieval-Augmented Generation (RAG) cycles and parse messy documents using LlamaParse, all from natural conversation. You can list active projects, monitor data ingestion pipelines (`list_pipelines`), track individual parsing jobs (`list_parsing_jobs`), or upload a complex PDF for structured context extraction via `create_parsing_upload`. This tool gives your agent full control over document lifecycle management and index auditing.

## Overview
- **Category:** ai-frontier
- **Price:** Free
- **Tags:** rag, document-parsing, data-ingestion, pipeline-orchestration, vector-indices

## Description

You'll connect your AI agent right into the thick of your company's data infrastructure. This LlamaCloud server lets you take full control of everything from setting up complex Retrieval-Augmented Generation (RAG) pipelines to parsing messy, multi-page documents—and you do it all just by talking to your agent. You don't gotta write a line of Python code for this.

You can manage the entire document lifecycle and audit your data indices right through conversation. This means your agent handles everything from raw files into perfectly structured context that your LLM needs to answer questions accurately. It’s about keeping your AI grounded in *your* actual corporate knowledge, not some generic internet garbage.

When you're setting up or auditing your system, you can start by getting a bird's-eye view of what's running. You use the `list_projects` tool to pull up a list of every high-level container—the managed LlamaCloud projects—that hold groups of related pipelines and indices. Once you know which project holds the data you need, you run `list_pipelines` to get an inventory of every single data pipeline running across your account. If you wanna deep dive into just one specific setup, you use `get_pipeline`, passing in a name, and it pulls back the full configuration details—that's where you see exactly which sources (like S3 buckets or Google Drive folders) are feeding data and how the indices are set up.

For the document parsing side of things, this is killer. Instead of having to manually copy-paste from a PDF, you use `create_parsing_upload` to send an explicit file—say, last year's annual report or a technical manual—straight to LlamaParse. This tool doesn't just read text; it figures out the layout, converting tables, complex sections, and even some handwriting into clean, structured Markdown context. Once that job is submitted, you can’t just assume it worked. You run `list_parsing_jobs` to check all active or recently finished parsing jobs so you know where your document ingestion stands. When the processing finishes, you use `get_parsing_result`, providing the specific job ID, and it spits out the final, structured Markdown text output that’s ready for your agent to query.

This setup gives your agent full command over data flow: list projects, audit pipelines, check specific configurations, manage document uploads, track parsing status, and grab clean results. You're managing a sophisticated RAG system—from raw file upload through structured index creation—all conversational commands away.

## Tools

### create_parsing_upload
Sends an explicit file to LlamaParse so it can begin converting complex document layouts into structured text.

### get_parsing_result
Retrieves the final, processed Markdown or rich-text context from a previously submitted parsing job ID.

### get_pipeline
Pulls detailed configuration settings—sources and indices—for one specific data pipeline name.

### list_parsing_jobs
Lists all currently active or recently completed parsing jobs, helping you track document ingestion status.

### list_pipelines
Provides a full list of every data pipeline deployed and available in your LlamaCloud account.

### list_projects
Lists all high-level, managed projects that contain groups of related pipelines and indices within LlamaCloud.

## Prompt Examples

**Prompt:** 
```
List all active data pipelines in my LlamaCloud account
```

**Response:** 
```
I've found 3 active pipelines: 'Financial-Reports-Index' (ID: pipe-123), 'Technical-Docs-RAG' (ID: pipe-456), and 'Customer-Support-KB' (ID: pipe-789). Which one would you like to check the source configuration for?
```

**Prompt:** 
```
Parse this PDF file using LlamaParse: 'annual_report_2024.pdf'
```

**Response:** 
```
File submitted to LlamaParse. Job ID: 'job-98765'. I'm monitoring the extraction process. LlamaCloud is currently processing complex tables and charts within the report. I'll provide the Markdown result as soon as it's ready.
```

**Prompt:** 
```
Show me the configuration for the 'Technical-Docs-RAG' pipeline
```

**Response:** 
```
Pipeline 'Technical-Docs-RAG' (ID: pipe-456) is configured with 2 sources: a S3 bucket ('s3://docs-bucket') and a Google Drive folder. It uses OpenAI 'text-embedding-3-small' for indexing and is mapped to your 'production-index' in LlamaCloud.
```

## Capabilities

### List active projects
Retrieve a list of all high-level, managed LlamaCloud project containers.

### List deployed pipelines
Get an inventory and configuration details for every data pipeline running in your account.

### Get specific pipeline config
Fetch the detailed setup, including sources and index settings, for one named pipeline.

### List parsing jobs
Check the status of ongoing document parsing tasks to see which are running or finished.

### Upload file for parsing
Send a physical file (like an annual report PDF) directly to LlamaParse for structure extraction.

### Retrieve job results
Fetch the final, structured Markdown text output from a completed parsing job ID.

## Use Cases

### The Compliance Review (Auditing)
A data scientist needs to prove that all financial documents are processed. They first run `list_projects` to find the 'Finance' container, then use `list_pipelines` to get every pipeline ID. Finally, they check each one with `get_pipeline` to confirm it uses the required embedding model and source type.

### Processing an Annual Report (Parsing)
An engineer gets a massive PDF annual report. They run `create_parsing_upload` on the file. They wait, check the status with `list_parsing_jobs`, and when it's done, they call `get_parsing_result` to pull out clean Markdown tables for immediate analysis.

### Connecting New Data Sources (Onboarding)
A developer needs to index a new set of technical manuals in Google Drive. They first use `list_pipelines` to find the existing RAG pipeline, then call `get_pipeline` to verify it supports adding a new source type before updating the connection.

### Debugging Bad Answers (Troubleshooting)
The agent gives an answer that cites old data. The user runs `list_parsing_jobs` to check for failed or stale jobs, then uses `get_pipeline` on the relevant pipeline to see if the source connection needs updating.

## Benefits

- Audit every stage of data ingestion. Instead of guessing if your index is current, use `list_pipelines` to check source connections (S3/Drive) and verify the precise settings for each pipeline.
- Extract structure from junk files fast. Use `create_parsing_upload` on PDFs with tables or handwriting. LlamaParse converts that messy visual data into clean Markdown context your agent can actually read.
- Track jobs without UI clicks. When a document is large, monitoring it manually sucks. Just use `list_parsing_jobs` to check status and then `get_parsing_result` when the job completes.
- See your entire knowledge base at a glance. `list_projects` lets you map out all related indices and pipelines in one go, so you know exactly where a piece of data lives.
- Verify context integrity before querying. By using `get_pipeline`, you confirm that the pipeline is mapped to the correct index (`production-index`), preventing bad answers.

## How It Works

The bottom line is: you manage the entire document lifecycle—from file upload to structured context—using simple function calls.

1. First, use `list_projects` to find the correct project container. Then, call `list_pipelines` to see all data pipelines within that scope.
2. Next, if you have a document, run `create_parsing_upload` with the file path. This initiates the job and returns a Job ID.
3. Finally, use the Job ID with `get_parsing_result`. Once processing is done, this tool hands back the final Markdown data.

## Frequently Asked Questions

**How do I start the process to parse a new PDF file using create_parsing_upload?**
You pass the path to your document directly to `create_parsing_upload`. This kicks off the LlamaParse job and gives you a Job ID. You then use that ID with `list_parsing_jobs` to monitor its progress.

**What is the difference between list_pipelines and get_pipeline?**
`list_pipelines` shows you everything available in your account (the inventory). `get_pipeline` dives deep into one specific pipeline, giving you details like its connected sources and index settings.

**Do I need to know the project ID before calling list_pipelines?**
No. You should first call `list_projects` to see all available containers. Then, use that context when listing pipelines for accuracy.

**When can I run get_parsing_result?**
You must wait until the parsing job is complete. Use `list_parsing_jobs` first; only then will the result be available to `get_parsing_result`.

**What credentials do I need before running any command, like `list_projects`?**
You must provide a valid LlamaCloud API key. This key authorizes your agent to access and manage your entire RAG infrastructure within the system.

**If my document parsing job fails, how do I check the error details using `list_parsing_jobs`?**
The job status will show 'failed.' You need to grab the associated Job ID from that listing. Then, you use diagnostic tools to pull the full failure trace for debugging.

**Does `get_pipeline` provide details on connected sources and indexing models?**
Yes, it gives the full configuration. You'll see exactly what source is linked (S3 bucket or Google Drive) and which embedding model was used for index creation.

**How do I view all my active data ingestion strategies across different LlamaCloud projects using `list_projects`?**
Running `list_projects` shows the high-level containers for your work. After getting the project name, you then call `list_pipelines` to see the specific pipelines running within that scope.

**Can LlamaParse handle complex tables and layouts in my PDFs?**
Absolutely. LlamaParse uses AI-driven parsing to turn complex PDF layouts, nested tables, and even handwriting into structured Markdown. Use the `create_parsing_upload` tool to start the process and retrieve high-quality context for your agent.

**How do I check if my RAG data pipeline is finished processing?**
Use the `get_parsing_result` tool with your specific Job ID. Your agent will poll the LlamaCloud API and report the current status. Once finished, it will retrieve the final parsed content ready for grounding.

**Can I see all data sources connected to a specific pipeline?**
Yes. The `get_pipeline` tool extracts the full configuration for any pipeline ID, identifying all connected data sources and configured index settings, ensuring you have a complete view of your ingestion flow.