# Docparser MCP for AI Agents MCP

> Docparser lets your AI agent automatically pull structured data from any document type, including PDFs, scans, and images. It manages parsing rules and tracks results so you never have to manually read an invoice or report again. You can check the status of documents in the queue and retrieve specific fields like order numbers and line items directly through conversation.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** data-extraction, ocr, pdf-parsing, automated-workflows, structured-data, data-processing

## Description

Manually pulling data from invoices, contracts, and reports is a massive time sink. Docparser connects your AI client to solve that problem by automating document extraction. It handles everything from complex PDFs to grainy scans, turning unstructured documents into clean, usable data points in seconds. You tell your agent what you need—like 'the total amount' or 'all line items'—and the system finds it across multiple uploaded files. The whole process is visible through conversational prompts: you can list all available parsing rules, check if a document failed extraction, and retrieve the structured results immediately. Since Vinkius hosts Docparser in its catalog, connecting your favorite AI client to this MCP gives you instant access to sophisticated data intelligence without managing any infrastructure.

## Tools

### get_docparser_account_metadata
Retrieves usage limits and operational metadata for your Docparser account.

### get_parser_details
Fetches the specific settings, rules, and status of a single document parser.

### quick_parser_health_audit
Pulls a high-level summary showing overall activity levels and success rates across all parsers.

### get_document_extraction_results
Retrieves the actual structured data points from a specific document file.

### list_parsed_documents
Shows all documents that have been successfully processed by one of your defined parsers.

### list_failed_document_extractions
Identifies and lists any documents that failed the parsing or extraction process, noting the error.

### list_document_parsers
Lists every document parser rule set you have configured in your account.

### list_documents_awaiting_parsing
Shows a list of documents currently waiting and queued for the parsing process to begin.

### list_recent_extractions
Retrieves the most recent data extraction results across all active parsers in one view.

### search_parsed_documents
Searches for previously parsed documents using a specific filename within a particular parser rule set.

## Prompt Examples

**Prompt:** 
```
List all documents processed by my 'Client Contracts' parser in the last week.
```

**Response:** 
```
**📄 Contract Parsing Report**

I found 12 contracts processed this week. Here’s a quick breakdown:

*   `Contract_A_Q3.pdf`: Success. Key fields extracted.
*   `ClientX_MSA_v2.pdf`: Success. Review required: Termination Date is missing.
*   `BadScan_01.jpg`: Failure (Error: Unreadable). Needs rescan.

Do you want me to run a detailed extraction on the successful files, or pull error logs for the failed ones?
```

**Prompt:** 
```
Show me the extracted data from 'DOC-9988', specifically the order number and customer name.
```

**Response:** 
```
**✅ Document Data Retrieval**

Here are the structured details for `DOC-9988`:

*   **Order Number:** ORD-456
*   **Customer Name:** Tech Corp
*   **Total Amount:** $1,250.00
*   **Status:** Paid

I can also pull the full JSON output if you need to pipe this data into another system.
```

**Prompt:** 
```
Are there any documents that failed extraction today?
```

**Response:** 
```
**🚨 Extraction Failures Found**

I checked all active parsers and found 2 failures:

1.  `BadScan_01.jpg`: Error: Unreadable (Possible cause: Poor lighting).
2.  `UnknownFormat.pdf`: Error: Unsupported format.

Would you like me to pull the detailed error logs for these two files so we can figure out a workaround?
```

## Capabilities

### Retrieve account usage status
Check your current metadata and API rate limits for the Docparser platform.

### Inspect parser configurations
Get detailed settings and status information about any specific document parsing rule you've set up.

### Audit overall system health
Pull a quick summary showing the activity levels and success rates across all your configured parsers.

### Get structured data from documents
Retrieve the actual, processed data points—including complex tables and custom fields—from a specific document.

### Review processed file lists
See a comprehensive list of all documents that have been successfully parsed by a particular rule set.

### Monitor extraction queue status
List files that are currently waiting in the system's processing pipeline to be analyzed.

### Track recent results history
See a chronological feed of the most recently extracted data across every active parser.

## Use Cases

### Processing a batch of vendor invoices
A finance specialist asks their agent: 'Get the total amount, due date, and tax rate from all invoices processed today.' The agent uses `list_recent_extractions` to pull this structured data for immediate reconciliation.

### Auditing compliance documents
An operations manager asks: 'Show me the extraction results for every document labeled 'W-9' last month.' The agent searches and presents a clean list of necessary fields, confirming compliance data points using `search_parsed_documents`.

### Troubleshooting failed reports
An analyst notices missing data and asks: 'What documents failed parsing today?' The agent uses `list_failed_document_extractions`, identifies the bad file, and suggests checking its format for manual correction.

### Checking parser status before a run
An automation lead wants to confirm readiness and asks: 'What's the health of our main contract parser?' The agent calls `quick_parser_health_audit` and confirms high success rates before initiating a large data pull.

## Benefits

- Stop copy-pasting figures. By using the `get_document_extraction_results` tool, your agent pulls precise details like order numbers or contract values directly into the chat window.
- Keep track of everything instantly. Instead of checking a dashboard, you can use `list_recent_extractions` to see the latest data pulled from all active parsers in one glance.
- Never get lost in errors again. The ability to call `list_failed_document_extractions` means your agent finds broken scans or misformatted documents and tells you exactly why they failed.
- See what's coming next. Use the queue tools, like `list_documents_awaiting_parsing`, so your agent can monitor incoming batches of documents before they even need processing.
- Manage rules easily. You can call `list_document_parsers` to check all available parsing configurations and ensure your agents are using the correct extraction methods for different document types.

## How It Works

The bottom line is that you talk to your AI client about documents, and it handles the complicated reading and structuring of the raw data for you.

1. Connect your preferred AI client to this MCP and authorize access using your Docparser API Key.
2. Tell your agent the task: 'Extract all order totals from PDFs in the 'Invoices' folder.'
3. The system runs the extraction, pulls the structured data, and returns the results directly into the conversation.

## Frequently Asked Questions

**How does Docparser MCP help with scanned images, not just PDFs?**
It handles scanned images by converting them into usable text through advanced OCR. You don't have to worry about the quality of the scan; the system pulls structured data even if the original document is a picture.

**Can Docparser MCP extract complex table data from reports?**
Yes, it excels at this. It doesn't just read text; it identifies rows and columns in tables—like line items on an invoice—and returns that structured information for your agent to use.

**What if my documents are from different sources? Does Docparser MCP handle them all?**
The MCP is designed to manage various parsers, meaning it can apply specific rules whether the document came from a vendor portal, an internal scanner, or a cloud storage bucket.

**Is Docparser MCP just for reading data? Can it track my workflow?**
Beyond extraction, it gives you visibility. You can monitor documents waiting in the queue and check which files have already been processed, giving you a complete view of your data lifecycle.

**What is the difference between listing parsed files and getting actual results?**
Listing shows *that* a file was processed. Getting the results retrieves the specific, structured data—like just 'the total amount' or 'customer ID'—so your agent gets the useful payload immediately.