# Parseur MCP

> Parseur automates document processing and data extraction for your AI agents. It connects directly to complex pipelines, allowing you to upload PDFs, emails, or bulk documents and extract structured fields—like invoice numbers, total amounts, dates, and line items—into usable JSON format. You define the rules using templates, and our OCR engine handles the rest, turning unstructured paper into actionable data points for your workflow.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** ocr, data-parsing, email-automation, pdf-processing, template-extraction, structured-data

## Description

When you need to read things that aren't in neat tables, this MCP is what you use. Forget manually opening every PDF or email attachment just to pull out an invoice number. This connector lets your agent process entire document streams automatically. It handles the messy stuff—whether it’s a scanned receipt with skewed text or a multi-page email thread. You set up specific mailboxes and templates, telling the system exactly what fields you need (e.g., 'invoice total' or 'date'). Then, when documents arrive, your agent pushes them through the pipeline for parsing. The result is clean JSON data that your next step can use immediately. If you’re managing document logic across multiple AI clients, Vinkius makes connecting this entire process reliable and straightforward.

## Tools

### create_mailbox
Sets up a new dedicated parsing pipeline, specifying if the mailbox handles PDFs, emails, or attachments.

### create_template
Defines the extraction rules and field mappings needed for the system to pull structured data from incoming documents.

### get_document_data
Retrieves the complete, parsed JSON dictionary of extracted fields from a document that has been successfully processed.

### get_document_details
Fetches only the metadata about a single parsed document, such as its ID and status, without the actual extracted data.

### get_mailbox
Provides detailed configuration information for a specific parsing mailbox to confirm its setup parameters.

### list_documents
Lists all documents within a mailbox, showing their ID, current status (processed/failed), and date details.

### list_mailboxes
Retrieves a list of every existing parsing pipeline configured for the account, along with their unique IDs.

### list_templates
Shows all defined extraction templates associated with a mailbox, detailing the rules used to pull data.

### retry_document
Forces a failed or errored document back into the parsing queue so it can be matched against the latest template rules.

### upload_document
Sends a document URL to a specified mailbox, immediately entering the file into the processing queue for OCR logic.

## Prompt Examples

**Prompt:** 
```
Check my Parseur mailboxes to find the specific bounding IDs.
```

**Response:** 
```
Extracted physical boundaries querying mailboxes. I found 2 limits explicitly registered: 1. 'Invoices Mailbox' (ID: xyz12), expecting PDF accounting payloads. 2. 'Web contact forms' (ID: wtf9). Should I list templates associated cleanly?
```

**Prompt:** 
```
Get the data schema parsed tightly inside document doc_987.
```

**Response:** 
```
Running payload bounds against doc_987. Status: parsed perfectly. Extracted JSON dictionary properties explicitly reveal: Invoice_Number: 'A-201', Total_Amount: 1400.99, Date: '2026-04-10'.
```

**Prompt:** 
```
Upload this snippet of parsed text directly into Mailbox xyz12 for OCR processing.
```

**Response:** 
```
Push executed successfully running `upload_document`. Triggered physical limits. The document natively entered 'Invoices Mailbox' parsing queue. Processing document UUID: doc_112. Status: pending. I'll dynamically pull it later if told.
```

## Capabilities

### Create Document Pipelines
You set up dedicated parsing mailboxes for specific document types like invoices or emails.

### Define Extraction Logic
You create templates that map fields and define the precise rules needed to pull structured data from documents.

### Submit Documents for Parsing
Your agent uploads document URLs or raw payloads into a configured mailbox queue.

### Retrieve Structured Data
You pull the fully extracted JSON data from documents once they have finished processing.

### Check Document Status
You list all processed or failed documents to track a batch job’s progress and status.

### Force Pipeline Retries
If an extraction fails due to a minor error, you can instantly push the document back into the pipeline for reprocessing.

## Use Cases

### Processing End-of-Month Vendor Invoices
The AP manager needs to process 50 invoices from different vendors. Instead of manually entering the invoice number and total into a ledger, they use `list_mailboxes` to identify the 'Vendor Invoice' pipeline and then run their agent to execute `upload_document` for all 50 files, retrieving structured data via `get_document_data`.

### Cleaning up Failed Scans
A batch of scanned receipts failed parsing due to a bad template rule. Instead of manually fixing the documents, an agent uses `list_documents` to identify the failed IDs and then calls `retry_document`, forcing the system to re-run the OCR against the fixed template.

### Building a Multi-Source Data Stream
A developer needs to ingest both PDF contracts and email attachments. They use `list_mailboxes` to confirm two separate pipelines exist, then route documents using `upload_document` into the correct stream for parsing.

### Debugging Data Flow
An integration needs to verify if a document is ready for processing. It first calls `get_mailbox` to check the configuration details before attempting any file uploads, ensuring data integrity across systems.

## Benefits

- Stop manually pulling data. By using `upload_document`, you route entire batches of documents into the parsing engine, getting clean, structured output instantly.
- Handle different document types without changing logic. You can define multiple pipelines—one for invoices, one for receipts, etc.—using separate mailboxes and templates.
- Don't get stuck on failed documents. If a parse fails due to an error, just call `retry_document` to re-run the pipeline against that specific document ID.
- Get exactly what you need. Use `get_document_data` to retrieve only the structured fields (like total amount and date) without getting bogged down in raw metadata.
- Understand your setup before sending files. Check the mailbox configuration with `get_mailbox` to verify that the correct parsing rules are active for a given document type.

## How It Works

The bottom line is that this MCP takes unstructured files and converts them into predictable, structured JSON objects for any application or agent to use.

1. First, list all available parsing pipelines using `list_mailboxes` or create new ones with `create_mailbox` to define what type of documents you process.
2. Next, use `create_template` to build the extraction rules and tell the system exactly which fields (e.g., total amount) you expect to find in those documents.
3. Finally, run your workflow by uploading a document URL using `upload_document`; after processing, retrieve the clean data structure with `get_document_data`.

## Frequently Asked Questions

**How do I get started with Parseur and structured data?**
You start by calling `list_mailboxes` to see what pipelines are available or creating a new one using `create_mailbox`. Then, you define the rules for that pipeline using `create_template`.

**Does Parseur handle scanned documents?**
Yes. The MCP uses powerful OCR logic to read text from images and scans. You just need to upload the document via `upload_document`, and the engine handles the rest of the parsing process.

**What is the difference between get_document_data and list_documents?**
Use `list_documents` when you only want a summary table showing which files exist and their status. Use `get_document_data` when you need the actual, fully parsed structured data from one specific file ID.

**Can I fix documents that failed parsing using Parseur?**
Absolutely. If a document fails validation, use `list_documents` to get the IDs of the failures, and then call `retry_document` to force a fresh parse run.

**What is an 'extraction template' in Parseur?**
An extraction template defines the rules—the field names, locations, and regex patterns—that tell the system exactly what data points (like tax ID or date) to pull from a messy document.

**How do I test my parsing setup before uploading files?**
Before running `upload_document`, it's smart to first use `get_mailbox` and `list_templates`. This lets you review the current configuration, ensuring your rules are set up correctly.