# Parsio MCP

> Parsio connects your AI client to an advanced document parsing engine. It takes unstructured data—like PDFs, images, or emails—and converts it into clean, structured JSON metadata automatically. Use custom templates that learn from your documents so you never have to manually enter invoice numbers or form details again.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** ocr, data-extraction, pdf-parsing, email-automation, json-conversion, unstructured-data

## Description

Parsio connects your AI client to a heavy-duty parsing engine, letting you take messy documents—whether they're PDFs, images, or raw emails—and turn them into clean, structured JSON metadata. You don't manually enter invoice numbers or form details anymore; the system handles it all.

**Managing Your Data Streams and Templates**

First, you need to set up your data pipelines. If you're dealing with a specific type of document flow—say, vendor invoices versus HR records—you use `create_mailbox` to set up an isolated container for that stream. You can check which containers are active across your account using `list_mailboxes`. Once you have the mailbox established, you can pull its configuration details and current status by calling `get_mailbox`. When you need to understand how data is expected to look, you'll use `list_mailbox_templates` to see every parsing template set up for that container. If you want deep technical info on one of those templates, run `get_template_details`; this shows you exactly what fields the parser is designed to capture.

For connecting Parsio to other services, you manage webhooks. You list all existing connections using `list_mailbox_webhooks`, which helps you keep track of external systems that need data updates when a document arrives. If your system needs an overview of every single parsed record for auditing purposes, run `list_parsed_data_history`. This gives you a clean log of everything processed in that specific mailbox.

**Running the Extraction Jobs**

There are different ways to extract data depending on how fast you need the results. If your workflow requires immediate feedback—like validating an ID number right as the user hits send—you use synchronous extraction. For file uploads, you run `extract_data_from_file_sync`, which instantly pulls structured metadata from that document. Similarly, if you're feeding raw text or HTML directly into the chat interface and need results immediately, you trigger `extract_data_from_text_sync`. Both these methods give you instant answers so your agent doesn't stall.

But what if you've got a massive PDF, or you're integrating this via a webhook that can't wait for a chat response? Then you use the asynchronous tools. You kick off background processing for large files using `extract_data_from_file_async`. If you're passing raw text or HTML content in a chat that needs time to parse, run `extract_data_from_text_async`. These jobs run outside of your main conversation thread, so the user experience stays fast.

Once any of these extraction jobs are done—whether they were sync or async—you need the final data. You fetch the structured JSON output from a completed job using `get_parsed_document_result`. This tool grabs the clean metadata that resulted from the document upload or parsing run. Everything you've submitted, every record parsed by the system, is eventually available for review through `list_parsed_data_history`, letting you audit exactly what came across those wires.

## Tools

### create_mailbox
Sets up and initializes a new, named container within Parsio to manage a specific stream of documents.

### extract_data_from_file_async
Starts the data extraction process for large files or webhooks, which runs in the background so you don't wait on the chat response.

### extract_data_from_file_sync
Immediately extracts structured metadata from a file upload; use this when you need results right away.

### extract_data_from_text_async
Starts background data extraction on raw text or HTML content provided in the chat.

### extract_data_from_text_sync
Immediately extracts structured metadata from raw text or HTML, providing results right away.

### get_mailbox
Retrieves all configuration details and status information for a specific mailbox container.

### get_parsed_document_result
Fetches the final JSON output from a document that has already been processed by an extraction job.

### get_template_details
Retrieves metadata about an existing parsing template, letting you check what fields it's designed to capture.

### list_mailbox_templates
Shows a list of all available parsing templates that have been configured for a specific mailbox.

### list_mailbox_webhooks
Lists the webhooks set up for a given mailbox, helping you manage external system connections.

### list_mailboxes
Retrieves an overview of every mailbox container currently managed within your Parsio account.

### list_parsed_data_history
Lists all historical records and documents that have been parsed for a specific mailbox, useful for audits.

## Prompt Examples

**Prompt:** 
```
List all my Parsio mailboxes.
```

**Response:** 
```
I've retrieved your mailboxes. You have 3 active containers: 'Sales Invoices', 'Customer Support', and 'Expense Receipts'. Which one would you like to list templates for?
```

**Prompt:** 
```
Show me all parsing templates I have configured and their extraction success rates.
```

**Response:** 
```
You have 6 active parsing templates. "Invoice Parser" processes 94% of documents successfully with 12 extracted fields. "Receipt Scanner" has 89% accuracy across 847 documents processed this month. "Purchase Order Template" handles 97% success rate. "Resume Parser" extracts from 91% of uploads. Your total documents processed this month: 3,421.
```

**Prompt:** 
```
Get the extracted data from the last 5 invoices processed by my Invoice Parser template.
```

**Response:** 
```
Here are the last 5 invoices processed. Invoice from Acme Corp: $4,250.00, dated May 8, PO#2847. Invoice from TechSupply: $1,890.50, dated May 7, NET30 terms. Invoice from CloudServices: $699.00, recurring monthly. Invoice from OfficeMax: $342.18, supplies category. Invoice from DataCenter Inc: $12,500.00, infrastructure. All 5 extracted with 100% field confidence.
```

## Capabilities

### Create Mailbox
Sets up a new, isolated container within the Parsio system for managing specific types of incoming documents.

### Extract Data (Large/Background)
Initiates a data extraction job for large files or when integrating with webhook workflows, handling processing outside of the main chat thread.

### Extract Data (Immediate Sync)
Pulls structured metadata instantly from a file upload, useful when immediate feedback is required by the user's workflow.

### Extract Data (Text/HTML)
Runs data extraction jobs directly on raw text or HTML content provided in the chat interface.

### Get Mailbox Details
Retrieves detailed configuration metadata for a specific, existing mailbox container.

### Retrieve Parsed Result
Fetches the final structured JSON data that resulted from a previously submitted parsing job or document upload.

### List Data History
Retrieves a list of all historical parsed records and documents for a given mailbox, allowing audit checks.

## Use Cases

### Processing Incoming Invoices
A finance analyst receives 50 invoices daily. Instead of opening each PDF and typing out the vendor name and total amount, they prompt their agent: 'Extract data from these 50 files using the Invoice Parser.' The agent runs `extract_data_from_file_async`, processes them in bulk, and returns a structured list of all required JSON fields.

### Analyzing Support Tickets
An ops manager needs to see trends. They upload 100 customer support emails into a dedicated mailbox. Using `list_parsed_data_history`, they can quickly pull up the extracted metadata (e.g., product model, issue type) for all tickets in one go, allowing them to build reports without leaving the chat.

### Handling HR Forms
When a new employee submits a complex W-4 form (a PDF), the agent doesn't just read it. It runs `extract_data_from_file_sync` against the specific 'HR Forms' mailbox, ensuring that critical fields like SSN and date of birth are pulled out as clean JSON data immediately.

### Validating Data Pipelines
A developer needs to confirm if a webhook is working. They use `list_mailbox_webhooks` to see the current endpoints, then trigger an extraction job and use `get_parsed_document_result` to confirm that the data arrived at the expected format.

## Benefits

- Stop manual data entry. By calling `extract_data_from_file_sync`, you get structured JSON metadata instantly, bypassing the need to manually copy numbers from invoices or forms.
- Manage document pipelines easily. Use `list_mailboxes` and `get_mailbox` to view your setup. You'll always know which containers are running and what their current configuration is.
- Handle massive workloads without delay. When you send a big batch of documents, use `extract_data_from_file_async`. This starts the job in the background so your chat doesn't time out while processing.
- Understand your data structure. Tools like `list_mailbox_templates` and `get_template_details` let you audit exactly how the system is interpreting a document, giving you control over the schema.
- Audit past work with precision. The `list_parsed_data_history` tool lets you pull up records from months ago for a specific mailbox, validating that data integrity hasn't drifted.

## How It Works

The bottom line is: Your AI acts as a dedicated data processing coordinator that converts unstructured documents into clean, usable JSON without manual intervention.

1. Start by subscribing to the server and providing your Parsio API Key in your AI client settings.
2. Tell your agent what needs parsing. This could be 'List all my mailboxes,' or 'Extract data from this invoice PDF.'
3. The system runs the appropriate tool (e.g., `extract_data_from_file_sync`), and you get back structured JSON metadata, ready for use in your chat conversation.

## Frequently Asked Questions

**How does `extract_data_from_file_sync` work?**
It runs data extraction on a file and returns the structured result immediately within the chat session. Use this when you need to confirm the parsed data right away, like checking one single receipt.

**What is the difference between `list_mailboxes` and `get_mailbox`?**
`list_mailboxes` gives you a high-level list of all containers you manage. You must use `get_mailbox` followed by a specific ID to retrieve detailed configuration metadata for one container.

**Can I process large documents with Parsio MCP Server?**
Yes, definitely. For anything over a few megabytes or any batch job, always use the async methods like `extract_data_from_file_async`. This prevents timeouts and ensures stability.

**`list_parsed_data_history` is for what purpose?**
This tool allows you to pull up a record of past data extraction. It's your audit trail—it shows exactly what was parsed and when it happened for any given mailbox.

**What steps are involved when I run `list_mailboxes` for the first time?**
The call confirms your connection status and lists all existing data containers. It requires a valid API key, which establishes secure communication between your AI client and Parsio's backend servers.

**What is the purpose of running `list_mailbox_webhooks`?**
This tool lets you see all active webhooks associated with a mailbox. Webhooks are crucial because they notify external systems instantly when new data arrives, bypassing constant polling.

**Using `get_template_details`, what metadata can I retrieve about my current templates?**
You get full schema details for the template you request. This includes field definitions, required data types, and configuration metadata that tells your agent exactly how to structure the output JSON.

**Is there a difference between using `extract_data_from_file_sync` and `extract_data_from_text_sync`?**
Yes, they handle different inputs. Use file extraction for binary files (PDFs, images) while text extraction handles raw strings or HTML content you copy/paste directly into your chat session.

**Can my AI automatically find the parsed results for a specific invoice URL?**
Yes! Use the `upload_file_sync` tool. Provide the file URL and the Mailbox ID, and your agent will respond with the structured JSON data extracted from the document in seconds.

**How do I find my Parsio API Key?**
Log in to your Parsio account, navigate to **Account Settings** > **API**, and you will find your unique secret API key there.

**Does it support hand-written text recognition?**
Absolutely. Parsio's AI-powered OCR engine is designed to handle both printed and hand-written text from scanned images and PDFs with high accuracy.