# Airparser MCP for AI Agents MCP

> Airparser lets your AI agent automatically pull structured data from virtually any messy document format—PDFs, emailed attachments, images, and more. It handles everything from auditing complex extraction schemas to running automated webhooks that push clean JSON data directly into your other applications.

## Overview
- **Category:** artificial-intelligence
- **Price:** Free
- **Tags:** data-extraction, idp, ocr, pdf-parsing, unstructured-data, automation

## Description

Dealing with documents is a nightmare of copy-pasting and manual checks. Airparser fixes that by letting your AI agent handle the whole messy process. You upload or point to PDFs, invoices, resumes, or even simple emails, and it extracts everything you need—dates, names, line items, totals—and structures it into clean JSON data automatically. Your agent manages the entire pipeline: checking if the schema is right, processing documents in the background, and finally sending that structured output to your CRM or database via webhooks. With Vinkius at the core of the catalog, you connect once from any compatible client to gain access to this robust document parsing capability. You just talk to your agent naturally, and it handles the data flow.

## Tools

### list_inboxes
Retrieves a list of all document inboxes configured within your Airparser account.

### get_inbox_details
Provides detailed metadata about a specific document inbox.

### parse_document_sync
Processes and extracts data from a document immediately, returning results in the current session.

### parse_document_async
Schedules document processing for later completion, useful for large files or batch jobs.

### list_documents
Retrieves a list of documents currently contained within a specified inbox.

### get_document_details
Fetches the final, extracted JSON data for a specific document ID.

### get_inbox_schema
Retrieves the defined extraction field definitions and rules for an inbox.

### list_webhooks
Lists all existing automated webhooks set up for a given inbox.

### create_webhook
Adds a new automated webhook to push data out of an inbox upon processing completion.

### delete_webhook
Removes an existing automated export endpoint from your inboxes.

## Prompt Examples

**Prompt:** 
```
I have a stack of old receipts and invoices here, can you parse them all?
```

**Response:** 
```
**Invoice Batch Summary**

| Document ID | Status | Key Data Extracted |
| :---: | :---: | :---: |
| doc_a9b7c | Completed | Vendor: Global Corp; Total: $450.21; Date: 2024-06-15 |
| doc_d3e2f | Failed | Status: Missing schema for Line Items. Check required fields. |

I'll queue the failed one and alert you when it's fixed.
```

**Prompt:** 
```
Show me what data I can pull out of my HR resumes.
```

**Response:** 
```
**Resume Schema Audit**

*   **Candidate Name:** (String, required)
*   **Years Experience:** (Integer, optional)
*   **Key Skills:** (Array[String], required)
*   **Education Institution:** (String, required)
*   **Highest Degree:** (Enum: BS/MS/PhD)

This schema will ensure every candidate record is consistent.
```

**Prompt:** 
```
I need to make sure the webhooks are set up for new payments.
```

**Response:** 
```
**Webhook Status Check**

You have 2 active inboxes. Here's what's running:

1. **Invoices_US:** Active webhook sends data to `https://api.crm.com/invoices` on completion.
2. **Leads_EU:** Webhook disabled. You need to run `create_webhook` first.
```

## Capabilities

### Audit and organize document sources
List and check all your Airparser inboxes to understand what types of documents are flowing into your system.

### Process single or batches of files
Parse a document instantly or queue it for background processing using dedicated functions.

### Define and verify data structure requirements
Retrieve the specific extraction schemas to confirm that the parsed output matches exactly what your database needs.

### Check document status and results
Monitor processing jobs, list documents in an inbox, or grab the final extracted JSON data for a given file ID.

### Automate external data transfers
Manage automated webhooks to push parsed JSON records directly into your external business applications.

## Use Cases

### Processing high-volume accounts payable
An operations manager needs to process 50 invoices from different vendors. They ask their agent to run `parse_document_async` on the batch, wait for status checks using document IDs, and then use `get_document_details` for every file that successfully completed parsing.

### Building an automated HR candidate pipeline
A recruiter receives a folder of diverse resumes. They ask the agent to list all inboxes (`list_inboxes`), check the required schema (`get_inbox_schema`) for resume parsing, and then create a webhook (`create_webhook`) so that every parsed JSON record lands directly into their ATS.

### Integrating document data into legacy systems
A developer needs to capture data from PDFs and send it via API. They first list the available inboxes, use `parse_document_sync` for quick testing, and then delete the webhook (`delete_webhook`) when they are done testing.

### Auditing existing data flows
A data analyst needs to confirm if their current automated system is working. They use `list_webhooks` to audit all active export points and then check the status of recent files using a document ID.

## Benefits

- You get immediate access to the full document processing lifecycle. Need to check the schema before parsing? Use `get_inbox_schema` to verify your field definitions first.
- Manage complex workflows by automating data exports. By calling `create_webhook`, you ensure that every successfully parsed record automatically pushes JSON data into your target system.
- Handling large volumes is simple. Instead of blocking the conversation, use `parse_document_async` to queue up dozens of documents and check on their status later with `list_documents`.
- Your agent doesn't just read; it audits. Use `get_inbox_details` to understand exactly what kind of files an inbox is expecting before you start processing anything.
- The data retrieval is precise. Once a document finishes, call `get_document_details` to grab the clean JSON output, ready for immediate use by your application.

## How It Works

The bottom line is: you give instructions to your AI client, and it executes complex document processing workflows using this MCP.

1. Subscribe to this MCP and enter your Airparser API key. This gives your AI client the connection credentials.
2. Tell your agent what you need done—for example, 'Parse all invoices from last month' or 'Show me the schema for HR documents'.
3. Your agent calls the necessary tools through Airparser to handle the parsing, status checks, and data retrieval, returning clean JSON results directly in the chat.

## Frequently Asked Questions

**How does the Airparser MCP help me move data from PDFs into my database?**
It parses the PDF and outputs clean JSON. You then use the webhook tools to automatically push that structured record directly into your target system, bypassing manual data entry entirely.

**I have a mixed batch of files—some emails, some scans. Can Airparser MCP handle it?**
Yes. It processes multiple formats like EML/HTML and images. Your agent just needs to know which inbox or file type you want to process next.

**If I change my data requirements, how do I update the parsing rules with Airparser MCP?**
You can retrieve and verify your current extraction schema using the dedicated tool. This lets you audit the field definitions before making changes to ensure accuracy.

**Is Airparser MCP better than just reading text via a general AI client?**
Yes, because it's specialized for structure. A general client reads text; this MCP extracts *meaning* and puts it into strict, predictable fields (JSON). It knows the difference between an address line and a total amount.