Parseur MCP. Turn Invoices, Receipts, and Emails Into JSON.

Q: How do I start a document pipeline with Parseur using createmailbox?

You call createmailbox, providing the desired type (like 'pdf' or 'email'). This sets up the dedicated intake channel. After creation, you must use listmailboxes to get the ID for subsequent steps.

Q: What is the difference between getdocumentdetails and getdocumentdata?

getdocumentdata pulls the structured payload (the actual extracted fields like amounts). getdocumentdetails only gives you metadata, such as the document's unique ID or when it was received.

Q: Can I process emails using Parseur? Which tool handles this?

Yes. You use createmailbox and set the type to 'email'. This allows you to define templates that pull data from attachments or body text within incoming messages.

Q: What happens if my template rules are wrong? How do I fix it?

First, review your schema using listtemplates. Then, after correcting the underlying logic, you use retrydocument to push the document through again, forcing a re-parse with the new rules.

Q: After running uploaddocument, I need to know if the job is still running; how can I check its real-time status?

You must use listdocuments to see all entries within that mailbox. Look at the 'status' field: it will show 'pending', 'processed', or 'failed'. This gives you an immediate view of the job's current state.

Q: I fixed a template rule and need to re-run a document that failed previously; what is the best way using retrydocument?

You simply pass the original document ID to retrydocument. The system ignores the previous failure reason and forces the parsing engine to match the document against your current, corrected template rules.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Parseur connects document extraction pipelines directly into your AI agent. It lets you list mailboxes, upload PDFs or emails, and pull structured data from invoices and receipts using powerful OCR templates.

You get clean JSON outputs ready for databases or workflows, without needing manual copy-pasting.

What your AI agents can do

Create mailbox

Creates a new document intake pipeline, specifying if it handles PDFs, emails, or attachments for automatic parsing.

Create template

Defines the precise field mappings and extraction rules that Parseur uses to structure data from incoming documents.

Get document data

Retrieves the final, structured JSON dictionary containing all extracted fields from a document with status 'processed'.

+ 7 more capabilities included

Extracting structured fields

Retrieve specific, mapped JSON properties (like total amount or dates) from documents already processed by the OCR engine.

Managing parsing pipelines

Create and verify dedicated mailboxes and templates that define how different document types should be parsed.

Processing raw files

Upload documents (PDFs, emails) directly to a mailbox, initiating the entire parsing workflow for automated extraction.

Monitoring job status

List all documents in a mailbox and check their status—whether they are pending, failed, or fully processed.

Fixing extraction errors

Force the re-run of parsing on an existing document using retry_document after you've fixed the underlying template rules.

Ask AI about this MCP

Ask ChatGPT

Ask Claude

Ask Perplexity

Supported MCP Clients

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

+ other MCP clients

Free for Subscribers

Waiting for input…

AI Agent

Parseur MCP Server: 10 Tools for Document Parsing

These ten tools let your AI client manage the entire document lifecycle—from setting up mailboxes and templates to uploading files and extracting final data.

create019d75ef

create mailbox

Creates a new document intake pipeline, specifying if it handles PDFs, emails, or attachments for automatic parsing.

create019d75ef

create template

Defines the precise field mappings and extraction rules that Parseur uses to structure data from incoming documents.

get019d75ef

get document data

Retrieves the final, structured JSON dictionary containing all extracted fields from a document with status 'processed'.

get019d75ef

get document details

Gets metadata about a single parsed document (like when it arrived or who sent it), but not the actual data.

get019d75ef

get mailbox

Retrieves the detailed configuration of a specific mailbox, letting you verify its current setup before sending files.

list019d75ef

list documents

Lists all documents within a mailbox, showing their ID, status (processed, failed, pending), and basic metadata.

list019d75ef

list mailboxes

Retrieves a list of every active parsing pipeline available in the system. You must use these IDs for subsequent operations.

list019d75ef

list templates

Lists all defined extraction templates tied to a specific mailbox, showing what data fields that template is designed to pull out.

retry019d75ef

retry document

Forces Parseur to re-process an existing document using the latest template rules after it has previously failed or errored.

upload019d75ef

upload document

Sends a file URL directly into a specified mailbox, starting the automated parsing process and returning a tracking ID.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Parseur, then connect any of our 4,700+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 4,700+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

What you can do with this MCP connector

You're building an AI agent that needs clean data from messy documents—invoices, receipts, emails. You don't want your agent slogging through raw PDFs or unstructured text. This server lets you connect document extraction pipelines straight into your workflow. It handles the gnarly OCR and templating stuff; all you gotta do is feed it files and get structured JSON back.

Setting Up Your Document Intake Pipeline

First, you need a place for the documents to go. You can start by listing all active pipelines using list_mailboxes so you know what's running. To create a new intake point—say, one just for invoices or another just for web forms—you use create_mailbox. This tells the system exactly what kind of files it should handle: PDFs, emails, or attachments.

You can check out the full setup details later using get_mailbox if you gotta verify how things are configured.

Next up is telling the server what to look for. Defining rules means creating a template that maps specific fields from any incoming document. When you use create_template, you're defining precisely what data points you expect—like the total amount, an invoice number, or a date range. You can list all existing templates tied to a mailbox using list_templates to make sure your rules are set up right.

Feeding the Documents and Running the Job

Once the pipeline is ready and the template is locked down, you feed it the goods. Sending a file URL directly into a specified mailbox starts the whole automated parsing process. You use upload_document for this; the document hits a queue, and the OCR engine gets to work.

You gotta keep an eye on what's happening. To see every document that's been added, you run list_documents. This gives you the ID, basic metadata, and most importantly, the status: Is it pending? Did it fail? Or is it fully processed?

If a job fails or throws an error, don't rewrite the whole thing. You use retry_document to force Parseur to re-run the parsing on that existing document ID. This works best after you’ve tweaked the underlying template rules.

Retrieving and Using the Data

When a document status is 'processed,' you can pull out the clean data. You use get_document_data to retrieve a final, structured JSON dictionary containing every mapped field—you'll get things like { "invoice_number": "A123", "total_amount": 500.00 }. This is the raw material you plug right into your database or workflow engine.

If you just need to know when the document arrived or who sent it, but don't care about the actual extracted data yet, get_document_details grabs that metadata for you. It keeps the process clean and targeted.

This server handles the whole loop: from creating a dedicated intake point (create_mailbox) to defining strict rules (create_template), pushing files in (upload_document), monitoring job status (list_documents), fixing errors (retry_document), and finally, spitting out perfectly structured JSON data (get_document_data). You get clean properties without ever having to manually copy-paste anything.

How Parseur MCP Works

1 First, use list_mailboxes and list_templates to confirm your required pipelines and extraction schemas are set up correctly.
2 Next, send the document using upload_document. The file is added to a mailbox queue and awaits OCR processing. You get back a new document ID for tracking.
3 Finally, use get_document_data with the document ID to pull the clean JSON payload containing all the structured fields you defined.

The bottom line is: your agent sends the file, waits for Parseur to parse it using your rules, and then pulls the clean data out of the resulting record.

Who Is Parseur MCP For?

This toolset is essential for finance teams who manually process invoices or receipts. It's also crucial for ops engineers running automated reporting pipelines that ingest forms from varied sources (web, email). If your job involves turning paper-based data into digital records, you need this.

Accounts Payable Specialist

Processes batches of vendor invoices and receipts by sending them to the server. Then, they retrieve the structured totals and dates via get_document_data for ledger entry.

Workflow Automation Engineer

Builds reliable document ingestion pipelines that monitor multiple mailboxes (list_mailboxes) and automatically retry failed documents using retry_document.

Financial Analyst

Needs to extract specific, missing data points (like a PO number or tax ID) from diverse sources before running financial reports. They use list_templates to ensure the schema is correct.

What Changes When You Connect

Structured Output: You get clean JSON data directly from get_document_data. This means no messy string manipulation for your agent; the fields are ready to use in databases or code logic.
Workflow Reliability: If a document fails parsing, you don't restart. Use list_documents and then call retry_document to force a re-run with updated rules—it handles the failure loop for you.
Pipeline Management: You control the entire process by first running list_mailboxes. This ensures your agent is always pointing at a valid, configured document intake pipeline before uploading anything.
Controlled Setup: Using create_template forces you to define exactly which fields you expect. Your AI client doesn't guess; it uses the schema you built, making the extraction predictable and reliable.
Source Agnostic: The system handles multiple inputs—PDFs, emails, and attachments—all through one process flow managed by upload_document.

Real-World Use Cases

Processing a batch of receipts from a vendor portal

An accounting agent needs to ingest 50 PDFs. First, it calls list_mailboxes to find the 'Receipts' mailbox ID. Then, it loops through all 50 URLs, running upload_document for each one. Finally, it runs get_document_data on every resulting document ID to pull out the vendor name and total amount into a single array.

Handling failed or corrupted invoices

A developer uploads an invoice that fails parsing due to a minor formatting change. Instead of giving up, the agent calls list_documents to check the status, realizes it's 'failed', and then runs retry_document. This forces Parseur to re-evaluate the document against current rules.

Building a webhook listener for incoming emails

The agent needs to monitor an external inbox. It uses create_mailbox with the 'email' type and defines a template expecting sender ID and date. When a new email arrives, it calls upload_document, which handles parsing the attachments automatically.

Validating extraction rules before production

A developer needs to test if their complex PO number regex works. They use list_templates to see what schemas exist, then call get_mailbox to verify the target mailbox is configured correctly, preventing accidental data routing.

The Tradeoffs

Trying to read a document without defining rules

The agent just calls upload_document and expects the data. This fails because Parseur has no template mapping, so it can't know what an 'invoice number' looks like or where it lives.

→ Always start by calling list_templates to ensure your extraction rules are set up. Then, use create_template to explicitly map the field names and patterns before running any document uploads.

Assuming all documents will parse on first try

Running a script that blindly calls get_document_data on every uploaded ID. If one fails, your entire batch processing stops, wasting time and requiring manual intervention.

→ Always check the document status first. Use list_documents to filter for documents where the status is 'processed'. Only then should you attempt data retrieval with get_document_data.

Bypassing mailbox validation

Sending a large batch of files via upload_document without first checking if the target mailbox exists or is configured for PDFs, leading to API errors.

→ Before uploading anything, run list_mailboxes and confirm the correct ID. You can double-check the setup using get_mailbox before proceeding.

When It Fits, When It Doesn't

Use this server if your core problem is converting unstructured visual data (scanned receipts, PDF invoices) into structured, queryable JSON records. This isn't just about text extraction; it's about schema enforcement. If you only need to read the raw text from a document and pass it to an LLM for summarization, basic file readers will work. But if you need to guarantee that total_amount is always returned as a float and invoice_number follows a specific regex pattern—you need Parseur's template system and tools like create_template. Don't use this if your data source is already clean JSON; use it only when the source requires heavy OCR/templating work.

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Parseur. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

How we secure it →

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 10 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Available Capabilities

create_mailbox create_template get_document_data get_document_details get_mailbox list_documents list_mailboxes list_templates retry_document upload_document

Manual invoice processing is slow, tedious, and generates too much spreadsheet clutter.

Right now, getting financial data means opening a PDF, scrolling to the top corner for the date, finding the total amount, and then copying all those distinct pieces of information into three different columns in your accounting spreadsheet. Repeat that 50 times, and you're looking at hours lost and manual entry errors.

With Parseur MCP Server, you upload the entire batch through `upload_document`. The system handles the OCR and mapping using templates defined by `create_template`. When it's done, your agent pulls all the totals into one clean JSON object via `get_document_data`. You skip the copy-paste step entirely.

Parseur MCP Server: Use `retry_document` to fix failed data extracts.

Sometimes, an invoice is slightly misaligned or a template rule gets tripped up by poor image quality. Your agent might see the document status as 'failed' and stop. You then have to manually check the rules and try again, restarting the whole process.

Now, you use `list_documents` to spot the failure. Then you call `retry_document`. This forces the parser to re-run that specific file against your latest template logic without you having to upload or manage any files—it's instant recovery.

Common Questions About Parseur MCP

How do I start a document pipeline with Parseur using `create_mailbox`? +

You call create_mailbox, providing the desired type (like 'pdf' or 'email'). This sets up the dedicated intake channel. After creation, you must use list_mailboxes to get the ID for subsequent steps.

What is the difference between `get_document_details` and `get_document_data`? +

get_document_data pulls the structured payload (the actual extracted fields like amounts). get_document_details only gives you metadata, such as the document's unique ID or when it was received.

Can I process emails using Parseur? Which tool handles this? +

Yes. You use create_mailbox and set the type to 'email'. This allows you to define templates that pull data from attachments or body text within incoming messages.

What happens if my template rules are wrong? How do I fix it? +

First, review your schema using list_templates. Then, after correcting the underlying logic, you use retry_document to push the document through again, forcing a re-parse with the new rules.

I ran `list_mailboxes` and see several IDs; how do I confirm which mailbox is intended for invoices? +

You check the returned metadata fields to match the ID with its documented purpose. The listing provides a name or description that specifies the document type (e.g., 'Client Invoices'). This prevents sending data to the wrong parsing pipeline.

After running `upload_document`, I need to know if the job is still running; how can I check its real-time status? +

You must use list_documents to see all entries within that mailbox. Look at the 'status' field: it will show 'pending', 'processed', or 'failed'. This gives you an immediate view of the job's current state.

I fixed a template rule and need to re-run a document that failed previously; what is the best way using `retry_document`? +

You simply pass the original document ID to retry_document. The system ignores the previous failure reason and forces the parsing engine to match the document against your current, corrected template rules.

Does `create_template` allow me to define different extraction patterns for various field types (e.g., dates vs. IDs)? +

Yes, you pass a JSON configuration string that defines multiple field mappings and their respective regex patterns. This lets the template handle structured data while accommodating variations in how fields appear on physical documents.

Does this tool parse the document directly or use the cloud engine? +

The tool offloads the logic specifically via endpoints mapping back to the Parseur Cloud Engine. The AI acts to organize mailboxes, list templates, and fetch final states securely without computing massive local OCR networks.

Can I upload a raw file string to be parsed? +

Yes. Utilizing the explicitly mapped upload_document constraint, the agent can inject raw string boundaries identifying formatting, passing files straightforward into the target mailbox ID natively.

Will I see missing required fields if extraction fails? +

Absolutely. Querying get_document_details lists specific status bounds. If a template expects InvoiceTotal and misses it, the document flags a processing boundary issue precisely traceable here.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript