# DocSumo MCP for AI Agents MCP

> DocSumo equips your AI agent to handle complex document data extraction. It automates processing for invoices, bank statements, and ID cards, giving you structured results right out of the box. Use it to manage end-to-end Intelligent Document Processing (IDP) workflows and audit every file processed.

## Overview
- **Category:** artificial-intelligence
- **Price:** Free
- **Tags:** intelligent-document-processing, data-extraction, idp, invoice-processing, automated-pipelines, data-audit

## Description

DocSumo connects advanced document intelligence directly into your AI workflow. You can now automate pulling complex data from invoices, bank statements, or even government ID cards. Instead of manually reviewing files, your agent handles the heavy lifting, structuring messy information for you to use immediately.

If your job involves processing high volumes of varied documents, this MCP lets you manage everything from start to finish. You can monitor entire document pipelines, see exactly which fields were extracted, and even check a record's confidence score if it needs human eyes. Finding the right connection point used to be tough, but Vinkius makes it easy; you connect once to access DocSumo’s full suite of features directly through your preferred AI client.

This means finance teams can pull invoice details instantly for bookkeeping, compliance officers can audit IDs and bank statements for verification status, and operations leads can keep a real-time pulse on the entire document processing health.

## Tools

### get_docsumo_account_metadata
Retrieves usage limits and general status information for your DocSumo account.

### list_failed_doc_extractions
Finds a list of documents that could not be processed by the system.

### get_document_extraction_data
Pulls the structured data points extracted from one specific document you name.

### quick_idp_health_audit
Provides a fast, high-level summary of how well your entire document processing pipeline is running.

### list_successfully_parsed_docs
Gives you a list of all documents that finished the process and passed verification checks.

### list_docsumo_document_types
Lists every document format (like invoices or bank statements) that is currently configured in DocSumo.

### list_processed_documents
Shows a list of all documents processed by the system, which you can filter by type.

### list_latest_extraction_results
Identifies and lists the most recently processed document results across all categories.

### list_documents_awaiting_review
Finds documents that need human eyes because the AI wasn't confident enough with the data.

### search_documents_by_filename
Searches for processed files using a specific keyword found in the document name.

## Prompt Examples

**Prompt:** 
```
Pull all the bank statements processed last month and list the top 5 vendors.
```

**Response:** 
```
**Summary of Processed Bank Statements (Last Month)**

| Vendor Name | Total Amount Pulled | Date Verified |
| :--- | :--- | :--- |
| Chase Bank | $12,450.00 | 2023-11-28 |
| Wells Fargo | $9,120.55 | 2023-11-27 |
| UtilityCorp | $350.00 | 2023-11-26 |

*Need the detailed line items for Chase Bank? Let me know.*
```

**Prompt:** 
```
Check if there are any ID cards that need manual review.
```

**Response:** 
```
**🚨 Action Required: Manual Review List 🚨**

We found **3** documents needing human verification due to low confidence scores. Please check these immediately:

1.  `ID_Scan_009.jpg`: Low match on name/address.
2.  `Passport_BETA.pdf`: Poor image quality, date unreadable.
3.  `Utility_Record_78.png`: Discrepancy found in serial number.

Reviewing these files ensures our compliance record is spotless.
```

**Prompt:** 
```
What are the most recently processed documents and their total value?
```

**Response:** 
```
**Latest Document Extraction Results**

*   **Document:** `Invoice_ABC-123.pdf` (Processed: 5 minutes ago)
    *   Vendor: Global Supplies Inc.
    *   Grand Total: **$4,500.00** (Confidence: 99%)
*   **Document:** `Statement_XYZ.pdf` (Processed: 3 hours ago)
    *   Period: Oct 1 - Oct 31
    *   Total Amount Pulled: **$22,500.00** (Confidence: 98%)
```

## Capabilities

### Pull structured data from documents
Your agent pulls specific details—like invoice numbers or total amounts—from complex file types.

### Audit and track failed extractions
You find out exactly which files didn't get processed correctly, along with error messages.

### Review documents needing manual checks
The system flags records where the AI wasn't confident enough and needs human review.

### Check overall processing health
You get a quick, high-level summary of how many documents were processed successfully in total.

### List all document types available
Your agent shows you every type of document (invoices, bank statements) configured within the system.

## Use Cases

### Auditring Compliance Documents
A compliance officer needs to verify 50 ID cards against current policy. They ask their agent to run `list_documents_awaiting_review` and get a list of all flagged IDs, which they then review in bulk using the MCP.

### Quickly Reconciling Bank Statements
A finance analyst needs to reconcile last month's expenses. They ask their agent to `list_latest_extraction_results` for bank statements, pulling all transaction dates and amounts into a single chat summary.

### Finding Specific Missing Invoices
An operations lead is looking for an invoice from 'XYZ Corp' that was processed last week. They use the agent to `search_documents_by_filename` and immediately pull up the correct document record.

### Verifying a Specific Record
A user needs to validate the grand total on a specific invoice ID. They ask their agent, which then uses `get_document_extraction_data` to provide the structured number and its confidence score.

## Benefits

- Stop manual data entry. Instead of copy-pasting numbers from PDFs into spreadsheets, your agent extracts the exact figures directly.
- Maintain compliance control by using tools like `list_documents_awaiting_review` to ensure no sensitive file is missed or unchecked.
- Get a clear view of operational status with `quick_idp_health_audit`, letting you troubleshoot processing failures instantly rather than waiting for end-of-day reports.
- Speed up bookkeeping by using `get_document_extraction_data` to pull verified invoice totals and vendor names on demand, without opening the file.
- Keep track of your assets with `list_processed_documents`, giving you a full audit trail of everything that passed through the pipeline.

## How It Works

The bottom line is that it turns unstructured files into clean, usable data points using simple chat commands.

1. Connect DocSumo’s MCP to your AI client and authorize it using your API key.
2. Tell your agent what kind of data you need, like 'all outstanding invoices from last month' or 'any documents needing review'.
3. The system processes the request, pulls the structured data, and presents actionable results in a conversation.

## Frequently Asked Questions

**How does the DocSumo MCP help with invoice processing?**
It automates the difficult process of pulling structured data from messy invoices. Instead of manually typing out the vendor name, invoice number, and total amount, your agent pulls these verified fields directly into your chat conversation.

**Is DocSumo good for compliance auditing?**
Yes. It helps you maintain audit trails by allowing you to flag documents that need human review using the `list_documents_awaiting_review` tool. This ensures no critical ID or document is overlooked.

**What if I have different types of files, like receipts and bank statements?**
The MCP handles various formats. You can list all available document types to see what it supports—from basic receipts to complex bank statements—and process them all through one consistent workflow.

**Can I check the status of a whole batch of documents?**
Absolutely. You can use `quick_idp_health_audit` to get a single summary report on your entire pipeline's success rate, so you know immediately if there are bottlenecks.

**I need to search for an old document by name; how does the DocSumo MCP help?**
You can use the agent to `search_documents_by_filename`. You just give it a keyword, and it finds all relevant processed documents, giving you direct access to their extracted data.

**Is this better than using a separate dedicated document management tool?**
This MCP integrates the intelligence layer directly into your existing workflow. It doesn't just store files; it extracts and structures the usable data, making the information actionable immediately within your agent conversation.