Sensible MCP. Extract structured JSON from PDFs, images, and files.

Q: How do I process a PDF file that's already attached to an email?

You should use extractsync if the file is small enough to pass as Base64. If it's part of a large batch, generating an upload URL with generateuploadurl and having your agent process the attachment through that secure endpoint works better.

Q: What's the difference between classifyasync and classifysync?

classifysync returns the document type immediately, which is great for quick validation checks. classifyasync is better if you are dealing with a massive batch of files and want to run classification in the background without blocking your workflow.

Q: Can I extract data from multiple different types of documents at once?

You can use portfolio tools like extractportfoliofromurl. This lets you process related files together, ensuring all necessary structured fields are extracted in one go.

Q: How can I check the status or retrieve results using getdocument after an asynchronous extraction job?

You call getdocument(id) to pull specific extraction results. This is critical for confirming successful processing, especially if an async job was slow or failed initially.

Q: I refined my extraction rules; how do I apply those changes using updateconfiguration?

You send the parameters via updateconfiguration(id). This lets you revise an existing rule set without having to rebuild the entire configuration from scratch.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

Sensible handles structured data extraction from any document type—PDFs, images, Word files, etc. It turns messy, unstructured documents into clean, predictable JSON records using a robust parsing engine.

You classify documents first, then extract specific fields (like invoice numbers or tax IDs) whether the file is local, remote via URL, or part of a portfolio batch.

What your AI agents can do

Classify async

Classifies a document asynchronously by determining its type (e.g., invoice, W-2).

Classify sync

Classifies a document synchronously and returns the document type immediately.

Create configuration

Creates a new rule set (configuration) used to guide data extraction for a specific document type.

+ 34 more capabilities included

Classify Documents

Determines what kind of document you have (e.g., invoice, tax form) using synchronous or asynchronous classification tools.

Extract Data from Local Files

Runs an extraction job instantly on a file provided as a Base64 string (extract_sync).

Process Documents via URL

Starts background processing for documents hosted online, which is necessary for large volumes of files or external sources.

Handle Document Portfolios

Extracts data from a group (portfolio) of related documents at a specific URL using extract_portfolio_from_url.

Manage Data Schemas and Types

Allows you to create, update, and manage the rules (create_configuration) that dictate exactly what data points should be extracted from a given document type.

Ask AI about this MCP

Supported MCP Clients

OAuth 2.0 Compatible

Claude

ChatGPT

Cursor

Gemini

VS Code

JetBrains

Vercel

Zendesk

+ other MCP clients

Included with Plan

Waiting for input…

AI Agent

Sensible MCP Server: 37 Tools for Document Parsing

These tools allow your agent to manage the entire document lifecycle—from classification and configuration setup to synchronous extraction and generating final CSV/Excel reports.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using Sensible on Vinkius

classify019ea605

classify async

Classifies a document asynchronously by determining its type (e.g., invoice, W-2).

classify019ea605

classify sync

Classifies a document synchronously and returns the document type immediately.

create019ea605

create configuration

Creates a new rule set (configuration) used to guide data extraction for a specific document type.

create019ea605

create document type

Establishes an entirely new category or type of document within the system.

create019ea605

create golden

Creates a 'Golden' reference document, which serves as the primary source for defining data structure and quality control.

delete019ea605

delete configuration

Removes an existing data extraction configuration rule set.

delete019ea605

delete configuration version

Deletes a draft or unpublished version of a saved configuration.

delete019ea605

delete document type

Removes an entire document type definition from the system.

delete019ea605

delete golden

Deletes a reference document used for setting standards or templates.

extract019ea605

extract from url with config

Extracts data from a remote URL using specific rules defined by a configuration ID asynchronously.

extract019ea605

extract from url

Extracts data from any document hosted online, starting an asynchronous job.

extract019ea605

extract portfolio from url

Collects and extracts data from multiple related documents located at a single URL endpoint asynchronously.

extract019ea605

extract sync with config

Performs synchronous extraction on a local file (Base64) using an explicitly defined configuration ID.

extract019ea605

extract sync

Extracts structured data instantly when you provide the document as a Base64 encoded string.

extract019ea605

extract text from golden

Pulls all text lines and their exact coordinates from the master reference document for inspection.

generate019ea605

generate csv

Compiles multiple JSON extraction results into a standard CSV spreadsheet file format.

generate019ea605

generate excel

Compiles multiple JSON extraction results into a formatted Microsoft Excel workbook.

generate019ea605

generate portfolio upload url

Generates a secure, temporary URL for uploading an entire portfolio of documents for batch processing.

generate019ea605

generate upload url with config

Generates an upload URL specifically for asynchronous processing that must use a defined configuration set.

generate019ea605

generate upload url

Creates a pre-signed upload URL required to start any asynchronous document extraction process.

get019ea605

get auth tokens

Creates temporary authorization credentials, allowing external reviewers to access the data securely.

get019ea605

get configuration

Retrieves details for a specific document extraction configuration rule set by its ID.

get019ea605

get configuration version

Gets data about a particular saved version of an existing configuration.

get019ea605

get document type

Retrieves all metadata and details about a specific document type definition.

get019ea605

get document

Fetches the final extraction results for a document using its unique ID.

get019ea605

get extraction statistics

Returns metrics showing how much data has been extracted over recent days.

get019ea605

get golden

Retrieves metadata about the current reference document used for standardization.

list019ea605

list configuration versions

Shows all historical versions and drafts of a configuration rule set.

list019ea605

list configurations

Lists all available configurations that apply to a specific document type.

list019ea605

list document types

Provides an overview of every defined document type in the server system.

list019ea605

list extractions

Retrieves a paginated list of past extraction jobs, allowing you to track history and status.

list019ea605

list goldens

Shows all available reference documents defined for a specific document type.

publish019ea605

publish configuration

Makes a specific version of a configuration active and usable by the agent in production environments.

unassociate019ea605

unassociate golden

Removes a reference document from its current functional link to a specific configuration.

update019ea605

update configuration

Modifies an existing data extraction configuration rule set, adjusting the parsing logic.

update019ea605

update document type

Changes the general metadata or rules for a document type definition.

update019ea605

update golden

Updates the metadata associated with a reference document without changing its core content.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with Sensible, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,000+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Sensible. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 37 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Manually pulling data from invoices is a total waste of time.

Right now, if you need the total amount or the vendor name off an invoice PDF, you have to open it. Then you find the specific text block and copy/paste the data into your spreadsheet. You repeat that 50 times for a single week's worth of bills. It’s tedious, error-prone, and takes hours.

With Sensible, you just point your agent at the folder or the URL. The server runs `extract_from_url_with_config`. It reads every PDF in that batch and automatically pulls only the total amount and vendor name into clean JSON. You get a structured record for 50 invoices in minutes.

Using Sensible MCP Server: Structured Data Extraction

Forget having to write custom parsers for every single document template—whether it's a tax form or a utility bill. You define the data point once using `create_configuration` and you can instantly apply that rule set across thousands of different files.

The difference is control. Instead of hoping your agent guesses what you need, Sensible forces the output structure you demand. It’s predictable, reliable, and ready for integration.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What you can do with this MCP connector

You're done copy-pasting data out of PDFs and invoices. This server takes messy, unstructured documents—whether they’re PDFs, images, or Word files—and converts them into clean, predictable JSON records using a robust parsing engine.

The system starts by letting you define what kind of document you're dealing with. You establish an entirely new category or type of document using create_document_type, and then you guide the specific data extraction for that type by creating a rule set with create_configuration. If you need to standardize your process, you can create a 'Golden' reference document via create_golden, which serves as the primary source for defining both data structure and quality control.

You manage these rules using functions like update_configuration or get_configuration, and if something goes wrong, you can delete configurations with delete_configuration or remove entire types with delete_document_type.

When you receive a document, the first step is figuring out what it is. You classify documents using classify_sync to get the type immediately, or you run an async job with classify_async if classification takes time. Once classified, you have several ways to extract data. If you've got a local file encoded as Base64, extract_sync pulls structured data instantly.

For files hosted online, you start a background process using generate_upload_url, then run the extraction with extract_from_url. You can even target multiple related documents at one spot by running extract_portfolio_from_url. If your async job needs specific rules, use generate_upload_url_with_config and execute the extraction via extract_from_url_with_config, or for immediate local pulls using a configuration ID, run extract_sync_with_config.

To keep track of all this background work, you can list past jobs with list_extractions. You'll also need to manage your reference materials; get_golden retrieves metadata on the current standard document, and you can inspect every text line and its coordinates from that master source using extract_text_from_golden. When you’ve pulled a batch of data, you don't want JSON blobs.

You compile multiple results into usable formats: generate_csv creates a standard CSV spreadsheet, while generate_excel builds a formatted Microsoft Excel workbook.

For large-scale operations, the server helps you manage access and history. Use get_auth_tokens to create temporary credentials for external reviewers accessing data securely. You can list all available configurations with list_configurations, or check historical versions and drafts of any rule set using list_configuration_versions. If you're auditing your process, get_extraction_statistics returns metrics on how much data has been extracted recently, and you can see every defined document type overview by calling list_document_types.

The system also lets you review all available reference documents for a specific type using list_goldens, or view the metadata about any given configuration with get_configuration_version.

The server handles everything from setup to final delivery. It allows you to pull data instantly on local files, run complex background jobs against remote URLs, and aggregate those results into spreadsheets ready for your team to use.

Built · Hosted · Managed by Vinkius Sensible-MCP Server - Structured Data Extraction from PDFs Server ID 019ea606-0705-71f1-b6fc-8b6213d9d1eb

Vinkius Inspector

Compliance Grade A+

Score 98.33/100

Report View Report ↗

How Sensible MCP Works

1 First, use list_document_types or get_document_type to verify the required schema. If needed, you'll use tools like create_configuration and establish a 'Golden' reference document via create_golden.
2 Next, your agent calls an extraction tool—like extract_from_url_with_config for remote files or extract_sync_with_config for local data—passing the file and the target configuration ID.
3 Finally, Sensible returns JSON data. If you need a spreadsheet, call generate_csv or generate_excel to compile the results.

The bottom line is: You set up the rules once, point your AI agent at the messy file, and get clean, structured data back every time.

What Changes When You Connect

Stop manual data entry. Whether you process a single invoice or 10,000 tax forms, Sensible handles the extraction into predictable JSON records using extract_sync or extract_from_url.
Build reliable pipelines by managing your schemas first. Use tools like create_configuration and establish 'Golden' reference documents via create_golden to ensure consistent data mapping every time.
Handle scale with ease. Instead of running a job repeatedly, use the asynchronous URL methods (extract_from_url, generate_upload_url) for massive batch processing without timing out your agent.
Turn raw data into usable assets instantly. After extraction, call generate_excel or generate_csv. Your JSON output immediately becomes a spreadsheet ready for analysis in Excel or Google Sheets.
Manage complex document groups. If you have multiple related documents (like a Statement and an Appendix), use the portfolio tools—specifically extract_portfolio_from_url—to process them as one unit.

Real-World Use Cases

Processing Incoming Vendor Invoices

The Ops Engineer gets a batch of 50 PDF invoices attached to an email. Instead of downloading and opening each file, they use their agent to call generate_upload_url and then upload the files. The system runs extract_from_url_with_config, returning structured data for all 50 invoices in one go.

Cleaning Historical Scanned Records

The Data Analyst has a folder of old, scanned tax forms (images). They run the agent to classify them first using classify_async to confirm the document type. Then they process the batch via an upload URL and use generate_csv to convert all records into a single CSV file for BI tools.

Building a Contract Reviewer Tool

The Developer builds a workflow that first checks if a document is a contract using classify_sync. If it matches, the agent proceeds to call extract_sync_with_config to pull out specific clauses like 'Termination Date' and 'Governing Law', making the data ready for immediate use.

Standardizing Financial Data Feeds

A finance team needs to ensure all vendor invoices conform to one standard. They define a master schema using create_golden and then update their extraction tools with that configuration ID, guaranteeing the output structure is always correct.

The Tradeoffs

Treating every document like an image.

Simply passing a raw PDF file to an agent without first defining its type or schema. The system gets random, unusable text blocks instead of structured data.

→ Never extract blindly. Always run the classification tool (classify_sync) first. Then, define your expected output using create_configuration before calling any extraction method.

Using synchronous methods for scale.

Running extract_sync in a loop over hundreds of files. The job times out or hits rate limits because the system can't process it all at once.

→ For anything more than a handful of documents, use asynchronous processing. Start by generating an upload URL (generate_upload_url) and then queue up your batch jobs.

Bypassing the Golden Record.

Writing ad-hoc extraction code that pulls fields directly without referencing a master schema, causing field names to change when source documents update.

→ Use create_golden as your single source of truth. When you need to guarantee data structure integrity, always anchor your configuration against the Golden document.

When It Fits, When It Doesn't

You should use Sensible if your core problem is converting messy, unstructured documents (PDFs, scanned images, etc.) into clean JSON or spreadsheet data. This server handles the complex parsing logic so your agent doesn't have to.

Don't use this if: 1) Your input data is already in a structured format (like XML or clean CSV); you need a standard database connector instead. 2) You only need basic text extraction without knowing where the field lives on the page; then, simple OCR tools might suffice.

Use it when you need to manage the entire lifecycle: classification (classify_async), configuration definition (create_configuration), input handling (Base64 vs URL), and final output formatting (generate_excel).

Common Questions About Sensible MCP

How do I process a PDF file that's already attached to an email? +

You should use extract_sync if the file is small enough to pass as Base64. If it's part of a large batch, generating an upload URL with generate_upload_url and having your agent process the attachment through that secure endpoint works better.

What's the difference between classify_async and classify_sync? +

classify_sync returns the document type immediately, which is great for quick validation checks. classify_async is better if you are dealing with a massive batch of files and want to run classification in the background without blocking your workflow.

Can I extract data from multiple different types of documents at once? +

You can use portfolio tools like extract_portfolio_from_url. This lets you process related files together, ensuring all necessary structured fields are extracted in one go.

Which tool should I use to turn my JSON output into an Excel sheet? +

After the extraction is complete and you have the resulting JSON data, call generate_excel. It compiles your records directly into a usable spreadsheet format that's ready for sharing.

Why do I need to use create_golden before extracting? +

The Golden record establishes the single source of truth and the optimal schema for your data points. Using it guarantees consistency, so even if a vendor changes their invoice layout slightly, your extraction rules stay accurate.

How can I check the status or retrieve results using `get_document` after an asynchronous extraction job? +

You call get_document(id) to pull specific extraction results. This is critical for confirming successful processing, especially if an async job was slow or failed initially.

If I need temporary read-only access for external reviewers, what does the `get_auth_tokens` tool provide? +

It generates temporary authorization tokens. You can use these to give reviewers limited viewing access without handing over your main API key credentials.

I refined my extraction rules; how do I apply those changes using `update_configuration`? +

You send the parameters via update_configuration(id). This lets you revise an existing rule set without having to rebuild the entire configuration from scratch.

Can I extract data from a document instantly if I have its Base64 representation? +

Yes! Use the extract_sync tool. Provide the document type and the Base64-encoded document bytes, and your agent will return the structured extraction results synchronously.

How do I extract data from a document hosted at a public URL? +

You can use the extract_from_url tool. Simply provide the document type, the document URL, and the content type (e.g., application/pdf) to trigger an asynchronous extraction.

Can I specify a custom configuration layout when extracting? +

Yes, you can target specific configurations by using the extract_sync_with_config or extract_from_url_with_config tools, which allow you to define the exact configuration name to use for parsing.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python

Pydantic AI sdk-python

Vercel AI SDK sdk-typescript