Sensible MCP. Extract structured JSON from PDFs, images, and files.
Works with every AI agent you already use
…and any MCP-compatible client
Just plug in your AI agents and start using Vinkius.
Sensible handles structured data extraction from any document type—PDFs, images, Word files, etc. It turns messy, unstructured documents into clean, predictable JSON records using a robust parsing engine.
You classify documents first, then extract specific fields (like invoice numbers or tax IDs) whether the file is local, remote via URL, or part of a portfolio batch.
What your AI agents can do
Classify async
Classifies a document asynchronously by determining its type (e.g., invoice, W-2).
Classify sync
Classifies a document synchronously and returns the document type immediately.
Create configuration
Creates a new rule set (configuration) used to guide data extraction for a specific document type.
Determines what kind of document you have (e.g., invoice, tax form) using synchronous or asynchronous classification tools.
Runs an extraction job instantly on a file provided as a Base64 string (extract_sync).
Starts background processing for documents hosted online, which is necessary for large volumes of files or external sources.
Extracts data from a group (portfolio) of related documents at a specific URL using extract_portfolio_from_url.
Allows you to create, update, and manage the rules (create_configuration) that dictate exactly what data points should be extracted from a given document type.
Ask AI about this MCP
Supported MCP Clients
OAuth 2.0 CompatibleWaiting for input…
Sensible MCP Server: 37 Tools for Document Parsing
These tools allow your agent to manage the entire document lifecycle—from classification and configuration setup to synchronous extraction and generating final CSV/Excel reports.
Make your AI actually useful.
Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.
Start using Sensible on Vinkius019ea605classify async
Classifies a document asynchronously by determining its type (e.g., invoice, W-2).
019ea605classify sync
Classifies a document synchronously and returns the document type immediately.
019ea605create configuration
Creates a new rule set (configuration) used to guide data extraction for a specific document type.
019ea605create document type
Establishes an entirely new category or type of document within the system.
019ea605create golden
Creates a 'Golden' reference document, which serves as the primary source for defining data structure and quality control.
019ea605delete configuration
Removes an existing data extraction configuration rule set.
019ea605delete configuration version
Deletes a draft or unpublished version of a saved configuration.
019ea605delete document type
Removes an entire document type definition from the system.
019ea605delete golden
Deletes a reference document used for setting standards or templates.
019ea605extract from url with config
Extracts data from a remote URL using specific rules defined by a configuration ID asynchronously.
019ea605extract from url
Extracts data from any document hosted online, starting an asynchronous job.
019ea605extract portfolio from url
Collects and extracts data from multiple related documents located at a single URL endpoint asynchronously.
019ea605extract sync with config
Performs synchronous extraction on a local file (Base64) using an explicitly defined configuration ID.
019ea605extract sync
Extracts structured data instantly when you provide the document as a Base64 encoded string.
019ea605extract text from golden
Pulls all text lines and their exact coordinates from the master reference document for inspection.
019ea605generate csv
Compiles multiple JSON extraction results into a standard CSV spreadsheet file format.
019ea605generate excel
Compiles multiple JSON extraction results into a formatted Microsoft Excel workbook.
019ea605generate portfolio upload url
Generates a secure, temporary URL for uploading an entire portfolio of documents for batch processing.
019ea605generate upload url with config
Generates an upload URL specifically for asynchronous processing that must use a defined configuration set.
019ea605generate upload url
Creates a pre-signed upload URL required to start any asynchronous document extraction process.
019ea605get auth tokens
Creates temporary authorization credentials, allowing external reviewers to access the data securely.
019ea605get configuration
Retrieves details for a specific document extraction configuration rule set by its ID.
019ea605get configuration version
Gets data about a particular saved version of an existing configuration.
019ea605get document type
Retrieves all metadata and details about a specific document type definition.
019ea605get document
Fetches the final extraction results for a document using its unique ID.
019ea605get extraction statistics
Returns metrics showing how much data has been extracted over recent days.
019ea605get golden
Retrieves metadata about the current reference document used for standardization.
019ea605list configuration versions
Shows all historical versions and drafts of a configuration rule set.
019ea605list configurations
Lists all available configurations that apply to a specific document type.
019ea605list document types
Provides an overview of every defined document type in the server system.
019ea605list extractions
Retrieves a paginated list of past extraction jobs, allowing you to track history and status.
019ea605list goldens
Shows all available reference documents defined for a specific document type.
019ea605publish configuration
Makes a specific version of a configuration active and usable by the agent in production environments.
019ea605unassociate golden
Removes a reference document from its current functional link to a specific configuration.
019ea605update configuration
Modifies an existing data extraction configuration rule set, adjusting the parsing logic.
019ea605update document type
Changes the general metadata or rules for a document type definition.
019ea605update golden
Updates the metadata associated with a reference document without changing its core content.
Choose How to Get Started
Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.
Build Your Own
Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.
- Import from OpenAPI, Swagger, or YAML specs
- Create Agent Skills with progressive disclosure
- Deploy to edge with MCPFusion framework
- Built in DLP, auth, and compliance on every call
- Real time usage dashboard and cost metering
- Publish to catalog or keep private
Make Your AI Do More
Start with Sensible, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.
- Use this MCP plus 5,000+ others, all in one place
- Add new capabilities to your AI anytime you want
- Every connection is secured and compliant automatically
- Track usage and costs across all your servers
- Works with Claude, ChatGPT, Cursor, and more
- New servers added to the catalog every week
Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by Sensible. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.
VINKIUS INFRASTRUCTURE
Cloud Hosted
Managed infra
V8 Isolated
Sandboxed per request
Zero-Trust Proxy
No stored credentials
DLP Enforced
Policy on every call
GDPR Compliant
EU data residency
Token Compression
~60% cost reduction
Works with Claude, ChatGPT, Cursor, and more
The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.
This server provides 37 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.
Manually pulling data from invoices is a total waste of time.
Right now, if you need the total amount or the vendor name off an invoice PDF, you have to open it. Then you find the specific text block and copy/paste the data into your spreadsheet. You repeat that 50 times for a single week's worth of bills. It’s tedious, error-prone, and takes hours.
With Sensible, you just point your agent at the folder or the URL. The server runs `extract_from_url_with_config`. It reads every PDF in that batch and automatically pulls only the total amount and vendor name into clean JSON. You get a structured record for 50 invoices in minutes.
Using Sensible MCP Server: Structured Data Extraction
Forget having to write custom parsers for every single document template—whether it's a tax form or a utility bill. You define the data point once using `create_configuration` and you can instantly apply that rule set across thousands of different files.
The difference is control. Instead of hoping your agent guesses what you need, Sensible forces the output structure you demand. It’s predictable, reliable, and ready for integration.
What you can do with this MCP connector
You're done copy-pasting data out of PDFs and invoices. This server takes messy, unstructured documents—whether they’re PDFs, images, or Word files—and converts them into clean, predictable JSON records using a robust parsing engine.
The system starts by letting you define what kind of document you're dealing with. You establish an entirely new category or type of document using create_document_type, and then you guide the specific data extraction for that type by creating a rule set with create_configuration. If you need to standardize your process, you can create a 'Golden' reference document via create_golden, which serves as the primary source for defining both data structure and quality control.
You manage these rules using functions like update_configuration or get_configuration, and if something goes wrong, you can delete configurations with delete_configuration or remove entire types with delete_document_type.
When you receive a document, the first step is figuring out what it is. You classify documents using classify_sync to get the type immediately, or you run an async job with classify_async if classification takes time. Once classified, you have several ways to extract data. If you've got a local file encoded as Base64, extract_sync pulls structured data instantly.
For files hosted online, you start a background process using generate_upload_url, then run the extraction with extract_from_url. You can even target multiple related documents at one spot by running extract_portfolio_from_url. If your async job needs specific rules, use generate_upload_url_with_config and execute the extraction via extract_from_url_with_config, or for immediate local pulls using a configuration ID, run extract_sync_with_config.
To keep track of all this background work, you can list past jobs with list_extractions. You'll also need to manage your reference materials; get_golden retrieves metadata on the current standard document, and you can inspect every text line and its coordinates from that master source using extract_text_from_golden. When you’ve pulled a batch of data, you don't want JSON blobs.
You compile multiple results into usable formats: generate_csv creates a standard CSV spreadsheet, while generate_excel builds a formatted Microsoft Excel workbook.
For large-scale operations, the server helps you manage access and history. Use get_auth_tokens to create temporary credentials for external reviewers accessing data securely. You can list all available configurations with list_configurations, or check historical versions and drafts of any rule set using list_configuration_versions. If you're auditing your process, get_extraction_statistics returns metrics on how much data has been extracted recently, and you can see every defined document type overview by calling list_document_types.
The system also lets you review all available reference documents for a specific type using list_goldens, or view the metadata about any given configuration with get_configuration_version.
The server handles everything from setup to final delivery. It allows you to pull data instantly on local files, run complex background jobs against remote URLs, and aggregate those results into spreadsheets ready for your team to use.
019ea606-0705-71f1-b6fc-8b6213d9d1eb How Sensible MCP Works
- 1 First, use
list_document_typesorget_document_typeto verify the required schema. If needed, you'll use tools likecreate_configurationand establish a 'Golden' reference document viacreate_golden. - 2 Next, your agent calls an extraction tool—like
extract_from_url_with_configfor remote files orextract_sync_with_configfor local data—passing the file and the target configuration ID. - 3 Finally, Sensible returns JSON data. If you need a spreadsheet, call
generate_csvorgenerate_excelto compile the results.
The bottom line is: You set up the rules once, point your AI agent at the messy file, and get clean, structured data back every time.
Who Is Sensible MCP For?
The Ops Engineer who spends all day manually pulling data from PDFs into spreadsheets. The Data Analyst drowning in document repositories that need to become clean datasets. Developers building automated ingestion pipelines that can't rely on human intervention.
Uses the server to extract key fields (like PO numbers or payment dates) from batches of uploaded invoices and contracts without manual data entry.
Connects the server to pull historical data from thousands of scanned documents, converting unstructured archives into queryable CSV datasets.
Integrates document parsing directly into an application's workflow, allowing the user agent to ingest files and process them programmatically via Base64 strings.
What Changes When You Connect
- Stop manual data entry. Whether you process a single invoice or 10,000 tax forms, Sensible handles the extraction into predictable JSON records using
extract_syncorextract_from_url. - Build reliable pipelines by managing your schemas first. Use tools like
create_configurationand establish 'Golden' reference documents viacreate_goldento ensure consistent data mapping every time. - Handle scale with ease. Instead of running a job repeatedly, use the asynchronous URL methods (
extract_from_url,generate_upload_url) for massive batch processing without timing out your agent. - Turn raw data into usable assets instantly. After extraction, call
generate_excelorgenerate_csv. Your JSON output immediately becomes a spreadsheet ready for analysis in Excel or Google Sheets. - Manage complex document groups. If you have multiple related documents (like a Statement and an Appendix), use the portfolio tools—specifically
extract_portfolio_from_url—to process them as one unit.
Real-World Use Cases
Processing Incoming Vendor Invoices
The Ops Engineer gets a batch of 50 PDF invoices attached to an email. Instead of downloading and opening each file, they use their agent to call generate_upload_url and then upload the files. The system runs extract_from_url_with_config, returning structured data for all 50 invoices in one go.
Cleaning Historical Scanned Records
The Data Analyst has a folder of old, scanned tax forms (images). They run the agent to classify them first using classify_async to confirm the document type. Then they process the batch via an upload URL and use generate_csv to convert all records into a single CSV file for BI tools.
Building a Contract Reviewer Tool
The Developer builds a workflow that first checks if a document is a contract using classify_sync. If it matches, the agent proceeds to call extract_sync_with_config to pull out specific clauses like 'Termination Date' and 'Governing Law', making the data ready for immediate use.
Standardizing Financial Data Feeds
A finance team needs to ensure all vendor invoices conform to one standard. They define a master schema using create_golden and then update their extraction tools with that configuration ID, guaranteeing the output structure is always correct.
The Tradeoffs
Treating every document like an image.
Simply passing a raw PDF file to an agent without first defining its type or schema. The system gets random, unusable text blocks instead of structured data.
→
Never extract blindly. Always run the classification tool (classify_sync) first. Then, define your expected output using create_configuration before calling any extraction method.
Using synchronous methods for scale.
Running extract_sync in a loop over hundreds of files. The job times out or hits rate limits because the system can't process it all at once.
→
For anything more than a handful of documents, use asynchronous processing. Start by generating an upload URL (generate_upload_url) and then queue up your batch jobs.
Bypassing the Golden Record.
Writing ad-hoc extraction code that pulls fields directly without referencing a master schema, causing field names to change when source documents update.
→
Use create_golden as your single source of truth. When you need to guarantee data structure integrity, always anchor your configuration against the Golden document.
When It Fits, When It Doesn't
You should use Sensible if your core problem is converting messy, unstructured documents (PDFs, scanned images, etc.) into clean JSON or spreadsheet data. This server handles the complex parsing logic so your agent doesn't have to.
Don't use this if: 1) Your input data is already in a structured format (like XML or clean CSV); you need a standard database connector instead. 2) You only need basic text extraction without knowing where the field lives on the page; then, simple OCR tools might suffice.
Use it when you need to manage the entire lifecycle: classification (classify_async), configuration definition (create_configuration), input handling (Base64 vs URL), and final output formatting (generate_excel).
Common Questions About Sensible MCP
How do I process a PDF file that's already attached to an email? +
You should use extract_sync if the file is small enough to pass as Base64. If it's part of a large batch, generating an upload URL with generate_upload_url and having your agent process the attachment through that secure endpoint works better.
What's the difference between classify_async and classify_sync? +
classify_sync returns the document type immediately, which is great for quick validation checks. classify_async is better if you are dealing with a massive batch of files and want to run classification in the background without blocking your workflow.
Can I extract data from multiple different types of documents at once? +
You can use portfolio tools like extract_portfolio_from_url. This lets you process related files together, ensuring all necessary structured fields are extracted in one go.
Which tool should I use to turn my JSON output into an Excel sheet? +
After the extraction is complete and you have the resulting JSON data, call generate_excel. It compiles your records directly into a usable spreadsheet format that's ready for sharing.
Why do I need to use create_golden before extracting? +
The Golden record establishes the single source of truth and the optimal schema for your data points. Using it guarantees consistency, so even if a vendor changes their invoice layout slightly, your extraction rules stay accurate.
How can I check the status or retrieve results using `get_document` after an asynchronous extraction job? +
You call get_document(id) to pull specific extraction results. This is critical for confirming successful processing, especially if an async job was slow or failed initially.
If I need temporary read-only access for external reviewers, what does the `get_auth_tokens` tool provide? +
It generates temporary authorization tokens. You can use these to give reviewers limited viewing access without handing over your main API key credentials.
I refined my extraction rules; how do I apply those changes using `update_configuration`? +
You send the parameters via update_configuration(id). This lets you revise an existing rule set without having to rebuild the entire configuration from scratch.
Can I extract data from a document instantly if I have its Base64 representation? +
Yes! Use the extract_sync tool. Provide the document type and the Base64-encoded document bytes, and your agent will return the structured extraction results synchronously.
How do I extract data from a document hosted at a public URL? +
You can use the extract_from_url tool. Simply provide the document type, the document URL, and the content type (e.g., application/pdf) to trigger an asynchronous extraction.
Can I specify a custom configuration layout when extracting? +
Yes, you can target specific configurations by using the extract_sync_with_config or extract_from_url_with_config tools, which allow you to define the exact configuration name to use for parsing.
Use it with your favorite AI tools
Connect this server to Cursor, Claude, VS Code, and more.