PDF.co MCP. Extract structured data and convert documents from chat.

Q: What is the difference between pdftotext and pdftojson?

The key difference is structure. pdftotext gives you one big block of raw text, losing all formatting. pdftojson analyzes the document's layout and organizes the content into labeled fields, keeping context.

Q: Can I use ocrimage to read handwritten notes in a PDF?

Yes. You pass the image through ocrimage. It runs Optical Character Recognition specifically designed for scanned or handwritten documents, extracting text that standard digital readers can't see.

Claude

ChatGPT

Cursor

Gemini

Windsurf

VS Code

JetBrains

Vercel

See Vinkius in Action

Works with every AI agent you already use

…and any MCP-compatible client

Just plug in your AI agents and start using Vinkius.

PDF.co lets your AI client handle all document processing—parsing, converting, merging, and securing PDFs right in the chat window. You use it to extract structured data like tables into JSON or CSV formats, perform OCR on scanned images, or combine multiple reports into one file.

It’s a full suite of tools for turning messy documents into clean, actionable data pipelines without ever leaving your conversation.

What your AI agents can do

Check job status

Checks the status of any document processing job that was run asynchronously.

Pdf to csv

Converts data presented in PDF tables directly into a Comma Separated Values (CSV) file.

Pdf to json

Extracts and structures the entire content of a PDF document into a standardized JSON object.

+ 9 more capabilities included

Convert Documents to Structured Data

Transform PDFs and images into specific formats like JSON, CSV, XML, or plain text using tools such as pdf_to_json or pdf_to_csv.

Extract Specific Data Types

Pull metadata from a PDF with extract_pdf_meta, extract tables into structured formats via pdf_to_json, or perform OCR on images using ocr_image.

Combine and Divide Files

Use merge_pdfs to combine multiple PDFs into a single file, or use split_pdf to break one large document into smaller parts.

Manage Document Security

Apply password protection with protect_pdf, or remove existing passwords using unprotect_pdf on PDF files.

Monitor Job Status and Account Info

Check the progress of background processing jobs with check_job_status, and view your service credit balance via get_account_info.

Ask AI about this MCP

Supported MCP Clients

OAuth 2.0 Compatible

Claude

ChatGPT

Cursor

Gemini

VS Code

JetBrains

Vercel

Zendesk

+ other MCP clients

Included with Plan

Waiting for input…

AI Agent

PDF.co MCP Server: 12 Tools for Document Processing

This server gives your AI client everything it needs to handle PDFs—from converting data formats and reading scans to merging files and controlling security settings.

Make your AI actually useful.

Add this MCP to Claude, Cursor, or Windsurf and your AI stops guessing. It gets real tools to look things up, take action, and handle the stuff you keep doing by hand.

Start using PDF.co on Vinkius

check019dd138

check job status

Checks the status of any document processing job that was run asynchronously.

pdf019dd138

pdf to csv

Converts data presented in PDF tables directly into a Comma Separated Values (CSV) file.

pdf019dd138

pdf to json

Extracts and structures the entire content of a PDF document into a standardized JSON object.

pdf019dd138

pdf to text

Converts an entire PDF file into simple, clean plain text format.

pdf019dd138

pdf to xml

Converts a PDF document's content and structure into an XML file.

extract019dd138

extract pdf meta

Extracts general metadata (like creation date, author, and title) from a PDF file.

get019dd138

get account info

Retrieves your current account usage metrics and service credit balance.

merge019dd138

merge pdfs

Combines two or more separate PDF documents into a single output file.

ocr019dd138

ocr image

Runs Optical Character Recognition on an uploaded image to extract text, even if the original document was scanned.

protect019dd138

protect pdf

Adds password protection to a PDF, restricting access or editing capabilities.

split019dd138

split pdf

Cuts one large PDF document into multiple smaller PDFs based on page numbers or ranges.

unprotect019dd138

unprotect pdf

Removes existing password protection from a locked PDF file.

Choose How to Get Started

Build a custom MCP for your own tools, or connect a ready-made integration from our catalog.

Build Your Own

Turn any API into an MCP. Import a spec, define Agent Skills, or deploy with MCPFusion.

Import from OpenAPI, Swagger, or YAML specs
Create Agent Skills with progressive disclosure
Deploy to edge with MCPFusion framework
Built in DLP, auth, and compliance on every call
Real time usage dashboard and cost metering
Publish to catalog or keep private

Start building

Make Your AI Do More

Start with PDF.co, then connect any of our 5,000+ other servers whenever your AI needs more. One click, no limits.

Use this MCP plus 5,000+ others, all in one place
Add new capabilities to your AI anytime you want
Every connection is secured and compliant automatically
Track usage and costs across all your servers
Works with Claude, ChatGPT, Cursor, and more
New servers added to the catalog every week

Independent Platform Disclaimer: Vinkius is an independent platform and is not affiliated with, endorsed by, sponsored by, verified by, or otherwise authorized by PDF.co. All third-party trademarks, logos, and brand names are the property of their respective owners. Their use on this website is strictly for informational purposes to identify service compatibility and interoperability.

VINKIUS INFRASTRUCTURE

Cloud Hosted

Managed infra

V8 Isolated

Sandboxed per request

Zero-Trust Proxy

No stored credentials

DLP Enforced

Policy on every call

GDPR Compliant

EU data residency

Token Compression

~60% cost reduction

Your data is protected. See how we built it.

Works with Claude, ChatGPT, Cursor, and more

The Model Context Protocol standardizes how applications expose capabilities to LLMs. Instead of operating in isolation, your AI gains direct access to external platforms, live data, and real-world actions through secure, standardized connections.

This server provides 12 capabilities that interface natively with Claude, ChatGPT, Cursor, and any MCP client. No middleware. No custom integration required.

Manually extracting data from PDFs feels like detective work.

Think about it: you get a quarterly report. You open the PDF, then you have to manually spot the revenue number, copy it into a spreadsheet cell, find the tax percentage on page 8, and paste that too. Then you repeat this process for ten different reports just because they're in PDF format.

With the PDF.co MCP Server, you tell your agent exactly what you need—like 'Give me the total revenue from all attached PDFs.' It runs the necessary tools (like `pdf_to_json`) and returns a clean, structured data block instantly. You get the answer, not just a file.

Use PDF.co MCP Server for predictable JSON output.

Before this server, if you needed to process tables from PDFs, you were stuck using `pdf_to_text` or relying on visual guesswork. The text conversion often loses the relationship between columns and rows, making the resulting data useless for automation.

Now, when you call `pdf_to_json`, the AI sees the document's internal layout—the tables, the headers, the fields—and maps them to a predictable JSON schema. You get reliable data that your code can actually use.

Support 24/7 support@vinkius.com ↗

Security Vinkius Trust Center ↗

SLA Service Level Agreement ↗

Report Listing Send Report ↗

What you can do with this MCP connector

Listen up. This isn't some basic converter you use when you're bored. The PDF.co server gives your AI client a full suite of tools to handle documents—it’s for parsing, converting, merging, and locking down PDFs right in the chat window. You'll use it anytime you need your agent to deal with messy files and turn them into clean, actionable data without you having to copy-paste a single thing.

Converting Documents to Structured Data
You can make your AI client read every kind of PDF structure using dedicated conversion tools. If the document has tables, use pdf_to_csv and it'll spit out a perfect Comma Separated Values file you can use in any spreadsheet program. For deeper data analysis, run pdf_to_json to extract and structure all the content into a standardized JSON object that your code can actually read.

If you need something more rigid, pdf_to_xml converts the entire document's structure into an XML file format. Even if all you need is raw reading material, pdf_to_text handles converting the whole PDF down to simple, clean plain text.

Extracting Specific Data and Handling Scans
Sometimes you don't want the whole thing; you just need pieces of info. You can pull basic document information—like who created it, what the title is, or when it was made—by running extract_pdf_meta. If you've got a scanned invoice or some old paperwork that isn't digital text, don't sweat it.

Use ocr_image to run Optical Character Recognition on an uploaded image; it extracts usable text even if the original document was just ink on paper. For massive PDFs, if you only need sections three through five, use split_pdf, and it’ll cut that one big file into several smaller parts for you.

Combining and Managing Files
When your workflow requires multiple inputs, this server handles the heavy lifting. You can run merge_pdfs to take two or more separate PDF documents—say, quarterly reports from different departments—and combine them into a single output file. On the flip side of organization, you might need to mess with security.

If a document is locked down and you need access, use unprotect_pdf to strip away existing passwords so your agent can work on it. Conversely, if you're sending something sensitive, you can run protect_pdf to add password protection, restricting who can view or edit the file.

Utility and Monitoring
Your AI client keeps track of everything running in the background. When a big job—like converting 50 files—is queued up, use check_job_status to see exactly where that document processing is at. Plus, you can keep an eye on your usage with get_account_info, which pulls up your current service credit balance and account metrics so you know what's left.

Basically, it gives you the whole damn toolbox for making PDFs into usable data.

Built · Hosted · Managed by Vinkius PDF.co MCP Server - Convert & Extract Data from PDFs Server ID 019dd138-7b3c-705b-a69d-4f1fc09c6f6d

Vinkius Inspector

Compliance Grade A+

Score 98.33/100

Report View Report ↗

Who Is PDF.co MCP For?

This tool’s best users are data-heavy roles that spend too much time switching between document readers, spreadsheet programs, and databases. Think Data Analysts, Accountants, or Operations Managers who deal with invoices, reports, and legal filings daily. If your job involves extracting numbers from a PDF report before you can use it in a dashboard, this is for you.

Data Analyst

Uses pdf_to_json to pull structured data points (like revenue or line items) from complex PDFs so they can be fed into BI tools.

Operations Manager

Automates the collection of documents by using merge_pdfs on multiple quarterly reports, then potentially securing them with protect_pdf for distribution.

Accountant

Processes batches of scanned invoices by running ocr_image to pull text and metadata, making it easy to reconcile accounts without manual entry.

What Changes When You Connect

Stop losing time on manual extraction. Use pdf_to_json or pdf_to_csv to turn complex tables directly into machine-readable data, eliminating spreadsheet copy/paste errors.
Handle mixed media inputs instantly. Run ocr_image on scanned invoices and handwriting samples; it extracts text that simple PDF readers miss entirely.
Simplify document management workflows. Need to combine three quarterly reports? Use merge_pdfs; the server handles stitching them together into one file, keeping all pages sequential.
Maintain data integrity across systems. Convert files using pdf_to_xml or pdf_to_json, ensuring your downstream application gets a clean, predictable schema every time.
Control document access right from chat. Apply security locks with protect_pdf immediately after processing sensitive client documents.

Real-World Use Cases

Processing a Batch of Client Invoices

An accountant gets 50 scanned invoices (JPEGs). Instead of manually typing in the vendor name, invoice number, and total for each one, they ask their agent to run ocr_image on all 50 files. The server extracts the necessary metadata from every image, allowing them to compile a master spreadsheet with zero manual data entry.

Building an Annual Compliance Binder

An operations manager needs to combine annual reports (Q1 through Q4) and ensure they're secure. They first use merge_pdfs to compile the 4 reports into one, then run protect_pdf on the final file before uploading it to the archive.

Converting Raw Report Data for a Database

A data analyst has a PDF report full of financial tables. They use pdf_to_json, which pulls out all column headers and values into a structured JSON object. The agent then passes this clean, predictable data directly to the database API.

Splitting Master Legal Documents

A legal team receives one massive 300-page agreement PDF. Instead of reading it all at once, they ask their agent to run split_pdf to separate the 'Definitions' section (pages 1-25) from the 'Exhibit A' section (pages 280-300), giving them smaller, manageable files.

The Tradeoffs

Treating all PDFs as simple text.

Copying and pasting the output of pdf_to_text into a database field when you actually need specific columns like 'Total' or 'Date'. You lose structure, making data useless for analysis.

→ Don't use pdf_to_text. Use pdf_to_json instead. It preserves the document's inherent structure—tables and fields—so your downstream system gets clean keys and values.

Ignoring job processing delays.

Asking the agent to process a massive 500-page PDF conversion, and then immediately asking 'What is the result?' without waiting. The request fails because the server hasn't finished running the background task.

→ After initiating a large task, always use check_job_status. This confirms the job is done before you ask for the final output.

Sharing sensitive documents unsecured.

Generating a PDF containing client payroll data and then sending it out via email without protection. Anyone with access can view or copy the raw information.

→ Always run protect_pdf on any document that contains PII or proprietary data immediately after you've finished assembling it.

When It Fits, When It Doesn't

Use this server if your primary bottleneck is getting clean, structured data out of unstructured documents (scans, reports, invoices). You need a tool that can convert PDF tables into JSON objects, or reliably extract metadata. The tools are best when you use them sequentially: e.g., first ocr_image to get the text from a scan, then pass that output through your agent logic to format it via structured conversion tools like pdf_to_json. Don't use this if your need is just simple viewing or editing; those are PDF reader functions. Also, don’t try to build an entire document management system on this—it handles processing; you still need a separate storage solution for the files themselves.

Common Questions About PDF.co MCP

How do I convert PDF tables into structured data using pdf_to_csv? +

You simply tell the agent to 'Convert this document's tables to CSV.' The tool handles identifying all tabular content and outputs it in a standard, delimited format ready for import.

What is the difference between pdf_to_text and pdf_to_json? +

The key difference is structure. pdf_to_text gives you one big block of raw text, losing all formatting. pdf_to_json analyzes the document's layout and organizes the content into labeled fields, keeping context.

Can I use ocr_image to read handwritten notes in a PDF? +

Yes. You pass the image through ocr_image. It runs Optical Character Recognition specifically designed for scanned or handwritten documents, extracting text that standard digital readers can't see.

How do I combine several PDFs into one using merge_pdfs? +

Just upload the files and tell your agent to 'Merge these three reports.' The merge_pdfs tool combines them sequentially into a single, cohesive PDF document for you.

How do I use `protect_pdf` to add password security to a document? +

The tool encrypts your PDF file. You provide the document and the desired credentials, which locks it down so only authorized users can view or edit the content.

What is the purpose of `check_job_status` after a conversion task? +

It lets you track long-running processes. Complex conversions take time; use this tool to monitor if your document job completed successfully or if it ran into an error.

How can I use `extract_pdf_meta` to get information about the PDF itself? +

It pulls out hidden document properties. This function reads key metadata like the author, creation date, and title embedded deep within the file structure.

If I only need specific pages, how does `split_pdf` work? +

You can break a large PDF into smaller parts. Just specify the exact page range or individual pages you want to extract and create new, separated documents.

Can my AI automatically find and extract a specific table from a PDF? +

Yes! Use the convert_to_csv or convert_to_json tools. Your agent will respond with the structured tabular data from the document in seconds, ready for analysis.

How do I find my PDF.co API Key? +

Log in to your PDF.co account, navigate to the main dashboard, and you will find your unique secret API key (starting with your email reference or key string) there.

Does this support handwritten text recognition? +

Absolutely. PDF.co's high-fidelity OCR engine is designed to handle both printed and handwritten text with high accuracy across multiple languages.

Use it with your favorite AI tools

Connect this server to Cursor, Claude, VS Code, and more.

OpenAI Agents SDK sdk-python

Google ADK sdk-python