# Pdfcrowd MCP

> Pdfcrowd converts web pages, raw HTML, and documents into usable formats like PDF, images, or plain text. It also lets you generate professional business records—like invoices and receipts—directly from structured JSON data using your AI agent.

## Overview
- **Category:** productivity
- **Price:** Free
- **Tags:** pdf-generation, html-to-pdf, web-to-pdf, invoice-generator, document-conversion

## Description

This server handles everything from turning raw web data into polished documents to generating official paperwork for your AI client. You can feed it anything—a URL, an HTML string, or an existing PDF—and it spits out usable content.

Need a complete document? Use `convert_html_to_pdf` and pass in an HTML string or a full web address; the agent sends back the base64 data for a finished PDF. If you just need a visual snapshot of what that page looks like, run `convert_html_to_image` to capture it as a base64 encoded image (PNG, JPG, WebP).

Pulling information out of existing PDFs is straightforward. To strip the document down to raw, searchable text for data analysis, use `convert_pdf_to_text`. If you need more than just plain words—if your agent needs to analyze structural tags and formatting—you can run `convert_pdf_to_html` to transform the PDF's content into editable HTML. Conversely, if you start with clean HTML or a raw web page and want it in a document format, those functions cover it.

When it comes to business records, forget building invoices from scratch. Just send structured JSON data—like item lists, total amounts, or dates—to `generate_business_document`, and the tool returns a polished, professional PDF invoice or receipt that you can use right away.

## Tools

### convert_html_to_image
Captures a visual screenshot of an HTML page or URL, returning the image data in base64 format.

### convert_html_to_pdf
Converts any given web page or raw HTML string into a complete PDF document, returned as base64 encoded data.

### convert_pdf_to_html
Transforms the content of an existing PDF file back into editable HTML format, retaining structural tags.

### convert_pdf_to_text
Extracts only the plain text from a PDF document, discarding all formatting and keeping it clean for data analysis.

### generate_business_document
Creates professional business records (invoices or receipts) by taking structured JSON input and returning a final PDF file.

## Prompt Examples

**Prompt:** 
```
Convert the web page https://example.com to a PDF in landscape orientation.
```

**Response:** 
```
I've initiated the conversion for https://example.com. I've set the orientation to landscape as requested. Your PDF is ready for download.
```

**Prompt:** 
```
Take a PNG screenshot of https://news.ycombinator.com with a width of 1280px.
```

**Response:** 
```
Capturing the screenshot... I've generated a PNG image of Hacker News at a 1280px viewport width. You can view the image now.
```

**Prompt:** 
```
Generate a modern invoice for $1200 USD for 'Software Consulting' with 1 item.
```

**Response:** 
```
I'm generating your modern invoice. I've added 'Software Consulting' as the line item for a total of $1200 USD. The PDF document has been created.
```

## Capabilities

### Convert web pages/HTML to PDF
Pass an HTML string or URL, and the agent sends back base64 encoded data for a complete PDF document.

### Capture screenshots from web content
Takes a visual snapshot of a webpage or raw HTML structure, returning it as a base64 encoded image (PNG, JPG, WebP).

### Extract structured data into HTML
Converts an existing PDF file into clean HTML data, preserving formatting and allowing the agent to analyze document structure.

### Get plain text from PDFs
Processes a PDF document and returns only the raw, searchable text content, stripping away all formatting.

### Generate professional business documents
Takes structured JSON data (like item lists, totals, dates) and generates a polished, ready-to-use PDF invoice or receipt.

## Use Cases

### Archiving an old website
The content manager needs to save 50 pages from a site that might go offline. Instead of manually printing each page, they ask their agent to run `convert_html_to_pdf` on the list of URLs. They end up with 50 standardized PDFs ready for documentation.

### Processing financial statements
The analyst gets a PDF bank statement and needs key figures. Instead of copy-pasting messy text, they ask their agent to use `convert_pdf_to_text`. This pulls out only the clean numbers and dates for calculation.

### Creating automated invoices
The sales system generates a list of completed tasks in JSON format. The finance team tells the agent to run `generate_business_document`, which spits out a perfectly formatted, ready-to-send PDF invoice without touching any templates.

### Building developer reports
The dev needs to show a live web feature in a report. Instead of taking a quick, blurry screenshot, they ask the agent to use `convert_html_to_image` with specific dimensions (e.g., 1280px width) for pixel-perfect inclusion.

## Benefits

- **Instant Report Generation:** Don't manually screenshot reports. Use `convert_html_to_pdf` to turn a live URL or complex HTML into one standardized, high-quality PDF file in seconds.
- **Structured Data Capture:** Need the underlying code structure? Run `convert_pdf_to_html`. This keeps tables and headings intact when you send the data back to your agent for processing.
- **Accurate Text Extraction:** When all you care about is the text, use `convert_pdf_to_text`. It strips away headers, footers, and junk formatting, giving you clean text ready for database entry.
- **Professional Billing Records:** Skip template filling. Use `generate_business_document` to feed JSON (line items, totals) directly into a modern invoice PDF.
- **Visual Archiving:** Need proof of what a page looked like? `convert_html_to_image` captures precise screenshots at specific viewports and dimensions.

## How It Works

The bottom line is your AI client handles all the API calls; you just give it the source material and tell it what format you need at the end.

1. Subscribe to the server and provide your Pdfcrowd Username and API Key.
2. Tell your AI agent what you need—for example: 'Convert this URL to a PDF in landscape orientation.'
3. The agent calls the appropriate tool (e.g., `convert_html_to_pdf`) and receives the base64 encoded file data, which it passes back to you.

## Frequently Asked Questions

**How do I convert a URL to PDF using convert_html_to_pdf?**
You pass the full URL or raw HTML string to `convert_html_to_pdf`. The tool returns base64 encoded data, meaning your agent gets the complete PDF file ready for download.

**Is convert_pdf_to_text better than converting to HTML?**
It depends on what you need. `convert_pdf_to_text` strips everything down to raw text, which is best if you just want to search or analyze content word-for-word. If you need tables and layout for analysis, use `convert_pdf_to_html`.

**Can I generate an invoice with multiple line items? (generate_business_document)**
Yes. You provide the structured JSON data containing all your line items, quantities, and rates. The `generate_business_document` tool compiles this into a single, professional PDF record.

**How do I capture a specific section of a web page? (convert_html_to_image)**
You pass the URL or HTML. The `convert_html_to_image` tool lets you specify viewport dimensions and captures that exact visual area as a PNG, JPG, or WebP image.

**What input formats can I use with convert_pdf_to_html?**
You must provide a PDF document file or base64 encoded data. The tool reads the binary structure of the PDF and converts it into standard HTML markup, losing any complex visual elements like non-text backgrounds.

**Does convert_html_to_pdf accept raw HTML strings, or must I use a full URL?**
You can use both. If you provide a URL, the service fetches and renders the page first. If you pass a raw HTML string, it processes that exact markup without making any external web requests.

**Are there specific parameters I need to set for layout when using convert_html_to_pdf?**
Yes, you control the output structure by passing metadata. You can specify orientation (portrait/landscape), desired page size, and margin requirements to ensure pixel-perfect results.

**What happens if I hit a rate limit when using convert_html_to_image?**
If you make too many requests in a short time, the API will return an HTTP 429 error. You'll need to implement exponential backoff or simply pause your calls until the cooldown period expires.

**Can I convert a raw HTML string instead of a URL?**
Yes! Use the `convert_html_to_pdf` or `convert_html_to_image` tools and provide your HTML code in the `text` parameter instead of using the `url` parameter.

**How do I generate a professional invoice from my data?**
Use the `generate_business_document` tool. Provide the `document_type` as 'invoice', and include your line items, total, and currency in the JSON payload to get a styled PDF.

**Is it possible to extract plain text from a PDF file?**
Absolutely. Use the `convert_pdf_to_text` tool with the URL of your PDF. You can also enable `no_layout` if you want the text in reading order without layout preservation.