# Scrapfly MCP

> Scrapfly lets your AI agent scrape web data at scale through a managed API connection. It handles proxies, browser rendering, and anti-bot bypassing automatically. You can run complex extraction jobs—from raw HTML to structured JSON—and capture specific element screenshots directly in conversation. No need to manage headless browsers or worry about IP rotation; just talk to your agent.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** scrapfly, web-scraping, data-extraction, anti-bot-bypass, residential-proxies, ai-extraction, js-rendering, screenshots-api, mcp

## Description

**Scrapfly lets your agent scrape web data at scale through a managed API connection. It handles proxies, browser rendering, and anti-bot bypassing automatically. You can run complex extraction jobs—from raw HTML to structured JSON—and capture specific element screenshots directly in conversation. No need to manage headless browsers or worry about IP rotation; just talk to your agent.**

### Data Collection & Extraction

The `web_scrape` tool pulls the full HTML source code from any site you point it at, handling rendering issues so you get clean raw content every time. When you use `ai_data_extraction`, the system runs AI models over that web page and gives you structured JSON data—it’s ready to drop right into a database. The `capture_screenshot` tool takes an image of either the whole website or just one specific element, giving you visual proof right in your chat thread. You can check what kind of screenshots are possible with `get_screenshot_capabilities`, and if you need help figuring out which models work best for data structuring, run `list_extraction_models` to see them all.

### Advanced Scraping & Geo-Targeting

You don't gotta worry about IP rotation or bot detection; the platform manages that automatically. You can check what advanced features are available by running `get_scraping_capabilities`, which confirms things like proxy support and anti-bot bypass functionality. The agent connects to millions of residential proxies spread across 50+ countries, so you can manage data collection for specific regions; run `list_proxy_regions` to see the exact areas and proxy types available. When you're ready to pull raw HTML or structured data, you’ll know exactly what’s going on with `web_scrape` and `ai_data_extraction`. The system uses sophisticated rendering to scrape content even from sites protected by major bot mitigation systems.

### Account Monitoring & Configuration

You've got tools to keep tabs on the whole operation. To see if your API credentials are valid, run `test_scrapfly_auth` immediately; it verifies everything’s active. You can get an overall snapshot of your account health with `get_api_status`, and for deeper details on a specific job, use `get_project_details`. If you're tracking costs, `check_credit_usage` tells you exactly how many API credits you've burned through. For setup management, the system lets you list all configured webhooks using `list_api_webhooks`, so your agent can notify you when data is ready. You also get a clear view of what scraping capabilities are available with `get_scraping_capabilities`.

### How it Works in Practice

Your AI client coordinates everything. First, you tell the agent to scrape a URL using `web_scrape`. If you need structured data from that raw HTML, you trigger `ai_data_extraction`, which outputs usable JSON. You can also ask the agent to capture an image of the results or the page itself via `capture_screenshot`. To ensure your scraping stays localized and realistic, the system uses proxies managed through methods confirmed by `list_proxy_regions`. When you're done collecting data for a project, you run `get_project_details` to review all the metadata and configuration used. You never have to manage headless browsers or worry about IP changes; just talk to your agent.

## Tools

### ai_data_extraction
Extracts structured data from a web page using AI models, converting the content into JSON.

### get_api_status
Retrieves high-level information about your Scrapfly account status.

### list_extraction_models
Retrieves a list of available AI models used for data structuring and extraction.

### get_project_details
Gets metadata and specific configuration details for a defined scraping project.

### get_scraping_capabilities
Checks which advanced scraping features, like proxy support or anti-bot bypass, are available.

### get_screenshot_capabilities
Determines the specific screenshot functionality and parameters that can be used.

### check_credit_usage
Checks your current API credit consumption and usage statistics.

### list_proxy_regions
Shows the geographical regions and types of proxies available for scraping.

### list_api_webhooks
Lists all webhooks that have been configured on your Scrapfly account for notifications.

### web_scrape
Scrapes the raw HTML content of any specified URL, handling complex rendering issues.

### capture_screenshot
Takes an image of a website or specific element on a webpage for visual record-keeping.

### test_scrapfly_auth
Verifies that your provided API credentials are valid and active with Scrapfly.

## Prompt Examples

**Prompt:** 
```
Scrape the homepage of 'https://news.ycombinator.com' and return the HTML.
```

**Response:** 
```
Retrieving website content... I've successfully scraped Hacker News. Should I extract the top story titles and links for you?
```

**Prompt:** 
```
Scrape the product listings from the first 3 pages of an e-commerce category with pricing data.
```

**Response:** 
```
Scraping completed across 3 pages. Total products extracted: 72 (24 per page). Data fields: product name, price, original price, discount %, rating, review count, availability, SKU. Price range: $12.99 - $299.99. Average price: $67.40. 18 products on sale (25% of listings). 5 products out of stock. Anti-bot protection bypassed successfully. JavaScript rendering used for dynamic content. Total API credits used: 6. Data exported as JSON (234 KB).
```

**Prompt:** 
```
Take a full-page screenshot of our competitor's pricing page and extract the plan details.
```

**Response:** 
```
Full-page screenshot captured: competitor_pricing_may2025.png (2400x8600px). Plan details extracted: Starter ($29/mo, 1 user, 5GB storage), Professional ($79/mo, 5 users, 50GB, API access), Enterprise ($199/mo, unlimited users, 500GB, priority support, SSO). Annual discount: 20% across all plans. Free trial: 14 days. Compared to your pricing: you are 15% lower on Starter, comparable on Professional, 10% higher on Enterprise. New feature since last check: AI assistant added to Professional tier.
```

## Capabilities

### Scrape raw web content
The agent pulls the full HTML source code from any specified website.

### Extract structured JSON records
The agent uses AI models to read a complex webpage and output data in clean, usable JSON format.

### Capture page screenshots
The agent takes images of full web pages or specific elements on the page.

### Manage proxy locations
The agent connects to millions of residential proxies across 50+ countries for localized scraping.

### Check API usage and status
The agent reads your account metrics, like credit consumption or project details.

## Use Cases

### Monitoring Competitor Pricing
The Market Researcher needs competitor pricing from three different regional sites. They ask their agent to run `web_scrape` across all URLs, ensuring they use proxies listed by `list_proxy_regions`. The agent pulls the raw HTML, then uses `ai_data_extraction` to isolate and standardize the price points into a single JSON file.

### Building Visual Audits
The Growth Engineer needs to compare two competitor's checkout flows. They ask the agent to take full-page screenshots (`capture_screenshot`) of the critical steps, noting any missing elements or dark mode issues, creating an immediate visual audit report.

### Extracting Data from JS-Heavy Portals
The Data Scientist hits a portal that only loads data via JavaScript. Instead of failing, they ask the agent to run `web_scrape`, which uses headless rendering. The agent successfully gets the dynamic content and then pipes it into `ai_data_extraction` for clean JSON output.

### Checking API Health Before a Run
Before running a massive data job, the operations team member asks the agent to run `get_api_status`. This immediately confirms credentials are good and checks available credits using `check_credit_usage`, preventing costly failures mid-job.

## Benefits

- Bypass anti-bot systems and Cloudflare blocks. The `web_scrape` tool handles programmatic retrieval of clean HTML, even when sites are protected.
- Stop cleaning spreadsheets manually. Use `ai_data_extraction` to turn complex web pages into structured JSON records, ready for immediate use.
- Get visual confirmation every time. You can take element-specific or full-page screenshots using `capture_screenshot`, which is perfect for audits.
- Stay localized and reliable. The system accesses millions of residential proxies via tools like `list_proxy_regions` across 50+ countries, guaranteeing regional data capture.
- Keep operations clean. Use API calls like `check_credit_usage` to monitor your spending and manage project metadata directly through the agent.

## How It Works

The bottom line is that your AI acts like a dedicated web scraping engineer, handling all the boilerplate tech while you just talk to it.

1. Subscribe to the Scrapfly server and enter your API key into your AI client.
2. Tell your agent what you need—for example: 'Scrape X website and extract Y data.'
3. Your agent sends the request, which handles proxy rotation, rendering, and anti-bot bypass. You get back clean JSON or a screenshot file.

## Frequently Asked Questions

**How does Scrapfly MCP Server handle Cloudflare anti-bot bypass?**
The server handles this automatically. When you use the `web_scrape` tool, it runs sophisticated proxy and rendering techniques to ensure your request gets through the bot protection layers.

**Can I just scrape raw HTML with Scrapfly MCP Server?**
Yes, you can. Use `web_scrape` if you need the full source code. But remember, for usable data, follow up by using `ai_data_extraction` to structure it.

**What tools do I use to check my usage?**
You'll use `check_credit_usage`. This tool lets your agent read your current consumption stats, so you always know how much API credit is left for your job.

**Does Scrapfly MCP Server support multiple countries for proxies?**
Yep. The system supports millions of residential proxies across 50+ countries. You can check the available locations using `list_proxy_regions` before starting a geo-specific scrape.

**When I use `ai_data_extraction`, can it handle complex web layouts to generate structured JSON?**
Yep, it transforms complicated page content into clean, machine-readable JSON based on your prompts. You define the desired data schema (like a list of objects), and the AI fills in the blanks automatically.

**Can I use `capture_screenshot` to focus on specific elements instead of capturing the whole page?**
Yes, you can provide CSS selectors or element IDs when calling `capture_screenshot`. This lets you pinpoint and capture only small sections of a webpage without wasting space on surrounding junk.

**What is the best way to verify my API key setup before running a large scrape with Scrapfly?**
Run the `test_scrapfly_auth` tool. This confirms your connection status and validates that your API key is properly linked to your account, saving you credits on failed jobs.

**How much history does the `get_project_details` tool provide for my scraping work?**
It gives a full overview of your project's run parameters and status. You can check total credits consumed, the last successful run date, and any stored metadata used during the job.

**Can my AI automatically extract structured JSON from a web page using Scrapfly?**
Yes! Use the `ai_data_extraction` tool. Provide the URL and optionally a model or prompt, and your agent will return the parsed data in structured JSON format instantly.

**How do I use residential proxies to bypass anti-bot systems?**
Simply ask the agent to run the `web_scrape` action. Scrapfly handles anti-bot (ASP) and premium proxy rotation automatically based on the site's security level.

**How do I find my Scrapfly API Key?**
Log in to your Scrapfly account, navigate to the **Dashboard**, and you will find your unique secret API key prominently displayed.