# ScrapingBee MCP

> ScrapingBee manages web data extraction by handling anti-bot systems, rotating proxies, and JavaScript rendering automatically. You connect it to your AI agent and run complex scraping jobs using natural language prompts. It bypasses typical site blocks so you can reliably get raw HTML, structured JSON, or screenshots from any website.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** scrapingbee, web-scraping, data-extraction, headless-browser, proxy-rotation, captcha-solving, ai-extraction, stealth-scraping, mcp

## Description

Yo, check it. This ain't just some API you plug in; this whole setup lets your agent handle web data extraction like a pro, bypassing all the bullshit anti-bot crap that usually bricks your job.

When you use `extract_data`, your AI client pulls general structured data off any webpage based on natural language instructions. If you need something tighter, you can run `extract_data_with_ai`, which makes sure the agent returns exactly what you ask for—clean JSON formatted right out of the gate. But if you know precisely where the data lives, forget the AI guesswork; use `extract_structured_data` to target specific fields by defining CSS or XPath selectors against a page, guaranteeing you get that precise structured JSON regardless of how messy the surrounding text is.

For scraping whole sites, it's all about controlling the render. You can run `scrape_webpage`, which scrapes everything while automatically handling JavaScript rendering, rotating proxies, and anti-bot measures for full browser simulation. If you only care about the dynamic stuff—the content that only loads after a script runs—you hit up `scrape_with_js`. Need to get around geo-blocks or IP bans? Use `scrape_with_proxy` to run the scrape using premium proxy rotation, which makes it look like traffic is coming from different places. And if you wanna be sneaky, use `scrape_with_stealth`; this runs the page in stealth mode to mimic how a real human browses, slipping past even advanced bot detection systems.

Sometimes, you just need proof that the page loaded right. Running `take_screenshot` captures a visual image of the target URL; it handles all the necessary browser rendering so you get exactly what the user sees in their browser window. If you're pulling data from modern Single Page Applications (SPAs) or anything dynamic, your agent runs a headless browser to render JavaScript first, letting you capture that content.

When you need to manage the logistics, you've got tools for that too. To check if you can afford another scrape, use `get_api_usage` to see your current usage status and available credits against your ScrapingBee API key. For a full breakdown of how much juice you’re burning and what your remaining credit limits are, just run `get_usage`. It's that simple.

Basically, you let your AI client do all the heavy lifting. You tell it *what* data to get—whether it's general text blocks, specific selectors, or just a visual screenshot—and this system handles the technical nightmare of getting past rate limits, proxy bans, and JavaScript rendering issues. You don't worry about complex infrastructure; you just prompt your agent with natural language instructions.

## Tools

### extract_data
Pulls generalized structured data out of a given web page using AI instructions.

### get_usage
Retrieves a detailed breakdown of current API consumption and remaining credit limits.

### extract_data_with_ai
Instructs the agent to extract and return data as formatted JSON based on natural language descriptions.

### extract_structured_data
Extracts specific, structured JSON data by defining precise CSS or XPath selectors against a web page.

### scrape_webpage
Scrapes an entire webpage while automatically handling JavaScript, proxy rotation, and anti-bot measures for full browser rendering.

### take_screenshot
Captures a visual screenshot of the requested website URL, automatically handling necessary browser rendering.

### get_api_usage
Checks the current usage status and available credits for your ScrapingBee API key.

### scrape_with_js
Scrapes a page specifically by enabling full JavaScript rendering to capture dynamic content.

### scrape_with_proxy
Scrapes a page using premium proxy rotation, which helps bypass geo-restrictions and IP bans.

### scrape_with_stealth
Scrapes a page in stealth mode to mimic human behavior and bypass advanced bot detection systems.

## Prompt Examples

**Prompt:** 
```
Extract the names and prices of all laptops from 'https://example.com/computers' using AI.
```

**Response:** 
```
Processing AI extraction... I've retrieved 8 laptops. 'Model X' is $1,200 and 'Model Y' is $950. Should I format this data into a comparison table?
```

**Prompt:** 
```
Scrape a Google search results page for the query 'best project management tools 2025' and extract the top 10 results.
```

**Response:** 
```
Google SERP scraped successfully. Query: "best project management tools 2025". Top 10 organic results extracted. #1: "15 Best Project Management Software (2025)" - Forbes. #2: "Top PM Tools Compared" - G2. #3: "Project Management Tool Guide" - Gartner. #4-10: TechRadar, PCMag, Capterra, GetApp, Software Advice, Business News Daily, Zapier. Featured snippet: "Monday.com, Asana, and Jira lead in 2025 rankings". People Also Ask: 4 related questions captured. Ads detected: 3 sponsored results. Credits used: 5.
```

**Prompt:** 
```
Extract structured product data from an e-commerce product page using CSS selectors.
```

**Response:** 
```
Data extraction completed. URL: example-store.com/product/wireless-headphones. Selectors used: 5. Results: Product name: "Pro Wireless Headphones X3". Price: $149.99. Original price: $199.99. Discount: 25% off. Rating: 4.7/5 (2,340 reviews). Availability: In Stock. Description: 280 characters extracted. Specifications: 8 key-value pairs (battery life, driver size, frequency response, etc). Images: 6 URLs captured. Related products: 4 items extracted. API credits used: 1.
```

## Capabilities

### Parse complex web pages into structured JSON
The agent uses natural language to identify and structure data points from a given webpage into clean, usable JSON format.

### Extract specific fields using selectors
You target data by providing CSS or XPath selectors, guaranteeing the extraction of precise structured data regardless of surrounding text.

### Capture dynamic website content
The system runs a headless browser to render JavaScript, allowing you to scrape data from modern SPAs that load content dynamically.

### Bypass detection and IP blocks
It manages proxy rotation and implements stealth mode protocols, making scraping difficult for anti-bot systems to detect or block.

### Capture visual proof of a page
The tool takes a screenshot of the target URL, capturing what the user sees in their browser.

## Use Cases

### Competitive pricing data from dynamic e-commerce sites
A market analyst needs product specs and prices. Instead of using basic scraping that fails when the site loads JavaScript, they run `scrape_with_js`. This captures all the necessary JS-rendered content, allowing them to then use `extract_data` to pull out the structured names and prices.

### Collecting lead contacts from a protected portal
A growth engineer needs multiple email addresses from a platform that blocks basic requests. They set up a loop using `scrape_with_proxy`, cycling through different IPs to scrape the user list, then use `extract_data_with_ai` on each page pull to get clean JSON records.

### Debugging a complex web flow
A developer is testing an anti-bot feature. They run `scrape_webpage` with both proxy rotation and stealth mode enabled (`scrape_with_proxy` + `scrape_with_stealth`). If the data comes back, they know their access method works for high-security environments.

### Generating a report on site layout issues
A QA tester needs to prove that an element is missing. They use `take_screenshot` first to capture the current view. If the data extraction fails, they can share the screenshot alongside the failure log to pinpoint exactly what went wrong.

## Benefits

- Stop dealing with broken scrapers. By using `scrape_webpage`, you get full browser rendering, meaning dynamic content (like pricing loaded by JS) actually gets pulled out—no manual workarounds required.
- Don't guess how to structure data. If you need clean JSON, use `extract_data_with_ai` and just describe the fields in plain English; the AI handles the schema mapping for you.
- Hitting a security wall? Use `scrape_with_proxy`. It rotates your IP address across residential proxies, letting you scrape high-security sites without triggering blocks or limits.
- Need data that's absolutely specific? Skip the fuzzy extraction and use `extract_structured_data` with CSS/XPath selectors. This guarantees schema adherence for mission-critical fields.
- Want to see what the user sees? The `take_screenshot` tool lets you capture visual proof of a page, which is great for debugging or reporting on site layouts.

## How It Works

The bottom line is you don't manage proxies or browser clusters; your AI agent just calls the right function and gets clean data back.

1. Subscribe to the ScrapingBee server and provide your API key from the dashboard.
2. Your AI client sends the request—specifying the URL, desired data structure, and necessary scraping methods (e.g., JS rendering or proxy use).
3. The tool executes the scrape, returning raw HTML, structured JSON, a screenshot, or an error code to your chat interface.

## Frequently Asked Questions

**How do I scrape pages that use JavaScript when using the ScrapingBee MCP Server?**
You must use either `scrape_with_js` or the general `scrape_webpage` tool. These tools activate a headless browser, which executes all the site's JavaScript before pulling the content. This is essential for modern Single Page Applications (SPAs).

**Is ScrapingBee MCP Server safe for scraping high-security sites?**
Yes. For high-security or restricted sites, you need to use `scrape_with_proxy`. This tool manages rotating residential proxies, which helps keep your IP address hidden and prevents rate limiting.

**What's the difference between `extract_data` and `extract_structured_data`?**
`extract_data` uses natural language to guide the AI on what data you want. `extract_structured_data` is more precise; it requires you to provide specific CSS or XPath selectors, which guarantees the exact element you need.

**How do I check if my scraping budget is okay with ScrapingBee MCP Server?**
Use either `get_usage` or `get_api_usage`. Both tools connect to your dashboard and provide real-time information on how many credits you've used and how much you have left.

**Can I just get a picture of the webpage using ScrapingBee MCP Server?**
Yep, that's what `take_screenshot` does. It captures an image file of the URL, automatically handling any necessary browser rendering to make sure the screenshot is accurate.

**How does `scrape_with_proxy` handle getting blocked or rate-limited by a website?**
It handles blocks automatically. The tool uses premium proxy rotation, so if one IP address gets flagged, it instantly switches to another. This keeps your scraping session running without interruption due to IP bans.

**If the target site redesigns its layout, should I use `extract_data` or `extract_data_with_ai`?**
Use `extract_data_with_ai`. The AI reads content contextually. If the underlying CSS selectors change (which they often do), the AI still finds and extracts the correct data based on natural language understanding.

**When I run `scrape_webpage`, can I get the raw HTML payload instead of just structured JSON?**
Yes, you retrieve the complete source. The tool captures the full, rendered HTML content. This lets your agent process the page entirely later—great for deep analysis or archiving.

**Can my AI automatically extract structured JSON from a web page using ScrapingBee?**
Yes! Use the `extract_data` tool. You can provide standard extraction rules or set `ai=true` to let ScrapingBee's AI models identify and parse the data fields you need automatically.

**How do I use premium or residential proxies for high-security sites?**
Simply include `premium_proxy: true` in your `scrape_general` parameters. This will route your request through residential IPs, making it much harder for anti-bot systems to detect and block.

**How do I find my ScrapingBee API Key?**
Log in to your ScrapingBee dashboard, and your API Key will be clearly visible in the **Credentials** section on the main page.