# ScrapingBee MCP

> ScrapingBee handles complex web data extraction, bypassing anti-bot measures like Cloudflare. Use it directly with your AI agent to scrape dynamic JavaScript content, pull structured Amazon product details, or search Google and Walmart—all without writing a single line of crawler code.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** scraping, proxy, headless-browser, data-extraction, ai-scraping

## Description

You need clean data from the web, period. This server lets your AI agent grab it without you writing a single line of crawler code. ScrapingBee handles all the headache—the headless browsing, proxy rotation, and dealing with sites that run JavaScript. You just point the tool at what you want.

* **General Web Extraction (`scrape_html`)**
When you need full-fidelity source code from any website, `scrape_html` is what you use. It fetches content even if the site relies on JavaScript to display it and manages proxy rotations so you don't get blocked. You can run custom wait times or specify ad blocking rules, which means the HTML you pull back is usable, not junk.

* **Major Retail & Search Sites**
For specific e-commerce data, dedicated tools save you time. To grab structured product details from Amazon, just call `get_amazon_product` and supply an ASIN code; it returns clean pricing, ratings, and core info right away. You can also scrape search results for other big players: use `search_walmart` to pull specific data straight from Walmart's site, or run `search_youtube` to get structured data pulled directly from YouTube searches.

* **Advanced Search Capabilities**
When you need general web intelligence, your options are solid. If you wanna know what people are talking about, use `search_google`. This tool gives you organized JSON results for various search types—web pages, news articles, maps, and images—without needing manual parsing on your end. For a quick hit of info, the `fast_search` tool runs general SERP queries that grab fast search engine results.

* **AI Content Generation & Specialized Scraping**
The server also includes `ask_chatgpt`, letting your agent query the ChatGPT API for general text generation tasks whenever you need it. If a site is tricky and needs more generalized scraping, the core functionality of `scrape_html` handles fetching content across the board.

* **How Your Agent Uses It**
Your AI client just calls a tool—say, `search_google` with specific parameters for news results. The server executes the scrape or query, cleans the data, and sends the result back to your agent's context window. You get it as raw HTML, clean Markdown text, or structured JSON data, ready to use immediately.

## Tools

### get_amazon_product
Scrapes and returns structured details for any specific Amazon product using its ASIN code.

### ask_chatgpt
Queries the ChatGPT API through ScrapingBee for general text generation tasks.

### fast_search
Performs a quick search engine result page (SERP) query.

### search_google
Scrapes and returns structured JSON results from Google Search for various result types (web, news, maps).

### scrape_html
Fetches the content of a web page, supporting JS rendering, proxy management, and data extraction.

### search_walmart
Scrapes search results specifically from Walmart's website.

### search_youtube
Scrapes and returns structured data from YouTube search results.

## Prompt Examples

**Prompt:** 
```
Scrape https://news.ycombinator.com and return the content as clean Markdown.
```

**Response:** 
```
I've scraped Hacker News for you. Here is the content converted to Markdown: [Markdown content follows...]
```

**Prompt:** 
```
Search Google for 'best MCP servers 2024' and give me the top 3 results.
```

**Response:** 
```
I found the top 3 results for your search: 1. Introduction to MCP... 2. Top MCP Servers List... 3. GitHub MCP Awesome...
```

**Prompt:** 
```
Get the price and rating for Amazon product B08N5WRWNW.
```

**Response:** 
```
For the Amazon product (ASIN: B08N5WRWNW), the current price is $999.00 with a rating of 4.8 stars based on 12,450 reviews.
```

## Capabilities

### Extracting dynamic website content
Use `scrape_html` to pull full-fidelity web page source code that runs JavaScript and bypasses common anti-bot defenses.

### Running structured Google searches
Invoke `search_google` to retrieve organized JSON data from various search results (web, news, images) without manual parsing.

### Pulling Amazon product details
Call `get_amazon_product` with an ASIN code to get structured pricing, ratings, and core product information directly.

### Scraping major retail sites
Access specific search results from Walmart or YouTube using dedicated tools like `search_walmart` and `search_youtube`.

## Use Cases

### Monitoring competitor pricing shifts
A market researcher needs to know if a rival lowered their price. They ask their agent to use `get_amazon_product` and `search_google`, comparing the current listing's price against historical data points, saving hours of manual comparison.

### Building an automated news aggregator
A developer wants a feed of top stories. They instruct their agent to run `search_google` for 'tech news today' and then pipe the results into `scrape_html`, ensuring they only pull article snippets, not just links.

### Analyzing viral video trends
A content strategist wants to track trending topics. They use `search_youtube` to get a list of popular videos and then cross-reference those titles with general web scraping using `scrape_html` for context.

### Gathering diverse search data
An analyst needs to compare information across platforms. They run `search_google` for a topic, then use `search_walmart` and `search_youtube` to get three different structured views of the same subject.

## Benefits

- Bypass anti-bot walls. Forget getting blocked by Cloudflare; the server handles proxy rotation and rendering issues so your scrape always completes.
- Structured output means less cleanup work for you. You get Amazon product details or Google results as clean JSON, ready for immediate use in your workflow.
- Handle JavaScript sites effortlessly. Tools like `scrape_html` fully render dynamic content—you don't have to write custom Selenium code just to see the page.

## How It Works

The bottom line is: you tell your AI agent what to scrape, and the server does the hard work of getting it cleanly.

1. Subscribe to this server and input your unique ScrapingBee API Key.
2. Your AI agent executes a tool call (e.g., `scrape_html`) providing the URL, parameters, and desired output format.
3. The server handles all proxy management, rendering, and scraping logic, returning clean data directly into your client's context.

## Frequently Asked Questions

**How do I scrape dynamic JavaScript sites with scrape_html?**
You simply pass the URL to `scrape_html`. The server supports full JS rendering, meaning it executes the page's scripts before scraping. This is how you get content that only loads after a user clicks something.

**Can I use search_google to find structured data from news sites?**
Yes. `search_google` provides structured JSON results for various result types, including news and web links. It doesn't just give you a list of URLs; it gives context.

**What is the difference between scrape_html and get_amazon_product?**
The difference is specificity. `scrape_html` pulls everything from an arbitrary URL, giving raw web content. `get_amazon_product` specifically targets Amazon's API endpoints to pull only structured product data (like ASIN, price, rating) in a reliable format.

**Do I need separate tools for Walmart and Google?**
Yes. Using `search_walmart` gives you results tailored specifically to the Walmart site structure, whereas using `search_google` gives broader search engine results that might include other retail sites.

**What happens if my requests exceed standard rate limits when using scrape_html?**
The server manages rate limiting automatically. It handles proxy rotation and request throttling so your scraping process doesn't get blocked by IP bans or excessive calls.

**Can scrape_html output content as something other than raw HTML?**
Yes, you can specify the desired format in your query. You can receive the extracted data as clean Markdown, plain text, or standard raw HTML, depending on what your agent needs.

**Does the server handle advanced anti-bot measures like Cloudflare blocks?**
It does. The underlying infrastructure bypasses these challenges automatically. Your AI client simply sends the request, and we handle the complexity of CAPTCHAs and bot detection.

**What specific identifiers do I need when calling get_amazon_product?**
You must provide the product's unique ASIN code. Using this identifier ensures the scraper targets a single, precise listing to pull accurate details like current price and star rating.

**Can I extract specific data from a page using natural language instead of CSS selectors?**
Yes! Use the `scrape_html` tool and provide your request in the `ai_query` parameter. The server will use ScrapingBee's AI capabilities to parse the HTML and return exactly what you asked for.

**How do I handle websites that require JavaScript to load content?**
The `scrape_html` tool has `render_js` enabled by default. You can also use `wait` or `wait_for` parameters to ensure the page is fully loaded before the data is captured.

**Can I get structured results from Google Search directly?**
Absolutely. Use the `search_google` tool with your query. It returns structured JSON containing organic results, ads, and related searches, saving you from parsing search result pages manually.