# ScraperAPI MCP

> ScraperAPI equips your AI agent with professional web scraping capabilities, letting it bypass IP bans and CAPTCHAs to extract data at scale. Use proxy rotation and headless browsers to reliably pull structured HTML from difficult sites like Amazon or Google SERPs, even when they use JavaScript rendering or aggressive anti-bot systems.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** proxy-rotation, headless-browser, captcha-solving, data-extraction, html-parsing, web-automation

## Description

You need your AI agent to do more than just read text; you need it to see the web as a data source. This MCP connects your agent to an industrial-grade scraping layer that handles all the messy infrastructure stuff—proxies, CAPTCHAs, and anti-bot systems. Instead of getting blocked when hitting major sites, your agent can pull structured product details from Amazon or get full Google search result layouts in clean JSON format. It’s built for scale. Whether you're an analyst pulling competitor pricing across dozens of ASINs or a developer needing to render data from a Single Page Application (SPA), this lets your agent do the heavy lifting without you writing complex networking code. When you connect it via Vinkius, you just keep giving clear commands through your preferred AI client and let it manage the scraping process entirely.

## Tools

### create_async_job
Starts a scrape job that runs in the background and returns a tracking ID.

### custom_scrape
Allows you to perform a general web scrape using specific, advanced parameters.

### get_account_stats
Retrieves current usage statistics for your scraping account.

### get_async_job
Checks the status and retrieves the final result of a previously started background job.

### get_screenshot_link
Generates a temporary URL that captures a full-page screenshot image.

### scrape_amazon
Retrieves structured product details, pricing, and metadata specifically from Amazon listings.

### scrape_google_serp
Fetches the full structure of a Google search results page for analysis.

### scrape_html
Scrapes standard static HTML content from any given URL while automatically rotating IPs.

### scrape_js_rendered
Handles complex URLs by running JavaScript code to make sure all data loads before scraping.

### scrape_premium
Scrapes a URL using high-quality residential proxies for maximum bypass capability.

## Prompt Examples

**Prompt:** 
```
Scrape an Amazon product page for this ASIN: B08J5F3G18 and list its price.
```

**Response:** 
```
I've fetched the Amazon structured data. The ASIN B08J5F3G18 is the 'Apple iMac Desktop'. The current Buy Box price is $1,299.00 and it holds a rating of 4.7 with over 4,000 reviews. It's 'In Stock'.
```

**Prompt:** 
```
Run a Google SERP check for the keyword 'best LLM orchestration frameworks'.
```

**Response:** 
```
I've scraped the Google search results for you. LangChain occupies the first organic spot. LlamaIndex holds the featured snippet, and emerging frameworks like Vurb and Semantic Kernel are dominating the 'People Also Ask' sections. Would you like me to dump the full organic ranking array?
```

**Prompt:** 
```
Take a screenshot of https://netflix.com homepage so I can check its layout.
```

**Response:** 
```
I've invoked the screenshot endpoint. You can view the full rendered layout capture generated by ScraperAPI here: `https://api.scraperapi.com/v1/screenshots?...`. It successfully bypassed the initial bot-checks saving a high-resolution PNG.
```

## Capabilities

### Extract structured e-commerce data
The agent retrieves formatted product details, pricing, and reviews directly from Amazon listings.

### Capture Google search results (SERPs)
You get the full structure of a Google search page, including featured snippets and organic rankings, in JSON format.

### Scrape dynamic web pages
The agent fetches data from modern sites built with JavaScript frameworks like React or Vue.

### Bypass anti-bot systems
The connection automatically rotates proxies and uses residential IP pools to avoid getting blocked by major websites.

### Run background scraping jobs
You can initiate massive data pulls that run in the background, keeping your conversation thread clean while waiting for results.

## Use Cases

### Tracking competitor price changes daily
A growth analyst needs to know if a rival changed their Amazon Buy Box pricing. They tell their agent: 'Run `scrape_amazon` for ASIN X.' The system returns structured data, letting the analyst immediately compare it to yesterday's record.

### Analyzing keyword trends across regions
An SEO specialist wants to see how Google ranks a term in Japan vs. Germany. They prompt their agent to use `scrape_google_serp` for both locations, getting two separate JSON outputs to compare organic positioning.

### Capturing complex website layouts
A developer needs a visual reference of a site's current design before building a scraper. They use `get_screenshot_link` to instantly pull a high-res PNG capture, bypassing the need for manual browser checks.

### Pulling data from slow or restricted sites
An engineer is trying to scrape an internal dashboard that only loads content after complex scripts run. They use `scrape_js_rendered` via the agent, ensuring the AI gets the fully loaded and correct view.

## Benefits

- You get reliable data extraction even when facing high security. Use the `scrape_premium` tool to pull content using residential proxies, bypassing aggressive Cloudflare protection.
- Stop struggling with dynamic sites. If a site uses JavaScript (like React), use `scrape_js_rendered` to force the browser to load all assets before pulling data, ensuring you get the full picture.
- Handle massive data sets without clogging your conversation history. Use `create_async_job` to kick off long-running scrapes in the background and check status later with `get_async_job`.
- Get specialized results for key platforms. Running `scrape_amazon` or `scrape_google_serp` gives you structured JSON output tailored specifically for e-commerce or SEO analysis, respectively.
- Improve reliability across all tasks by using the basic `scrape_html` tool, which automatically manages proxy rotation to keep your IP address clean and active.

## How It Works

The bottom line is that you give a simple prompt, and the system gives you reliable, clean data from complicated websites.

1. Subscribe to this MCP and input your unique ScraperAPI key.
2. Your AI agent sends a command telling it which website or data point to scrape (e.g., 'Get the price for ASIN X').
3. The MCP executes the request, handles all proxy rotation and rendering, and delivers the clean, structured data back to your chat.

## Frequently Asked Questions

**How do I use ScraperAPI MCP to scrape JavaScript sites?**
You must use the `scrape_js_rendered` tool. This tells your agent that the site is dynamic and needs a full browser render before scraping can begin, which captures all loaded content.

**Is ScraperAPI MCP better than basic web scraping tools?**
Yes. Basic tools often fail when websites detect bots or use modern JavaScript frameworks. This MCP uses proxy rotation and headless browsers to ensure successful data extraction at scale.

**How do I scrape multiple pages without interrupting my chat flow using ScraperAPI MCP?**
Start by calling `create_async_job`. This spins up the scraping task in the background. You then use `get_async_job` later to retrieve results without waiting live.

**Can I get structured data from Google Search using ScraperAPI MCP?**
Absolutely. Use the dedicated `scrape_google_serp` tool. It pulls search results and structures them into JSON, giving you much more than just a raw list of links.

**What is the difference between scrape_html and scrape_premium?**
`scrape_html` handles standard scraping with basic proxy rotation. `scrape_premium`, however, uses high-quality residential proxies, offering a much higher chance of success when hitting heavily protected targets.