# ScrapingAnt MCP

> ScrapingAnt connects your AI client to a high-performance web data extraction engine. It handles JavaScript rendering, IP rotation via proxies, and CAPTCHA solving automatically. Use it to get raw HTML, convert pages to clean Markdown, or extract complex JSON structures directly from any website.

## Overview
- **Category:** industry-titans
- **Price:** Free
- **Tags:** scrapingant, web-scraping, data-extraction, headless-browser, proxy-rotation, markdown-converter, anti-bot-bypass, ai-scraping, mcp

## Description

Listen up. This server connects your AI client straight into a heavy-duty web data extraction engine. You don't mess with proxies or anti-bot measures manually; it handles all that automatically so you just get the data you need.

When you use `scrape_webpage`, you bypass JavaScript barriers and anti-bot defenses by rendering pages using a headless browser. This means if a site runs complex code—like modern shopping carts—it'll capture the fully loaded content, not just the skeleton structure. You get reliable data every time.

Need specific info from that messy page? Use `extract_structured_data`. Your agent processes web content and spits out exactly what you asked for in a clean JSON object. It pulls specific pieces of information and formats them instantly, making the output machine-readable right away.

If you've got an entire article or blog post, don't just grab raw HTML. Run it through `scrape_to_markdown`. This tool scrapes the whole page and strips out all the crap—the navigation bars, ads, side widgets—leaving you with clean, readable Markdown text that’s perfect for knowledge bases (RAG).

For deep technical dives, use `scrape_extended_data`. This performs a deeper scrape than usual, returning metadata like full HTTP headers and browser cookies alongside the main content. It's what you need when you gotta analyze how the page loads beneath the hood.

You can also check your account status anytime using `get_api_usage` to monitor your current remaining credit balance against your monthly limit.

## Tools

### scrape_extended_data
Scrapes a page and retrieves network logs, cookies, and full HTTP headers for deep technical analysis.

### extract_structured_data
Uses the AI model to pull specific pieces of information from a page and format them as clean JSON data.

### scrape_to_markdown
Converts an entire webpage into Markdown format, stripping out navigation bars and clutter to keep the core content clean.

### scrape_webpage
Scrapes a page using headless browser rendering, automatically bypassing JavaScript barriers and anti-bot defenses.

### get_api_usage
Checks your current API credit balance against your monthly usage limits.

## Prompt Examples

**Prompt:** 
```
Extract the latest product prices from 'https://example.com/shop' using AI.
```

**Response:** 
```
Processing extraction... I've retrieved 5 products from the shop. 'Item A' is $19.99 and 'Item B' is $25.00. Would you like the full JSON results?
```

**Prompt:** 
```
Convert the page 'https://example.com/blog/post-1' to Markdown.
```

**Response:** 
```
Retrieving page content... I've successfully converted the blog post to Markdown. It includes the main heading, 3 subheaders, and all body text. Should I save this content for you?
```

**Prompt:** 
```
Check my current API credit balance in ScrapingAnt.
```

**Response:** 
```
Fetching usage stats... You have 4,500 credits remaining out of your 5,000 monthly limit. Your plan resets in 12 days.
```

## Capabilities

### Extract structured JSON data
The agent processes web content and outputs specific, predictable fields as a machine-readable JSON object.

### Scrape dynamic JavaScript pages
It renders complex websites that rely on JavaScript (like modern shopping carts) and captures the fully loaded content.

### Convert web articles to Markdown
The tool scrapes an entire page and cleans it up, removing navigation clutter to leave only clean, readable Markdown text.

### Capture network logs and cookies
It performs a deep scrape, returning metadata like HTTP headers and browser cookies alongside the main content for advanced analysis.

### Monitor API usage credits
The agent checks your current remaining credit balance against your monthly limit.

## Use Cases

### Competitive Price Monitoring
A growth hacker needs to track product pricing across 50 competitor pages. Running a simple `scrape_webpage` job first gets the raw content, then they immediately pipe that into `extract_structured_data` to pull only the item name and price into a JSON array for comparison.

### Migrating Academic Archives
A researcher needs thousands of academic articles. They use `scrape_to_markdown` on bulk URLs. This ensures that every article, regardless of how it was originally formatted (HTML/JS), is converted into clean Markdown for easy ingestion into a database.

### Deep Web Content Analysis
A data scientist needs to know not just what text is on a page, but *how* the browser got there. They run `scrape_extended_data` to get network logs and cookies, which helps them debug why certain dynamic content isn't appearing.

### Testing Schema Reliability
A developer wants to validate if a specific website always reports the manufacturer ID correctly. They use `extract_structured_data` with a strict schema and run it repeatedly against different pages to confirm data integrity before deployment.

## Benefits

- **Stop being blocked.** The `scrape_webpage` tool handles IP rotation and anti-bot bypass, letting you scrape complex sites without constantly running into rate limits or needing a proxy pool manager.
- **Get clean content for AI.** Use `scrape_to_markdown`. Instead of dumping raw HTML that includes footers and sidebars, this tool cleans the text so your RAG system only sees the article body. It's huge for knowledge bases.
- **Structure complex data instantly.** Never deal with messy CSV imports again. With `extract_structured_data`, you give a prompt (e.g., 'Give me all product names and prices'), and it returns perfect JSON.
- **Analyze the full request lifecycle.** Need to know *how* the page loaded? `scrape_extended_data` captures network logs and cookies, giving you the deep technical data that standard scraping misses.
- **Keep track of your budget.** Use `get_api_usage`. Before running a massive job, check your credits. It saves time (and money) knowing exactly how much capacity you have left.

## How It Works

The bottom line is that your AI client acts as a dedicated web researcher, managing all the messy technical details of scraping behind the scenes.

1. Subscribe to the ScrapingAnt server and input your unique API key into your AI client.
2. Ask your agent to target a specific URL, specifying if you need raw data (HTML), structured fields (JSON), or clean text (Markdown).
3. The system executes the necessary scrape—handling proxies, anti-bots, and rendering—and returns the specified data format.

## Frequently Asked Questions

**How do I scrape a page that requires JavaScript to load the data using ScrapingAnt?**
You use the `scrape_webpage` tool. This handles JavaScript rendering, meaning it waits for all dynamic content—like product carousels or interactive widgets—to fully load before capturing the final HTML.

**I need to pull only names and prices from a website; which tool should I use? Is `scrape_structured_data` best?**
Yes, `extract_structured_data` is what you want. You give it the URL and tell your agent exactly what data points (names/prices) and the schema you expect in JSON format. It handles the extraction logic for you.

**What's the difference between `scrape_webpage` and `scrape_extended_data`?**
`scrape_webpage` gives you the rendered content, which is usually enough. `scrape_extended_data` goes deeper—it captures network logs, cookies, and headers. Use this when you need technical debugging info alongside the content.

**Can I use ScrapingAnt to check my remaining API credits?**
You can run `get_api_usage`. This tool checks your current credit balance against your account's monthly limit, preventing you from running jobs when you're out of quota.

**When should I use `scrape_extended_data` instead of `scrape_webpage`?**
`scrape_extended_data` provides a deeper technical view than just rendered content. It captures network logs and cookies alongside the page data, which is crucial for debugging scraping issues or analyzing session state. If you only need clean, visible text, stick with `scrape_webpage`.

**What if I need to feed scraped content into a RAG pipeline? Is `scrape_to_markdown` the right choice?**
Yes, use `scrape_to_markdown`. This tool automatically converts web pages directly into clean Markdown format. It's built for LLM consumption because it preserves structural elements like headings and lists while stripping out messy HTML.

**How does ScrapingAnt handle repeated scraping attempts or IP blocks?**
The service manages anti-bot defenses using rotating proxies. It automatically handles both datacenter and residential IPs, which significantly boosts your success rate when running large, persistent data extraction jobs.

**What is the limit on complexity when I use `extract_structured_data`?**
You define the schema using natural language or a simple JSON prompt. The AI handles mapping that required structure to the source data, even if the website's layout changes slightly between pages.

**Can my AI automatically convert a web page into Markdown format?**
Yes! Use the `scrape_markdown` tool. Provide the URL, and your agent will return the page content cleanly formatted in Markdown instantly.

**How do I use AI to extract specific data like prices or stock from a site?**
Simply ask the agent to run the `extract_data` action. Provide the URL and a prompt or schema of what you need, and ScrapingAnt's AI models will parse the page for you.

**How do I find my ScrapingAnt API Key?**
Log in to your ScrapingAnt dashboard, and you will find your unique API Key prominently displayed on the main page.