# Spider MCP

> Spider provides high-performance web scraping and crawling via an open MCP Server connection. It lets your AI agent search, scrape single pages, or map entire websites using a Rust engine that runs at extreme speeds (>100K pages/second). Built-in anti-bot protection and proxy rotation handle the hard stuff, giving you clean data in Markdown, HTML, or plain text format.

## Overview
- **Category:** ship-it
- **Price:** Free
- **Tags:** web-crawling, data-extraction, headless-browser, anti-bot, rust-engine, html-parsing

## Description

Spider hooks your AI agent into some of the fastest web scraping engine out there. It’s built on a Rust core, meaning it runs at crazy speeds—we're talking over 100K pages per second when you need it. Your agent uses this server to search, scrape individual pages, or even map whole websites. The system handles all the tough stuff itself: proxy rotation and anti-bot protection are built in so you just get clean data.

### Scraping Specific Content with `spider_scrape`
If you need the full text from one single URL, use `spider_scrape`. This tool pulls clean content and markup from that page. It automatically deals with JavaScript rendering, which means if a site loads its actual text using JS, your agent still gets it. The system manages anti-bot measures and rotates proxies so nothing breaks in the middle of the scrape. You'll get the data you need—in Markdown (which is the default), HTML, or plain text format.

### Mapping Entire Websites with `spider_crawl`
When you need more than just one page, run `spider_crawl`. This tool maps a whole website by following all its internal links. Your agent can recursively follow those connections to extract structured data across dozens of pages without missing a beat or hitting a wall. It essentially builds a map of the site's content structure for you.

### Combining Search and Extraction with `spider_search`
Need both discovery and actual content? Use `spider_search`. This tool combines web search query processing directly with scraping capabilities in one single request. Your agent gets results from the top sources *and* the actual text content from those sources instantly, keeping your workflow fast.

### How It Works Under the Hood
When you connect your AI client to the MCP Server endpoint and call a tool—`spider_crawl`, `spider_scrape`, or `spider_search`—the high-performance Rust engine takes over. The server processes requests at extreme speed, returning reliable data formatted for immediate use. You're not dealing with complicated infrastructure; you just tell your agent what you need, and the system delivers clean, structured content. It provides robust support across multiple output formats, including **Markdown**, **HTML**, or simple plain text. This means whether you’re feeding the data into a database, running it through another process, or just reading it, the format is ready to go.

## Tools

### spider_crawl
Maps a website by following internal links and scraping content across multiple pages at high speed.

### spider_scrape
Scrapes the clean, full content of one specific URL, handling JS rendering and anti-bot measures automatically.

### spider_search
Combines web search query processing with scraping, delivering results and their content in a single call.

## Prompt Examples

**Prompt:** 
```
Scrape the homepage of spider.cloud and show me what they offer.
```

**Response:** 
```
Scraped spider.cloud (1,200 words):

Spider.cloud is the fastest web crawler, built in Rust. Core features: Scrape, Crawl, Search, AI-powered extraction, and Browser Cloud for full CDP access. Supports markdown, HTML, JSON output formats with built-in anti-bot protection and proxy rotation.
```

**Prompt:** 
```
Crawl docs.python.org and get the first 5 pages.
```

**Response:** 
```
Crawled 5 pages from docs.python.org:

1. **Python Documentation** (/) — 3,200 words
2. **Tutorial** (/tutorial) — 4,100 words
3. **Library Reference** (/library) — 2,800 words
4. **FAQ** (/faq) — 1,900 words
5. **What's New** (/whatsnew) — 5,400 words
```

**Prompt:** 
```
Search for 'machine learning frameworks comparison 2026' and scrape the top 3 results.
```

**Response:** 
```
Found and scraped 3 results:

1. **ML Framework Comparison 2026** (towardsdatascience.com) — 4,500 words
2. **PyTorch vs TensorFlow vs JAX** (paperswithcode.com) — 3,200 words
3. **Best ML Frameworks Review** (neptune.ai) — 2,900 words
```

## Capabilities

### Map entire websites
The `spider_crawl` tool follows internal links across a whole site, returning content from multiple pages following the structure.

### Extract content from a single URL
The `spider_scrape` tool pulls clean text and markup from one page while automatically managing JavaScript rendering and anti-bot measures.

### Search web results and extract data
The `spider_search` tool searches the web and scrapes the content of the top results in a single request, combining discovery with extraction.

## Use Cases

### Conducting a full competitive audit
A market researcher needs to understand the content depth of three competitors. Instead of manual checks, they use `spider_crawl` on each site to map all internal pages and extract structured data, giving them a complete picture of the opposition's published material.

### Gathering academic source material
A student is writing a literature review. They run `spider_search` for 'quantum computing breakthroughs 2025.' The agent finds and scrapes the top three articles in one go, saving hours of manual copy-pasting from different sources.

### Extracting product data from single pages
An e-commerce scraper only needs the content from a specific URL. They use `spider_scrape`, specifying Markdown output. This ensures they get clean, formatted text without worrying about messy HTML tags or JavaScript failures.

### Building a niche knowledge base
A developer wants to index all documentation from an open-source project. They run `spider_crawl` on the docs domain first, then feed the resulting pages into their AI client for indexing. This systematic approach guarantees full coverage.

## Benefits

- Speed and Stealth: The Rust engine provides speeds exceeding 100K pages/second while built-in stealth mode handles fingerprint rotation and residential proxies. You get massive throughput without hitting roadblocks.
- Multi-Format Output: Don't just get text. `spider_scrape` lets you choose the output format—Markdown (default), HTML, or plain text—so your data is ready for whatever pipeline you use next.
- Full Site Mapping: Use `spider_crawl` to recursively map entire domains. This ensures you gather structured content from every internal link, which is critical for complete site audits.
- Discovery + Extraction: The `spider_search` tool eliminates friction by combining web searching and scraping into one request. You get the best of both worlds instantly.
- JS Rendering Handled: Forget missing content because a page uses JavaScript. Both `spider_scrape` and `spider_crawl` handle JS rendering automatically, guaranteeing you pull all visible text.

## How It Works

The bottom line is: You pass the target URL or query to the right tool, and Spider handles all the complex fetching, anti-bot detection, and data formatting for you.

1. Subscribe to the server and enter your Spider API key. Your agent uses this key to authenticate.
2. Your AI client invokes one of the three specialized tools: `spider_scrape` for a single page, `spider_crawl` for a site map, or `spider_search` for web results.
3. The tool executes the request using its optimized Rust engine and returns the requested content structure (Markdown, HTML, or text) directly to your agent.

## Frequently Asked Questions

**How do I use spider_scrape for a single product page?**
You call `spider_scrape` and pass the direct URL. This tool handles JavaScript rendering automatically, so you get clean content regardless of how the site loads its text.

**Is spider_crawl better than just scraping a list of URLs?**
Yes. `spider_crawl` is superior because it understands and follows internal links (sitemap logic). It discovers pages you might not even know exist, ensuring your data set is complete.

**Can spider_search scrape the content of search results?**
Yes. That's exactly what `spider_search` does. It combines finding relevant web links with scraping their actual content in one efficient API call, saving you multiple steps.

**What is the performance difference between Spider and other scrapers?**
Spider uses a Rust engine for maximum speed. The listing data shows it can crawl at speeds exceeding 100K pages/second, which dramatically outperforms tools built on Node.js.

**What content formats can I get when using spider_scrape?**
The tool supports Markdown, HTML, and plain text outputs. You specify your desired format in the request parameters. This lets you choose the best structure for parsing or saving.

**How do I limit the scope when using spider_crawl?**
You configure both the maximum depth and the total page count in the API call. This keeps your crawl focused, preventing unnecessary processing of entire websites.

**Does spider_scrape handle modern websites that rely on JavaScript?**
Yes, it handles JS rendering automatically. The service includes built-in support for anti-bot measures and proxy rotation so your requests appear legitimate.

**What does using spider_search combine into one call?**
It combines web searching with content extraction in a single, high-performance API request. This saves time and improves efficiency by eliminating the need for two separate calls.

**How is Spider different from Firecrawl?**
Spider is built in Rust and optimized for raw speed and volume — it can crawl 100K+ pages per second, making it 10-20x faster than Firecrawl for large-scale operations. Spider also offers lower per-page costs at high volume, built-in stealth mode with fingerprint rotation, and multiple request modes (HTTP, Smart, Chrome). Firecrawl excels at simplicity and LLM-specific features like JSON extraction.

**What output formats does Spider support?**
Spider supports Markdown, HTML, raw HTML, plain text, JSON (structured extraction), screenshots, and PDF output. You can specify the desired format via the return_format parameter in each request.

**How does Spider pricing work?**
Spider offers 500 free credits to get started (no credit card required). Paid plans are usage-based with credits consumed per page scraped. The Starter plan begins at $15/month with 12,000 credits. Enterprise plans offer custom pricing with dedicated infrastructure and unlimited concurrency.