# Firecrawl MCP

> Firecrawl. Turn any website into clean, LLM-ready Markdown with a single API call. This server lets your AI agent scrape single pages, crawl entire sites, map site structures, and search the live web—all into structured data for processing. Stop dealing with messy HTML and start feeding clean content directly to your models.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** markdown-conversion, llm-data, web-crawling, data-extraction, structured-data, dynamic-content

## Description

This server lets your AI agent pull clean, LLM-ready Markdown from any website with one API call. You're done dealing with messy HTML. You can scrape single pages, crawl whole sites, map site structures, and search the live web—all into structured data your models can use. 

**Scraping Single Pages**
When you run `scrape_page` on a URL, it hands you the clean Markdown text from that single web page. You'll use this for articles, documentation, or product pages.

**Searching Live Results**
If you run `search_web` on a topic, it gives you scraped content from the top web search results. This is great for quick fact-checking or targeted research.

**Mapping Site Structures**
Run `map_site` on a domain, and it returns a list of every discovered link without pulling any content. You need this to understand a site's full structure before you start pulling data.

**Indexing Entire Websites**
To index a whole website, you run `crawl_site` on a domain. This returns a job ID, which you use to track the process of scraping multiple internal pages.

## Tools

### crawl_site
Crawl an entire website and extract content from multiple pages. Use this for large-scale indexing, returning a job ID for tracking.

### map_site
Discover all URLs on a website without scraping content. Use this to understand a site's full structure before scraping anything.

### scrape_page
Scrape a single web page and extract its content as clean Markdown. Use this for reliable article or documentation extraction.

### search_web
Search the web and return scraped content from the top results. Use this for quick fact-checking or targeted research.

## Prompt Examples

**Prompt:** 
```
Scrape the main page of docs.firecrawl.dev and give me a summary of what Firecrawl offers.
```

**Response:** 
```
Scraped docs.firecrawl.dev (2,450 words):

Firecrawl offers four core capabilities: Scrape (single page → Markdown), Crawl (recursive site traversal), Map (URL discovery), and Search (web search + scraping). It supports multiple output formats and handles JavaScript rendering, anti-bot protection, and dynamic content automatically.
```

**Prompt:** 
```
Search the web for 'best practices for RAG pipelines 2026' and return the top 3 results with content.
```

**Response:** 
```
Found 3 results for 'best practices for RAG pipelines 2026':

1. **Building Production RAG Systems in 2026** (langchain.dev) — 3,200 words
   Full guide covering chunking strategies, embedding models, and retrieval optimization...

2. **RAG Pipeline Architecture** (docs.llamaindex.ai) — 2,800 words
   Comprehensive overview of modern RAG architectures...

3. **Advanced RAG Techniques** (arxiv.org) — 4,100 words
   Research paper on hybrid search, re-ranking, and query transformation...
```

**Prompt:** 
```
Map all pages on example.com to see the site structure.
```

**Response:** 
```
Site Map: 47 URLs found on example.com

/
/about
/products
/products/alpha
/products/beta
/blog
/blog/post-1
/blog/post-2
/docs
/docs/getting-started
/docs/api-reference
... and 36 more URLs
```

## Capabilities

### Extract single-page content
You run `scrape_page` on a URL, and it returns the clean Markdown text from that single web page.

### Search and scrape live results
You run `search_web` on a topic, and it returns scraped content from the top web search results.

### Discover all URLs on a site
You run `map_site` on a domain, and it returns a list of all discovered links without pulling any content.

### Index an entire website
You run `crawl_site` on a domain, and it returns a job ID to track the process of scraping multiple internal pages.

## Use Cases

### Competitive analysis of a new product
A market researcher needs to compare three competitor websites. They first run `map_site` to understand the scope of all three domains. Then, they use `crawl_site` on each site to pull all core documentation. Finally, they run `search_web` to find the latest press releases, getting a full comparison dataset.

### Building a company knowledge base
An internal documentation team needs to index their entire company wiki. They use `crawl_site` on the internal domain. This gathers all blog posts and API reference pages into one corpus, giving the agent a comprehensive, searchable knowledge base.

### Fact-checking a niche claim
A student needs to verify a complex scientific claim. Instead of reading multiple Wikipedia pages, they run `search_web` for the claim. The agent returns content from the top three academic sources, allowing the student to immediately analyze the evidence.

### Updating product documentation
A product manager is updating a product page. They use `scrape_page` on the old page URL to grab the current content, then run `map_site` to see if any deep-linked supporting pages were missed, ensuring nothing is left out of the rewrite.

## Benefits

- **Scrape single pages reliably.** Need the text from a single article? `scrape_page` handles JavaScript, anti-bot measures, and cookie banners automatically. You get clean Markdown, period.
- **Index whole sites efficiently.** Don't manually list URLs. Use `crawl_site` to recursively traverse an entire domain, perfect for indexing large documentation or product catalogs.
- **Find the structure first.** Before running a full crawl, use `map_site` to get a sitemap of every possible URL. This saves compute time and helps you scope the job.
- **Gather info fast.** Instead of clicking through Google results, use `search_web`. It runs a search and extracts the content from the top results in one step.
- **Reduce data prep time.** By converting all web content directly to clean Markdown, your agent skips the messy HTML parsing step, feeding models pure text.
- **Build robust pipelines.** The combination of `map_site` (structure) and `crawl_site` (content) lets you build reliable data pipelines for any site.

## How It Works

The bottom line is, your agent gets a reliable way to read and structure data from any website, regardless of how complex the underlying web code is.

1. Subscribe to the server and provide your Firecrawl API key.
2. Your agent invokes a specific tool (e.g., `scrape_page`) with the target URL.
3. The server executes the request, handles rendering, and returns the clean, structured Markdown content.

## Frequently Asked Questions

**How do I use `scrape_page` with Firecrawl?**
Just pass the specific URL to `scrape_page`. It automatically extracts the clean Markdown from that page, ignoring scripts and banners.

**Is `crawl_site` better than `map_site`?**
`map_site` only returns links (the map). `crawl_site` actually scrapes the content from those links. Use `map_site` first if you just need to know the site structure, then use `crawl_site` if you need the data.

**Can I scrape a whole domain with Firecrawl?**
Yes, use `crawl_site`. This tool recursively crawls the site, extracting content from multiple pages. It's designed for full site indexing.

**What is the best way to find information on a topic using Firecrawl?**
Use `search_web`. It combines a Google-like search with automatic content extraction, giving you the full text from the top results, not just snippets.

**How do I handle rate limits when I use `crawl_site`?**
The server manages rate limits automatically. If you exceed the limit, the API returns a 429 error, and your agent should implement an exponential backoff retry strategy. This ensures your workflow continues without manual intervention.

**Does `scrape_page` handle complex JavaScript rendering?**
Yes, it handles complex JavaScript rendering. Firecrawl automatically executes the page's JS to get the fully rendered content before extracting it to clean Markdown. This means you get the real content, not just the initial HTML.

**What is the difference between `map_site` and `crawl_site`?**
They serve different purposes. `map_site` returns a sitemap of all discovered links without scraping content. You use this first to understand the site structure; then, you run `crawl_site` to actually extract the content from those links.

**How does `search_web` extract content from search results?**
The tool performs a Google-like search and then scrapes the full content from the top results. It doesn't just give you links; it gives your agent the actual text needed for analysis, making fact-checking easy.

**How does Firecrawl pricing work?**
Firecrawl uses a credit-based system. You get 500 free lifetime credits to start (no credit card required). Base cost is 1 credit per page scraped. Advanced features like JSON extraction (+4 credits) or enhanced mode (+4 credits) consume additional credits per page. Paid plans start at $16/month with 3,000 monthly credits.

**Can Firecrawl handle JavaScript-heavy websites?**
Yes! Firecrawl renders pages in a full browser environment before extracting content — this means it handles React, Next.js, Angular, and any other JavaScript framework. It also automatically bypasses common anti-bot protections, removes cookie consent banners, and waits for dynamic content to load before extraction.

**What formats does Firecrawl return?**
Firecrawl can return content in multiple formats: Markdown (default and most popular for LLM consumption), HTML, raw HTML, structured JSON (with LLM-powered extraction), screenshots, links, and page metadata. You can request multiple formats in a single call.