# Firecrawl MCP

> Firecrawl. Scrapes and crawls entire websites into clean, structured markdown using a single API call. It handles JavaScript rendering, automatically excludes boilerplate content like headers and footers, and allows your AI agent to programmatically discover and ingest full knowledge bases from any root URL.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** data-extraction, markdown-conversion, rag-pipelines, recursive-crawling, llm-ready, web-data

## Description

Firecrawl lets your AI agent scrape and crawl entire websites, spitting out clean, structured markdown. You'll use this server to ingest full knowledge bases from any root URL, and it handles JavaScript rendering and stripping out all the boilerplate junk like headers and footers. 

To grab a single URL, your agent sends the address and gets the content back as structured markdown, automatically cleaning up the fluff. You can also map a site's internal structure by giving it a root domain; the server lists every reachable link without actually downloading the content. If you wanna crawl a whole site, your agent inputs a root URL and a depth limit, and the server starts a background job that systematically discovers and scrapes all linked pages. When you need to track a crawl job, your agent sends a job ID, and the server returns the current status: running, finished, or failed. You can also check your API usage by asking for details, and the server spits out your remaining Firecrawl credits and usage history. If a crawl job gets out of hand, your agent can provide an active job ID, and the server sends a signal to terminate the process immediately.

## Tools

### cancel_active_crawl
Stops a crawl job that is currently running using a provided job ID.

### get_api_usage
Checks and reports your current Firecrawl credit usage and remaining limits.

### get_crawl_status
Retrieves the current status (running, completed, failed) for a specific crawl job ID.

### map_website_structure
Discovers and lists all accessible URLs on a domain without downloading the content of the pages.

### scrape_url
Converts the content of a single specified URL into clean, structured markdown format.

### start_crawl
Initiates a recursive crawling job on a root URL, returning a job ID for tracking.

## Prompt Examples

**Prompt:** 
```
Turn 'https://stripe.com/docs/api' into clean Markdown.
```

**Response:** 
```
Scraping in progress... I've successfully converted the Stripe API documentation into high-fidelity Markdown, excluding the navigation and footers. Would you like me to summarize the main endpoints for you?
```

**Prompt:** 
```
Crawl 'https://docs.firecrawl.dev' recursively with a limit of 10 pages.
```

**Response:** 
```
Crawl job started! I've initiated a recursive crawl of the Firecrawl documentation (ID: crl_123). I'll monitor the progress for you and notify you as soon as the 10 pages are indexed.
```

**Prompt:** 
```
Map all internal links for 'https://github.com/vinkius'.
```

**Response:** 
```
Mapping site structure... I've identified all reachable links for the requested domain. I found 25 internal URLs, including various repository and profile paths. Would you like the full list of mapped URLs?
```

## Capabilities

### Scrape a single URL to Markdown
The agent sends a URL, and the server returns the content as clean, structured markdown, automatically cleaning up boilerplate elements.

### Map a website's internal structure
The agent inputs a root domain, and the server lists every reachable link on that site without downloading the content.

### Recursively crawl a site
The agent inputs a root URL and a depth limit, and the server initiates a background job that systematically discovers and scrapes all linked pages.

### Check the status of a crawl job
The agent sends a job ID, and the server returns the current status of the crawl, letting you know if it's running, finished, or failed.

### Get current API usage
The agent asks for usage details, and the server returns your remaining Firecrawl credits and usage history.

### Stop a running crawl
The agent provides an active job ID, and the server sends a signal to terminate the crawling process immediately.

## Use Cases

### Building a RAG Knowledge Base
A developer needs to index all documentation from a vendor site. Instead of manually scraping 50 pages, they ask their agent to use `map_website_structure` first, then `start_crawl` on the root URL. The resulting job ID feeds into their RAG pipeline, providing a complete knowledge base.

### Competitive Analysis
A market researcher needs to track competitor pricing across 10 product pages. They use `scrape_url` on each specific URL, ensuring that every piece of extracted content is clean Markdown, allowing for easy comparison and structured data analysis.

### Archiving an Article
A content creator finds a great article and needs to save it for later. They ask their agent to use `scrape_url`, which extracts the main text and automatically strips out the distracting sidebars and ads. They also capture a full-page screenshot for context.

### Auditing a Website's Links
A site auditor needs to know every possible internal link on a large corporate site. They use `map_website_structure` to get a complete list of all reachable URLs, without wasting time or credits on content extraction.

## Benefits

- **Structured Data Output:** Forget messy HTML. `scrape_url` converts complex web pages into clean, structured Markdown, making the data instantly usable by your LLM.
- **Deep Site Discovery:** Need more than just one page? `start_crawl` handles recursive crawling, systematically finding and scraping every linked subpage to build a full knowledge base.
- **Planning Before Execution:** Don't guess what's on the site. Use `map_website_structure` first to get a full list of reachable URLs, letting you plan your data gathering before spending credits.
- **Full Process Visibility:** Use `get_crawl_status` and `get_api_usage` to monitor jobs in real-time. You always know if the crawl is running, finished, or if you're running low on credits.
- **Visual and Textual Record:** Capture a full-page screenshot alongside the structured text using the agent's visual capture capability. You get both the raw visual context and the clean text data.
- **Stop and Control:** If a crawl goes wrong or you change your mind, `cancel_active_crawl` lets you shut down the job immediately.

## How It Works

The bottom line is: you tell your AI agent what web data you need, and it uses Firecrawl to execute the scrape or crawl, giving you clean, structured output.

1. Subscribe to the Firecrawl server and retrieve your API Key from the Firecrawl dashboard.
2. Give your AI agent the target URL and the desired action (e.g., 'Scrape this page' or 'Crawl this site').
3. The agent executes the tool call, and the server returns the structured data (Markdown, job ID, or usage metrics) to your agent for processing.

## Frequently Asked Questions

**How do I use Firecrawl with my AI client to scrape a single page?**
You use the `scrape_url` tool. Just give your agent the URL you want. It handles the JavaScript rendering and outputs clean Markdown, making the content ready for analysis right away.

**Can Firecrawl crawl an entire website recursively?**
Yes. You use the `start_crawl` tool. It initiates a background job and returns a job ID. You then use `get_crawl_status` to monitor the progress until the entire site is indexed.

**What is the difference between `map_website_structure` and `start_crawl`?**
`map_website_structure` only discovers links; it doesn't download content. `start_crawl` executes the crawl and downloads the actual page content. Use mapping to plan, and crawling to execute.

**How do I check if my Firecrawl API usage is over my limit?**
Run the `get_api_usage` tool. This tells you your remaining credits and helps you manage your budget before you run out of data extraction capacity.

**Can Firecrawl capture screenshots while crawling?**
Yes, the agent can capture full-page screenshots of any URL. This adds a visual record to your data set, giving you context beyond just the text.

**How do I manage an ongoing crawl using the `cancel_active_crawl` tool?**
You call `cancel_active_crawl` with the job ID. This immediately stops the crawl job, preventing further processing and saving credits. You can't restart a canceled job; you'll need to initiate a new crawl.

**What is the difference between `scrape_url` and `map_website_structure`?**
`scrape_url` converts a single URL into clean, Markdown-ready content. `map_website_structure` simply discovers all reachable links on a site without extracting any content. Use `map_website_structure` first if you only need a site map.

**How do I monitor my job progress after running `start_crawl`?**
You use `get_crawl_status` with the job ID returned by `start_crawl`. This tells you if the crawl is running, paused, or complete. It's the only way to track the real-time status of a background job.

**How do I find my Firecrawl API Key?**
Log in to your [**Firecrawl dashboard**](https://www.firecrawl.dev/app/dashboard), and navigate to the **API Keys** section to copy your unique token.

**Can I scrape content excluding headers and footers?**
Yes! The `scrape_url` tool includes an `onlyMainContent` parameter. When set to true, Firecrawl uses AI to extract only the core article or page content.

**How long does a recursive crawl take?**
Crawl time depends on the site size and depth. Use the `get_crawl_status` tool to monitor progress and retrieve results once the job is complete.