# Firecrawl MCP

> Firecrawl crawls entire websites, turning complex web pages into clean, structured Markdown ready for your AI agent. It handles JavaScript rendering and recursive crawling automatically. You can map site structures, scrape individual URLs, or monitor large data jobs—all through natural conversation with your preferred AI client.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** data-extraction, markdown-conversion, rag-pipelines, recursive-crawling, llm-ready, web-data

## Description

Need to build a knowledge base from the web? Firecrawl lets you treat websites like structured databases. Instead of wrestling with messy HTML tags or writing complex scraping scripts, you just tell your agent what site you need. It handles everything else: rendering JavaScript, filtering out headers and footers, and converting deep structures into clean Markdown. If you connect this MCP via Vinkius, your AI client manages the whole process—from mapping a domain's entire link structure to crawling hundreds of pages recursively. You maintain control over data acquisition using natural language prompts, making web scraping feel like talking to an expert data engineer who never gets tired.

## Tools

### cancel_active_crawl
Stops any ongoing website crawl job immediately.

### get_crawl_status
Retrieves the current status of a specific crawl job ID.

### start_crawl
Initiates a recursive crawl of a website and returns a unique job ID for tracking.

### map_website_structure
Discovers and lists every reachable link on a domain without scraping any content.

### scrape_url
Extracts the full, cleaned Markdown content from a single URL.

### get_api_usage
Checks how many API credits you have left and tracks your usage.

## Prompt Examples

**Prompt:** 
```
Turn 'https://stripe.com/docs/api' into clean Markdown.
```

**Response:** 
```
Scraping in progress... I've successfully converted the Stripe API documentation into high-fidelity Markdown, excluding the navigation and footers. Would you like me to summarize the main endpoints for you?
```

**Prompt:** 
```
Crawl 'https://docs.firecrawl.dev' recursively with a limit of 10 pages.
```

**Response:** 
```
Crawl job started! I've initiated a recursive crawl of the Firecrawl documentation (ID: crl_123). I'll monitor the progress for you and notify you as soon as the 10 pages are indexed.
```

**Prompt:** 
```
Map all internal links for 'https://github.com/vinkius'.
```

**Response:** 
```
Mapping site structure... I've identified all reachable links for the requested domain. I found 25 internal URLs, including various repository and profile paths. Would you like the full list of mapped URLs?
```

## Capabilities

### Scrape a single page
Turn any specific URL into clean Markdown text in one go.

### Crawl entire websites
Start a job that discovers and extracts content from every subpage, building deep knowledge bases.

### Map site structure
Find all the internal links on a domain without actually downloading any full page content.

### Check job status
Monitor ongoing crawls to see exactly where the process stands.

### Manage usage and limits
Track your remaining API credits and current usage in real time.

## Use Cases

### Building a competitor analysis database
A market analyst needs to track 15 competitors' documentation sites. They use `map_website_structure` first to confirm all relevant subdomains, then run `start_crawl` on each one to build an indexed knowledge base for comparison.

### Creating a site index for a manual
A developer needs to document every single page of their internal wiki. They use `map_website_structure` to get the full list, and then loop through that list using `scrape_url` to pull clean Markdown from each link.

### Researching a niche topic quickly
A researcher needs data on a specific university department. They use `start_crawl` on the main departmental URL, letting it recursively gather content and index all pages for later review.

### Checking web content integrity
Before committing to a large crawl job, a user calls `get_api_usage`. This confirms they have enough credits and can start the process with confidence.

## Benefits

- Stop copy-pasting messy HTML. Use `scrape_url` to instantly convert any webpage into high-fidelity Markdown that your AI agent can read cleanly.
- Build massive knowledge bases by running a recursive crawl using `start_crawl`. This tool maps and extracts content from an entire site, not just one page.
- Need to know what links exist before scraping? Use `map_website_structure` first. It gives you the full blueprint of a website without wasting credits on unnecessary downloads.
- Maintain total control over your process using `get_crawl_status`. You can monitor long jobs and use `cancel_active_crawl` if something goes wrong or takes too long.
- Know your limits before running big jobs. The `get_api_usage` tool lets you check your credit count so you never run out mid-project.

## How It Works

The bottom line is, you talk to your agent, and it handles the whole web scraping pipeline for you.

1. Subscribe to this MCP, then grab your API key from the Firecrawl dashboard.
2. Your AI client uses natural language to initiate a task—like mapping links or starting a crawl job.
3. The system returns structured data (Markdown or status updates) that your agent can use immediately.

## Frequently Asked Questions

**How do I scrape just one page using Firecrawl? (scrape_url)**
Call `scrape_url` and provide the exact URL. This tool is designed for single-page extraction, giving you clean Markdown without starting a full crawl job.

**What's the difference between map_website_structure and start_crawl?**
`map_website_structure` only finds links (the blueprint). `start_crawl` actually visits those links and extracts content to build your knowledge base.

**Can I stop a crawl job if it fails or takes too long? (cancel_active_crawl)**
Yes. If you initiate a crawl via `start_crawl` and need to halt the process, use `cancel_active_crawl` with the job ID.

**How do I check if my credits are okay before crawling? (get_api_usage)**
Use `get_api_usage`. This tool immediately reports your remaining credit balance and usage history, letting you manage costs upfront.

**Using `get_crawl_status`, how do I confirm that a recursive job has completed indexing all pages?**
The status endpoint reports the final completion state. You need to poll this tool repeatedly until it returns 'completed' or 'failed'. This confirms the crawl finished processing, not just started.

**If a recursive job initiated by `start_crawl` encounters an error, how do I debug the issue?**
The system captures detailed error logs associated with the specific job ID. Check these logs using your agent to see the exact page or link that caused the failure. You can then retry only the problematic segment.

**How does `map_website_structure` handle links pointing outside of the main domain?**
The tool is designed to understand internal site architecture, so it only discovers and lists reachable URLs within the specified root domain. It ignores external links entirely.

**When managing multiple large jobs, do I need a specific job ID for `cancel_active_crawl`?**
Yes, you must provide the unique Job ID for each crawl instance you intend to stop. The tool operates on IDs; it can't cancel an entire category of running jobs.

**How do I find my Firecrawl API Key?**
Log in to your [**Firecrawl dashboard**](https://www.firecrawl.dev/app/dashboard), and navigate to the **API Keys** section to copy your unique token.

**Can I scrape content excluding headers and footers?**
Yes! The `scrape_url` tool includes an `onlyMainContent` parameter. When set to true, Firecrawl uses AI to extract only the core article or page content.

**How long does a recursive crawl take?**
Crawl time depends on the site size and depth. Use the `get_crawl_status` tool to monitor progress and retrieve results once the job is complete.