# Firecrawl MCP

> Firecrawl turns websites into clean, structured markdown data for your AI agent. It lets you scrape single pages or run multi-page crawls across entire sites without touching a dashboard. You can map out a site's full structure and monitor job progress using this MCP.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** data-extraction, markdown-conversion, web-crawling, llm-ready, data-pipeline

## Description

Web content is messy. When you pull data off a website, it rarely comes in a neat format your AI agent can use immediately; it's usually buried in HTML tags, navigation menus, and random scripts. This MCP changes that. It lets your agent treat any complex website like a clean source file. You can direct it to scrape specific pages for immediate results or launch huge crawl jobs across multiple site sections. Furthermore, if you need to understand the architecture before pulling data, you can map out the entire site's hierarchy first. Because this MCP connects through Vinkius, your agent gains access to a powerful web curation engine that keeps all your extracted information structured and ready for whatever task comes next.

## Tools

### crawl_url
Initiates a multi-page crawl job for an entire website.

### delete_crawl_job
Removes a specified, completed or failed crawl job from the system log.

### get_crawl_status
Checks and reports the current progress status of an ongoing crawl job.

### list_crawl_jobs
Retrieves a list detailing all past and present crawl jobs managed by the MCP.

### map_website
Generates a structural map showing how a website's various pages are linked together.

### scrape_url
Retrieves and cleans the main content from a single specified URL, returning markdown text.

## Prompt Examples

**Prompt:** 
```
Scrape https://vinkius.com/about and give me the markdown content.
```

**Response:** 
```
I've scraped the page! I've retrieved the cleaned markdown which details Vinkius's mission and team. Would you like a summary of the core sections?
```

**Prompt:** 
```
Start a crawl job for https://docs.firecrawl.dev with a limit of 10 pages.
```

**Response:** 
```
Crawl job started! The Job ID is `job_12345`. I'll monitor the status for you. So far, it has identified the main documentation categories. Would you like me to notify you when it's done?
```

**Prompt:** 
```
Map the website structure of https://github.com.
```

**Response:** 
```
I've mapped out the hierarchy for GitHub. It includes major sections like /features, /pricing, and /trending. Would you like the full list of identified subdirectories?
```

## Capabilities

### Extract Clean Web Content
Scrape any single URL and return the main text content as clean, machine-readable markdown.

### Map Site Structure
Generate a map showing all the key pages and how they connect on a website's hierarchy.

### Run Multi-Page Crawls
Start large, multi-page crawling jobs across an entire site and monitor them until completion.

### Manage Job Lifecycle
List all crawl jobs (active or finished) and delete old jobs to keep your job log clean.

## Use Cases

### Monitoring a Competitor’s Blog
A marketing manager wants to audit five competitor blogs. Instead of opening five tabs and copying content, they ask their agent to run `crawl_url` across the main feed URLs. The MCP handles the job execution and delivers clean markdown summaries for direct comparison.

### Building a Knowledge Base
A data scientist needs structured documentation from 50 pages of internal guides. They first run `map_website` to ensure full coverage, then use `crawl_url` to start the multi-page job, monitoring it with `get_crawl_status` until every page is collected.

### Analyzing Site Architecture
An SEO specialist needs to know if a client's site has deep linking issues. They ask their agent to run `map_website`. The MCP immediately returns the structure, allowing them to spot missing or orphaned directories instantly.

### Cleaning Up Old Data Runs
After several weeks of testing, a team has dozens of old crawl jobs cluttering their history. They use `list_crawl_jobs` to see everything and then execute `delete_crawl_job` on the irrelevant entries.

## Benefits

- Stop wrestling with messy HTML. Instead, the `scrape_url` tool pulls out only the core text, giving you clean markdown that's ready to feed into a prompt.
- Don't guess how big a site is. Use `map_website` to instantly see the entire page hierarchy and identify missing or critical sections before you start crawling.
- Need data from dozens of pages? The MCP lets you initiate large jobs using `crawl_url`. You then use `get_crawl_status` to monitor progress without manually checking a dashboard.
- Keep your workspace tidy. After a massive crawl job is done, use `list_crawl_jobs` and `delete_crawl_job` to archive the record, preventing log clutter.
- The MCP handles all the web complexity so you don't have to. You focus on the data; we handle the scraping.

## How It Works

The bottom line is, you tell your agent what web content you need, and this MCP handles the complex extraction process for you.

1. Subscribe to this MCP and enter the required API key.
2. Instruct your AI agent to run a specific action, like scraping a URL or mapping a site structure.
3. The MCP executes the task and returns structured data—be it clean markdown text or a crawl job status.

## Frequently Asked Questions

**How do I start a crawl job using Firecrawl MCP?**
You initiate this using `crawl_url`. You provide the starting URL and any necessary parameters, and the MCP handles launching the background collection process.

**Can I check if my crawl job is done with Firecrawl MCP?**
Yes. Use `get_crawl_status` to check the current progress of a running or paused crawl job, giving you real-time feedback on its status.

**What's the difference between scrape_url and crawl_url in Firecrawl MCP?**
Scrape is for one URL; it delivers immediate text. Crawl is for multiple URLs across a site; it launches a background, multi-page job.

**How do I delete old jobs using Firecrawl MCP?**
First, use `list_crawl_jobs` to get the Job ID. Then, pass that specific ID into the `delete_crawl_job` tool to remove it from your history.

**What does the `map_website` function do, and how can it help me plan a crawl?**
It generates an immediate map of a website's structure. Instead of scraping pages, you get a clear hierarchy showing site depth and page distribution. This is perfect for planning exactly which areas to target next.

**I want to audit my past work; how do I use `list_crawl_jobs`?**
The function returns a comprehensive list of all your crawl jobs, whether they succeeded or failed. You can review this history to track data collection over time and ensure you haven't missed any key sites.

**When I use `scrape_url`, can I control the output format?**
Yes, you tell the MCP whether you need markdown or HTML. Picking the right format ensures that the content is structured perfectly for whatever LLM client or data pipeline you're feeding it.

**If a crawl job fails using `crawl_url`, how do I remove its record?**
You first use `list_crawl_jobs` to find the specific faulty Job ID. Then, you execute `delete_crawl_job` with that ID. This removes junk records and keeps your MCP logs clean.

**How do I find my Firecrawl API Key?**
Log in to your [**Firecrawl.dev dashboard**](https://www.firecrawl.dev/app/settings/api-keys), and you will find your API Key under the settings. Copy and paste it below.

**Can the agent crawl multiple pages at once?**
Yes. Use the `crawl_url` tool providing the base URL. Firecrawl will start a job to extract all subpages, and you can monitor the status via `get_crawl_status`.

**Is it possible to see the website structure before scraping?**
Yes. The `map_website` tool allows your agent to retrieve a hierarchy of the site, giving you an audit of the structure before performing a full scrape or crawl.