# Firecrawl MCP

> Firecrawl turns any website into clean, structured Markdown for your AI agent. It handles all the junk you usually fight: JavaScript rendering, cookie banners, and anti-bot measures. Need to read a whole site? Crawl it recursively. Just want to know what pages exist? Map them first. Firecrawl gives your agent full web access in one API call.

## Overview
- **Category:** friends-mcp
- **Price:** Free
- **Tags:** markdown-conversion, llm-data, web-crawling, data-extraction, structured-data, dynamic-content

## Description

Your AI client needs more than just conversation; it needs context from the real world. This connector lets you give your agent direct access to the entire web, treating every URL like a readable document. You can scrape specific product pages for deep analysis. Or maybe you need to index an entire documentation site—the system handles crawling through all internal links until it’s done. It even gives you Google-like search results, but with the full text extracted right there. Connecting this via Vinkius means your agent gets one gateway to read and understand anything on the internet, turning complex web data into simple Markdown for immediate use.

## Tools

### crawl_site
It crawls an entire website, extracting content from multiple linked pages in batches.

### map_site
This tool discovers all URLs on a website so you can plan exactly what needs to be scraped later.

### scrape_page
It pulls clean Markdown from any single web page, handling anti-bot protections automatically.

### search_web
It searches the web for a topic and returns scraped content from the resulting top pages.

## Prompt Examples

**Prompt:** 
```
Scrape the main page of docs.firecrawl.dev and give me a summary of what Firecrawl offers.
```

**Response:** 
```
Scraped docs.firecrawl.dev (2,450 words):

Firecrawl offers four core capabilities: Scrape (single page → Markdown), Crawl (recursive site traversal), Map (URL discovery), and Search (web search + scraping). It supports multiple output formats and handles JavaScript rendering, anti-bot protection, and dynamic content automatically.
```

**Prompt:** 
```
Search the web for 'best practices for RAG pipelines 2026' and return the top 3 results with content.
```

**Response:** 
```
Found 3 results for 'best practices for RAG pipelines 2026':

1. **Building Production RAG Systems in 2026** (langchain.dev) — 3,200 words
   Full guide covering chunking strategies, embedding models, and retrieval optimization...

2. **RAG Pipeline Architecture** (docs.llamaindex.ai) — 2,800 words
   Comprehensive overview of modern RAG architectures...

3. **Advanced RAG Techniques** (arxiv.org) — 4,100 words
   Research paper on hybrid search, re-ranking, and query transformation...
```

**Prompt:** 
```
Map all pages on example.com to see the site structure.
```

**Response:** 
```
Site Map: 47 URLs found on example.com

/
/about
/products
/products/alpha
/products/beta
/blog
/blog/post-1
/blog/post-2
/docs
/docs/getting-started
/docs/api-reference
... and 36 more URLs
```

## Capabilities

### Extracting single page content
You give it a URL, and you get clean Markdown text ready for analysis.

### Mapping site structure
It returns a list of all possible URLs on a domain without downloading any actual content.

### Crawling entire websites
The system follows internal links across a whole domain, returning the collected pages in batches.

### Searching and extracting web data
It performs a search query and then scrapes the top results, including their full article text.

## Use Cases

### A research team needs a competitive analysis.
Instead of manually visiting competitor sites, the agent first uses `map_site` to list all product sections on three target domains. Then, it runs `crawl_site` across those domains to collect and compare full feature set descriptions.

### You need a quick summary of a new white paper.
The agent uses `search_web` for the title of the white paper. Once found, it scrapes the main page using `scrape_page`, giving your agent enough context to summarize the key takeaways without needing manual reading.

### A data team is building a knowledge base from an internal wiki.
The engineer uses `crawl_site` on the wiki domain. The MCP processes all the linked pages, turning thousands of articles into clean Markdown chunks ready to be indexed.

### You need to validate if a site has documentation.
Before committing resources, you run `map_site` on the suspected domain. If it returns hundreds of links under a 'docs' directory, you know where to focus your scraping efforts.

## Benefits

- Scrape single articles perfectly. Use `scrape_page` to grab clean Markdown from any URL, ignoring cookie banners or JS rendering issues so the text is always usable.
- Index entire sites reliably. Running `crawl_site` lets your agent recursively visit every page on a domain, perfect for building knowledge bases from documentation sets.
- Understand the scope first. Before you write code to grab data, run `map_site`. This shows you all available URLs so you know exactly how big the target site is.
- Fact-check with depth. Don't just search; use `search_web` to find information and then get the full article content from the top results for verification.
- Save time on data prep. You don't have to worry about messy HTML or JavaScript execution; the MCP handles all the dirty work so you get structured text immediately.

## How It Works

The bottom line is: you point your AI client at the data source, and it handles all the messy plumbing required to make that data readable Markdown.

1. Subscribe to this MCP and provide your API key.
2. Your agent sends a request specifying the target URL or domain scope.
3. The tool executes the web operation (e.g., scraping, crawling) and returns the structured content or job ID for tracking.

## Frequently Asked Questions

**How does Firecrawl MCP handle JavaScript content?**
It automatically handles JavaScript rendering. This means if the website loads its data using JS (like many modern blogs do), your agent still gets to read it, not just the initial source code.

**Can I use Firecrawl MCP to index an entire company wiki?**
Yes. You'd run `map_site` first to get all internal links, then pass those links into a job using `crawl_site`. This collects and indexes the full text from every linked page.

**Is Firecrawl MCP better than just scraping one URL?**
Yes. While `scrape_page` works for single URLs, this MCP also gives you site-level tools like `map_site`, which help you understand the whole structure before you start extracting content.

**What is the difference between crawl_site and map_site?**
Mapping only finds links; it returns a sitemap of URLs. Crawling, however, actually goes to those links, scrapes their content, and returns the text for every single page.

**When I run `search_web`, does it provide full article content, or just snippets?**
It provides the extracted, full content from top search results. The tool combines web searching with automatic extraction, meaning you get more than just links; you get the actual text needed for your agent's context.

**Does `scrape_page` consistently return Markdown, even if the source page is messy?**
Yes, it always returns clean Markdown regardless of how messy the original webpage was. This automatic conversion handles formatting issues and ensures your agent receives usable, structured text from any single URL.

**How does the MCP handle rate limits when running `crawl_site` across many pages?**
The platform manages usage through credits assigned to your API key. You simply track your consumption; once you run out of free credits, you just top up your account for continued use.

**Before using `scrape_page`, how can I check the entire site structure with `map_site`?**
You run `map_site` first to generate a complete sitemap of all possible URLs on a domain. This allows you to understand the site's architecture before deciding exactly which pages need scraping.

**How does Firecrawl pricing work?**
Firecrawl uses a credit-based system. You get 500 free lifetime credits to start (no credit card required). Base cost is 1 credit per page scraped. Advanced features like JSON extraction (+4 credits) or enhanced mode (+4 credits) consume additional credits per page. Paid plans start at $16/month with 3,000 monthly credits.

**Can Firecrawl handle JavaScript-heavy websites?**
Yes! Firecrawl renders pages in a full browser environment before extracting content — this means it handles React, Next.js, Angular, and any other JavaScript framework. It also automatically bypasses common anti-bot protections, removes cookie consent banners, and waits for dynamic content to load before extraction.

**What formats does Firecrawl return?**
Firecrawl can return content in multiple formats: Markdown (default and most popular for LLM consumption), HTML, raw HTML, structured JSON (with LLM-powered extraction), screenshots, links, and page metadata. You can request multiple formats in a single call.