MCP Library updated 13 min read

Firecrawl MCP Server: Deep Website Crawling for AI

Crawl and extract structured data from entire websites. Learn how to use Firecrawl MCP for comprehensive web scraping, content extraction, and site analysis.

RP

Rajesh Praharaj

Aug 11, 2025 · Updated Dec 27, 2025

Firecrawl MCP Server: Deep Website Crawling for AI

TL;DR - Firecrawl MCP Quick Start

Crawl entire websites for AI analysis - Deep extraction and structured data.

🆕 2025: Firecrawl v2.5 introduces Semantic Index for 40% faster data access, Agent endpoint, and AI-native search capabilities! For an introduction to MCP, see the MCP Introduction guide.

Quick Setup:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "fc-your-api-key-here"
      }
    }
  }
}

What you can do:

  • 🕷️ Crawl: Entire websites following links
  • 📄 Extract: Structured content and metadata
  • 🔍 Scrape: Single pages with JS rendering
  • 📊 Map: Discover all pages on a site
  • 🎯 Extract: Custom data schemas

Example conversation:

You: Crawl docs.example.com and summarize the documentation structure

Claude: Crawling docs.example.com...

        **Site Structure (47 pages)**
        
        📁 Getting Started (5 pages)
        ├── Introduction
        ├── Installation
        ├── Quick Start
        ├── Configuration
        └── First Steps
        
        📁 Core Concepts (12 pages)
        ├── Architecture
        ├── Components
        ...
        
        📁 API Reference (30 pages)
        ├── Authentication
        ├── Endpoints
        ...

💡 Requires Firecrawl API key - Get one at firecrawl.dev

🤖 AI-Native: Designed specifically for LLMs, RAG pipelines, and agentic systems. For more on RAG, see the RAG, Embeddings & Vector Databases guide.


Firecrawl vs Other Web MCPs

When to use each web-focused MCP:

MCPBest ForJS RenderingSpeed
FirecrawlEntire sites, structured extraction✅ YesFast (parallel)
FetchSingle pages, simple content❌ NoFastest
PlaywrightInteractive pages, forms, testing✅ YesSlower

Decision Guide

Need to crawl a whole site?

    ┌────┴────┐
    ▼         ▼
   Yes        No
    │         │
    ▼         ▼
Firecrawl   Single page?

        ┌────┴────┐
        ▼         ▼
       Yes        No (interactive)
        │         │
        ▼         ▼
      Fetch    Playwright

Prerequisites

1. Firecrawl API Key

  1. Go to firecrawl.dev
  2. Sign up for an account
  3. Navigate to API Keys
  4. Create and copy your API key

Pricing Tiers:

TierPages/MonthFeatures
Free500Basic crawling
Starter3,000Custom extraction
Standard50,000Priority processing
ScaleUnlimitedDedicated support

2025 Updated Pricing

PlanPriceCredits/Month
Free$0500
Hobby$16/mo3,000
Standard$83/mo100,000
Growth$333/mo500,000
EnterpriseCustomUnlimited

2. Node.js v18+

node --version  # Should be v18+

Installation & Configuration

Claude Desktop Setup

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "fc-xxxxxxxxxxxxx"
      }
    }
  }
}

Cursor Setup

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "fc-xxxxxxxxxxxxx"
      }
    }
  }
}

Verify Installation

Restart your AI client and test:

You: Crawl example.com

Claude: Crawling example.com...

        **Crawl Results:**
        - Pages found: 1
        - Content extracted: Yes
        
        **Page: Example Domain**
        This domain is for use in illustrative examples...

Available Tools

Core Operations

ToolDescriptionExample Prompt
firecrawl_crawlCrawl entire website”Crawl all of docs.example.com”
firecrawl_scrapeScrape single page”Scrape the pricing page”
firecrawl_mapGet site map/structure”Map all pages on example.com”

Extraction Options

OptionDescriptionUse Case
limitMax pages to crawlLarge sites
excludePathsSkip certain pathsAvoid duplicates
includePathsOnly crawl matching pathsFocus on section
extractSchemaCustom data extractionPrices, contacts

Practical Workflows

Workflow 1: Documentation Analysis

Crawl and analyze documentation sites. For more on AI research tools, see the AI Search Engines guide.

You: Crawl the React documentation and create a summary 
     of all available hooks

Claude: Crawling React docs...

        **React Hooks Summary** (from react.dev)
        
        **State Hooks:**
        - `useState` - Add state to components
        - `useReducer` - Complex state management
        
        **Context Hooks:**
        - `useContext` - Subscribe to context
        
        **Ref Hooks:**
        - `useRef` - Reference DOM elements
        - `useImperativeHandle` - Customize ref behavior
        
        **Effect Hooks:**
        - `useEffect` - Side effects
        - `useLayoutEffect` - DOM-sync effects
        - `useInsertionEffect` - CSS-in-JS
        
        **Performance Hooks:**
        - `useMemo` - Memoize values
        - `useCallback` - Memoize functions
        - `useTransition` - Non-blocking updates
        - `useDeferredValue` - Defer updates
        
        **Other Hooks:**
        - `useId` - Unique IDs
        - `useSyncExternalStore` - External stores
        - `useActionState` - Form actions
        
        Found 14 documented hooks across 47 pages.

Workflow 2: Competitive Analysis

Analyze competitor websites:

You: Crawl competitor.com and analyze their product offerings

Claude: Crawling competitor.com...

        **Competitor Analysis**
        
        **Products Found (8):**
        
        | Product | Price | Category |
        |---------|-------|----------|
        | Pro Plan | $29/mo | SaaS |
        | Team Plan | $99/mo | SaaS |
        | Enterprise | Custom | SaaS |
        | Widget Toolkit | $199 | One-time |
        | API Access | $0.01/call | Usage |
        
        **Key Features Advertised:**
        - Real-time collaboration
        - 99.9% uptime SLA
        - SOC 2 compliance
        - 24/7 support (Enterprise)
        
        **Content Strategy:**
        - 23 blog posts (SEO focus)
        - 15 case studies
        - 8 integration guides
        
        **Target Audience Signals:**
        - Enterprise language prominent
        - Security messaging strong
        - Developer-focused docs

Workflow 3: Content Inventory

Audit a website’s content:

You: Map all pages on blog.company.com and categorize 
     the content types

Claude: Mapping blog.company.com...

        **Content Inventory** (156 pages)
        
        **By Category:**
        | Category | Count | % of Total |
        |----------|-------|------------|
        | Engineering | 45 | 29% |
        | Product Updates | 38 | 24% |
        | Tutorials | 32 | 21% |
        | Company News | 25 | 16% |
        | Case Studies | 16 | 10% |
        
        **By Year:**
        - 2024: 42 posts
        - 2023: 58 posts
        - 2022: 36 posts
        - 2021: 20 posts
        
        **Top Authors:**
        - Alice Chen: 28 posts
        - Bob Developer: 24 posts
        - Carol Writer: 19 posts
        
        **Content Gaps Identified:**
        - No posts about AI/ML topics
        - Limited video content
        - Few beginner tutorials

Workflow 4: Structured Data Extraction

Extract specific data fields:

You: Crawl realestate.example.com and extract all property listings 
     with price, location, and number of bedrooms

Claude: Crawling with extraction schema...

        **Property Listings** (45 found)
        
        | Address | Price | Beds | Baths | Sqft |
        |---------|-------|------|-------|------|
        | 123 Oak St | $450,000 | 3 | 2 | 1,850 |
        | 456 Maple Ave | $325,000 | 2 | 1 | 1,200 |
        | 789 Pine Rd | $875,000 | 5 | 3 | 3,400 |
        ...
        
        **Statistics:**
        - Average price: $524,000
        - Median beds: 3
        - Price range: $250K - $1.2M
        
        **Location Breakdown:**
        - Downtown: 12 listings
        - Suburbs: 28 listings
        - Waterfront: 5 listings

Workflow 5: Site Migration Prep

Prepare for website migration. For more on AI-powered workflows, see the AI-Powered Workflows guide.

You: Crawl oldsite.com completely and create an inventory 
     for migration to the new platform

Claude: Full site crawl starting...

        **Migration Inventory** (234 pages)
        
        **Page Types:**
        | Type | Count | Action Needed |
        |------|-------|---------------|
        | Static pages | 45 | Direct migrate |
        | Blog posts | 128 | CMS import |
        | Product pages | 32 | DB migration |
        | Landing pages | 18 | Redesign |
        | Redirects needed | 11 | 301 setup |
        
        **URLs with Issues:**
        - 5 pages return 404
        - 3 pages have duplicate content
        - 7 pages missing meta descriptions
        
        **External Links Found:** 156
        - Need verification: 23 (old/potentially broken)
        
        **Images:** 892 total
        - Average size: 245KB
        - Need optimization: 234 (over 500KB)
        
        **Recommended Migration Order:**
        1. Static pages (simple)
        2. Product pages (business critical)
        3. Blog content (volume)
        4. Legacy pages (assess need)

Example Prompts

Full Site Crawling

# Complete crawl
"Crawl all of docs.example.com"
"Crawl company.com with a limit of 100 pages"
"Crawl blog.site.com but exclude /archive/*"

# Focused crawl
"Crawl only the /docs section of example.com"
"Crawl pages matching /api/* on the docs site"
"Crawl the getting-started guides only"

Site Mapping

# Structure discovery
"Map all pages on example.com"
"Show me the site structure of docs.company.io"
"What pages exist on this documentation site?"

# Link analysis
"Find all internal links on this site"
"What external sites does this page link to?"
"Show the navigation structure"

Single Page Scraping

# Content extraction
"Scrape the pricing page from competitor.com"
"Extract the main content from this article"
"Get the product details from this page"

# Specific data
"Scrape contact information from the about page"
"Extract all code examples from this tutorial"
"Get the FAQ content from the support page"

Structured Extraction

# E-commerce
"Crawl the shop and extract product names, prices, and availability"
"Find all products under $50 on this store"

# Real estate
"Extract property listings with addresses and prices"
"Get all rental listings from this site"

# Jobs
"Crawl the careers page and list all open positions"
"Extract job postings with titles and locations"

# Events
"Find all upcoming events with dates and venues"
"Extract conference schedule information"

Analysis Requests

# Content analysis
"Crawl the blog and identify the most common topics"
"Analyze the documentation structure and suggest improvements"
"Find content gaps by comparing to competitor's site"

# SEO analysis
"Crawl the site and check for missing meta descriptions"
"Find pages with duplicate content"
"Analyze internal linking patterns"

# Technical audit
"Check for broken internal links"
"Find pages with slow load indicators"
"Identify missing alt text on images"

Advanced Features

Custom Extraction Schemas

Extract specific structured data:

You: Crawl products.example.com and extract:
     - Product name
     - Price
     - SKU
     - Availability (in stock / out of stock)

Claude: Crawling with custom schema...

        **Extracted Products:**
        
        ```json
        [
          {
            "name": "Widget Pro 2000",
            "price": 149.99,
            "sku": "WP-2000",
            "availability": "in_stock"
          },
          {
            "name": "Gadget Plus",
            "price": 79.99,
            "sku": "GP-100",
            "availability": "out_of_stock"
          }
          ...
        ]

Extracted 45 products with complete data.


### Crawl Limits and Filters

Control crawl scope:

```text
You: Crawl docs.example.com but:
     - Limit to 50 pages max
     - Only include /guides and /api sections
     - Exclude any /deprecated pages

Claude: Crawling with filters...

        Configuration:
        - Max pages: 50
        - Include: /guides/*, /api/*
        - Exclude: /deprecated/*
        
        Found 47 matching pages...

Handling Large Sites

For enterprise sites:

You: This is a large site. Crawl it in sections:
     First, map the structure, then we'll crawl each section

Claude: Good approach for large sites!

        **Step 1: Site Map**
        Main sections found:
        - /docs (estimated 200 pages)
        - /blog (estimated 500 pages)
        - /api (estimated 150 pages)
        - /help (estimated 100 pages)
        
        Total estimated: ~950 pages
        
        Which section should I crawl first?

Combining with Other MCPs

Firecrawl + Filesystem

Save crawled content:

You: Crawl the documentation and save each page as markdown 
     in my docs folder

Claude: [Uses Firecrawl to crawl docs]
        [Uses Filesystem to save files]
        
        Saved 47 documentation pages:
        /docs/external/
        ├── getting-started.md
        ├── configuration.md
        ├── api-reference/
        │   ├── authentication.md
        │   └── endpoints.md
        ...

Firecrawl + Memory

Remember site structures:

You: Crawl the competitor's site and remember how it's organized

Claude: [Uses Firecrawl to analyze site]
        [Uses Memory to store structure]
        
        Memorized competitor site structure:
        - Product tiers: Free, Pro, Enterprise
        - Content sections: Blog, Docs, Resources
        - Key differentiators: Real-time sync, API access
        
        I'll remember this for comparisons!

Firecrawl + Notion

Store findings in workspace:

You: Crawl the documentation and create a summary page in Notion

Claude: [Uses Firecrawl to extract docs]
        [Uses Notion to create summary page]
        
        Created Notion page: "External API Documentation Summary"
        - 15 endpoints documented
        - Authentication patterns
        - Rate limits and quotas
        - Code examples saved

Troubleshooting

Issue: “API key invalid”

Symptoms: Authentication fails

Solutions:

CauseSolution
Wrong keyVerify copy from Firecrawl dashboard
Key expiredGenerate new key
No key setCheck env variable in config

Issue: “Crawl taking too long”

Symptoms: Timeout or slow progress

Solutions:

  • Add page limit: limit: 50
  • Focus on specific paths
  • Check if site is slow
  • Large sites may need multiple crawls

Issue: “Blocked by site”

Symptoms: Access denied errors

Solutions:

CauseSolution
robots.txt blockingCheck site policies
Rate limitSlow down requests
Bot detectionMay not be possible to crawl
IP blockedContact Firecrawl support

Issue: “Missing content”

Symptoms: Pages not fully extracted

Solutions:

  • JS-heavy content should render (Firecrawl uses headless browser)
  • Check if content is loaded dynamically after delay
  • Login-required content won’t be accessible

Best Practices

Ethical Crawling

✅ Do❌ Don’t
Check robots.txt firstIgnore site policies
Respect rate limitsOverload servers
Crawl public contentScrape personal data
Use for legitimate purposesViolate terms of service

For more on responsible AI tool usage, see the Understanding AI Safety, Ethics, and Limitations guide.

Efficient Usage

PracticeWhy
Start with mapUnderstand site structure first
Set limitsAvoid unnecessary API usage
Filter pathsFocus on needed content
Cache resultsDon’t re-crawl unchanged content

ServerComplements Firecrawl By…
Fetch MCPQuick single-page fetches
Playwright MCPInteractive automation
Filesystem MCPSaving crawled content
Memory MCPRemembering site analysis

Summary

The Firecrawl MCP Server enables comprehensive website crawling:

  • Full site crawling with link following
  • JavaScript rendering for modern sites
  • Structured extraction for specific data
  • Site mapping for structure discovery
  • Fast parallel processing
  • AI-native - designed for LLMs and RAG

2025 Updates (v2.5):

  • Semantic Index - 40% faster, historical data access
  • Agent endpoint - for agentic AI systems
  • Stealth proxies - access difficult sites
  • AI-native search - built for LLMs

Best use cases:

  • Documentation analysis
  • Competitive research
  • Content inventories
  • Site migration prep
  • Structured data extraction

Comparison:

  • Firecrawl: Whole sites, structured data
  • Fetch: Single pages, simple content
  • Playwright: Interactive, testing

🎉 Phase 3 Complete! Continue to Phase 4 for enterprise integrations.


Questions about Firecrawl MCP? Check firecrawl.dev/docs or the Firecrawl GitHub.

Was this page helpful?

Let us know if you found what you were looking for.