firecrawl


Namefirecrawl JSON
Version 4.3.6 PyPI version JSON
download
home_pagehttps://github.com/firecrawl/firecrawl
SummaryPython SDK for Firecrawl API
upload_time2025-09-07 19:09:40
maintainerNone
docs_urlNone
authorMendable.ai
requires_python>=3.8
licenseMIT License
keywords sdk api firecrawl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Firecrawl Python SDK

The Firecrawl Python SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.

## Installation

To install the Firecrawl Python SDK, you can use pip:

```bash 
pip install firecrawl-py
```

## Usage

1. Get an API key from [firecrawl.dev](https://firecrawl.dev)
2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `Firecrawl` class.

Here's an example of how to use the SDK:

```python 
from firecrawl import Firecrawl
from firecrawl.types import ScrapeOptions

firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")

# Scrape a website (v2):
data = firecrawl.scrape(
  'https://firecrawl.dev', 
  formats=['markdown', 'html']
)
print(data)

# Crawl a website (v2 waiter):
crawl_status = firecrawl.crawl(
  'https://firecrawl.dev', 
  limit=100, 
  scrape_options=ScrapeOptions(formats=['markdown', 'html'])
)
print(crawl_status)
```

### Scraping a URL

To scrape a single URL, use the `scrape` method. It takes the URL as a parameter and returns a document with the requested formats.

```python 
# Scrape a website (v2):
scrape_result = firecrawl.scrape('https://firecrawl.dev', formats=['markdown', 'html'])
print(scrape_result)
```

### Crawling a Website

To crawl a website, use the `crawl` method. It takes the starting URL and optional parameters as arguments. You can control depth, limits, formats, and more.

```python 
crawl_status = firecrawl.crawl(
  'https://firecrawl.dev', 
  limit=100, 
  scrape_options=ScrapeOptions(formats=['markdown', 'html']),
  poll_interval=30
)
print(crawl_status)
```

### Asynchronous Crawling

<Tip>Looking for async operations? Check out the [Async Class](#async-class) section below.</Tip>

To enqueue a crawl asynchronously, use `start_crawl`. It returns the crawl `ID` which you can use to check the status of the crawl job.

```python 
crawl_job = firecrawl.start_crawl(
  'https://firecrawl.dev', 
  limit=100, 
  scrape_options=ScrapeOptions(formats=['markdown', 'html']),
)
print(crawl_job)
```

### Checking Crawl Status

To check the status of a crawl job, use the `get_crawl_status` method. It takes the job ID as a parameter and returns the current status of the crawl job.

```python 
crawl_status = firecrawl.get_crawl_status("<crawl_id>")
print(crawl_status)
```

### Cancelling a Crawl

To cancel an asynchronous crawl job, use the `cancel_crawl` method. It takes the job ID of the asynchronous crawl as a parameter and returns the cancellation status.

```python 
cancel_crawl = firecrawl.cancel_crawl(id)
print(cancel_crawl)
```

### Map a Website

Use `map` to generate a list of URLs from a website. Options let you customize the mapping process, including whether to use the sitemap or include subdomains.

```python 
# Map a website (v2):
map_result = firecrawl.map('https://firecrawl.dev')
print(map_result)
```

{/* ### Extracting Structured Data from Websites

  To extract structured data from websites, use the `extract` method. It takes the URLs to extract data from, a prompt, and a schema as arguments. The schema is a Pydantic model that defines the structure of the extracted data.

  <ExtractPythonShort /> */}

### Crawling a Website with WebSockets

To crawl a website with WebSockets, use the `crawl_url_and_watch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.

```python 
# inside an async function...
nest_asyncio.apply()

# Define event handlers
def on_document(detail):
    print("DOC", detail)

def on_error(detail):
    print("ERR", detail['error'])

def on_done(detail):
    print("DONE", detail['status'])

    # Function to start the crawl and watch process
async def start_crawl_and_watch():
    # Initiate the crawl job and get the watcher
    watcher = app.crawl_url_and_watch('firecrawl.dev', exclude_paths=['blog/*'], limit=5)

    # Add event listeners
    watcher.add_event_listener("document", on_document)
    watcher.add_event_listener("error", on_error)
    watcher.add_event_listener("done", on_done)

    # Start the watcher
    await watcher.connect()

# Run the event loop
await start_crawl_and_watch()
```

## Error Handling

The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.

## Async Class

For async operations, you can use the `AsyncFirecrawl` class. Its methods mirror the `Firecrawl` class, but you `await` them.

```python 
from firecrawl import AsyncFirecrawl

firecrawl = AsyncFirecrawl(api_key="YOUR_API_KEY")

# Async Scrape (v2)
async def example_scrape():
  scrape_result = await firecrawl.scrape(url="https://example.com")
  print(scrape_result)

# Async Crawl (v2)
async def example_crawl():
  crawl_result = await firecrawl.crawl(url="https://example.com")
  print(crawl_result)
```

## v1 compatibility

For legacy code paths, v1 remains available under `firecrawl.v1` with the original method names.

```python
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="YOUR_API_KEY")

# v1 methods (feature‑frozen)
doc_v1 = firecrawl.v1.scrape_url('https://firecrawl.dev', formats=['markdown', 'html'])
crawl_v1 = firecrawl.v1.crawl_url('https://firecrawl.dev', limit=100)
map_v1 = firecrawl.v1.map_url('https://firecrawl.dev')
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/firecrawl/firecrawl",
    "name": "firecrawl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "\"Mendable.ai\" <nick@mendable.ai>",
    "keywords": "SDK, API, firecrawl",
    "author": "Mendable.ai",
    "author_email": "\"Mendable.ai\" <nick@mendable.ai>",
    "download_url": "https://files.pythonhosted.org/packages/a8/0e/7e818f3c53c48a8dce157d8a525d960d6f68819e9e35354be88126fa0b8b/firecrawl-4.3.6.tar.gz",
    "platform": null,
    "description": "# Firecrawl Python SDK\n\nThe Firecrawl Python SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.\n\n## Installation\n\nTo install the Firecrawl Python SDK, you can use pip:\n\n```bash \npip install firecrawl-py\n```\n\n## Usage\n\n1. Get an API key from [firecrawl.dev](https://firecrawl.dev)\n2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `Firecrawl` class.\n\nHere's an example of how to use the SDK:\n\n```python \nfrom firecrawl import Firecrawl\nfrom firecrawl.types import ScrapeOptions\n\nfirecrawl = Firecrawl(api_key=\"fc-YOUR_API_KEY\")\n\n# Scrape a website (v2):\ndata = firecrawl.scrape(\n  'https://firecrawl.dev', \n  formats=['markdown', 'html']\n)\nprint(data)\n\n# Crawl a website (v2 waiter):\ncrawl_status = firecrawl.crawl(\n  'https://firecrawl.dev', \n  limit=100, \n  scrape_options=ScrapeOptions(formats=['markdown', 'html'])\n)\nprint(crawl_status)\n```\n\n### Scraping a URL\n\nTo scrape a single URL, use the `scrape` method. It takes the URL as a parameter and returns a document with the requested formats.\n\n```python \n# Scrape a website (v2):\nscrape_result = firecrawl.scrape('https://firecrawl.dev', formats=['markdown', 'html'])\nprint(scrape_result)\n```\n\n### Crawling a Website\n\nTo crawl a website, use the `crawl` method. It takes the starting URL and optional parameters as arguments. You can control depth, limits, formats, and more.\n\n```python \ncrawl_status = firecrawl.crawl(\n  'https://firecrawl.dev', \n  limit=100, \n  scrape_options=ScrapeOptions(formats=['markdown', 'html']),\n  poll_interval=30\n)\nprint(crawl_status)\n```\n\n### Asynchronous Crawling\n\n<Tip>Looking for async operations? Check out the [Async Class](#async-class) section below.</Tip>\n\nTo enqueue a crawl asynchronously, use `start_crawl`. It returns the crawl `ID` which you can use to check the status of the crawl job.\n\n```python \ncrawl_job = firecrawl.start_crawl(\n  'https://firecrawl.dev', \n  limit=100, \n  scrape_options=ScrapeOptions(formats=['markdown', 'html']),\n)\nprint(crawl_job)\n```\n\n### Checking Crawl Status\n\nTo check the status of a crawl job, use the `get_crawl_status` method. It takes the job ID as a parameter and returns the current status of the crawl job.\n\n```python \ncrawl_status = firecrawl.get_crawl_status(\"<crawl_id>\")\nprint(crawl_status)\n```\n\n### Cancelling a Crawl\n\nTo cancel an asynchronous crawl job, use the `cancel_crawl` method. It takes the job ID of the asynchronous crawl as a parameter and returns the cancellation status.\n\n```python \ncancel_crawl = firecrawl.cancel_crawl(id)\nprint(cancel_crawl)\n```\n\n### Map a Website\n\nUse `map` to generate a list of URLs from a website. Options let you customize the mapping process, including whether to use the sitemap or include subdomains.\n\n```python \n# Map a website (v2):\nmap_result = firecrawl.map('https://firecrawl.dev')\nprint(map_result)\n```\n\n{/* ### Extracting Structured Data from Websites\n\n  To extract structured data from websites, use the `extract` method. It takes the URLs to extract data from, a prompt, and a schema as arguments. The schema is a Pydantic model that defines the structure of the extracted data.\n\n  <ExtractPythonShort /> */}\n\n### Crawling a Website with WebSockets\n\nTo crawl a website with WebSockets, use the `crawl_url_and_watch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.\n\n```python \n# inside an async function...\nnest_asyncio.apply()\n\n# Define event handlers\ndef on_document(detail):\n    print(\"DOC\", detail)\n\ndef on_error(detail):\n    print(\"ERR\", detail['error'])\n\ndef on_done(detail):\n    print(\"DONE\", detail['status'])\n\n    # Function to start the crawl and watch process\nasync def start_crawl_and_watch():\n    # Initiate the crawl job and get the watcher\n    watcher = app.crawl_url_and_watch('firecrawl.dev', exclude_paths=['blog/*'], limit=5)\n\n    # Add event listeners\n    watcher.add_event_listener(\"document\", on_document)\n    watcher.add_event_listener(\"error\", on_error)\n    watcher.add_event_listener(\"done\", on_done)\n\n    # Start the watcher\n    await watcher.connect()\n\n# Run the event loop\nawait start_crawl_and_watch()\n```\n\n## Error Handling\n\nThe SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.\n\n## Async Class\n\nFor async operations, you can use the `AsyncFirecrawl` class. Its methods mirror the `Firecrawl` class, but you `await` them.\n\n```python \nfrom firecrawl import AsyncFirecrawl\n\nfirecrawl = AsyncFirecrawl(api_key=\"YOUR_API_KEY\")\n\n# Async Scrape (v2)\nasync def example_scrape():\n  scrape_result = await firecrawl.scrape(url=\"https://example.com\")\n  print(scrape_result)\n\n# Async Crawl (v2)\nasync def example_crawl():\n  crawl_result = await firecrawl.crawl(url=\"https://example.com\")\n  print(crawl_result)\n```\n\n## v1 compatibility\n\nFor legacy code paths, v1 remains available under `firecrawl.v1` with the original method names.\n\n```python\nfrom firecrawl import Firecrawl\n\nfirecrawl = Firecrawl(api_key=\"YOUR_API_KEY\")\n\n# v1 methods (feature\u2011frozen)\ndoc_v1 = firecrawl.v1.scrape_url('https://firecrawl.dev', formats=['markdown', 'html'])\ncrawl_v1 = firecrawl.v1.crawl_url('https://firecrawl.dev', limit=100)\nmap_v1 = firecrawl.v1.map_url('https://firecrawl.dev')\n```\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Python SDK for Firecrawl API",
    "version": "4.3.6",
    "project_urls": {
        "Documentation": "https://docs.firecrawl.dev",
        "Homepage": "https://github.com/firecrawl/firecrawl",
        "Source": "https://github.com/firecrawl/firecrawl",
        "Tracker": "https://github.com/firecrawl/firecrawl/issues"
    },
    "split_keywords": [
        "sdk",
        " api",
        " firecrawl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d5fbe54d140e221a96bf9340d66374f12fb913f7230aea50f861353663b75f93",
                "md5": "05b17343a3fd57e1169c9952e0ce6a78",
                "sha256": "945291be3c617c461639027037bd5cd617b25500f8b5317e8831102630e41153"
            },
            "downloads": -1,
            "filename": "firecrawl-4.3.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "05b17343a3fd57e1169c9952e0ce6a78",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 168669,
            "upload_time": "2025-09-07T19:09:39",
            "upload_time_iso_8601": "2025-09-07T19:09:39.475422Z",
            "url": "https://files.pythonhosted.org/packages/d5/fb/e54d140e221a96bf9340d66374f12fb913f7230aea50f861353663b75f93/firecrawl-4.3.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a80e7e818f3c53c48a8dce157d8a525d960d6f68819e9e35354be88126fa0b8b",
                "md5": "10c5209205df36165826d92c7ae2b601",
                "sha256": "a4f92452c420fa2d7430396d39b5b43dddb3d27f790cffae8138a5acc371b955"
            },
            "downloads": -1,
            "filename": "firecrawl-4.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "10c5209205df36165826d92c7ae2b601",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 133449,
            "upload_time": "2025-09-07T19:09:40",
            "upload_time_iso_8601": "2025-09-07T19:09:40.852291Z",
            "url": "https://files.pythonhosted.org/packages/a8/0e/7e818f3c53c48a8dce157d8a525d960d6f68819e9e35354be88126fa0b8b/firecrawl-4.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-07 19:09:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "firecrawl",
    "github_project": "firecrawl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "firecrawl"
}
        
Elapsed time: 1.67512s