# Firecrawl Python SDK
The Firecrawl Python SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.
## Installation
To install the Firecrawl Python SDK, you can use pip:
```bash
pip install firecrawl-py
```
## Usage
1. Get an API key from [firecrawl.dev](https://firecrawl.dev)
2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class.
Here's an example of how to use the SDK:
```python
from firecrawl import FirecrawlApp, ScrapeOptions
app = FirecrawlApp(api_key="fc-YOUR_API_KEY")
# Scrape a website:
data = app.scrape_url(
'https://firecrawl.dev',
formats=['markdown', 'html']
)
print(data)
# Crawl a website:
crawl_status = app.crawl_url(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=['markdown', 'html'])
)
print(crawl_status)
```
### Scraping a URL
To scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a dictionary.
```python
# Scrape a website:
scrape_result = app.scrape_url('firecrawl.dev', formats=['markdown', 'html'])
print(scrape_result)
```
### Crawling a Website
To crawl a website, use the `crawl_url` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
```python
crawl_status = app.crawl_url(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=['markdown', 'html']),
poll_interval=30
)
print(crawl_status)
```
### Asynchronous Crawling
<Tip>Looking for async operations? Check out the [Async Class](#async-class) section below.</Tip>
To crawl a website asynchronously, use the `crawl_url_async` method. It returns the crawl `ID` which you can use to check the status of the crawl job. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
```python
crawl_status = app.async_crawl_url(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=['markdown', 'html']),
)
print(crawl_status)
```
### Checking Crawl Status
To check the status of a crawl job, use the `check_crawl_status` method. It takes the job ID as a parameter and returns the current status of the crawl job.
```python
crawl_status = app.check_crawl_status("<crawl_id>")
print(crawl_status)
```
### Cancelling a Crawl
To cancel an asynchronous crawl job, use the `cancel_crawl` method. It takes the job ID of the asynchronous crawl as a parameter and returns the cancellation status.
```python
cancel_crawl = app.cancel_crawl(id)
print(cancel_crawl)
```
### Map a Website
Use `map_url` to generate a list of URLs from a website. The `params` argument let you customize the mapping process, including options to exclude subdomains or to utilize the sitemap.
```python
# Map a website:
map_result = app.map_url('https://firecrawl.dev')
print(map_result)
```
{/* ### Extracting Structured Data from Websites
To extract structured data from websites, use the `extract` method. It takes the URLs to extract data from, a prompt, and a schema as arguments. The schema is a Pydantic model that defines the structure of the extracted data.
<ExtractPythonShort /> */}
### Crawling a Website with WebSockets
To crawl a website with WebSockets, use the `crawl_url_and_watch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
```python
# inside an async function...
nest_asyncio.apply()
# Define event handlers
def on_document(detail):
print("DOC", detail)
def on_error(detail):
print("ERR", detail['error'])
def on_done(detail):
print("DONE", detail['status'])
# Function to start the crawl and watch process
async def start_crawl_and_watch():
# Initiate the crawl job and get the watcher
watcher = app.crawl_url_and_watch('firecrawl.dev', exclude_paths=['blog/*'], limit=5)
# Add event listeners
watcher.add_event_listener("document", on_document)
watcher.add_event_listener("error", on_error)
watcher.add_event_listener("done", on_done)
# Start the watcher
await watcher.connect()
# Run the event loop
await start_crawl_and_watch()
```
## Error Handling
The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.
## Async Class
For async operations, you can use the `AsyncFirecrawlApp` class. Its methods are the same as the `FirecrawlApp` class, but they don't block the main thread.
```python
from firecrawl import AsyncFirecrawlApp
app = AsyncFirecrawlApp(api_key="YOUR_API_KEY")
# Async Scrape
async def example_scrape():
scrape_result = await app.scrape_url(url="https://example.com")
print(scrape_result)
# Async Crawl
async def example_crawl():
crawl_result = await app.crawl_url(url="https://example.com")
print(crawl_result)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/mendableai/firecrawl",
"name": "firecrawl-py",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "\"Mendable.ai\" <nick@mendable.ai>",
"keywords": "SDK, API, firecrawl",
"author": "Mendable.ai",
"author_email": "\"Mendable.ai\" <nick@mendable.ai>",
"download_url": "https://files.pythonhosted.org/packages/ac/77/ed004956282f2773263d39f2bfce65a11b4b4d9b881ad5dbb58380f10193/firecrawl_py-2.16.1.tar.gz",
"platform": null,
"description": "# Firecrawl Python SDK\n\nThe Firecrawl Python SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.\n\n## Installation\n\nTo install the Firecrawl Python SDK, you can use pip:\n\n```bash \npip install firecrawl-py\n```\n\n## Usage\n\n1. Get an API key from [firecrawl.dev](https://firecrawl.dev)\n2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class.\n\nHere's an example of how to use the SDK:\n\n```python \nfrom firecrawl import FirecrawlApp, ScrapeOptions\n\napp = FirecrawlApp(api_key=\"fc-YOUR_API_KEY\")\n\n# Scrape a website:\ndata = app.scrape_url(\n 'https://firecrawl.dev', \n formats=['markdown', 'html']\n)\nprint(data)\n\n# Crawl a website:\ncrawl_status = app.crawl_url(\n 'https://firecrawl.dev', \n limit=100, \n scrape_options=ScrapeOptions(formats=['markdown', 'html'])\n)\nprint(crawl_status)\n```\n\n### Scraping a URL\n\nTo scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a dictionary.\n\n```python \n# Scrape a website:\nscrape_result = app.scrape_url('firecrawl.dev', formats=['markdown', 'html'])\nprint(scrape_result)\n```\n\n### Crawling a Website\n\nTo crawl a website, use the `crawl_url` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.\n\n```python \ncrawl_status = app.crawl_url(\n 'https://firecrawl.dev', \n limit=100, \n scrape_options=ScrapeOptions(formats=['markdown', 'html']),\n poll_interval=30\n)\nprint(crawl_status)\n```\n\n### Asynchronous Crawling\n\n<Tip>Looking for async operations? Check out the [Async Class](#async-class) section below.</Tip>\n\nTo crawl a website asynchronously, use the `crawl_url_async` method. It returns the crawl `ID` which you can use to check the status of the crawl job. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.\n\n```python \ncrawl_status = app.async_crawl_url(\n 'https://firecrawl.dev', \n limit=100, \n scrape_options=ScrapeOptions(formats=['markdown', 'html']),\n)\nprint(crawl_status)\n```\n\n### Checking Crawl Status\n\nTo check the status of a crawl job, use the `check_crawl_status` method. It takes the job ID as a parameter and returns the current status of the crawl job.\n\n```python \ncrawl_status = app.check_crawl_status(\"<crawl_id>\")\nprint(crawl_status)\n```\n\n### Cancelling a Crawl\n\nTo cancel an asynchronous crawl job, use the `cancel_crawl` method. It takes the job ID of the asynchronous crawl as a parameter and returns the cancellation status.\n\n```python \ncancel_crawl = app.cancel_crawl(id)\nprint(cancel_crawl)\n```\n\n### Map a Website\n\nUse `map_url` to generate a list of URLs from a website. The `params` argument let you customize the mapping process, including options to exclude subdomains or to utilize the sitemap.\n\n```python \n# Map a website:\nmap_result = app.map_url('https://firecrawl.dev')\nprint(map_result)\n```\n\n{/* ### Extracting Structured Data from Websites\n\n To extract structured data from websites, use the `extract` method. It takes the URLs to extract data from, a prompt, and a schema as arguments. The schema is a Pydantic model that defines the structure of the extracted data.\n\n <ExtractPythonShort /> */}\n\n### Crawling a Website with WebSockets\n\nTo crawl a website with WebSockets, use the `crawl_url_and_watch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.\n\n```python \n# inside an async function...\nnest_asyncio.apply()\n\n# Define event handlers\ndef on_document(detail):\n print(\"DOC\", detail)\n\ndef on_error(detail):\n print(\"ERR\", detail['error'])\n\ndef on_done(detail):\n print(\"DONE\", detail['status'])\n\n # Function to start the crawl and watch process\nasync def start_crawl_and_watch():\n # Initiate the crawl job and get the watcher\n watcher = app.crawl_url_and_watch('firecrawl.dev', exclude_paths=['blog/*'], limit=5)\n\n # Add event listeners\n watcher.add_event_listener(\"document\", on_document)\n watcher.add_event_listener(\"error\", on_error)\n watcher.add_event_listener(\"done\", on_done)\n\n # Start the watcher\n await watcher.connect()\n\n# Run the event loop\nawait start_crawl_and_watch()\n```\n\n## Error Handling\n\nThe SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.\n\n## Async Class\n\nFor async operations, you can use the `AsyncFirecrawlApp` class. Its methods are the same as the `FirecrawlApp` class, but they don't block the main thread.\n\n```python \nfrom firecrawl import AsyncFirecrawlApp\n\napp = AsyncFirecrawlApp(api_key=\"YOUR_API_KEY\")\n\n# Async Scrape\nasync def example_scrape():\n scrape_result = await app.scrape_url(url=\"https://example.com\")\n print(scrape_result)\n\n# Async Crawl\nasync def example_crawl():\n crawl_result = await app.crawl_url(url=\"https://example.com\")\n print(crawl_result)\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Python SDK for Firecrawl API",
"version": "2.16.1",
"project_urls": {
"Documentation": "https://docs.firecrawl.dev",
"Homepage": "https://github.com/mendableai/firecrawl",
"Source": "https://github.com/mendableai/firecrawl",
"Tracker": "https://github.com/mendableai/firecrawl/issues"
},
"split_keywords": [
"sdk",
" api",
" firecrawl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca8e090612e3d603542e7ecea1e9259216f1d54a5cbe1b1d3b1b38fa0f4b88a3",
"md5": "c01f067b23673a3d9dda00ec0dd17e31",
"sha256": "b7509c1f5824eeaa6b8120245f85402c2aba19ee1e7a6de100745661699f3480"
},
"downloads": -1,
"filename": "firecrawl_py-2.16.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c01f067b23673a3d9dda00ec0dd17e31",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39874,
"upload_time": "2025-07-15T18:19:30",
"upload_time_iso_8601": "2025-07-15T18:19:30.796044Z",
"url": "https://files.pythonhosted.org/packages/ca/8e/090612e3d603542e7ecea1e9259216f1d54a5cbe1b1d3b1b38fa0f4b88a3/firecrawl_py-2.16.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ac77ed004956282f2773263d39f2bfce65a11b4b4d9b881ad5dbb58380f10193",
"md5": "af5c2cee01ff58e36db79e72a923a179",
"sha256": "c7a380ae9fbddbf6db0815918113fd4bfaf16fefcae942fdad3ccad7369dfa82"
},
"downloads": -1,
"filename": "firecrawl_py-2.16.1.tar.gz",
"has_sig": false,
"md5_digest": "af5c2cee01ff58e36db79e72a923a179",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39877,
"upload_time": "2025-07-15T18:19:31",
"upload_time_iso_8601": "2025-07-15T18:19:31.945426Z",
"url": "https://files.pythonhosted.org/packages/ac/77/ed004956282f2773263d39f2bfce65a11b4b4d9b881ad5dbb58380f10193/firecrawl_py-2.16.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 18:19:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mendableai",
"github_project": "firecrawl",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "firecrawl-py"
}