scrapy-impersonate


Namescrapy-impersonate JSON
Version 1.4.0 PyPI version JSON
download
home_pagehttps://github.com/jxlil/scrapy-impersonate
SummaryScrapy download handler that can impersonate browser fingerprints
upload_time2024-08-09 13:51:26
maintainerNone
docs_urlNone
authorJalil SA (jxlil)
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements curl-cffi scrapy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # scrapy-impersonate
[![version](https://img.shields.io/pypi/v/scrapy-impersonate.svg)](https://pypi.python.org/pypi/scrapy-impersonate)

`scrapy-impersonate` is a Scrapy download handler. This project integrates [curl_cffi](https://github.com/yifeikong/curl_cffi) to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.


## Installation

```
pip install scrapy-impersonate
```

## Activation

Replace the default `http` and/or `https` Download Handlers through [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html#download-handlers)

```python
DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
```

Also, be sure to [install the asyncio-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):

```python
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
```

## Basic usage

Set the `impersonate` [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) key to download a request using `curl_cffi`:

```python
import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
    }

    def start_requests(self):
        for browser in ["chrome110", "edge99", "safari15_5"]:
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
                meta={"impersonate": browser},
            )

    def parse(self, response):
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
        # ja3_hash: cd08e31494f9531f560d64c695473da9
        # ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
        yield {"ja3_hash": response.json()["ja3_hash"]}
```

## Supported browsers

The following browsers can be impersonated

| Browser | Version | Build | OS | Name |
| --- | --- | --- | --- | --- |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 99 | 99.0.4844.51 | Windows 10 | `chrome99` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 99 | 99.0.4844.73 | Android 12 | `chrome99_android` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 100 | 100.0.4896.75 | Windows 10 | `chrome100` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 101 | 101.0.4951.67 | Windows 10 | `chrome101` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 104 | 104.0.5112.81 | Windows 10 | `chrome104` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 107 | 107.0.5304.107 | Windows 10 | `chrome107` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 110 | 110.0.5481.177 | Windows 10 | `chrome110` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 116 | 116.0.5845.180 | Windows 10 | `chrome116` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 119 | 119.0.6045.199 | macOS Sonoma | `chrome119` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 120 | 120.0.6099.109 | macOS Sonoma | `chrome120` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 123 | 123.0.6312.124 | macOS Sonoma | `chrome123` |
| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png "Chrome") | 124 | 124.0.6367.60 | macOS Sonoma | `chrome124` |
| ![Edge](https://raw.githubusercontent.com/alrra/browser-logos/main/src/edge/edge_24x24.png "Edge") | 99 | 99.0.1150.30 | Windows 10 | `edge99` |
| ![Edge](https://raw.githubusercontent.com/alrra/browser-logos/main/src/edge/edge_24x24.png "Edge") | 101 | 101.0.1210.47 | Windows 10 | `edge101` |
| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png "Safari") | 15.3 | 16612.4.9.1.8 | MacOS Big Sur | `safari15_3` |
| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png "Safari") | 15.5 | 17613.2.7.1.8 | MacOS Monterey | `safari15_5` |
| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png "Safari") | 17.0 | unclear | MacOS Sonoma | `safari17_0` |
| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png "Safari") | 17.2 | unclear | iOS 17.2 | `safari17_2_ios` |

## Thanks

This project is inspired by the following projects:

+ [curl_cffi](https://github.com/yifeikong/curl_cffi) - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
+ [curl-impersonate](https://github.com/lwthiker/curl-impersonate) - A special build of curl that can impersonate Chrome & Firefox
+ [scrapy-playwright](https://github.com/scrapy-plugins/scrapy-playwright) - Playwright integration for Scrapy

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jxlil/scrapy-impersonate",
    "name": "scrapy-impersonate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Jalil SA (jxlil)",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8e/f9/a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6/scrapy_impersonate-1.4.0.tar.gz",
    "platform": null,
    "description": "# scrapy-impersonate\n[![version](https://img.shields.io/pypi/v/scrapy-impersonate.svg)](https://pypi.python.org/pypi/scrapy-impersonate)\n\n`scrapy-impersonate` is a Scrapy download handler. This project integrates [curl_cffi](https://github.com/yifeikong/curl_cffi) to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.\n\n\n## Installation\n\n```\npip install scrapy-impersonate\n```\n\n## Activation\n\nReplace the default `http` and/or `https` Download Handlers through [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html#download-handlers)\n\n```python\nDOWNLOAD_HANDLERS = {\n    \"http\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n    \"https\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n}\n```\n\nAlso, be sure to [install the asyncio-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):\n\n```python\nTWISTED_REACTOR = \"twisted.internet.asyncioreactor.AsyncioSelectorReactor\"\n```\n\n## Basic usage\n\nSet the `impersonate` [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) key to download a request using `curl_cffi`:\n\n```python\nimport scrapy\n\n\nclass ImpersonateSpider(scrapy.Spider):\n    name = \"impersonate_spider\"\n    custom_settings = {\n        \"DOWNLOAD_HANDLERS\": {\n            \"http\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n            \"https\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n        },\n        \"TWISTED_REACTOR\": \"twisted.internet.asyncioreactor.AsyncioSelectorReactor\",\n    }\n\n    def start_requests(self):\n        for browser in [\"chrome110\", \"edge99\", \"safari15_5\"]:\n            yield scrapy.Request(\n                \"https://tls.browserleaks.com/json\",\n                dont_filter=True,\n                meta={\"impersonate\": browser},\n            )\n\n    def parse(self, response):\n        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37\n        # ja3_hash: cd08e31494f9531f560d64c695473da9\n        # ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9\n        yield {\"ja3_hash\": response.json()[\"ja3_hash\"]}\n```\n\n## Supported browsers\n\nThe following browsers can be impersonated\n\n| Browser | Version | Build | OS | Name |\n| --- | --- | --- | --- | --- |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 99 | 99.0.4844.51 | Windows 10 | `chrome99` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 99 | 99.0.4844.73 | Android 12 | `chrome99_android` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 100 | 100.0.4896.75 | Windows 10 | `chrome100` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 101 | 101.0.4951.67 | Windows 10 | `chrome101` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 104 | 104.0.5112.81 | Windows 10 | `chrome104` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 107 | 107.0.5304.107 | Windows 10 | `chrome107` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 110 | 110.0.5481.177 | Windows 10 | `chrome110` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 116 | 116.0.5845.180 | Windows 10 | `chrome116` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 119 | 119.0.6045.199 | macOS Sonoma | `chrome119` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 120 | 120.0.6099.109 | macOS Sonoma | `chrome120` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 123 | 123.0.6312.124 | macOS Sonoma | `chrome123` |\n| ![Chrome](https://raw.githubusercontent.com/alrra/browser-logos/main/src/chrome/chrome_24x24.png \"Chrome\") | 124 | 124.0.6367.60 | macOS Sonoma | `chrome124` |\n| ![Edge](https://raw.githubusercontent.com/alrra/browser-logos/main/src/edge/edge_24x24.png \"Edge\") | 99 | 99.0.1150.30 | Windows 10 | `edge99` |\n| ![Edge](https://raw.githubusercontent.com/alrra/browser-logos/main/src/edge/edge_24x24.png \"Edge\") | 101 | 101.0.1210.47 | Windows 10 | `edge101` |\n| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png \"Safari\") | 15.3 | 16612.4.9.1.8 | MacOS Big Sur | `safari15_3` |\n| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png \"Safari\") | 15.5 | 17613.2.7.1.8 | MacOS Monterey | `safari15_5` |\n| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png \"Safari\") | 17.0 | unclear | MacOS Sonoma | `safari17_0` |\n| ![Safari](https://github.com/alrra/browser-logos/blob/main/src/safari/safari_24x24.png \"Safari\") | 17.2 | unclear | iOS 17.2 | `safari17_2_ios` |\n\n## Thanks\n\nThis project is inspired by the following projects:\n\n+ [curl_cffi](https://github.com/yifeikong/curl_cffi) - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.\n+ [curl-impersonate](https://github.com/lwthiker/curl-impersonate) - A special build of curl that can impersonate Chrome & Firefox\n+ [scrapy-playwright](https://github.com/scrapy-plugins/scrapy-playwright) - Playwright integration for Scrapy\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Scrapy download handler that can impersonate browser fingerprints",
    "version": "1.4.0",
    "project_urls": {
        "Homepage": "https://github.com/jxlil/scrapy-impersonate"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b23f5bb806cbc6960ebdcf40edba545c717237420745f7128a99c1d137514f30",
                "md5": "d4187f8726c58afbc95f9bd59ab28e07",
                "sha256": "917caaf1c006f04a3d94d8d30e29c24aeac5d597b342f438ae4c470769202d39"
            },
            "downloads": -1,
            "filename": "scrapy_impersonate-1.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d4187f8726c58afbc95f9bd59ab28e07",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 5847,
            "upload_time": "2024-08-09T13:51:25",
            "upload_time_iso_8601": "2024-08-09T13:51:25.635960Z",
            "url": "https://files.pythonhosted.org/packages/b2/3f/5bb806cbc6960ebdcf40edba545c717237420745f7128a99c1d137514f30/scrapy_impersonate-1.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ef9a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6",
                "md5": "ef0d0262b6bcc6c29627fc307b1c5209",
                "sha256": "29df3452ef7aac7302d246cd208de226cd180548ce7eda09da6a344351dd8148"
            },
            "downloads": -1,
            "filename": "scrapy_impersonate-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ef0d0262b6bcc6c29627fc307b1c5209",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 5217,
            "upload_time": "2024-08-09T13:51:26",
            "upload_time_iso_8601": "2024-08-09T13:51:26.719411Z",
            "url": "https://files.pythonhosted.org/packages/8e/f9/a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6/scrapy_impersonate-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 13:51:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jxlil",
    "github_project": "scrapy-impersonate",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "curl-cffi",
            "specs": [
                [
                    ">=",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "scrapy",
            "specs": [
                [
                    ">=",
                    "2.10.1"
                ]
            ]
        }
    ],
    "lcname": "scrapy-impersonate"
}
        
Elapsed time: 6.05308s