# scrapy-impersonate
[](https://pypi.python.org/pypi/scrapy-impersonate)
`scrapy-impersonate` is a Scrapy download handler. This project integrates [curl_cffi](https://github.com/yifeikong/curl_cffi) to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
## Installation
```
pip install scrapy-impersonate
```
## Activation
Replace the default `http` and/or `https` Download Handlers through [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html#download-handlers)
```python
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
```
Also, be sure to [install the asyncio-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):
```python
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
```
## Basic usage
Set the `impersonate` [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) key to download a request using `curl_cffi`:
```python
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
# ja3_hash: cd08e31494f9531f560d64c695473da9
# ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
yield {"ja3_hash": response.json()["ja3_hash"]}
```
## Supported browsers
The following browsers can be impersonated
| Browser | Version | Build | OS | Name |
| --- | --- | --- | --- | --- |
|  | 99 | 99.0.4844.51 | Windows 10 | `chrome99` |
|  | 99 | 99.0.4844.73 | Android 12 | `chrome99_android` |
|  | 100 | 100.0.4896.75 | Windows 10 | `chrome100` |
|  | 101 | 101.0.4951.67 | Windows 10 | `chrome101` |
|  | 104 | 104.0.5112.81 | Windows 10 | `chrome104` |
|  | 107 | 107.0.5304.107 | Windows 10 | `chrome107` |
|  | 110 | 110.0.5481.177 | Windows 10 | `chrome110` |
|  | 116 | 116.0.5845.180 | Windows 10 | `chrome116` |
|  | 119 | 119.0.6045.199 | macOS Sonoma | `chrome119` |
|  | 120 | 120.0.6099.109 | macOS Sonoma | `chrome120` |
|  | 123 | 123.0.6312.124 | macOS Sonoma | `chrome123` |
|  | 124 | 124.0.6367.60 | macOS Sonoma | `chrome124` |
|  | 99 | 99.0.1150.30 | Windows 10 | `edge99` |
|  | 101 | 101.0.1210.47 | Windows 10 | `edge101` |
|  | 15.3 | 16612.4.9.1.8 | MacOS Big Sur | `safari15_3` |
|  | 15.5 | 17613.2.7.1.8 | MacOS Monterey | `safari15_5` |
|  | 17.0 | unclear | MacOS Sonoma | `safari17_0` |
|  | 17.2 | unclear | iOS 17.2 | `safari17_2_ios` |
## Thanks
This project is inspired by the following projects:
+ [curl_cffi](https://github.com/yifeikong/curl_cffi) - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
+ [curl-impersonate](https://github.com/lwthiker/curl-impersonate) - A special build of curl that can impersonate Chrome & Firefox
+ [scrapy-playwright](https://github.com/scrapy-plugins/scrapy-playwright) - Playwright integration for Scrapy
Raw data
{
"_id": null,
"home_page": "https://github.com/jxlil/scrapy-impersonate",
"name": "scrapy-impersonate",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Jalil SA (jxlil)",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/8e/f9/a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6/scrapy_impersonate-1.4.0.tar.gz",
"platform": null,
"description": "# scrapy-impersonate\n[](https://pypi.python.org/pypi/scrapy-impersonate)\n\n`scrapy-impersonate` is a Scrapy download handler. This project integrates [curl_cffi](https://github.com/yifeikong/curl_cffi) to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.\n\n\n## Installation\n\n```\npip install scrapy-impersonate\n```\n\n## Activation\n\nReplace the default `http` and/or `https` Download Handlers through [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html#download-handlers)\n\n```python\nDOWNLOAD_HANDLERS = {\n \"http\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n \"https\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n}\n```\n\nAlso, be sure to [install the asyncio-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):\n\n```python\nTWISTED_REACTOR = \"twisted.internet.asyncioreactor.AsyncioSelectorReactor\"\n```\n\n## Basic usage\n\nSet the `impersonate` [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) key to download a request using `curl_cffi`:\n\n```python\nimport scrapy\n\n\nclass ImpersonateSpider(scrapy.Spider):\n name = \"impersonate_spider\"\n custom_settings = {\n \"DOWNLOAD_HANDLERS\": {\n \"http\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n \"https\": \"scrapy_impersonate.ImpersonateDownloadHandler\",\n },\n \"TWISTED_REACTOR\": \"twisted.internet.asyncioreactor.AsyncioSelectorReactor\",\n }\n\n def start_requests(self):\n for browser in [\"chrome110\", \"edge99\", \"safari15_5\"]:\n yield scrapy.Request(\n \"https://tls.browserleaks.com/json\",\n dont_filter=True,\n meta={\"impersonate\": browser},\n )\n\n def parse(self, response):\n # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37\n # ja3_hash: cd08e31494f9531f560d64c695473da9\n # ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9\n yield {\"ja3_hash\": response.json()[\"ja3_hash\"]}\n```\n\n## Supported browsers\n\nThe following browsers can be impersonated\n\n| Browser | Version | Build | OS | Name |\n| --- | --- | --- | --- | --- |\n|  | 99 | 99.0.4844.51 | Windows 10 | `chrome99` |\n|  | 99 | 99.0.4844.73 | Android 12 | `chrome99_android` |\n|  | 100 | 100.0.4896.75 | Windows 10 | `chrome100` |\n|  | 101 | 101.0.4951.67 | Windows 10 | `chrome101` |\n|  | 104 | 104.0.5112.81 | Windows 10 | `chrome104` |\n|  | 107 | 107.0.5304.107 | Windows 10 | `chrome107` |\n|  | 110 | 110.0.5481.177 | Windows 10 | `chrome110` |\n|  | 116 | 116.0.5845.180 | Windows 10 | `chrome116` |\n|  | 119 | 119.0.6045.199 | macOS Sonoma | `chrome119` |\n|  | 120 | 120.0.6099.109 | macOS Sonoma | `chrome120` |\n|  | 123 | 123.0.6312.124 | macOS Sonoma | `chrome123` |\n|  | 124 | 124.0.6367.60 | macOS Sonoma | `chrome124` |\n|  | 99 | 99.0.1150.30 | Windows 10 | `edge99` |\n|  | 101 | 101.0.1210.47 | Windows 10 | `edge101` |\n|  | 15.3 | 16612.4.9.1.8 | MacOS Big Sur | `safari15_3` |\n|  | 15.5 | 17613.2.7.1.8 | MacOS Monterey | `safari15_5` |\n|  | 17.0 | unclear | MacOS Sonoma | `safari17_0` |\n|  | 17.2 | unclear | iOS 17.2 | `safari17_2_ios` |\n\n## Thanks\n\nThis project is inspired by the following projects:\n\n+ [curl_cffi](https://github.com/yifeikong/curl_cffi) - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.\n+ [curl-impersonate](https://github.com/lwthiker/curl-impersonate) - A special build of curl that can impersonate Chrome & Firefox\n+ [scrapy-playwright](https://github.com/scrapy-plugins/scrapy-playwright) - Playwright integration for Scrapy\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Scrapy download handler that can impersonate browser fingerprints",
"version": "1.4.0",
"project_urls": {
"Homepage": "https://github.com/jxlil/scrapy-impersonate"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b23f5bb806cbc6960ebdcf40edba545c717237420745f7128a99c1d137514f30",
"md5": "d4187f8726c58afbc95f9bd59ab28e07",
"sha256": "917caaf1c006f04a3d94d8d30e29c24aeac5d597b342f438ae4c470769202d39"
},
"downloads": -1,
"filename": "scrapy_impersonate-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d4187f8726c58afbc95f9bd59ab28e07",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5847,
"upload_time": "2024-08-09T13:51:25",
"upload_time_iso_8601": "2024-08-09T13:51:25.635960Z",
"url": "https://files.pythonhosted.org/packages/b2/3f/5bb806cbc6960ebdcf40edba545c717237420745f7128a99c1d137514f30/scrapy_impersonate-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8ef9a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6",
"md5": "ef0d0262b6bcc6c29627fc307b1c5209",
"sha256": "29df3452ef7aac7302d246cd208de226cd180548ce7eda09da6a344351dd8148"
},
"downloads": -1,
"filename": "scrapy_impersonate-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "ef0d0262b6bcc6c29627fc307b1c5209",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 5217,
"upload_time": "2024-08-09T13:51:26",
"upload_time_iso_8601": "2024-08-09T13:51:26.719411Z",
"url": "https://files.pythonhosted.org/packages/8e/f9/a7da1ed4d452a0d12bdbb7a56f2e314d48ff23cdfa4952f9ee85f6937df6/scrapy_impersonate-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-09 13:51:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jxlil",
"github_project": "scrapy-impersonate",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "curl-cffi",
"specs": [
[
">=",
"0.7.0"
]
]
},
{
"name": "scrapy",
"specs": [
[
">=",
"2.10.1"
]
]
}
],
"lcname": "scrapy-impersonate"
}