scrapy-tls-client


Namescrapy-tls-client JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/dylankeepon/TlsClientMiddleware.git
Summarytls client downloader middleware for scrapy, send request by tls client.
upload_time2023-09-13 06:50:23
maintainer
docs_urlNone
authorDylan Chen
requires_python>=3.7.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrapy Tls Client Downloader Middleware

This package will make scrapy support tls_client. Everything is same with tls_client, but needed 
to specify in settings.py.

## Installation

```shell script
pip3 install scrapy-tls-client
```

you also need to enable `TlsClientDownloaderMiddleware` in `DOWNLOADER_MIDDLEWARES`:

```python
DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
}
```

Be Attention, you must specify User-Agent, Otherwise all request gonna be blocked by Cloudflare if there is detection, 

and compression error may occured. For request with headers, just specify headers is ok, 

for the one don't need, close default User-Agent middleware.

```python
DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}
```

Also, if there is any compression error, you can choose to shut down the default HttpCompressionMiddleware.

```python
DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': None
}
```

## Usage

After add this middleware, all requests will be sent by tls_client.

The usage is very simple, for tls client session, just add params in settings.py in scrapy project, 
for request, specify params in meta. 

PLEASE NOTE YOU DO NOT NEED TO SPECIFY ALL PARAMS SHOWS BELOW, JUST SPECIFY REQUIRED.

### Settings for Tls_Client Session

For the preset usage of tls_client:

```python
CLIENT_IDENTIFIER = 'chrome_112'
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
```

or

```python
RANDOM_CHROME_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
```

```python
RANDOM_APP_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
```

For the custom usage:

```python
JA3_STRING = '771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0'
H2_SETTINGS = {
    "HEADER_TABLE_SIZE": 65536,
    "MAX_CONCURRENT_STREAMS": 1000,
    "INITIAL_WINDOW_SIZE": 6291456,
    "MAX_HEADER_LIST_SIZE": 262144
}
H2_SETTINGS_ORDER = [
    "HEADER_TABLE_SIZE",
    "MAX_CONCURRENT_STREAMS",
    "INITIAL_WINDOW_SIZE",
    "MAX_HEADER_LIST_SIZE"
]
SUPPORTED_SIGNATURE_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_DELEGATED_CREDENTIALS_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_VERSIONS = [
    "GREASE",
    "1.3",
    "1.2"
]
KEY_SHARE_CURVES = [
    "GREASE",
    "X25519"
]
CERT_COMPRESSION_ALGO = 'brotli'
ADDITIONAL_DECODE = 'gzip'
PSEUDO_HEADER_ORDER = [
    ":method",
    ":authority",
    ":scheme",
    ":path"
]
CONNECTION_FLOW = 15663105
PRIORITY_FRAMES = [
  {
    "streamID": 3,
    "priorityParam": {
      "weight": 201,
      "streamDep": 0,
      "exclusive": False
    }
  },
  {
    "streamID": 5,
    "priorityParam": {
      "weight": 101,
      "streamDep": False,
      "exclusive": 0
    }
  }
]
HEADER_ORDER = [
        "accept",
        "user-agent",
        "accept-encoding",
        "accept-language"
    ]
HEADER_PRIORITY = {
  "streamDep": 1,
  "exclusive": True,
  "weight": 1
}
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
```

### Settings for Request

```python
import json

params = {
    'key1': 'value1',
    'key2': 'value2',
}
data = {
    'key1': 'value1',
    'key2': 'value2',
}
# turn cookie jar into dict, and remove the " mark, use ' mark
cookies = {
    'key1': 'value1',
    'key2': 'value2',
}
payload = {
    'key1': 'value1',
    'key2': 'value2'
}
proxy_ = 'http://username:password@ip:port' # https also works
or 
proxy_ = [
    'http://username:password@ip:port',
    'http://username:password@ip:port',
] # if the type of proxy is list, every request will get a random proxy in the list
meta_data = {
    'params': params,
    'data': data,
    'cookies': cookies,
    'json': payload,
    'allow_redirects': False,
    'insecure_skip_verify': False,
    'timeout_seconds': 10,
    'proxy_': proxy_
}
or 
meta_data = {
    'params': json.dumps(params),
    'data': json.dumps(data),
    'cookies': json.dumps(cookies),
    'json': json.dumps(payload),
    'allow_redirects': False,
    'insecure_skip_verify': False,
    'timeout_seconds': 10,
    'proxy_': json.dumps(proxy_)
}
yield scrapy.Request(url=url, headers=headers, meta=meta_data)
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dylankeepon/TlsClientMiddleware.git",
    "name": "scrapy-tls-client",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Dylan Chen",
    "author_email": "dylankeep@163.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/27/ee3598d151b7238b6f33fddaf421fb249d235cc68cc2c2a331c5f89b3a0c/scrapy-tls-client-0.0.5.tar.gz",
    "platform": null,
    "description": "# Scrapy Tls Client Downloader Middleware\r\n\r\nThis package will make scrapy support tls_client. Everything is same with tls_client, but needed \r\nto specify in settings.py.\r\n\r\n## Installation\r\n\r\n```shell script\r\npip3 install scrapy-tls-client\r\n```\r\n\r\nyou also need to enable `TlsClientDownloaderMiddleware` in `DOWNLOADER_MIDDLEWARES`:\r\n\r\n```python\r\nDOWNLOADER_MIDDLEWARES = {\r\n    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,\r\n}\r\n```\r\n\r\nBe Attention, you must specify User-Agent, Otherwise all request gonna be blocked by Cloudflare if there is detection, \r\n\r\nand compression error may occured. For request with headers, just specify headers is ok, \r\n\r\nfor the one don't need, close default User-Agent middleware.\r\n\r\n```python\r\nDOWNLOADER_MIDDLEWARES = {\r\n    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,\r\n    \"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware\": None,\r\n}\r\n```\r\n\r\nAlso, if there is any compression error, you can choose to shut down the default HttpCompressionMiddleware.\r\n\r\n```python\r\nDOWNLOADER_MIDDLEWARES = {\r\n    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,\r\n    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': None\r\n}\r\n```\r\n\r\n## Usage\r\n\r\nAfter add this middleware, all requests will be sent by tls_client.\r\n\r\nThe usage is very simple, for tls client session, just add params in settings.py in scrapy project, \r\nfor request, specify params in meta. \r\n\r\nPLEASE NOTE YOU DO NOT NEED TO SPECIFY ALL PARAMS SHOWS BELOW, JUST SPECIFY REQUIRED.\r\n\r\n### Settings for Tls_Client Session\r\n\r\nFor the preset usage of tls_client:\r\n\r\n```python\r\nCLIENT_IDENTIFIER = 'chrome_112'\r\nRANDOM_TLS_EXTENSION_ORDER = True\r\nFORCE_HTTP1 = False #default False\r\nCATCH_PANICS = False #default False\r\nRAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse\r\n```\r\n\r\nor\r\n\r\n```python\r\nRANDOM_CHROME_IDENTIFIER = True\r\nRANDOM_TLS_EXTENSION_ORDER = True\r\nFORCE_HTTP1 = False #default False\r\nCATCH_PANICS = False #default False\r\nRAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse\r\n```\r\n\r\n```python\r\nRANDOM_APP_IDENTIFIER = True\r\nRANDOM_TLS_EXTENSION_ORDER = True\r\nFORCE_HTTP1 = False #default False\r\nCATCH_PANICS = False #default False\r\nRAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse\r\n```\r\n\r\nFor the custom usage:\r\n\r\n```python\r\nJA3_STRING = '771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0'\r\nH2_SETTINGS = {\r\n    \"HEADER_TABLE_SIZE\": 65536,\r\n    \"MAX_CONCURRENT_STREAMS\": 1000,\r\n    \"INITIAL_WINDOW_SIZE\": 6291456,\r\n    \"MAX_HEADER_LIST_SIZE\": 262144\r\n}\r\nH2_SETTINGS_ORDER = [\r\n    \"HEADER_TABLE_SIZE\",\r\n    \"MAX_CONCURRENT_STREAMS\",\r\n    \"INITIAL_WINDOW_SIZE\",\r\n    \"MAX_HEADER_LIST_SIZE\"\r\n]\r\nSUPPORTED_SIGNATURE_ALGORITHMS = [\r\n    \"ECDSAWithP256AndSHA256\",\r\n    \"PSSWithSHA256\",\r\n    \"PKCS1WithSHA256\",\r\n    \"ECDSAWithP384AndSHA384\",\r\n    \"PSSWithSHA384\",\r\n    \"PKCS1WithSHA384\",\r\n    \"PSSWithSHA512\",\r\n    \"PKCS1WithSHA512\",\r\n]\r\nSUPPORTED_DELEGATED_CREDENTIALS_ALGORITHMS = [\r\n    \"ECDSAWithP256AndSHA256\",\r\n    \"PSSWithSHA256\",\r\n    \"PKCS1WithSHA256\",\r\n    \"ECDSAWithP384AndSHA384\",\r\n    \"PSSWithSHA384\",\r\n    \"PKCS1WithSHA384\",\r\n    \"PSSWithSHA512\",\r\n    \"PKCS1WithSHA512\",\r\n]\r\nSUPPORTED_VERSIONS = [\r\n    \"GREASE\",\r\n    \"1.3\",\r\n    \"1.2\"\r\n]\r\nKEY_SHARE_CURVES = [\r\n    \"GREASE\",\r\n    \"X25519\"\r\n]\r\nCERT_COMPRESSION_ALGO = 'brotli'\r\nADDITIONAL_DECODE = 'gzip'\r\nPSEUDO_HEADER_ORDER = [\r\n    \":method\",\r\n    \":authority\",\r\n    \":scheme\",\r\n    \":path\"\r\n]\r\nCONNECTION_FLOW = 15663105\r\nPRIORITY_FRAMES = [\r\n  {\r\n    \"streamID\": 3,\r\n    \"priorityParam\": {\r\n      \"weight\": 201,\r\n      \"streamDep\": 0,\r\n      \"exclusive\": False\r\n    }\r\n  },\r\n  {\r\n    \"streamID\": 5,\r\n    \"priorityParam\": {\r\n      \"weight\": 101,\r\n      \"streamDep\": False,\r\n      \"exclusive\": 0\r\n    }\r\n  }\r\n]\r\nHEADER_ORDER = [\r\n        \"accept\",\r\n        \"user-agent\",\r\n        \"accept-encoding\",\r\n        \"accept-language\"\r\n    ]\r\nHEADER_PRIORITY = {\r\n  \"streamDep\": 1,\r\n  \"exclusive\": True,\r\n  \"weight\": 1\r\n}\r\nFORCE_HTTP1 = False #default False\r\nCATCH_PANICS = False #default False\r\nRAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse\r\n```\r\n\r\n### Settings for Request\r\n\r\n```python\r\nimport json\r\n\r\nparams = {\r\n    'key1': 'value1',\r\n    'key2': 'value2',\r\n}\r\ndata = {\r\n    'key1': 'value1',\r\n    'key2': 'value2',\r\n}\r\n# turn cookie jar into dict, and remove the \" mark, use ' mark\r\ncookies = {\r\n    'key1': 'value1',\r\n    'key2': 'value2',\r\n}\r\npayload = {\r\n    'key1': 'value1',\r\n    'key2': 'value2'\r\n}\r\nproxy_ = 'http://username:password@ip:port' # https also works\r\nor \r\nproxy_ = [\r\n    'http://username:password@ip:port',\r\n    'http://username:password@ip:port',\r\n] # if the type of proxy is list, every request will get a random proxy in the list\r\nmeta_data = {\r\n    'params': params,\r\n    'data': data,\r\n    'cookies': cookies,\r\n    'json': payload,\r\n    'allow_redirects': False,\r\n    'insecure_skip_verify': False,\r\n    'timeout_seconds': 10,\r\n    'proxy_': proxy_\r\n}\r\nor \r\nmeta_data = {\r\n    'params': json.dumps(params),\r\n    'data': json.dumps(data),\r\n    'cookies': json.dumps(cookies),\r\n    'json': json.dumps(payload),\r\n    'allow_redirects': False,\r\n    'insecure_skip_verify': False,\r\n    'timeout_seconds': 10,\r\n    'proxy_': json.dumps(proxy_)\r\n}\r\nyield scrapy.Request(url=url, headers=headers, meta=meta_data)\r\n```\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "tls client downloader middleware for scrapy, send request by tls client.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/dylankeepon/TlsClientMiddleware.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f2d1c32f80fc9bc1779c0a9885ec9f3e755aa8402b5dae6af2da0cc4f57c7dd",
                "md5": "f83621e2ed1c9b4c532ae4558b27187b",
                "sha256": "3f6d621adb7c8d068f7d8f2ccae16cd1e05dab49f363ea30cf78bc4cd0e6c0d7"
            },
            "downloads": -1,
            "filename": "scrapy_tls_client-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f83621e2ed1c9b4c532ae4558b27187b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.0",
            "size": 9536,
            "upload_time": "2023-09-13T06:50:21",
            "upload_time_iso_8601": "2023-09-13T06:50:21.131073Z",
            "url": "https://files.pythonhosted.org/packages/1f/2d/1c32f80fc9bc1779c0a9885ec9f3e755aa8402b5dae6af2da0cc4f57c7dd/scrapy_tls_client-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c427ee3598d151b7238b6f33fddaf421fb249d235cc68cc2c2a331c5f89b3a0c",
                "md5": "35d90333d1051782baa5765caf21a1f7",
                "sha256": "4749233852959a6deef176b13d0c07f0f8adee80fee02002dcd5b9f497d39e2d"
            },
            "downloads": -1,
            "filename": "scrapy-tls-client-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "35d90333d1051782baa5765caf21a1f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.0",
            "size": 10576,
            "upload_time": "2023-09-13T06:50:23",
            "upload_time_iso_8601": "2023-09-13T06:50:23.104529Z",
            "url": "https://files.pythonhosted.org/packages/c4/27/ee3598d151b7238b6f33fddaf421fb249d235cc68cc2c2a331c5f89b3a0c/scrapy-tls-client-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-13 06:50:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dylankeepon",
    "github_project": "TlsClientMiddleware",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "scrapy-tls-client"
}
        
Elapsed time: 2.35806s