scrapy-nodriver

Name	scrapy-nodriver JSON
Version	0.0.7 JSON
	download
home_page	https://github.com/Ehsan-U/scrapy-nodriver
Summary	Nodriver integration for Scrapy
upload_time	2024-12-18 02:53:34
maintainer	None
docs_url	None
author	Ehsan U.
requires_python	>=3.8
license	BSD
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # scrapy-nodriver: Nodriver integration for Scrapy
[![version](https://img.shields.io/pypi/v/scrapy-nodriver.svg)](https://pypi.python.org/pypi/scrapy-nodriver)
[![pyversions](https://img.shields.io/pypi/pyversions/scrapy-nodriver.svg)](https://pypi.python.org/pypi/scrapy-nodriver)


A [Scrapy](https://github.com/scrapy/scrapy) Download Handler which performs requests using
[Nodriver](https://github.com/ultrafunkamsterdam/nodriver).
It can be used to handle pages that require JavaScript (among other things),
while adhering to the regular Scrapy workflow (i.e. without interfering
with request scheduling, item processing, etc).


What makes this package different from package like [Scrapy-Playwright](https://github.com/scrapy-plugins/scrapy-playwright), is the optimization to stay undetected for most anti-bot solutions.
[CDP](https://chromedevtools.github.io/devtools-protocol/) communication provides even better resistance against web applicatinon firewalls (WAF’s), while performance gets a massive boost.



## Requirements

After the release of [version 2.0](https://docs.scrapy.org/en/latest/news.html#scrapy-2-0-0-2020-03-03),
which includes [coroutine syntax support](https://docs.scrapy.org/en/2.0/topics/coroutines.html)
and [asyncio support](https://docs.scrapy.org/en/2.0/topics/asyncio.html), Scrapy allows
to integrate `asyncio`-based projects such as `Nodriver`.<br>
Note: Chrome must be installed on the system.

### Minimum required versions

* Python >= 3.8
* Scrapy >= 2.0 (!= 2.4.0)


## Installation

`scrapy-nodriver` is available on PyPI and can be installed with `pip`:

```
pip install scrapy-nodriver
```

`nodriver` is defined as a dependency so it gets installed automatically,



## Activation

### Download handler

Replace the default `http` and/or `https` Download Handlers through
[`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html):

```python
# settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_nodriver.handler.ScrapyNodriverDownloadHandler",
    "https": "scrapy_nodriver.handler.ScrapyNodriverDownloadHandler",
}
```

Note that the `ScrapyNodriverDownloadHandler` class inherits from the default
`http/https` handler. Unless explicitly marked (see [Basic usage](#basic-usage)),
requests will be processed by the regular Scrapy download handler.


### Twisted reactor

[Install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):

```python
# settings.py
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
```

This is the default in new projects since [Scrapy 2.7](https://github.com/scrapy/scrapy/releases/tag/2.7.0).


## Basic usage

Set the [`nodriver`](#nodriver) [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta)
key to download a request using Nodriver:

```python
import scrapy

class AwesomeSpider(scrapy.Spider):
    name = "awesome"

    def start_requests(self):
        # GET request
        yield scrapy.Request("https://httpbin.org/get", meta={"nodriver": True})

    def parse(self, response, **kwargs):
        # 'response' contains the page as seen by the browser
        return {"url": response.url}
```


### `NODRIVER_MAX_CONCURRENT_PAGES`
Type `Optional[int]`, defaults to the value of Scrapy's `CONCURRENT_REQUESTS` setting

Maximum amount of allowed concurrent Nodriver pages.

```python
NODRIVER_MAX_CONCURRENT_PAGES = 8
```

### `NODRIVER_BLOCKED_URLS`
Type `Optional[List]`, default `None`

Block resources on the page.

```python
NODRIVER_BLOCKED_URLS = [
    "*/*.jpg",
    "*/*.png",
    "*/*.gif",
    "*/*.webp",
    "*/*.svg",
    "*/*.ico"
]
```

### `NODRIVER_HEADLESS`
Type `Optional[bool]`, default `True`

```python
NODRIVER_HEADLESS = True
```



## Supported [`Request.meta`](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) keys

### `nodriver`
Type `bool`, default `False`

If set to a value that evaluates to `True` the request will be processed by Nodriver.

```python
return scrapy.Request("https://example.org", meta={"nodriver": True})
```

### `nodriver_include_page`
Type `bool`, default `False`

If `True`, the [Nodriver page]
that was used to download the request will be available in the callback at
`response.meta['nodriver_page']`. If `False` (or unset) the page will be
closed immediately after processing the request.

**Important!**

This meta key is entirely optional, it's NOT necessary for the page to load or for any
asynchronous operation to be performed (specifically, it's NOT necessary for `PageMethod`
objects to be applied). Use it only if you need access to the Page object in the callback
that handles the response.

For more information and important notes see
[Receiving Page objects in callbacks](#receiving-page-objects-in-callbacks).

```python
return scrapy.Request(
    url="https://example.org",
    meta={"nodriver": True, "nodriver_include_page": True},
)
```

### `nodriver_page_methods`
Type `Iterable[PageMethod]`, default `()`

An iterable of [`scrapy_nodriver.page.PageMethod`](#pagemethod-class)
objects to indicate actions to be performed on the page before returning the
final response. See [Executing actions on pages](#executing-actions-on-pages).

### `nodriver_page`
Type `Optional[nodriver.Tab]`, default `None`

A [Nodriver page]() to be used to
download the request. If unspecified, a new page is created for each request.
This key could be used in conjunction with `nodriver_include_page` to make a chain of
requests using the same page. For instance:

```python
from nodriver import Tab

def start_requests(self):
    yield scrapy.Request(
        url="https://httpbin.org/get",
        meta={"nodriver": True, "nodriver_include_page": True},
    )

def parse(self, response, **kwargs):
    page: Tab = response.meta["nodriver_page"]
    yield scrapy.Request(
        url="https://httpbin.org/headers",
        callback=self.parse_headers,
        meta={"nodriver": True, "nodriver_page": page},
    )
```

```python
from nodriver import Tab
import scrapy

class AwesomeSpiderWithPage(scrapy.Spider):
    name = "page_spider"

    def start_requests(self):
        yield scrapy.Request(
            url="https://example.org",
            callback=self.parse_first,
            meta={"nodriver": True, "nodriver_include_page": True},
            errback=self.errback_close_page,
        )

    def parse_first(self, response):
        page: Tab = response.meta["nodriver_page"]
        return scrapy.Request(
            url="https://example.com",
            callback=self.parse_second,
            meta={"nodriver": True, "nodriver_include_page": True, "nodriver_page": page},
            errback=self.errback_close_page,
        )

    async def parse_second(self, response):
        page: Tab = response.meta["nodriver_page"]
        title = await page.title()  # "Example Domain"
        await page.close()
        return {"title": title}

    async def errback_close_page(self, failure):
        page: Tab = failure.request.meta["nodriver_page"]
        await page.close()
```

**Notes:**

* When passing `nodriver_include_page=True`, make sure pages are always closed
  when they are no longer used. It's recommended to set a Request errback to make
  sure pages are closed even if a request fails (if `nodriver_include_page=False`
  pages are automatically closed upon encountering an exception).
  This is important, as open pages count towards the limit set by
  `NODRIVER_MAX_CONCURRENT_PAGES` and crawls could freeze if the limit is reached
  and pages remain open indefinitely.
* Defining callbacks as `async def` is only necessary if you need to `await` things,
  it's NOT necessary if you just need to pass over the Page object from one callback
  to another (see the example above).
* Any network operations resulting from awaiting a coroutine on a Page object
  (`get`, etc) will be executed directly by Nodriver, bypassing the
  Scrapy request workflow (Scheduler, Middlewares, etc).



## Executing actions on pages

A sorted iterable (e.g. `list`, `tuple`) of `PageMethod` objects
could be passed in the `nodriver_page_methods`
[Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta)
key to request methods to be invoked on the `Page` object before returning the final
`Response` to the callback.

This is useful when you need to perform certain actions on a page (like scrolling
down or clicking links) and you want to handle only the final result in your callback.

### `PageMethod` class

#### `scrapy_nodriver.page.PageMethod(method: str, *args, **kwargs)`:

Represents a method to be called (and awaited if necessary) on a
`nodriver.Tab` object (e.g. "select", "save_screenshot", "evaluate", etc).
`method` is the name of the method, `*args` and `**kwargs`
are passed when calling such method. The return value
will be stored in the `PageMethod.result` attribute.

For instance:
```python
def start_requests(self):
    yield Request(
        url="https://example.org",
        meta={
            "nodriver": True,
            "nodriver_page_methods": [
                PageMethod("save_screenshot", filename="example.jpeg", full_page=True),
            ],
        },
    )

def parse(self, response, **kwargs):
    screenshot = response.meta["nodriver_page_methods"][0]
    # screenshot.result contains the image file path
```

produces the same effect as:
```python
def start_requests(self):
    yield Request(
        url="https://example.org",
        meta={"nodriver": True, "nodriver_include_page": True},
    )

async def parse(self, response, **kwargs):
    page = response.meta["nodriver_page"]
    filepath = await page.save_screenshot(filename="example.jpeg", full_page=True)
    await page.close()
```


### Supported methods

Refer to the [upstream docs for the `Tab` class](https://github.com/ultrafunkamsterdam/nodriver)
to see available methods.


**Scroll down on an infinite scroll page, take a screenshot of the full page**

```python
class ScrollSpider(scrapy.Spider):
    name = "scroll"

    def start_requests(self):
        yield scrapy.Request(
            url="http://quotes.toscrape.com/scroll",
            meta=dict(
                nodriver=True,
                nodriver_include_page=True,
                nodriver_page_methods=[
                    PageMethod("wait_for", "div.quote"),
                    PageMethod("evaluate", "window.scrollBy(0, document.body.scrollHeight)"),
                    PageMethod("wait_for", "div.quote:nth-child(11)"),  # 10 per page
                ],
            ),
        )

    async def parse(self, response, **kwargs):
        page = response.meta["nodriver_page"]
        await page.save_screenshot(filename="quotes.jpeg", full_page=True)
        await page.close()
        return {"quote_count": len(response.css("div.quote"))}  # quotes from several pages
```



## Known issues

### No proxy support
Specifying a proxy via the `proxy` Request meta key is not supported.

## Reporting issues

Before opening an issue please make sure the unexpected behavior can only be
observed by using this package and not with standalone Nodriver. To do this,
translate your spider code to a reasonably close Nodriver script: if the
issue also occurs this way, you should instead report it
[upstream](https://github.com/ultrafunkamsterdam/nodriver).
For instance:

```python
import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        yield scrapy.Request(
            url="https://example.org",
            meta=dict(
                nodriver=True,
                nodriver_page_methods=[
                    PageMethod("save_screenshot", filename="example.jpeg", full_page=True),
                ],
            ),
        )
```

translates roughly to:

```python
import asyncio
import nodriver as uc

async def main():
    browser = await uc.start()
    page = await browser.get("https://example.org")
    await page.save_screenshot(filename="example.jpeg", full_page=True)
    await page.close()

if __name__ == '__main__':
    uc.loop().run_until_complete(main())
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ehsan-U/scrapy-nodriver",
    "name": "scrapy-nodriver",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ehsan U.",
    "author_email": "lancerf562@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cf/04/5f2acedd4883494cff3e6ce71afe23d1743c3171a6c70ce43af14147a728/scrapy_nodriver-0.0.7.tar.gz",
    "platform": null,
    "description": "# scrapy-nodriver: Nodriver integration for Scrapy\n[![version](https://img.shields.io/pypi/v/scrapy-nodriver.svg)](https://pypi.python.org/pypi/scrapy-nodriver)\n[![pyversions](https://img.shields.io/pypi/pyversions/scrapy-nodriver.svg)](https://pypi.python.org/pypi/scrapy-nodriver)\n\n\nA [Scrapy](https://github.com/scrapy/scrapy) Download Handler which performs requests using\n[Nodriver](https://github.com/ultrafunkamsterdam/nodriver).\nIt can be used to handle pages that require JavaScript (among other things),\nwhile adhering to the regular Scrapy workflow (i.e. without interfering\nwith request scheduling, item processing, etc).\n\n\nWhat makes this package different from package like [Scrapy-Playwright](https://github.com/scrapy-plugins/scrapy-playwright), is the optimization to stay undetected for most anti-bot solutions.\n[CDP](https://chromedevtools.github.io/devtools-protocol/) communication provides even better resistance against web applicatinon firewalls (WAF\u2019s), while performance gets a massive boost.\n\n\n\n## Requirements\n\nAfter the release of [version 2.0](https://docs.scrapy.org/en/latest/news.html#scrapy-2-0-0-2020-03-03),\nwhich includes [coroutine syntax support](https://docs.scrapy.org/en/2.0/topics/coroutines.html)\nand [asyncio support](https://docs.scrapy.org/en/2.0/topics/asyncio.html), Scrapy allows\nto integrate `asyncio`-based projects such as `Nodriver`.<br>\nNote: Chrome must be installed on the system.\n\n### Minimum required versions\n\n* Python >= 3.8\n* Scrapy >= 2.0 (!= 2.4.0)\n\n\n## Installation\n\n`scrapy-nodriver` is available on PyPI and can be installed with `pip`:\n\n```\npip install scrapy-nodriver\n```\n\n`nodriver` is defined as a dependency so it gets installed automatically,\n\n\n\n## Activation\n\n### Download handler\n\nReplace the default `http` and/or `https` Download Handlers through\n[`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html):\n\n```python\n# settings.py\nDOWNLOAD_HANDLERS = {\n    \"http\": \"scrapy_nodriver.handler.ScrapyNodriverDownloadHandler\",\n    \"https\": \"scrapy_nodriver.handler.ScrapyNodriverDownloadHandler\",\n}\n```\n\nNote that the `ScrapyNodriverDownloadHandler` class inherits from the default\n`http/https` handler. Unless explicitly marked (see [Basic usage](#basic-usage)),\nrequests will be processed by the regular Scrapy download handler.\n\n\n### Twisted reactor\n\n[Install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):\n\n```python\n# settings.py\nTWISTED_REACTOR = \"twisted.internet.asyncioreactor.AsyncioSelectorReactor\"\n```\n\nThis is the default in new projects since [Scrapy 2.7](https://github.com/scrapy/scrapy/releases/tag/2.7.0).\n\n\n## Basic usage\n\nSet the [`nodriver`](#nodriver) [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta)\nkey to download a request using Nodriver:\n\n```python\nimport scrapy\n\nclass AwesomeSpider(scrapy.Spider):\n    name = \"awesome\"\n\n    def start_requests(self):\n        # GET request\n        yield scrapy.Request(\"https://httpbin.org/get\", meta={\"nodriver\": True})\n\n    def parse(self, response, **kwargs):\n        # 'response' contains the page as seen by the browser\n        return {\"url\": response.url}\n```\n\n\n### `NODRIVER_MAX_CONCURRENT_PAGES`\nType `Optional[int]`, defaults to the value of Scrapy's `CONCURRENT_REQUESTS` setting\n\nMaximum amount of allowed concurrent Nodriver pages.\n\n```python\nNODRIVER_MAX_CONCURRENT_PAGES = 8\n```\n\n### `NODRIVER_BLOCKED_URLS`\nType `Optional[List]`, default `None`\n\nBlock resources on the page.\n\n```python\nNODRIVER_BLOCKED_URLS = [\n    \"*/*.jpg\",\n    \"*/*.png\",\n    \"*/*.gif\",\n    \"*/*.webp\",\n    \"*/*.svg\",\n    \"*/*.ico\"\n]\n```\n\n### `NODRIVER_HEADLESS`\nType `Optional[bool]`, default `True`\n\n```python\nNODRIVER_HEADLESS = True\n```\n\n\n\n## Supported [`Request.meta`](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) keys\n\n### `nodriver`\nType `bool`, default `False`\n\nIf set to a value that evaluates to `True` the request will be processed by Nodriver.\n\n```python\nreturn scrapy.Request(\"https://example.org\", meta={\"nodriver\": True})\n```\n\n### `nodriver_include_page`\nType `bool`, default `False`\n\nIf `True`, the [Nodriver page]\nthat was used to download the request will be available in the callback at\n`response.meta['nodriver_page']`. If `False` (or unset) the page will be\nclosed immediately after processing the request.\n\n**Important!**\n\nThis meta key is entirely optional, it's NOT necessary for the page to load or for any\nasynchronous operation to be performed (specifically, it's NOT necessary for `PageMethod`\nobjects to be applied). Use it only if you need access to the Page object in the callback\nthat handles the response.\n\nFor more information and important notes see\n[Receiving Page objects in callbacks](#receiving-page-objects-in-callbacks).\n\n```python\nreturn scrapy.Request(\n    url=\"https://example.org\",\n    meta={\"nodriver\": True, \"nodriver_include_page\": True},\n)\n```\n\n### `nodriver_page_methods`\nType `Iterable[PageMethod]`, default `()`\n\nAn iterable of [`scrapy_nodriver.page.PageMethod`](#pagemethod-class)\nobjects to indicate actions to be performed on the page before returning the\nfinal response. See [Executing actions on pages](#executing-actions-on-pages).\n\n### `nodriver_page`\nType `Optional[nodriver.Tab]`, default `None`\n\nA [Nodriver page]() to be used to\ndownload the request. If unspecified, a new page is created for each request.\nThis key could be used in conjunction with `nodriver_include_page` to make a chain of\nrequests using the same page. For instance:\n\n```python\nfrom nodriver import Tab\n\ndef start_requests(self):\n    yield scrapy.Request(\n        url=\"https://httpbin.org/get\",\n        meta={\"nodriver\": True, \"nodriver_include_page\": True},\n    )\n\ndef parse(self, response, **kwargs):\n    page: Tab = response.meta[\"nodriver_page\"]\n    yield scrapy.Request(\n        url=\"https://httpbin.org/headers\",\n        callback=self.parse_headers,\n        meta={\"nodriver\": True, \"nodriver_page\": page},\n    )\n```\n\n```python\nfrom nodriver import Tab\nimport scrapy\n\nclass AwesomeSpiderWithPage(scrapy.Spider):\n    name = \"page_spider\"\n\n    def start_requests(self):\n        yield scrapy.Request(\n            url=\"https://example.org\",\n            callback=self.parse_first,\n            meta={\"nodriver\": True, \"nodriver_include_page\": True},\n            errback=self.errback_close_page,\n        )\n\n    def parse_first(self, response):\n        page: Tab = response.meta[\"nodriver_page\"]\n        return scrapy.Request(\n            url=\"https://example.com\",\n            callback=self.parse_second,\n            meta={\"nodriver\": True, \"nodriver_include_page\": True, \"nodriver_page\": page},\n            errback=self.errback_close_page,\n        )\n\n    async def parse_second(self, response):\n        page: Tab = response.meta[\"nodriver_page\"]\n        title = await page.title()  # \"Example Domain\"\n        await page.close()\n        return {\"title\": title}\n\n    async def errback_close_page(self, failure):\n        page: Tab = failure.request.meta[\"nodriver_page\"]\n        await page.close()\n```\n\n**Notes:**\n\n* When passing `nodriver_include_page=True`, make sure pages are always closed\n  when they are no longer used. It's recommended to set a Request errback to make\n  sure pages are closed even if a request fails (if `nodriver_include_page=False`\n  pages are automatically closed upon encountering an exception).\n  This is important, as open pages count towards the limit set by\n  `NODRIVER_MAX_CONCURRENT_PAGES` and crawls could freeze if the limit is reached\n  and pages remain open indefinitely.\n* Defining callbacks as `async def` is only necessary if you need to `await` things,\n  it's NOT necessary if you just need to pass over the Page object from one callback\n  to another (see the example above).\n* Any network operations resulting from awaiting a coroutine on a Page object\n  (`get`, etc) will be executed directly by Nodriver, bypassing the\n  Scrapy request workflow (Scheduler, Middlewares, etc).\n\n\n\n## Executing actions on pages\n\nA sorted iterable (e.g. `list`, `tuple`) of `PageMethod` objects\ncould be passed in the `nodriver_page_methods`\n[Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta)\nkey to request methods to be invoked on the `Page` object before returning the final\n`Response` to the callback.\n\nThis is useful when you need to perform certain actions on a page (like scrolling\ndown or clicking links) and you want to handle only the final result in your callback.\n\n### `PageMethod` class\n\n#### `scrapy_nodriver.page.PageMethod(method: str, *args, **kwargs)`:\n\nRepresents a method to be called (and awaited if necessary) on a\n`nodriver.Tab` object (e.g. \"select\", \"save_screenshot\", \"evaluate\", etc).\n`method` is the name of the method, `*args` and `**kwargs`\nare passed when calling such method. The return value\nwill be stored in the `PageMethod.result` attribute.\n\nFor instance:\n```python\ndef start_requests(self):\n    yield Request(\n        url=\"https://example.org\",\n        meta={\n            \"nodriver\": True,\n            \"nodriver_page_methods\": [\n                PageMethod(\"save_screenshot\", filename=\"example.jpeg\", full_page=True),\n            ],\n        },\n    )\n\ndef parse(self, response, **kwargs):\n    screenshot = response.meta[\"nodriver_page_methods\"][0]\n    # screenshot.result contains the image file path\n```\n\nproduces the same effect as:\n```python\ndef start_requests(self):\n    yield Request(\n        url=\"https://example.org\",\n        meta={\"nodriver\": True, \"nodriver_include_page\": True},\n    )\n\nasync def parse(self, response, **kwargs):\n    page = response.meta[\"nodriver_page\"]\n    filepath = await page.save_screenshot(filename=\"example.jpeg\", full_page=True)\n    await page.close()\n```\n\n\n### Supported methods\n\nRefer to the [upstream docs for the `Tab` class](https://github.com/ultrafunkamsterdam/nodriver)\nto see available methods.\n\n\n**Scroll down on an infinite scroll page, take a screenshot of the full page**\n\n```python\nclass ScrollSpider(scrapy.Spider):\n    name = \"scroll\"\n\n    def start_requests(self):\n        yield scrapy.Request(\n            url=\"http://quotes.toscrape.com/scroll\",\n            meta=dict(\n                nodriver=True,\n                nodriver_include_page=True,\n                nodriver_page_methods=[\n                    PageMethod(\"wait_for\", \"div.quote\"),\n                    PageMethod(\"evaluate\", \"window.scrollBy(0, document.body.scrollHeight)\"),\n                    PageMethod(\"wait_for\", \"div.quote:nth-child(11)\"),  # 10 per page\n                ],\n            ),\n        )\n\n    async def parse(self, response, **kwargs):\n        page = response.meta[\"nodriver_page\"]\n        await page.save_screenshot(filename=\"quotes.jpeg\", full_page=True)\n        await page.close()\n        return {\"quote_count\": len(response.css(\"div.quote\"))}  # quotes from several pages\n```\n\n\n\n## Known issues\n\n### No proxy support\nSpecifying a proxy via the `proxy` Request meta key is not supported.\n\n## Reporting issues\n\nBefore opening an issue please make sure the unexpected behavior can only be\nobserved by using this package and not with standalone Nodriver. To do this,\ntranslate your spider code to a reasonably close Nodriver script: if the\nissue also occurs this way, you should instead report it\n[upstream](https://github.com/ultrafunkamsterdam/nodriver).\nFor instance:\n\n```python\nimport scrapy\n\nclass ExampleSpider(scrapy.Spider):\n    name = \"example\"\n\n    def start_requests(self):\n        yield scrapy.Request(\n            url=\"https://example.org\",\n            meta=dict(\n                nodriver=True,\n                nodriver_page_methods=[\n                    PageMethod(\"save_screenshot\", filename=\"example.jpeg\", full_page=True),\n                ],\n            ),\n        )\n```\n\ntranslates roughly to:\n\n```python\nimport asyncio\nimport nodriver as uc\n\nasync def main():\n    browser = await uc.start()\n    page = await browser.get(\"https://example.org\")\n    await page.save_screenshot(filename=\"example.jpeg\", full_page=True)\n    await page.close()\n\nif __name__ == '__main__':\n    uc.loop().run_until_complete(main())\n```",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Nodriver integration for Scrapy",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/Ehsan-U/scrapy-nodriver"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3378ab994f21aa1c446ef2207d97cb7e0107871c456ca117272ba24368be67ba",
                "md5": "9b1b7a4ecd0a20ecd35b8026f5fa8cea",
                "sha256": "9ac8929aa2b2cc58bbb11674ba48012b8ff7f2bcdc0f301d176bb96ea147f907"
            },
            "downloads": -1,
            "filename": "scrapy_nodriver-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b1b7a4ecd0a20ecd35b8026f5fa8cea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9520,
            "upload_time": "2024-12-18T02:53:30",
            "upload_time_iso_8601": "2024-12-18T02:53:30.015767Z",
            "url": "https://files.pythonhosted.org/packages/33/78/ab994f21aa1c446ef2207d97cb7e0107871c456ca117272ba24368be67ba/scrapy_nodriver-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cf045f2acedd4883494cff3e6ce71afe23d1743c3171a6c70ce43af14147a728",
                "md5": "76d7c6896a53e795fb9f6517781da5d2",
                "sha256": "339c441649a56f59d39189de0c8d5d61b7dd6c1632d69dc1a4ce17ef138d1dbb"
            },
            "downloads": -1,
            "filename": "scrapy_nodriver-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "76d7c6896a53e795fb9f6517781da5d2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 11921,
            "upload_time": "2024-12-18T02:53:34",
            "upload_time_iso_8601": "2024-12-18T02:53:34.917955Z",
            "url": "https://files.pythonhosted.org/packages/cf/04/5f2acedd4883494cff3e6ce71afe23d1743c3171a6c70ce43af14147a728/scrapy_nodriver-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-18 02:53:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Ehsan-U",
    "github_project": "scrapy-nodriver",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "scrapy-nodriver"
}

Ehsan U.