scrapy-selenium-enhanced


Namescrapy-selenium-enhanced JSON
Version 0.0.5 PyPI version JSON
download
home_pageNone
SummaryScrapy with selenium and more
upload_time2025-09-08 09:15:35
maintainerNone
docs_urlNone
authorLiu Hongyu
requires_python>=3.13
licenseNone
keywords scrapy selenium webdriver scraping crawler
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrapy with selenium

[![PyPI](https://img.shields.io/pypi/v/scrapy-selenium-enhanced.svg)](https://pypi.python.org/pypi/scrapy-selenium-enhanced) [![Maintainability](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/maintainability)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/maintainability)

Scrapy middleware to handle javascript pages using selenium.

## Installation

```
$ pip install scrapy-selenium-enhanced
```

You should use **python>=3.13**. 
You will also need one of the Selenium [compatible browsers](http://www.seleniumhq.org/about/platforms.jsp).

## Configuration

1. Add the browser to use, the path to the driver executable, and the arguments to pass to the executable to the scrapy settings:
   
```python
from shutil import which

SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
# '-headless' if using firefox instead of chrome
SELENIUM_DRIVER_ARGUMENTS=['--headless', f'user-agent={USER_AGENT}']
```

Optionally, set the path to the browser executable:

```python
SELENIUM_BROWSER_EXECUTABLE_PATH = which('chrome')
```

In order to use a remote Selenium driver, specify `SELENIUM_COMMAND_EXECUTOR` instead of `SELENIUM_DRIVER_EXECUTABLE_PATH`:
    
```python
SELENIUM_COMMAND_EXECUTOR = 'http://localhost:4444/wd/hub'
```

2. Add the `SeleniumMiddleware` to the downloader middlewares:
   
```python
DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium_enhanced.SeleniumMiddleware': 800
}
```
   
   
   
## Usage

Use the `scrapy_selenium.SeleniumRequest` instead of the scrapy built-in `Request` like below:

```python
from scrapy_selenium_enhanced import SeleniumRequest
yield SeleniumRequest(url=url, callback=self.parse_result)
```

The request will be handled by selenium, and the request will have an additional `meta` key, named `driver` containing the selenium driver with the request processed.

```python
def parse_result(self, response):
    print(response.request.meta['driver'].title)
```

For more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver)

The `selector` response attribute work as usual (but contains the html processed by the selenium driver).

```python
def parse_result(self, response):
    print(response.xpath('//title/@text'))
```

### Additional arguments

The `scrapy_selenium_enhanced.SeleniumRequest` accept 4 additional arguments:

#### `wait_time` / `wait_until`

When used, selenium will perform an [Explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) before returning the response to the spider.

```python
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

yield SeleniumRequest(
    url=url,
    callback=self.parse_result,
    wait_time=10,
    wait_until=EC.element_to_be_clickable((By.ID, 'someid'))
)
```

#### `screenshot`

When used, selenium will take a screenshot of the page and the binary data of the .png captured will be added to the response `meta`:

```python
yield SeleniumRequest(
    url=url,
    callback=self.parse_result,
    screenshot=True
)

def parse_result(self, response):
    with open('image.png', 'wb') as image_file:
        image_file.write(response.meta['screenshot'])
```

#### `script`

When used, selenium will execute custom JavaScript code.

```python
yield SeleniumRequest(
    url=url,
    callback=self.parse_result,
    script='window.scrollTo(0, document.body.scrollHeight);',
)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrapy-selenium-enhanced",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "scrapy, selenium, webdriver, scraping, crawler",
    "author": "Liu Hongyu",
    "author_email": "Liu Hongyu <eliuhy@163.com>",
    "download_url": "https://files.pythonhosted.org/packages/b4/ab/cba6cdd0fdf5c6f818135ad81da20512ff5682c12fd1052a68f0f5c2a8cb/scrapy_selenium_enhanced-0.0.5.tar.gz",
    "platform": null,
    "description": "# Scrapy with selenium\n\n[![PyPI](https://img.shields.io/pypi/v/scrapy-selenium-enhanced.svg)](https://pypi.python.org/pypi/scrapy-selenium-enhanced) [![Maintainability](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/maintainability)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/maintainability)\n\nScrapy middleware to handle javascript pages using selenium.\n\n## Installation\n\n```\n$ pip install scrapy-selenium-enhanced\n```\n\nYou should use **python>=3.13**. \nYou will also need one of the Selenium [compatible browsers](http://www.seleniumhq.org/about/platforms.jsp).\n\n## Configuration\n\n1. Add the browser to use, the path to the driver executable, and the arguments to pass to the executable to the scrapy settings:\n   \n```python\nfrom shutil import which\n\nSELENIUM_DRIVER_NAME = 'chrome'\nSELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')\n# '-headless' if using firefox instead of chrome\nSELENIUM_DRIVER_ARGUMENTS=['--headless', f'user-agent={USER_AGENT}']\n```\n\nOptionally, set the path to the browser executable:\n\n```python\nSELENIUM_BROWSER_EXECUTABLE_PATH = which('chrome')\n```\n\nIn order to use a remote Selenium driver, specify `SELENIUM_COMMAND_EXECUTOR` instead of `SELENIUM_DRIVER_EXECUTABLE_PATH`:\n    \n```python\nSELENIUM_COMMAND_EXECUTOR = 'http://localhost:4444/wd/hub'\n```\n\n2. Add the `SeleniumMiddleware` to the downloader middlewares:\n   \n```python\nDOWNLOADER_MIDDLEWARES = {\n    'scrapy_selenium_enhanced.SeleniumMiddleware': 800\n}\n```\n   \n   \n   \n## Usage\n\nUse the `scrapy_selenium.SeleniumRequest` instead of the scrapy built-in `Request` like below:\n\n```python\nfrom scrapy_selenium_enhanced import SeleniumRequest\nyield SeleniumRequest(url=url, callback=self.parse_result)\n```\n\nThe request will be handled by selenium, and the request will have an additional `meta` key, named `driver` containing the selenium driver with the request processed.\n\n```python\ndef parse_result(self, response):\n    print(response.request.meta['driver'].title)\n```\n\nFor more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver)\n\nThe `selector` response attribute work as usual (but contains the html processed by the selenium driver).\n\n```python\ndef parse_result(self, response):\n    print(response.xpath('//title/@text'))\n```\n\n### Additional arguments\n\nThe `scrapy_selenium_enhanced.SeleniumRequest` accept 4 additional arguments:\n\n#### `wait_time` / `wait_until`\n\nWhen used, selenium will perform an [Explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) before returning the response to the spider.\n\n```python\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support import expected_conditions as EC\n\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    wait_time=10,\n    wait_until=EC.element_to_be_clickable((By.ID, 'someid'))\n)\n```\n\n#### `screenshot`\n\nWhen used, selenium will take a screenshot of the page and the binary data of the .png captured will be added to the response `meta`:\n\n```python\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    screenshot=True\n)\n\ndef parse_result(self, response):\n    with open('image.png', 'wb') as image_file:\n        image_file.write(response.meta['screenshot'])\n```\n\n#### `script`\n\nWhen used, selenium will execute custom JavaScript code.\n\n```python\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    script='window.scrollTo(0, document.body.scrollHeight);',\n)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Scrapy with selenium and more",
    "version": "0.0.5",
    "project_urls": null,
    "split_keywords": [
        "scrapy",
        " selenium",
        " webdriver",
        " scraping",
        " crawler"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "17a98a5694f89c3e8b82de84f01d9a7c9f104292ab875ebe59028d3c88fb8b6d",
                "md5": "dcea2893bbcbc1c0eea1af86a9aaf376",
                "sha256": "b942463090c052bdc66f87db8782cabc5971d5f18109f4ef3b8c77c0371b0022"
            },
            "downloads": -1,
            "filename": "scrapy_selenium_enhanced-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dcea2893bbcbc1c0eea1af86a9aaf376",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 11396,
            "upload_time": "2025-09-08T09:15:34",
            "upload_time_iso_8601": "2025-09-08T09:15:34.419040Z",
            "url": "https://files.pythonhosted.org/packages/17/a9/8a5694f89c3e8b82de84f01d9a7c9f104292ab875ebe59028d3c88fb8b6d/scrapy_selenium_enhanced-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b4abcba6cdd0fdf5c6f818135ad81da20512ff5682c12fd1052a68f0f5c2a8cb",
                "md5": "ff110dff788bf78aadf9a8fe6eb1aeb6",
                "sha256": "5c705aff53e7c591148ed530d379292a0f023695a96ef2b7a0de03581769cd8e"
            },
            "downloads": -1,
            "filename": "scrapy_selenium_enhanced-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "ff110dff788bf78aadf9a8fe6eb1aeb6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 8333,
            "upload_time": "2025-09-08T09:15:35",
            "upload_time_iso_8601": "2025-09-08T09:15:35.798331Z",
            "url": "https://files.pythonhosted.org/packages/b4/ab/cba6cdd0fdf5c6f818135ad81da20512ff5682c12fd1052a68f0f5c2a8cb/scrapy_selenium_enhanced-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 09:15:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scrapy-selenium-enhanced"
}
        
Elapsed time: 2.16400s