scrachy


Namescrachy JSON
Version 0.5.9 PyPI version JSON
download
home_pagehttps://bitbucket.org/reidswanson/scrachy
SummaryEnhanced caching modules for scrapy.
upload_time2023-12-13 18:01:08
maintainerReid Swanson
docs_urlNone
authorReid Swanson
requires_python
licenselgpl-v3
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Scrachy
Scrachy was primarily developed to provide a flexible cache storage backend for [Scrapy](https://scrapy.org/) that stores its data in a relational database using [SQLAlchemy](https://www.sqlalchemy.org/).
However, it now has several other additional features including middleware for using Selenium to download requests.
It also comes with a downloader middleware that will optionally ignore requests that are already in the cache.

# Install
You can install the latest version from git:

```
>pip install git+https://bitbucket.org/reidswanson/scrachy.git
``` 

or from PyPI:

```
>pip install scrachy
```

# Documentation
A brief guide to minimally using the cache storage engine and the Selenium backend are given below.
For other configuration options and features please see the full documentation on [Read the Docs](https://scrachy.readthedocs.io/en/latest).

## Storage Backend
To (minimally) use the storage backend you simply need to enable caching by adding the following to your `settings.py` file:  
```python
# Enable caching
HTTPCACHE_ENABLED = True

# Set the storage backend to the one provided by Scrachy.
HTTPCACHE_STORAGE = 'scrachy.middleware.httpcache.AlchemyCacheStorage'

# One of the supported SqlAlchemy dialects
SCRACHY_DB_DIALECT = '<database-dialect>'

# The name of the driver (that must be installed as an extra) and used.
SCRACHY_DB_DRIVER = '<database-driver>'

# Options for connecting to the database
SCRACHY_DB_HOST = '<database-hostname>'
SCRACHY_DB_PORT = '<database-port>'
SCRACHY_DB_SCHEMA = <database-schema>
SCRACHY_DB_DATABASE = '<database-name>'
SCRACHY_DB_USERNAME = '<username>'

# Note, do not store this value in the settings file. Use an environment
# variable or python-dotenv.
SCRACHY_DB_PASSWORD = '<password>'

# A dictionary of other connection arguments
SCRACHY_DB_CONNECT_ARGS = {}

# there may be a conflict with the compression middleware. If you encounter
# errors either disable it or move it after the caching middleware.
DOWNLOADER_MIDDLEWARES = {
   ...
   'scrapy.downloadermiddlewares.http.compression.HttpCompressionMiddleware': None,
}
```

# Selenium
There are two Selenium middleware classes provided by Scrachy.
To use them, first add one of them to the `DOWNLOADER_MIDDLEWARES`

```python
DOWNLOADER_MIDDLEWARES = {
    ...
    'scrachy.middleware.selenium.SeleniumMiddleware': 800,  # or AsyncSeleniumMiddleware
    ...
}
```

Then in your spider parsing code use a `SeleniumRequest` instead of a `scrapy.http.Request`.


# License
Scrachy is released using the GNU Lesser General Public License.
See the [LICENSE](LICENSE.md) file for more details.
Files that are adapted or use code from other sources are indicated either at the top of the file or at the location of the code snippet.
Some of these files were adapted from code released under a 3-clause BSD license.
Those files should indicate the original copyright in a comment at the top of the file.
See the [BSD_LICENSE](BSD_LICENSE.md) file for details of this license.

            

Raw data

            {
    "_id": null,
    "home_page": "https://bitbucket.org/reidswanson/scrachy",
    "name": "scrachy",
    "maintainer": "Reid Swanson",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "reid@reidswanson.com",
    "keywords": "",
    "author": "Reid Swanson",
    "author_email": "reid@reidswanson.com",
    "download_url": "https://files.pythonhosted.org/packages/5c/ef/62c8b87db9c83ff680abf4fc5198862a46cc10cfcf2837093607890edfd8/scrachy-0.5.9.tar.gz",
    "platform": null,
    "description": "# Scrachy\r\nScrachy was primarily developed to provide a flexible cache storage backend for [Scrapy](https://scrapy.org/) that stores its data in a relational database using [SQLAlchemy](https://www.sqlalchemy.org/).\r\nHowever, it now has several other additional features including middleware for using Selenium to download requests.\r\nIt also comes with a downloader middleware that will optionally ignore requests that are already in the cache.\r\n\r\n# Install\r\nYou can install the latest version from git:\r\n\r\n```\r\n>pip install git+https://bitbucket.org/reidswanson/scrachy.git\r\n``` \r\n\r\nor from PyPI:\r\n\r\n```\r\n>pip install scrachy\r\n```\r\n\r\n# Documentation\r\nA brief guide to minimally using the cache storage engine and the Selenium backend are given below.\r\nFor other configuration options and features please see the full documentation on [Read the Docs](https://scrachy.readthedocs.io/en/latest).\r\n\r\n## Storage Backend\r\nTo (minimally) use the storage backend you simply need to enable caching by adding the following to your `settings.py` file:  \r\n```python\r\n# Enable caching\r\nHTTPCACHE_ENABLED = True\r\n\r\n# Set the storage backend to the one provided by Scrachy.\r\nHTTPCACHE_STORAGE = 'scrachy.middleware.httpcache.AlchemyCacheStorage'\r\n\r\n# One of the supported SqlAlchemy dialects\r\nSCRACHY_DB_DIALECT = '<database-dialect>'\r\n\r\n# The name of the driver (that must be installed as an extra) and used.\r\nSCRACHY_DB_DRIVER = '<database-driver>'\r\n\r\n# Options for connecting to the database\r\nSCRACHY_DB_HOST = '<database-hostname>'\r\nSCRACHY_DB_PORT = '<database-port>'\r\nSCRACHY_DB_SCHEMA = <database-schema>\r\nSCRACHY_DB_DATABASE = '<database-name>'\r\nSCRACHY_DB_USERNAME = '<username>'\r\n\r\n# Note, do not store this value in the settings file. Use an environment\r\n# variable or python-dotenv.\r\nSCRACHY_DB_PASSWORD = '<password>'\r\n\r\n# A dictionary of other connection arguments\r\nSCRACHY_DB_CONNECT_ARGS = {}\r\n\r\n# there may be a conflict with the compression middleware. If you encounter\r\n# errors either disable it or move it after the caching middleware.\r\nDOWNLOADER_MIDDLEWARES = {\r\n   ...\r\n   'scrapy.downloadermiddlewares.http.compression.HttpCompressionMiddleware': None,\r\n}\r\n```\r\n\r\n# Selenium\r\nThere are two Selenium middleware classes provided by Scrachy.\r\nTo use them, first add one of them to the `DOWNLOADER_MIDDLEWARES`\r\n\r\n```python\r\nDOWNLOADER_MIDDLEWARES = {\r\n    ...\r\n    'scrachy.middleware.selenium.SeleniumMiddleware': 800,  # or AsyncSeleniumMiddleware\r\n    ...\r\n}\r\n```\r\n\r\nThen in your spider parsing code use a `SeleniumRequest` instead of a `scrapy.http.Request`.\r\n\r\n\r\n# License\r\nScrachy is released using the GNU Lesser General Public License.\r\nSee the [LICENSE](LICENSE.md) file for more details.\r\nFiles that are adapted or use code from other sources are indicated either at the top of the file or at the location of the code snippet.\r\nSome of these files were adapted from code released under a 3-clause BSD license.\r\nThose files should indicate the original copyright in a comment at the top of the file.\r\nSee the [BSD_LICENSE](BSD_LICENSE.md) file for details of this license.\r\n",
    "bugtrack_url": null,
    "license": "lgpl-v3",
    "summary": "Enhanced caching modules for scrapy.",
    "version": "0.5.9",
    "project_urls": {
        "Homepage": "https://bitbucket.org/reidswanson/scrachy"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "195826cef76285e3c9e58dcf0cee9d858ebe460511384dda7dde3a20e835ff15",
                "md5": "b87b05691aaec65d42c0cee3e7415708",
                "sha256": "3e4af4ec1b284e9977a20dd661f8a9dd7cb2d90f9745087172ced2e77bf94fe7"
            },
            "downloads": -1,
            "filename": "scrachy-0.5.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b87b05691aaec65d42c0cee3e7415708",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 89698,
            "upload_time": "2023-12-13T18:01:04",
            "upload_time_iso_8601": "2023-12-13T18:01:04.909145Z",
            "url": "https://files.pythonhosted.org/packages/19/58/26cef76285e3c9e58dcf0cee9d858ebe460511384dda7dde3a20e835ff15/scrachy-0.5.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5cef62c8b87db9c83ff680abf4fc5198862a46cc10cfcf2837093607890edfd8",
                "md5": "9ce028dee079e0575d5771c753b6cf7e",
                "sha256": "28778a5940cf1a9f0724c82555fd56612226f9f3283c1ea2a223a68eaa093e0a"
            },
            "downloads": -1,
            "filename": "scrachy-0.5.9.tar.gz",
            "has_sig": false,
            "md5_digest": "9ce028dee079e0575d5771c753b6cf7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 53259,
            "upload_time": "2023-12-13T18:01:08",
            "upload_time_iso_8601": "2023-12-13T18:01:08.307488Z",
            "url": "https://files.pythonhosted.org/packages/5c/ef/62c8b87db9c83ff680abf4fc5198862a46cc10cfcf2837093607890edfd8/scrachy-0.5.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-13 18:01:08",
    "github": false,
    "gitlab": false,
    "bitbucket": true,
    "codeberg": false,
    "bitbucket_user": "reidswanson",
    "bitbucket_project": "scrachy",
    "lcname": "scrachy"
}
        
Elapsed time: 0.17056s