**scrapelib** is a library for making requests to less-than-reliable websites.
Source: [https://github.com/jamesturk/scrapelib](https://github.com/jamesturk/scrapelib)
Documentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)
Issues: [https://github.com/jamesturk/scrapelib/issues](https://github.com/jamesturk/scrapelib/issues)
[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)
[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)
## Features
**scrapelib** originated as part of the [Open States](http://openstates.org/)
project to scrape the websites of all 50 state legislatures and as a result
was therefore designed with features desirable when dealing with sites that
have intermittent errors or require rate-limiting.
Advantages of using scrapelib over using requests as-is:
- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- highly-configurable request throtting
- configurable retries for non-permanent site failures
- All of the power of the suberb [requests](http://python-requests.org) library.
## Installation
*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool:
poetry add scrapelib
or:
pip install scrapelib
## Example Usage
``` python
import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)
# Grab Google front page
s.get('http://google.com')
# Will be throttled to 10 HTTP requests per minute
while True:
s.get('http://example.com')
```
Raw data
{
"_id": null,
"home_page": "https://github.com/jamesturk/scrapelib",
"name": "scrapelib",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "James Turk",
"author_email": "dev@jamesturk.net",
"download_url": "https://files.pythonhosted.org/packages/d2/79/1a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c/scrapelib-2.3.0.tar.gz",
"platform": null,
"description": "**scrapelib** is a library for making requests to less-than-reliable websites.\n\nSource: [https://github.com/jamesturk/scrapelib](https://github.com/jamesturk/scrapelib)\n\nDocumentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)\n\nIssues: [https://github.com/jamesturk/scrapelib/issues](https://github.com/jamesturk/scrapelib/issues)\n\n[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)\n[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)\n\n## Features\n\n**scrapelib** originated as part of the [Open States](http://openstates.org/)\nproject to scrape the websites of all 50 state legislatures and as a result\nwas therefore designed with features desirable when dealing with sites that\nhave intermittent errors or require rate-limiting.\n\nAdvantages of using scrapelib over using requests as-is:\n\n- HTTP(S) and FTP requests via an identical API\n- support for simple caching with pluggable cache backends\n- highly-configurable request throtting\n- configurable retries for non-permanent site failures\n- All of the power of the suberb [requests](http://python-requests.org) library.\n\n\n## Installation\n\n*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool:\n\n poetry add scrapelib\n\nor:\n\n pip install scrapelib\n\n\n## Example Usage\n\n``` python\n\n import scrapelib\n s = scrapelib.Scraper(requests_per_minute=10)\n\n # Grab Google front page\n s.get('http://google.com')\n\n # Will be throttled to 10 HTTP requests per minute\n while True:\n s.get('http://example.com')\n```\n",
"bugtrack_url": null,
"license": "BSD-2-Clause",
"summary": "",
"version": "2.3.0",
"project_urls": {
"Homepage": "https://github.com/jamesturk/scrapelib",
"Repository": "https://github.com/jamesturk/scrapelib"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "82d689bbe05bcc4edf399473c8f4f52f61534ff40cb2be60ee60381959fbfc72",
"md5": "470e8740e2580171d5b590fb602db031",
"sha256": "4004b717ebe916533c9937b7671fcbe7ef64d998fb54fcad54a5497fc276a7bf"
},
"downloads": -1,
"filename": "scrapelib-2.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "470e8740e2580171d5b590fb602db031",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7,<4.0",
"size": 16981,
"upload_time": "2023-12-15T22:24:13",
"upload_time_iso_8601": "2023-12-15T22:24:13.319558Z",
"url": "https://files.pythonhosted.org/packages/82/d6/89bbe05bcc4edf399473c8f4f52f61534ff40cb2be60ee60381959fbfc72/scrapelib-2.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d2791a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c",
"md5": "5767c096d9692ab3343ca4681671b402",
"sha256": "e99b327340b2a9162e1598a8c0664259d16eddef9ebb8389f93ca5428f3c58da"
},
"downloads": -1,
"filename": "scrapelib-2.3.0.tar.gz",
"has_sig": false,
"md5_digest": "5767c096d9692ab3343ca4681671b402",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7,<4.0",
"size": 15343,
"upload_time": "2023-12-15T22:24:14",
"upload_time_iso_8601": "2023-12-15T22:24:14.480580Z",
"url": "https://files.pythonhosted.org/packages/d2/79/1a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c/scrapelib-2.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-15 22:24:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jamesturk",
"github_project": "scrapelib",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "scrapelib"
}