scrapelib


Namescrapelib JSON
Version 2.3.0 PyPI version JSON
download
home_pagehttps://github.com/jamesturk/scrapelib
Summary
upload_time2023-12-15 22:24:14
maintainer
docs_urlNone
authorJames Turk
requires_python>=3.7,<4.0
licenseBSD-2-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            **scrapelib** is a library for making requests to less-than-reliable websites.

Source: [https://github.com/jamesturk/scrapelib](https://github.com/jamesturk/scrapelib)

Documentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)

Issues: [https://github.com/jamesturk/scrapelib/issues](https://github.com/jamesturk/scrapelib/issues)

[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)
[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)

## Features

**scrapelib** originated as part of the [Open States](http://openstates.org/)
project to scrape the websites of all 50 state legislatures and as a result
was therefore designed with features desirable when dealing with sites that
have intermittent errors or require rate-limiting.

Advantages of using scrapelib over using requests as-is:

- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- highly-configurable request throtting
- configurable retries for non-permanent site failures
- All of the power of the suberb [requests](http://python-requests.org) library.


## Installation

*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool:

    poetry add scrapelib

or:

    pip install scrapelib


## Example Usage

``` python

  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10)

  # Grab Google front page
  s.get('http://google.com')

  # Will be throttled to 10 HTTP requests per minute
  while True:
      s.get('http://example.com')
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jamesturk/scrapelib",
    "name": "scrapelib",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "James Turk",
    "author_email": "dev@jamesturk.net",
    "download_url": "https://files.pythonhosted.org/packages/d2/79/1a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c/scrapelib-2.3.0.tar.gz",
    "platform": null,
    "description": "**scrapelib** is a library for making requests to less-than-reliable websites.\n\nSource: [https://github.com/jamesturk/scrapelib](https://github.com/jamesturk/scrapelib)\n\nDocumentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)\n\nIssues: [https://github.com/jamesturk/scrapelib/issues](https://github.com/jamesturk/scrapelib/issues)\n\n[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)\n[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)\n\n## Features\n\n**scrapelib** originated as part of the [Open States](http://openstates.org/)\nproject to scrape the websites of all 50 state legislatures and as a result\nwas therefore designed with features desirable when dealing with sites that\nhave intermittent errors or require rate-limiting.\n\nAdvantages of using scrapelib over using requests as-is:\n\n- HTTP(S) and FTP requests via an identical API\n- support for simple caching with pluggable cache backends\n- highly-configurable request throtting\n- configurable retries for non-permanent site failures\n- All of the power of the suberb [requests](http://python-requests.org) library.\n\n\n## Installation\n\n*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool:\n\n    poetry add scrapelib\n\nor:\n\n    pip install scrapelib\n\n\n## Example Usage\n\n``` python\n\n  import scrapelib\n  s = scrapelib.Scraper(requests_per_minute=10)\n\n  # Grab Google front page\n  s.get('http://google.com')\n\n  # Will be throttled to 10 HTTP requests per minute\n  while True:\n      s.get('http://example.com')\n```\n",
    "bugtrack_url": null,
    "license": "BSD-2-Clause",
    "summary": "",
    "version": "2.3.0",
    "project_urls": {
        "Homepage": "https://github.com/jamesturk/scrapelib",
        "Repository": "https://github.com/jamesturk/scrapelib"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82d689bbe05bcc4edf399473c8f4f52f61534ff40cb2be60ee60381959fbfc72",
                "md5": "470e8740e2580171d5b590fb602db031",
                "sha256": "4004b717ebe916533c9937b7671fcbe7ef64d998fb54fcad54a5497fc276a7bf"
            },
            "downloads": -1,
            "filename": "scrapelib-2.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "470e8740e2580171d5b590fb602db031",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 16981,
            "upload_time": "2023-12-15T22:24:13",
            "upload_time_iso_8601": "2023-12-15T22:24:13.319558Z",
            "url": "https://files.pythonhosted.org/packages/82/d6/89bbe05bcc4edf399473c8f4f52f61534ff40cb2be60ee60381959fbfc72/scrapelib-2.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d2791a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c",
                "md5": "5767c096d9692ab3343ca4681671b402",
                "sha256": "e99b327340b2a9162e1598a8c0664259d16eddef9ebb8389f93ca5428f3c58da"
            },
            "downloads": -1,
            "filename": "scrapelib-2.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5767c096d9692ab3343ca4681671b402",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 15343,
            "upload_time": "2023-12-15T22:24:14",
            "upload_time_iso_8601": "2023-12-15T22:24:14.480580Z",
            "url": "https://files.pythonhosted.org/packages/d2/79/1a285d79e417ef509a84f9a6b58f106bb438b13b16228b9681c32d688c8c/scrapelib-2.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-15 22:24:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jamesturk",
    "github_project": "scrapelib",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "scrapelib"
}
        
Elapsed time: 0.30396s