scrapesession


Namescrapesession JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/8W9aG/scrapesession
SummaryA library for processing image features in a dataframe.
upload_time2025-07-10 21:42:45
maintainerNone
docs_urlNone
authorWill Sackfield
requires_pythonNone
licenseMIT
keywords session scraping
VCS
bugtrack_url
requirements numpy requests_cache wayback func-timeout random_user_agent tenacity playwright
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # scrapesession

<a href="https://pypi.org/project/scrapesession/">
    <img alt="PyPi" src="https://img.shields.io/pypi/v/scrapesession">
</a>

A requests session meant for scraping with caching, backoffs and historical fallbacks.

## Dependencies :globe_with_meridians:

Python 3.11.6:

- [numpy](https://numpy.org/)
- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)
- [wayback](https://github.com/edgi-govdata-archiving/wayback)
- [func-timeout](https://github.com/kata198/func_timeout)
- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)
- [tenacity](https://github.com/jd/tenacity)
- [playwright](https://playwright.dev/)

## Raison D'ĂȘtre :thought_balloon:

`scrapesession` is a requests session that performs heavy caching and other tools in order to efficiently scrape sites.

## Architecture :triangular_ruler:

`scrapesession` is a requests session that has the following properties:

1. Handle non 302 redirects (such as in javascript).
2. Retries and backoffs.
3. User-Agent rotations.
4. Caching.
5. Wayback Machine integration.

## Installation :inbox_tray:

This is a python package hosted on pypi, so to install simply run the following command:

`pip install scrapesession`

or install using this local repository:

`python setup.py install --old-and-unmanageable`

## Usage example :eyes:

The use of `scrapesession` is entirely through code due to it being a library. It has exactly the same semantics as a requests session:

```python
from scrapesession.scrapesession import create_scrape_session


session = create_scrape_session()

response = session.get("http://www.helloworld.com")
print(response.text)
```

## License :memo:

The project is available under the [MIT License](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/8W9aG/scrapesession",
    "name": "scrapesession",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "session, scraping",
    "author": "Will Sackfield",
    "author_email": "will.sackfield@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/08/b8/ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3/scrapesession-0.0.5.tar.gz",
    "platform": null,
    "description": "# scrapesession\n\n<a href=\"https://pypi.org/project/scrapesession/\">\n    <img alt=\"PyPi\" src=\"https://img.shields.io/pypi/v/scrapesession\">\n</a>\n\nA requests session meant for scraping with caching, backoffs and historical fallbacks.\n\n## Dependencies :globe_with_meridians:\n\nPython 3.11.6:\n\n- [numpy](https://numpy.org/)\n- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)\n- [wayback](https://github.com/edgi-govdata-archiving/wayback)\n- [func-timeout](https://github.com/kata198/func_timeout)\n- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)\n- [tenacity](https://github.com/jd/tenacity)\n- [playwright](https://playwright.dev/)\n\n## Raison D'\u00eatre :thought_balloon:\n\n`scrapesession` is a requests session that performs heavy caching and other tools in order to efficiently scrape sites.\n\n## Architecture :triangular_ruler:\n\n`scrapesession` is a requests session that has the following properties:\n\n1. Handle non 302 redirects (such as in javascript).\n2. Retries and backoffs.\n3. User-Agent rotations.\n4. Caching.\n5. Wayback Machine integration.\n\n## Installation :inbox_tray:\n\nThis is a python package hosted on pypi, so to install simply run the following command:\n\n`pip install scrapesession`\n\nor install using this local repository:\n\n`python setup.py install --old-and-unmanageable`\n\n## Usage example :eyes:\n\nThe use of `scrapesession` is entirely through code due to it being a library. It has exactly the same semantics as a requests session:\n\n```python\nfrom scrapesession.scrapesession import create_scrape_session\n\n\nsession = create_scrape_session()\n\nresponse = session.get(\"http://www.helloworld.com\")\nprint(response.text)\n```\n\n## License :memo:\n\nThe project is available under the [MIT License](LICENSE).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for processing image features in a dataframe.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/8W9aG/scrapesession"
    },
    "split_keywords": [
        "session",
        " scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "08b8ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3",
                "md5": "6a87ebe516a12ca7b1e12aa3d189154b",
                "sha256": "3923f033b667fc5bbed210f5b746429bbc929207292014ccefb512c58421c797"
            },
            "downloads": -1,
            "filename": "scrapesession-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "6a87ebe516a12ca7b1e12aa3d189154b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9987,
            "upload_time": "2025-07-10T21:42:45",
            "upload_time_iso_8601": "2025-07-10T21:42:45.992213Z",
            "url": "https://files.pythonhosted.org/packages/08/b8/ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3/scrapesession-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 21:42:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "8W9aG",
    "github_project": "scrapesession",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "requests_cache",
            "specs": [
                [
                    ">=",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "wayback",
            "specs": [
                [
                    ">=",
                    "0.4.5"
                ]
            ]
        },
        {
            "name": "func-timeout",
            "specs": [
                [
                    ">=",
                    "4.3.5"
                ]
            ]
        },
        {
            "name": "random_user_agent",
            "specs": [
                [
                    ">=",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    ">=",
                    "9.1.2"
                ]
            ]
        },
        {
            "name": "playwright",
            "specs": [
                [
                    ">=",
                    "1.53.0"
                ]
            ]
        }
    ],
    "lcname": "scrapesession"
}
        
Elapsed time: 1.55876s