# scrapesession
<a href="https://pypi.org/project/scrapesession/">
<img alt="PyPi" src="https://img.shields.io/pypi/v/scrapesession">
</a>
A requests session meant for scraping with caching, backoffs and historical fallbacks.
## Dependencies :globe_with_meridians:
Python 3.11.6:
- [numpy](https://numpy.org/)
- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)
- [wayback](https://github.com/edgi-govdata-archiving/wayback)
- [func-timeout](https://github.com/kata198/func_timeout)
- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)
- [tenacity](https://github.com/jd/tenacity)
- [playwright](https://playwright.dev/)
## Raison D'ĂȘtre :thought_balloon:
`scrapesession` is a requests session that performs heavy caching and other tools in order to efficiently scrape sites.
## Architecture :triangular_ruler:
`scrapesession` is a requests session that has the following properties:
1. Handle non 302 redirects (such as in javascript).
2. Retries and backoffs.
3. User-Agent rotations.
4. Caching.
5. Wayback Machine integration.
## Installation :inbox_tray:
This is a python package hosted on pypi, so to install simply run the following command:
`pip install scrapesession`
or install using this local repository:
`python setup.py install --old-and-unmanageable`
## Usage example :eyes:
The use of `scrapesession` is entirely through code due to it being a library. It has exactly the same semantics as a requests session:
```python
from scrapesession.scrapesession import create_scrape_session
session = create_scrape_session()
response = session.get("http://www.helloworld.com")
print(response.text)
```
## License :memo:
The project is available under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/8W9aG/scrapesession",
"name": "scrapesession",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "session, scraping",
"author": "Will Sackfield",
"author_email": "will.sackfield@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/08/b8/ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3/scrapesession-0.0.5.tar.gz",
"platform": null,
"description": "# scrapesession\n\n<a href=\"https://pypi.org/project/scrapesession/\">\n <img alt=\"PyPi\" src=\"https://img.shields.io/pypi/v/scrapesession\">\n</a>\n\nA requests session meant for scraping with caching, backoffs and historical fallbacks.\n\n## Dependencies :globe_with_meridians:\n\nPython 3.11.6:\n\n- [numpy](https://numpy.org/)\n- [requests-cache](https://requests-cache.readthedocs.io/en/stable/)\n- [wayback](https://github.com/edgi-govdata-archiving/wayback)\n- [func-timeout](https://github.com/kata198/func_timeout)\n- [random_user_agent](https://github.com/Luqman-Ud-Din/random_user_agent)\n- [tenacity](https://github.com/jd/tenacity)\n- [playwright](https://playwright.dev/)\n\n## Raison D'\u00eatre :thought_balloon:\n\n`scrapesession` is a requests session that performs heavy caching and other tools in order to efficiently scrape sites.\n\n## Architecture :triangular_ruler:\n\n`scrapesession` is a requests session that has the following properties:\n\n1. Handle non 302 redirects (such as in javascript).\n2. Retries and backoffs.\n3. User-Agent rotations.\n4. Caching.\n5. Wayback Machine integration.\n\n## Installation :inbox_tray:\n\nThis is a python package hosted on pypi, so to install simply run the following command:\n\n`pip install scrapesession`\n\nor install using this local repository:\n\n`python setup.py install --old-and-unmanageable`\n\n## Usage example :eyes:\n\nThe use of `scrapesession` is entirely through code due to it being a library. It has exactly the same semantics as a requests session:\n\n```python\nfrom scrapesession.scrapesession import create_scrape_session\n\n\nsession = create_scrape_session()\n\nresponse = session.get(\"http://www.helloworld.com\")\nprint(response.text)\n```\n\n## License :memo:\n\nThe project is available under the [MIT License](LICENSE).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library for processing image features in a dataframe.",
"version": "0.0.5",
"project_urls": {
"Homepage": "https://github.com/8W9aG/scrapesession"
},
"split_keywords": [
"session",
" scraping"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "08b8ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3",
"md5": "6a87ebe516a12ca7b1e12aa3d189154b",
"sha256": "3923f033b667fc5bbed210f5b746429bbc929207292014ccefb512c58421c797"
},
"downloads": -1,
"filename": "scrapesession-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "6a87ebe516a12ca7b1e12aa3d189154b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9987,
"upload_time": "2025-07-10T21:42:45",
"upload_time_iso_8601": "2025-07-10T21:42:45.992213Z",
"url": "https://files.pythonhosted.org/packages/08/b8/ed5952378a9dc9a887d1daf6b92d35f82b7ef93f5730b81f32113df303a3/scrapesession-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 21:42:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "8W9aG",
"github_project": "scrapesession",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.26.4"
]
]
},
{
"name": "requests_cache",
"specs": [
[
">=",
"1.2.1"
]
]
},
{
"name": "wayback",
"specs": [
[
">=",
"0.4.5"
]
]
},
{
"name": "func-timeout",
"specs": [
[
">=",
"4.3.5"
]
]
},
{
"name": "random_user_agent",
"specs": [
[
">=",
"1.0.1"
]
]
},
{
"name": "tenacity",
"specs": [
[
">=",
"9.1.2"
]
]
},
{
"name": "playwright",
"specs": [
[
">=",
"1.53.0"
]
]
}
],
"lcname": "scrapesession"
}