Name | stealth-requests JSON |
Version |
2.0.4
JSON |
| download |
home_page | None |
Summary | Undetected web-scraping & seamless HTML parsing in Python! |
upload_time | 2025-07-11 03:30:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT |
keywords |
http
requests
scraping
browser
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<p align="center">
<img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/173df6b8a8ef53bd1fd514b85291c5f98530a462/logo.png?raw=true">
</p>
<h1 align="center">The Easiest Way to Scrape the Web</h1>
<p align="center"><a href="https://github.com/jpjacobpadilla/stealth-requests/blob/main/LICENSE"><img src="https://img.shields.io/github/license/jpjacobpadilla/stealth-requests.svg?color=green"></a> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.9%2B-green" alt="Python 3.8+"></a> <a href="https://pypi.org/project/stealth-requests/"><img alt="PyPI" src="https://img.shields.io/pypi/v/stealth-requests.svg?color=green"></a> <a href="https://pepy.tech/project/stealth-requests"><img alt="PyPI installs" src="https://img.shields.io/pepy/dt/stealth-requests?label=pypi%20installs&color=green"></a></p>
### Features
- **Realistic HTTP Requests:**
- Mimics the Chrome browser for undetected scraping using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
- Automatically rotates User Agents between requests
- Tracks and updates the `Referer` header to simulate realistic request chains
- Built-in retry logic for failed requests (e.g. 429, 503, 522)
- **Faster and Easier Parsing:**
- Extract emails, phone numbers, images, and links from responses
- Automatically extract metadata (title, description, author, etc.) from HTML-based responses
- Seamlessly convert responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects for more parsing
- Easily convert full or specific sections of HTML to Markdown
### Install
```
$ pip install stealth_requests
```
### Table of Contents
- [Sending Requests](#sending-requests)
- [Sending Requests With Asyncio](#sending-requests-with-asyncio)
- [Accessing Page Metadata](#accessing-page-metadata)
- [Extracting Emails, Phone Numbers, Images, and Links](#extracting-emails-phone-numbers-images-and-links)
- [More Parsing Options](#more-parsing-options)
- [Converting Responses to Markdown](#converting-responses-to-markdown)
- [Using Proxies](#using-proxies)
### Sending Requests
Stealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.
You can send one-off requests like this:
```python
import stealth_requests as requests
resp = requests.get('https://link-here.com')
```
Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.
```python
from stealth_requests import StealthSession
with StealthSession() as session:
resp = session.get('https://link-here.com')
```
Stealth-Requests has a built-in retry feature that automatically waits 2 seconds and retries the request if it fails due to certain status codes (like 429, 503, etc.).
To enable retries, just pass the number of retry attempts using the `retry` argument:
```python
import stealth_requests as requests
resp = requests.get('https://link-here.com', retry=3)
```
### Sending Requests With Asyncio
Stealth-Requests supports Asyncio in the same way as the `requests` package:
```python
from stealth_requests import AsyncStealthSession
async with AsyncStealthSession() as session:
resp = await session.get('https://link-here.com')
```
### Accessing Page Metadata
The response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:
- title: `str | None`
- author: `str | None`
- description: `str | None`
- thumbnail: `str | None`
- canonical: `str | None`
- twitter_handle: `str | None`
- keywords: `tuple[str] | None`
- robots: `tuple[str] | None`
Here's an example of how to get the title of a page:
```python
import stealth_requests as requests
resp = requests.get('https://link-here.com')
print(resp.meta.title)
```
### Extracting Emails, Phone Numbers, Images, and Links
The `StealthResponse` object includes some helpful properties for extracting common data:
```python
import stealth_requests as requests
resp = requests.get('https://link-here.com')
print(resp.emails)
# Output: ('info@example.com', 'support@example.com')
print(resp.phone_numbers)
# Output: ('+1 (800) 123-4567', '212-555-7890')
print(resp.images)
# Output: ('https://example.com/logo.png', 'https://cdn.example.com/banner.jpg')
print(resp.links)
# Output: ('https://example.com/about', 'https://example.com/contact')
```
### More Parsing Options
To make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests: Lxml and BeautifulSoup4. To use these add-ons, you need to install the `parsers` extra:
```
$ pip install 'stealth_requests[parsers]'
```
To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.
For simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:
- `text_content()`: Get all text content in a response
- `xpath()`: Go right to using XPath expressions instead of getting your own Lxml tree.
### Converting Responses to Markdown
In some cases, it’s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!
`markdown()` has two optional parameters:
1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.
2. `ignore_links` A boolean value that tells `Html2Text` whether to include links in the Markdown output.
### Using Proxies
Stealth-Requests supports proxy usage through a `proxies` dictionary argument, similar to the standard requests package.
You can pass both HTTP and HTTPS proxy URLs when making a request:
```python
import stealth_requests as requests
proxies = {
"http": "http://username:password@proxyhost:port",
"https": "http://username:password@proxyhost:port",
}
resp = requests.get('https://link-here.com', proxies=proxies)
```
### Contributing
Contributions are welcome! Feel free to open issues or submit pull requests.
Before submitting a pull request, please format your code with Ruff: `uvx ruff format stealth_requests/`
[↑ Back to top](#table-of-contents)
Raw data
{
"_id": null,
"home_page": null,
"name": "stealth-requests",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "HTTP, requests, scraping, browser",
"author": null,
"author_email": "Jacob Padilla <stealth-requests@jacobpadilla.com>",
"download_url": "https://files.pythonhosted.org/packages/b3/41/868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b/stealth_requests-2.0.4.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://github.com/jpjacobpadilla/Stealth-Requests/blob/173df6b8a8ef53bd1fd514b85291c5f98530a462/logo.png?raw=true\">\n</p>\n\n<h1 align=\"center\">The Easiest Way to Scrape the Web</h1>\n\n<p align=\"center\"><a href=\"https://github.com/jpjacobpadilla/stealth-requests/blob/main/LICENSE\"><img src=\"https://img.shields.io/github/license/jpjacobpadilla/stealth-requests.svg?color=green\"></a> <a href=\"https://www.python.org/\"><img src=\"https://img.shields.io/badge/python-3.9%2B-green\" alt=\"Python 3.8+\"></a> <a href=\"https://pypi.org/project/stealth-requests/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/stealth-requests.svg?color=green\"></a> <a href=\"https://pepy.tech/project/stealth-requests\"><img alt=\"PyPI installs\" src=\"https://img.shields.io/pepy/dt/stealth-requests?label=pypi%20installs&color=green\"></a></p>\n\n\n### Features\n- **Realistic HTTP Requests:**\n - Mimics the Chrome browser for undetected scraping using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package\n - Automatically rotates User Agents between requests\n - Tracks and updates the `Referer` header to simulate realistic request chains\n - Built-in retry logic for failed requests (e.g. 429, 503, 522)\n- **Faster and Easier Parsing:**\n - Extract emails, phone numbers, images, and links from responses\n - Automatically extract metadata (title, description, author, etc.) from HTML-based responses\n - Seamlessly convert responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects for more parsing\n - Easily convert full or specific sections of HTML to Markdown\n\n\n### Install\n\n```\n$ pip install stealth_requests\n```\n\n\n### Table of Contents\n\n- [Sending Requests](#sending-requests)\n- [Sending Requests With Asyncio](#sending-requests-with-asyncio)\n- [Accessing Page Metadata](#accessing-page-metadata)\n- [Extracting Emails, Phone Numbers, Images, and Links](#extracting-emails-phone-numbers-images-and-links)\n- [More Parsing Options](#more-parsing-options)\n- [Converting Responses to Markdown](#converting-responses-to-markdown)\n- [Using Proxies](#using-proxies)\n\n\n### Sending Requests\n\nStealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.\n\nYou can send one-off requests like this:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\n```\n\nOr you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.\n\n```python\nfrom stealth_requests import StealthSession\n\nwith StealthSession() as session:\n resp = session.get('https://link-here.com')\n```\n\nStealth-Requests has a built-in retry feature that automatically waits 2 seconds and retries the request if it fails due to certain status codes (like 429, 503, etc.).\n\nTo enable retries, just pass the number of retry attempts using the `retry` argument:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com', retry=3)\n```\n\n### Sending Requests With Asyncio\n\nStealth-Requests supports Asyncio in the same way as the `requests` package:\n\n```python\nfrom stealth_requests import AsyncStealthSession\n\nasync with AsyncStealthSession() as session:\n resp = await session.get('https://link-here.com')\n```\n\n\n### Accessing Page Metadata\n\nThe response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:\n\n- title: `str | None`\n- author: `str | None`\n- description: `str | None`\n- thumbnail: `str | None`\n- canonical: `str | None`\n- twitter_handle: `str | None`\n- keywords: `tuple[str] | None`\n- robots: `tuple[str] | None`\n\nHere's an example of how to get the title of a page:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\nprint(resp.meta.title)\n```\n\n\n### Extracting Emails, Phone Numbers, Images, and Links\n\nThe `StealthResponse` object includes some helpful properties for extracting common data:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\n\nprint(resp.emails)\n# Output: ('info@example.com', 'support@example.com')\n\nprint(resp.phone_numbers)\n# Output: ('+1 (800) 123-4567', '212-555-7890')\n\nprint(resp.images)\n# Output: ('https://example.com/logo.png', 'https://cdn.example.com/banner.jpg')\n\nprint(resp.links)\n# Output: ('https://example.com/about', 'https://example.com/contact')\n```\n\n\n### More Parsing Options\n\nTo make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests: Lxml and BeautifulSoup4. To use these add-ons, you need to install the `parsers` extra:\n\n```\n$ pip install 'stealth_requests[parsers]'\n```\n\nTo easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.\n\nFor simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:\n\n- `text_content()`: Get all text content in a response\n- `xpath()`: Go right to using XPath expressions instead of getting your own Lxml tree.\n\n\n### Converting Responses to Markdown\n\nIn some cases, it\u2019s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!\n\n`markdown()` has two optional parameters:\n\n1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.\n2. `ignore_links` A boolean value that tells `Html2Text` whether to include links in the Markdown output.\n\n\n### Using Proxies\n\nStealth-Requests supports proxy usage through a `proxies` dictionary argument, similar to the standard requests package.\n\nYou can pass both HTTP and HTTPS proxy URLs when making a request:\n\n```python\nimport stealth_requests as requests\n\nproxies = {\n \"http\": \"http://username:password@proxyhost:port\",\n \"https\": \"http://username:password@proxyhost:port\",\n}\n\nresp = requests.get('https://link-here.com', proxies=proxies)\n```\n\n\n### Contributing\n\nContributions are welcome! Feel free to open issues or submit pull requests.\n\nBefore submitting a pull request, please format your code with Ruff: `uvx ruff format stealth_requests/`\n\n\n[\u2191 Back to top](#table-of-contents)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Undetected web-scraping & seamless HTML parsing in Python!",
"version": "2.0.4",
"project_urls": {
"Homepage": "https://github.com/jpjacobpadilla/Stealth-Requests"
},
"split_keywords": [
"http",
" requests",
" scraping",
" browser"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4c5db724610c042001a015cf871094e8c2fc494bc4cf5eeace42e1fa01197611",
"md5": "fb5167239bc167910e0193807e37fc79",
"sha256": "e345cb012f3764901c9bd38a8761ce4f4a587e8312db9f9875883d4e924fbe1c"
},
"downloads": -1,
"filename": "stealth_requests-2.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fb5167239bc167910e0193807e37fc79",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9029,
"upload_time": "2025-07-11T03:30:55",
"upload_time_iso_8601": "2025-07-11T03:30:55.626970Z",
"url": "https://files.pythonhosted.org/packages/4c/5d/b724610c042001a015cf871094e8c2fc494bc4cf5eeace42e1fa01197611/stealth_requests-2.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b341868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b",
"md5": "0fd0f755daa41a0292f6c270ca0b2395",
"sha256": "881feb76ec17acf8ac919038cda1ac5e6facc04d11570fe588497a9e13684217"
},
"downloads": -1,
"filename": "stealth_requests-2.0.4.tar.gz",
"has_sig": false,
"md5_digest": "0fd0f755daa41a0292f6c270ca0b2395",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 11075,
"upload_time": "2025-07-11T03:30:56",
"upload_time_iso_8601": "2025-07-11T03:30:56.416107Z",
"url": "https://files.pythonhosted.org/packages/b3/41/868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b/stealth_requests-2.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 03:30:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jpjacobpadilla",
"github_project": "Stealth-Requests",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "stealth-requests"
}