stealth-requests

Name	stealth-requests JSON
Version	2.0.4 JSON
	download
home_page	None
Summary	Undetected web-scraping & seamless HTML parsing in Python!
upload_time	2025-07-11 03:30:56
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT
keywords	http requests scraping browser
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
    <img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/173df6b8a8ef53bd1fd514b85291c5f98530a462/logo.png?raw=true">
</p>

<h1 align="center">The Easiest Way to Scrape the Web</h1>

<p align="center"><a href="https://github.com/jpjacobpadilla/stealth-requests/blob/main/LICENSE"><img src="https://img.shields.io/github/license/jpjacobpadilla/stealth-requests.svg?color=green"></a> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.9%2B-green" alt="Python 3.8+"></a> <a href="https://pypi.org/project/stealth-requests/"><img alt="PyPI" src="https://img.shields.io/pypi/v/stealth-requests.svg?color=green"></a> <a href="https://pepy.tech/project/stealth-requests"><img alt="PyPI installs" src="https://img.shields.io/pepy/dt/stealth-requests?label=pypi%20installs&color=green"></a></p>


### Features
- **Realistic HTTP Requests:**
    - Mimics the Chrome browser for undetected scraping using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
    - Automatically rotates User Agents between requests
    - Tracks and updates the `Referer` header to simulate realistic request chains
    - Built-in retry logic for failed requests (e.g. 429, 503, 522)
- **Faster and Easier Parsing:**
    - Extract emails, phone numbers, images, and links from responses
    - Automatically extract metadata (title, description, author, etc.) from HTML-based responses
    - Seamlessly convert responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects for more parsing
    - Easily convert full or specific sections of HTML to Markdown


### Install

```
$ pip install stealth_requests
```


### Table of Contents

- [Sending Requests](#sending-requests)
- [Sending Requests With Asyncio](#sending-requests-with-asyncio)
- [Accessing Page Metadata](#accessing-page-metadata)
- [Extracting Emails, Phone Numbers, Images, and Links](#extracting-emails-phone-numbers-images-and-links)
- [More Parsing Options](#more-parsing-options)
- [Converting Responses to Markdown](#converting-responses-to-markdown)
- [Using Proxies](#using-proxies)


### Sending Requests

Stealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.

You can send one-off requests like this:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')
```

Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.

```python
from stealth_requests import StealthSession

with StealthSession() as session:
    resp = session.get('https://link-here.com')
```

Stealth-Requests has a built-in retry feature that automatically waits 2 seconds and retries the request if it fails due to certain status codes (like 429, 503, etc.).

To enable retries, just pass the number of retry attempts using the `retry` argument:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com', retry=3)
```

### Sending Requests With Asyncio

Stealth-Requests supports Asyncio in the same way as the `requests` package:

```python
from stealth_requests import AsyncStealthSession

async with AsyncStealthSession() as session:
    resp = await session.get('https://link-here.com')
```


### Accessing Page Metadata

The response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:

- title: `str | None`
- author: `str | None`
- description: `str | None`
- thumbnail: `str | None`
- canonical: `str | None`
- twitter_handle: `str | None`
- keywords: `tuple[str] | None`
- robots: `tuple[str] | None`

Here's an example of how to get the title of a page:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')
print(resp.meta.title)
```


### Extracting Emails, Phone Numbers, Images, and Links

The `StealthResponse` object includes some helpful properties for extracting common data:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')

print(resp.emails)
# Output: ('info@example.com', 'support@example.com')

print(resp.phone_numbers)
# Output: ('+1 (800) 123-4567', '212-555-7890')

print(resp.images)
# Output: ('https://example.com/logo.png', 'https://cdn.example.com/banner.jpg')

print(resp.links)
# Output: ('https://example.com/about', 'https://example.com/contact')
```


### More Parsing Options

To make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests: Lxml and BeautifulSoup4. To use these add-ons, you need to install the `parsers` extra:

```
$ pip install 'stealth_requests[parsers]'
```

To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.

For simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:

- `text_content()`: Get all text content in a response
- `xpath()`: Go right to using XPath expressions instead of getting your own Lxml tree.


### Converting Responses to Markdown

In some cases, it’s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!

`markdown()` has two optional parameters:

1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.
2. `ignore_links` A boolean value that tells `Html2Text` whether to include links in the Markdown output.


### Using Proxies

Stealth-Requests supports proxy usage through a `proxies` dictionary argument, similar to the standard requests package.

You can pass both HTTP and HTTPS proxy URLs when making a request:

```python
import stealth_requests as requests

proxies = {
    "http": "http://username:password@proxyhost:port",
    "https": "http://username:password@proxyhost:port",
}

resp = requests.get('https://link-here.com', proxies=proxies)
```


### Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

Before submitting a pull request, please format your code with Ruff: `uvx ruff format stealth_requests/`


[↑ Back to top](#table-of-contents)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "stealth-requests",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "HTTP, requests, scraping, browser",
    "author": null,
    "author_email": "Jacob Padilla <stealth-requests@jacobpadilla.com>",
    "download_url": "https://files.pythonhosted.org/packages/b3/41/868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b/stealth_requests-2.0.4.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"https://github.com/jpjacobpadilla/Stealth-Requests/blob/173df6b8a8ef53bd1fd514b85291c5f98530a462/logo.png?raw=true\">\n</p>\n\n<h1 align=\"center\">The Easiest Way to Scrape the Web</h1>\n\n<p align=\"center\"><a href=\"https://github.com/jpjacobpadilla/stealth-requests/blob/main/LICENSE\"><img src=\"https://img.shields.io/github/license/jpjacobpadilla/stealth-requests.svg?color=green\"></a> <a href=\"https://www.python.org/\"><img src=\"https://img.shields.io/badge/python-3.9%2B-green\" alt=\"Python 3.8+\"></a> <a href=\"https://pypi.org/project/stealth-requests/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/stealth-requests.svg?color=green\"></a> <a href=\"https://pepy.tech/project/stealth-requests\"><img alt=\"PyPI installs\" src=\"https://img.shields.io/pepy/dt/stealth-requests?label=pypi%20installs&color=green\"></a></p>\n\n\n### Features\n- **Realistic HTTP Requests:**\n    - Mimics the Chrome browser for undetected scraping using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package\n    - Automatically rotates User Agents between requests\n    - Tracks and updates the `Referer` header to simulate realistic request chains\n    - Built-in retry logic for failed requests (e.g. 429, 503, 522)\n- **Faster and Easier Parsing:**\n    - Extract emails, phone numbers, images, and links from responses\n    - Automatically extract metadata (title, description, author, etc.) from HTML-based responses\n    - Seamlessly convert responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects for more parsing\n    - Easily convert full or specific sections of HTML to Markdown\n\n\n### Install\n\n```\n$ pip install stealth_requests\n```\n\n\n### Table of Contents\n\n- [Sending Requests](#sending-requests)\n- [Sending Requests With Asyncio](#sending-requests-with-asyncio)\n- [Accessing Page Metadata](#accessing-page-metadata)\n- [Extracting Emails, Phone Numbers, Images, and Links](#extracting-emails-phone-numbers-images-and-links)\n- [More Parsing Options](#more-parsing-options)\n- [Converting Responses to Markdown](#converting-responses-to-markdown)\n- [Using Proxies](#using-proxies)\n\n\n### Sending Requests\n\nStealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.\n\nYou can send one-off requests like this:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\n```\n\nOr you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.\n\n```python\nfrom stealth_requests import StealthSession\n\nwith StealthSession() as session:\n    resp = session.get('https://link-here.com')\n```\n\nStealth-Requests has a built-in retry feature that automatically waits 2 seconds and retries the request if it fails due to certain status codes (like 429, 503, etc.).\n\nTo enable retries, just pass the number of retry attempts using the `retry` argument:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com', retry=3)\n```\n\n### Sending Requests With Asyncio\n\nStealth-Requests supports Asyncio in the same way as the `requests` package:\n\n```python\nfrom stealth_requests import AsyncStealthSession\n\nasync with AsyncStealthSession() as session:\n    resp = await session.get('https://link-here.com')\n```\n\n\n### Accessing Page Metadata\n\nThe response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:\n\n- title: `str | None`\n- author: `str | None`\n- description: `str | None`\n- thumbnail: `str | None`\n- canonical: `str | None`\n- twitter_handle: `str | None`\n- keywords: `tuple[str] | None`\n- robots: `tuple[str] | None`\n\nHere's an example of how to get the title of a page:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\nprint(resp.meta.title)\n```\n\n\n### Extracting Emails, Phone Numbers, Images, and Links\n\nThe `StealthResponse` object includes some helpful properties for extracting common data:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\n\nprint(resp.emails)\n# Output: ('info@example.com', 'support@example.com')\n\nprint(resp.phone_numbers)\n# Output: ('+1 (800) 123-4567', '212-555-7890')\n\nprint(resp.images)\n# Output: ('https://example.com/logo.png', 'https://cdn.example.com/banner.jpg')\n\nprint(resp.links)\n# Output: ('https://example.com/about', 'https://example.com/contact')\n```\n\n\n### More Parsing Options\n\nTo make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests: Lxml and BeautifulSoup4. To use these add-ons, you need to install the `parsers` extra:\n\n```\n$ pip install 'stealth_requests[parsers]'\n```\n\nTo easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.\n\nFor simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:\n\n- `text_content()`: Get all text content in a response\n- `xpath()`: Go right to using XPath expressions instead of getting your own Lxml tree.\n\n\n### Converting Responses to Markdown\n\nIn some cases, it\u2019s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!\n\n`markdown()` has two optional parameters:\n\n1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.\n2. `ignore_links` A boolean value that tells `Html2Text` whether to include links in the Markdown output.\n\n\n### Using Proxies\n\nStealth-Requests supports proxy usage through a `proxies` dictionary argument, similar to the standard requests package.\n\nYou can pass both HTTP and HTTPS proxy URLs when making a request:\n\n```python\nimport stealth_requests as requests\n\nproxies = {\n    \"http\": \"http://username:password@proxyhost:port\",\n    \"https\": \"http://username:password@proxyhost:port\",\n}\n\nresp = requests.get('https://link-here.com', proxies=proxies)\n```\n\n\n### Contributing\n\nContributions are welcome! Feel free to open issues or submit pull requests.\n\nBefore submitting a pull request, please format your code with Ruff: `uvx ruff format stealth_requests/`\n\n\n[\u2191 Back to top](#table-of-contents)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Undetected web-scraping & seamless HTML parsing in Python!",
    "version": "2.0.4",
    "project_urls": {
        "Homepage": "https://github.com/jpjacobpadilla/Stealth-Requests"
    },
    "split_keywords": [
        "http",
        " requests",
        " scraping",
        " browser"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4c5db724610c042001a015cf871094e8c2fc494bc4cf5eeace42e1fa01197611",
                "md5": "fb5167239bc167910e0193807e37fc79",
                "sha256": "e345cb012f3764901c9bd38a8761ce4f4a587e8312db9f9875883d4e924fbe1c"
            },
            "downloads": -1,
            "filename": "stealth_requests-2.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fb5167239bc167910e0193807e37fc79",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9029,
            "upload_time": "2025-07-11T03:30:55",
            "upload_time_iso_8601": "2025-07-11T03:30:55.626970Z",
            "url": "https://files.pythonhosted.org/packages/4c/5d/b724610c042001a015cf871094e8c2fc494bc4cf5eeace42e1fa01197611/stealth_requests-2.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b341868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b",
                "md5": "0fd0f755daa41a0292f6c270ca0b2395",
                "sha256": "881feb76ec17acf8ac919038cda1ac5e6facc04d11570fe588497a9e13684217"
            },
            "downloads": -1,
            "filename": "stealth_requests-2.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "0fd0f755daa41a0292f6c270ca0b2395",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11075,
            "upload_time": "2025-07-11T03:30:56",
            "upload_time_iso_8601": "2025-07-11T03:30:56.416107Z",
            "url": "https://files.pythonhosted.org/packages/b3/41/868f9fde654174b1d51fec665f35bd9ea85037f46217826c8a030152b78b/stealth_requests-2.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 03:30:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jpjacobpadilla",
    "github_project": "Stealth-Requests",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "stealth-requests"
}

None