stealth-requests


Namestealth-requests JSON
Version 1.2.1 PyPI version JSON
download
home_pageNone
SummaryMake HTTP requests exactly like a browser.
upload_time2024-10-22 00:49:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords http requests scraping browser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
    <img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/0572cdf58d141239e945a1562490b1d00054379c/logo.png?raw=true">
</p>

<h1 align="center">Stay Undetected While Scraping the Web.</h1>

### The All-In-One Solution to Web Scraping:
- **Realistic HTTP Requests:**
    - Mimics browser headers for undetected scraping, adapting to the requested file type
    - Tracks dynamic headers such as `Referer` and `Host`
    - Masks the TLS fingerprint of HTTP requests using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
- **Faster and Easier Parsing:**
    - Automatically extracts metadata (title, description, author, etc.) from HTML-based responses
    - Methods to extract all webpage and image URLs
    - Seamlessly converts responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects

### Install

```
$ pip install stealth_requests
```

### Sending Requests

Stealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.

You can send one-off requests like such:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')
```

Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.

```python
from stealth_requests import StealthSession

with StealthSession() as session:
    resp = session.get('https://link-here.com')
```

When sending a request, or creating a `StealthSession`, you can specify the type of browser that you want the request to mimic - either `chrome`, which is the default, or `safari`. If you want to change which browser to mimic, set the `impersonate` argument, either in `requests.get` or when initializing `StealthSession` to `safari` or `chrome`.

### Sending Requests With Asyncio

This package supports Asyncio in the same way as the `requests` package:

```python
from stealth_requests import AsyncStealthSession

async with AsyncStealthSession(impersonate='safari') as session:
    resp = await session.get('https://link-here.com')
```

or, for a one-off request, you can make a request like this:

```python
import stealth_requests as requests

resp = await requests.get('https://link-here.com', impersonate='safari')
```

### Getting Response Metadata

The response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:

- title: `str | None`
- author: `str | None`
- description: `str | None`
- thumbnail: `str | None`
- canonical: `str | None`
- twitter_handle: `str | None`
- keywords: `tuple[str] | None`
- robots: `tuple[str] | None`

Here's an example of how to get the title of a page:

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')
print(resp.meta.title)
```

### Parsing Responses

To make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests - Lxml and BeautifulSoup4. To use these add-ons you need to install the `parsers` extra: 

```
$ pip install stealth_requests[parsers]
```

To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.

For simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:

- `text_content()`: Get all text content in a response
- `xpath()` Go right to using XPath expressions instead of getting your own Lxml tree.

### Get All Image and Page Links From a Response

If you would like to get all of the webpage URLs (`a` tags) from an HTML-based response, you can use the `links` property. If you'd like to get all image URLs (`img` tags) you can use the `images` property from a response object.

```python
import stealth_requests as requests

resp = requests.get('https://link-here.com')
for image_url in resp.images:
    # ...
```


### Getting HTML Responses in Markdown Format

In some cases, it’s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!

`markdown()` has two optional parameters:

1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.
2. `ignore_links` A boolean value that tells Html2Text whether it should include any links in the output of the Markdown.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "stealth-requests",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "HTTP, requests, scraping, browser",
    "author": null,
    "author_email": "Jacob Padilla <jp@jacobpadilla.com>",
    "download_url": "https://files.pythonhosted.org/packages/e3/1b/22a556b133d7978634e89e6a52d6a33d2208f0ff4669b1e28a016f347f4c/stealth_requests-1.2.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"https://github.com/jpjacobpadilla/Stealth-Requests/blob/0572cdf58d141239e945a1562490b1d00054379c/logo.png?raw=true\">\n</p>\n\n<h1 align=\"center\">Stay Undetected While Scraping the Web.</h1>\n\n### The All-In-One Solution to Web Scraping:\n- **Realistic HTTP Requests:**\n    - Mimics browser headers for undetected scraping, adapting to the requested file type\n    - Tracks dynamic headers such as `Referer` and `Host`\n    - Masks the TLS fingerprint of HTTP requests using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package\n- **Faster and Easier Parsing:**\n    - Automatically extracts metadata (title, description, author, etc.) from HTML-based responses\n    - Methods to extract all webpage and image URLs\n    - Seamlessly converts responses into [Lxml](https://lxml.de/apidoc/lxml.html) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) objects\n\n### Install\n\n```\n$ pip install stealth_requests\n```\n\n### Sending Requests\n\nStealth-Requests mimics the API of the [requests](https://requests.readthedocs.io/en/latest/) package, allowing you to use it in nearly the same way.\n\nYou can send one-off requests like such:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\n```\n\nOr you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.\n\n```python\nfrom stealth_requests import StealthSession\n\nwith StealthSession() as session:\n    resp = session.get('https://link-here.com')\n```\n\nWhen sending a request, or creating a `StealthSession`, you can specify the type of browser that you want the request to mimic - either `chrome`, which is the default, or `safari`. If you want to change which browser to mimic, set the `impersonate` argument, either in `requests.get` or when initializing `StealthSession` to `safari` or `chrome`.\n\n### Sending Requests With Asyncio\n\nThis package supports Asyncio in the same way as the `requests` package:\n\n```python\nfrom stealth_requests import AsyncStealthSession\n\nasync with AsyncStealthSession(impersonate='safari') as session:\n    resp = await session.get('https://link-here.com')\n```\n\nor, for a one-off request, you can make a request like this:\n\n```python\nimport stealth_requests as requests\n\nresp = await requests.get('https://link-here.com', impersonate='safari')\n```\n\n### Getting Response Metadata\n\nThe response returned from this package is a `StealthResponse`, which has all of the same methods and attributes as a standard [requests response object](https://requests.readthedocs.io/en/latest/api/#requests.Response), with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the `meta` property, which gives you access to the following metadata:\n\n- title: `str | None`\n- author: `str | None`\n- description: `str | None`\n- thumbnail: `str | None`\n- canonical: `str | None`\n- twitter_handle: `str | None`\n- keywords: `tuple[str] | None`\n- robots: `tuple[str] | None`\n\nHere's an example of how to get the title of a page:\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\nprint(resp.meta.title)\n```\n\n### Parsing Responses\n\nTo make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests - Lxml and BeautifulSoup4. To use these add-ons you need to install the `parsers` extra: \n\n```\n$ pip install stealth_requests[parsers]\n```\n\nTo easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.\n\nFor simple parsing, I've also added the following convenience methods, from the Lxml package, right into the `StealthResponse` object:\n\n- `text_content()`: Get all text content in a response\n- `xpath()` Go right to using XPath expressions instead of getting your own Lxml tree.\n\n### Get All Image and Page Links From a Response\n\nIf you would like to get all of the webpage URLs (`a` tags) from an HTML-based response, you can use the `links` property. If you'd like to get all image URLs (`img` tags) you can use the `images` property from a response object.\n\n```python\nimport stealth_requests as requests\n\nresp = requests.get('https://link-here.com')\nfor image_url in resp.images:\n    # ...\n```\n\n\n### Getting HTML Responses in Markdown Format\n\nIn some cases, it\u2019s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the `resp.markdown()` method to convert the response into a Markdown string, providing a simplified and readable version of the page content!\n\n`markdown()` has two optional parameters:\n\n1. `content_xpath` An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.\n2. `ignore_links` A boolean value that tells Html2Text whether it should include any links in the output of the Markdown.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Make HTTP requests exactly like a browser.",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/jpjacobpadilla/Stealth-Requests"
    },
    "split_keywords": [
        "http",
        " requests",
        " scraping",
        " browser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e5bb6fc6a17c3a37cc651ce09108508ef5cc581be8ec5b61be5d959ed4e0ff79",
                "md5": "3a3a400cde174ec13be40f70265b6afd",
                "sha256": "0a1c2b926d39c2dbd5074cb5a789973eead3c3ff3b604ead7b0ed739533a88f7"
            },
            "downloads": -1,
            "filename": "stealth_requests-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a3a400cde174ec13be40f70265b6afd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8255,
            "upload_time": "2024-10-22T00:49:43",
            "upload_time_iso_8601": "2024-10-22T00:49:43.251679Z",
            "url": "https://files.pythonhosted.org/packages/e5/bb/6fc6a17c3a37cc651ce09108508ef5cc581be8ec5b61be5d959ed4e0ff79/stealth_requests-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e31b22a556b133d7978634e89e6a52d6a33d2208f0ff4669b1e28a016f347f4c",
                "md5": "13fba6d8c22ed52835fcb1def321d5f9",
                "sha256": "48cf22d32f56ee987852f7b48203d802ca8b6a1d268e6dae659400ea88770c87"
            },
            "downloads": -1,
            "filename": "stealth_requests-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "13fba6d8c22ed52835fcb1def321d5f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 9886,
            "upload_time": "2024-10-22T00:49:45",
            "upload_time_iso_8601": "2024-10-22T00:49:45.543817Z",
            "url": "https://files.pythonhosted.org/packages/e3/1b/22a556b133d7978634e89e6a52d6a33d2208f0ff4669b1e28a016f347f4c/stealth_requests-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 00:49:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jpjacobpadilla",
    "github_project": "Stealth-Requests",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "stealth-requests"
}
        
Elapsed time: 0.65208s