PlaywrightCapture


NamePlaywrightCapture JSON
Version 1.24.5 PyPI version JSON
download
home_pagehttps://github.com/Lookyloo/PlaywrightCapture
SummaryA simple library to capture websites using playwright
upload_time2024-04-19 10:36:54
maintainerNone
docs_urlNone
authorRaphaël Vinot
requires_python<4.0,>=3.8
licenseBSD-3-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Playwright Capture

Simple replacement for [splash](https://github.com/scrapinghub/splash) using [playwright](https://github.com/microsoft/playwright-python).

# Install

```bash
pip install playwrightcapture
```

# Usage

A very basic example:

```python
from playwrightcapture import Capture

async with Capture() as capture:
    await capture.initialize_context()
    entries = await capture.capture_page(url, max_depth_capture_time=90)
```

Entries is a dictionaries that contains (if all goes well) the HAR, the screenshot, all the cookies of the session, the URL as it is in the browser at the end of the capture, and the full HTML page as rendered.


# reCAPTCHA bypass

No blackmagic, it is just a reimplementation of a [well known technique](https://github.com/NikolaiT/uncaptcha3)
as implemented [there](https://github.com/Binit-Dhakal/Google-reCAPTCHA-v3-solver-using-playwright-python),
and [there](https://github.com/embium/solverecaptchas).

This modules will try to bypass reCAPTCHA protected websites if you install it this way:

```bash
pip install playwrightcapture[recaptcha]
```

This will install `requests`, `pydub` and `SpeechRecognition`. In order to work, `pydub`
requires `ffmpeg` or `libav`, look at the [install guide ](https://github.com/jiaaro/pydub#installation)
for more details.
`SpeechRecognition` uses the Google Speech Recognition API to turn the audio file into text (I hope you appreciate the irony).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Lookyloo/PlaywrightCapture",
    "name": "PlaywrightCapture",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Rapha\u00ebl Vinot",
    "author_email": "raphael.vinot@circl.lu",
    "download_url": "https://files.pythonhosted.org/packages/ae/81/6d8aedb55086fce1392d9b03d318e344611c73d9b9262269841779a7895d/playwrightcapture-1.24.5.tar.gz",
    "platform": null,
    "description": "# Playwright Capture\n\nSimple replacement for [splash](https://github.com/scrapinghub/splash) using [playwright](https://github.com/microsoft/playwright-python).\n\n# Install\n\n```bash\npip install playwrightcapture\n```\n\n# Usage\n\nA very basic example:\n\n```python\nfrom playwrightcapture import Capture\n\nasync with Capture() as capture:\n    await capture.initialize_context()\n    entries = await capture.capture_page(url, max_depth_capture_time=90)\n```\n\nEntries is a dictionaries that contains (if all goes well) the HAR, the screenshot, all the cookies of the session, the URL as it is in the browser at the end of the capture, and the full HTML page as rendered.\n\n\n# reCAPTCHA bypass\n\nNo blackmagic, it is just a reimplementation of a [well known technique](https://github.com/NikolaiT/uncaptcha3)\nas implemented [there](https://github.com/Binit-Dhakal/Google-reCAPTCHA-v3-solver-using-playwright-python),\nand [there](https://github.com/embium/solverecaptchas).\n\nThis modules will try to bypass reCAPTCHA protected websites if you install it this way:\n\n```bash\npip install playwrightcapture[recaptcha]\n```\n\nThis will install `requests`, `pydub` and `SpeechRecognition`. In order to work, `pydub`\nrequires `ffmpeg` or `libav`, look at the [install guide ](https://github.com/jiaaro/pydub#installation)\nfor more details.\n`SpeechRecognition` uses the Google Speech Recognition API to turn the audio file into text (I hope you appreciate the irony).\n\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A simple library to capture websites using playwright",
    "version": "1.24.5",
    "project_urls": {
        "Homepage": "https://github.com/Lookyloo/PlaywrightCapture",
        "Repository": "https://github.com/Lookyloo/PlaywrightCapture"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ba8a7f081b800eb1efb34492f213929a30241af535b6cef90f815a3a0a2ce5a",
                "md5": "14513a8eede46d84cf77d212dbc820ff",
                "sha256": "577b6831c02abbc6895248dd7a3c665a0ae2cb6396a8e11e8e286b31b33e5a60"
            },
            "downloads": -1,
            "filename": "playwrightcapture-1.24.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "14513a8eede46d84cf77d212dbc820ff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 19604,
            "upload_time": "2024-04-19T10:36:52",
            "upload_time_iso_8601": "2024-04-19T10:36:52.571730Z",
            "url": "https://files.pythonhosted.org/packages/9b/a8/a7f081b800eb1efb34492f213929a30241af535b6cef90f815a3a0a2ce5a/playwrightcapture-1.24.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae816d8aedb55086fce1392d9b03d318e344611c73d9b9262269841779a7895d",
                "md5": "fcbc6fff8e6105bd90463f3abd45c1ab",
                "sha256": "0fdc6cd9ee18316614e6606f3025c5b88502800f1f250e291685edb312da6445"
            },
            "downloads": -1,
            "filename": "playwrightcapture-1.24.5.tar.gz",
            "has_sig": false,
            "md5_digest": "fcbc6fff8e6105bd90463f3abd45c1ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 19002,
            "upload_time": "2024-04-19T10:36:54",
            "upload_time_iso_8601": "2024-04-19T10:36:54.293064Z",
            "url": "https://files.pythonhosted.org/packages/ae/81/6d8aedb55086fce1392d9b03d318e344611c73d9b9262269841779a7895d/playwrightcapture-1.24.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-19 10:36:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Lookyloo",
    "github_project": "PlaywrightCapture",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "playwrightcapture"
}
        
Elapsed time: 0.27216s