pcdt-scraper


Namepcdt-scraper JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/jakbin/pcdt-scraper
SummaryA PyChromeDevTools based WebScraper and selenium like syntax.
upload_time2025-02-03 04:45:58
maintainerNone
docs_urlNone
authorJak Bin
requires_python>=3.6
licenseNone
keywords webscraper scraper web-scraper pcdt-scraper
VCS
bugtrack_url
requirements bs4 PyChromeDevTools
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pcdt-scraper

A PyChromeDevTools based WebScraper and selenium like syntax.

[![Python package](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml/badge.svg)](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml)
[![PyPI version](https://badge.fury.io/py/pcdt-scraper.svg)](https://pypi.org/project/pcdt-scraper)
[![Downloads](https://pepy.tech/badge/pcdt-scraper/month)](https://pepy.tech/project/pcdt-scraper)
[![Downloads](https://static.pepy.tech/personalized-badge/pcdt-scraper?period=total&units=international_system&left_color=green&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/pcdt-scraper)
![GitHub commit activity](https://img.shields.io/github/commit-activity/m/jakbin/pcdt-scraper)
![GitHub last commit](https://img.shields.io/github/last-commit/jakbin/pcdt-scraper)

## Introduction

Sometimes website blocks your requests or aiohttp web request but don't block chrome web request.  

For this solution, here is "pcdt-scraper".

## Compatability

Python 3.6+ is required.

## Installation

```sh
pip install pcdt-scraper
```

or 

```sh
pip3 install pcdt-scraper
```

## Usage:

1. First run chromium or chrome remote instance

```sh
chromium --remote-debugging-port=9222 --remote-allow-origins=*

```
or You can run as headless mode.

```sh
chromium --remote-debugging-port=9222 --remote-allow-origins=* --headless
```

2. Then run python code

```py
from pcdt_scraper import WebScraper

scraper = WebScraper()
url = "https://www.example.com/"
try:
    # Navigate to a page
    if scraper.get(url):

        # Get page content
        content = scraper.get_page_content()

        # find element by class name
        text = scraper.find_element_by_class_name('class_name').text()
        print(text)

except Exception as e:
    print(f"An error occurred: {str(e)}")

finally:
    scraper.close()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jakbin/pcdt-scraper",
    "name": "pcdt-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "webscraper, scraper, web-scraper, pcdt-scraper",
    "author": "Jak Bin",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d8/b1/0343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40/pcdt_scraper-1.0.1.tar.gz",
    "platform": null,
    "description": "# pcdt-scraper\n\nA PyChromeDevTools based WebScraper and selenium like syntax.\n\n[![Python package](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml/badge.svg)](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml)\n[![PyPI version](https://badge.fury.io/py/pcdt-scraper.svg)](https://pypi.org/project/pcdt-scraper)\n[![Downloads](https://pepy.tech/badge/pcdt-scraper/month)](https://pepy.tech/project/pcdt-scraper)\n[![Downloads](https://static.pepy.tech/personalized-badge/pcdt-scraper?period=total&units=international_system&left_color=green&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/pcdt-scraper)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/jakbin/pcdt-scraper)\n![GitHub last commit](https://img.shields.io/github/last-commit/jakbin/pcdt-scraper)\n\n## Introduction\n\nSometimes website blocks your requests or aiohttp web request but don't block chrome web request.  \n\nFor this solution, here is \"pcdt-scraper\".\n\n## Compatability\n\nPython 3.6+ is required.\n\n## Installation\n\n```sh\npip install pcdt-scraper\n```\n\nor \n\n```sh\npip3 install pcdt-scraper\n```\n\n## Usage:\n\n1. First run chromium or chrome remote instance\n\n```sh\nchromium --remote-debugging-port=9222 --remote-allow-origins=*\n\n```\nor You can run as headless mode.\n\n```sh\nchromium --remote-debugging-port=9222 --remote-allow-origins=* --headless\n```\n\n2. Then run python code\n\n```py\nfrom pcdt_scraper import WebScraper\n\nscraper = WebScraper()\nurl = \"https://www.example.com/\"\ntry:\n    # Navigate to a page\n    if scraper.get(url):\n\n        # Get page content\n        content = scraper.get_page_content()\n\n        # find element by class name\n        text = scraper.find_element_by_class_name('class_name').text()\n        print(text)\n\nexcept Exception as e:\n    print(f\"An error occurred: {str(e)}\")\n\nfinally:\n    scraper.close()\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A PyChromeDevTools based WebScraper and selenium like syntax.",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/jakbin/pcdt-scraper/issues",
        "Homepage": "https://github.com/jakbin/pcdt-scraper"
    },
    "split_keywords": [
        "webscraper",
        " scraper",
        " web-scraper",
        " pcdt-scraper"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "653eda074c58be72e37425c5b92913ff1c04c8ce5c1f2475ab63ffdb39befb26",
                "md5": "aede98d87730e7864eee94ad2bcc46ec",
                "sha256": "e6ae36b6a248f11787b5be348b337a92e30413c7b71b259a24d6e30c9bf16981"
            },
            "downloads": -1,
            "filename": "pcdt_scraper-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aede98d87730e7864eee94ad2bcc46ec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 4023,
            "upload_time": "2025-02-03T04:45:57",
            "upload_time_iso_8601": "2025-02-03T04:45:57.121997Z",
            "url": "https://files.pythonhosted.org/packages/65/3e/da074c58be72e37425c5b92913ff1c04c8ce5c1f2475ab63ffdb39befb26/pcdt_scraper-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d8b10343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40",
                "md5": "f12fa8b0a286769114a81a951222645e",
                "sha256": "67679386b675f5b97df85447598836ad4022d29cdec23ba1d7a5f2189f9488fa"
            },
            "downloads": -1,
            "filename": "pcdt_scraper-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f12fa8b0a286769114a81a951222645e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 3785,
            "upload_time": "2025-02-03T04:45:58",
            "upload_time_iso_8601": "2025-02-03T04:45:58.646490Z",
            "url": "https://files.pythonhosted.org/packages/d8/b1/0343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40/pcdt_scraper-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-03 04:45:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jakbin",
    "github_project": "pcdt-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "bs4",
            "specs": []
        },
        {
            "name": "PyChromeDevTools",
            "specs": []
        }
    ],
    "lcname": "pcdt-scraper"
}
        
Elapsed time: 0.60951s