# pcdt-scraper
A PyChromeDevTools based WebScraper and selenium like syntax.
[![Python package](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml/badge.svg)](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml)
[![PyPI version](https://badge.fury.io/py/pcdt-scraper.svg)](https://pypi.org/project/pcdt-scraper)
[![Downloads](https://pepy.tech/badge/pcdt-scraper/month)](https://pepy.tech/project/pcdt-scraper)
[![Downloads](https://static.pepy.tech/personalized-badge/pcdt-scraper?period=total&units=international_system&left_color=green&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/pcdt-scraper)
![GitHub commit activity](https://img.shields.io/github/commit-activity/m/jakbin/pcdt-scraper)
![GitHub last commit](https://img.shields.io/github/last-commit/jakbin/pcdt-scraper)
## Introduction
Sometimes website blocks your requests or aiohttp web request but don't block chrome web request.
For this solution, here is "pcdt-scraper".
## Compatability
Python 3.6+ is required.
## Installation
```sh
pip install pcdt-scraper
```
or
```sh
pip3 install pcdt-scraper
```
## Usage:
1. First run chromium or chrome remote instance
```sh
chromium --remote-debugging-port=9222 --remote-allow-origins=*
```
or You can run as headless mode.
```sh
chromium --remote-debugging-port=9222 --remote-allow-origins=* --headless
```
2. Then run python code
```py
from pcdt_scraper import WebScraper
scraper = WebScraper()
url = "https://www.example.com/"
try:
# Navigate to a page
if scraper.get(url):
# Get page content
content = scraper.get_page_content()
# find element by class name
text = scraper.find_element_by_class_name('class_name').text()
print(text)
except Exception as e:
print(f"An error occurred: {str(e)}")
finally:
scraper.close()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/jakbin/pcdt-scraper",
"name": "pcdt-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "webscraper, scraper, web-scraper, pcdt-scraper",
"author": "Jak Bin",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d8/b1/0343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40/pcdt_scraper-1.0.1.tar.gz",
"platform": null,
"description": "# pcdt-scraper\n\nA PyChromeDevTools based WebScraper and selenium like syntax.\n\n[![Python package](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml/badge.svg)](https://github.com/jakbin/pcdt-scraper/actions/workflows/publish.yml)\n[![PyPI version](https://badge.fury.io/py/pcdt-scraper.svg)](https://pypi.org/project/pcdt-scraper)\n[![Downloads](https://pepy.tech/badge/pcdt-scraper/month)](https://pepy.tech/project/pcdt-scraper)\n[![Downloads](https://static.pepy.tech/personalized-badge/pcdt-scraper?period=total&units=international_system&left_color=green&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/pcdt-scraper)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/jakbin/pcdt-scraper)\n![GitHub last commit](https://img.shields.io/github/last-commit/jakbin/pcdt-scraper)\n\n## Introduction\n\nSometimes website blocks your requests or aiohttp web request but don't block chrome web request. \n\nFor this solution, here is \"pcdt-scraper\".\n\n## Compatability\n\nPython 3.6+ is required.\n\n## Installation\n\n```sh\npip install pcdt-scraper\n```\n\nor \n\n```sh\npip3 install pcdt-scraper\n```\n\n## Usage:\n\n1. First run chromium or chrome remote instance\n\n```sh\nchromium --remote-debugging-port=9222 --remote-allow-origins=*\n\n```\nor You can run as headless mode.\n\n```sh\nchromium --remote-debugging-port=9222 --remote-allow-origins=* --headless\n```\n\n2. Then run python code\n\n```py\nfrom pcdt_scraper import WebScraper\n\nscraper = WebScraper()\nurl = \"https://www.example.com/\"\ntry:\n # Navigate to a page\n if scraper.get(url):\n\n # Get page content\n content = scraper.get_page_content()\n\n # find element by class name\n text = scraper.find_element_by_class_name('class_name').text()\n print(text)\n\nexcept Exception as e:\n print(f\"An error occurred: {str(e)}\")\n\nfinally:\n scraper.close()\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A PyChromeDevTools based WebScraper and selenium like syntax.",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/jakbin/pcdt-scraper/issues",
"Homepage": "https://github.com/jakbin/pcdt-scraper"
},
"split_keywords": [
"webscraper",
" scraper",
" web-scraper",
" pcdt-scraper"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "653eda074c58be72e37425c5b92913ff1c04c8ce5c1f2475ab63ffdb39befb26",
"md5": "aede98d87730e7864eee94ad2bcc46ec",
"sha256": "e6ae36b6a248f11787b5be348b337a92e30413c7b71b259a24d6e30c9bf16981"
},
"downloads": -1,
"filename": "pcdt_scraper-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "aede98d87730e7864eee94ad2bcc46ec",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 4023,
"upload_time": "2025-02-03T04:45:57",
"upload_time_iso_8601": "2025-02-03T04:45:57.121997Z",
"url": "https://files.pythonhosted.org/packages/65/3e/da074c58be72e37425c5b92913ff1c04c8ce5c1f2475ab63ffdb39befb26/pcdt_scraper-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d8b10343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40",
"md5": "f12fa8b0a286769114a81a951222645e",
"sha256": "67679386b675f5b97df85447598836ad4022d29cdec23ba1d7a5f2189f9488fa"
},
"downloads": -1,
"filename": "pcdt_scraper-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f12fa8b0a286769114a81a951222645e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 3785,
"upload_time": "2025-02-03T04:45:58",
"upload_time_iso_8601": "2025-02-03T04:45:58.646490Z",
"url": "https://files.pythonhosted.org/packages/d8/b1/0343ce9a710fe1565936a2f2e3610dd864f586f662bdca97a384b4eebc40/pcdt_scraper-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-03 04:45:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jakbin",
"github_project": "pcdt-scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "bs4",
"specs": []
},
{
"name": "PyChromeDevTools",
"specs": []
}
],
"lcname": "pcdt-scraper"
}