# ScrapPyJS
![Project Language](https://img.shields.io/static/v1?label=language&message=python&color=blue)
![Project Type](https://img.shields.io/static/v1?label=type&message=package&color=red)
[![PyPI project](https://img.shields.io/static/v1?label=PyPI&message=ScrapPyJS&color=blue)](https://pypi.org/project/ScrapPyJS/)
![Current Version](https://img.shields.io/static/v1?label=current-version&message=v1.0.2&color=lightgrey)
![Stable Version](https://img.shields.io/static/v1?label=stable-version&message=v1.0.0&color=brightgreen)
![Maintained](https://img.shields.io/static/v1?label=maintained&message=yes&color=green)
![Ask Me Anything](https://img.shields.io/static/v1?label=ask-me&message=anything&color=green)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
The `ScrapPyJS` class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.
## Installing
```terminal
pip install ScrapPyJS
```
## How to Use
### Including and Initiating
```python
from ScrapPyJS import ScrapPyJS
# initiate ScrapPyJS
scrappy = ScrapPyJS()
# set js script
JS_SCRIPT = "return 'ScrapPy scrapping!'"
scrappy.set_script(JS_SCRIPT)
# rest of the code goes here...
# close ScrapPyJS
scrappy.end()
```
### Simple way
1. Use the `scrap` method to scrape a webpage:
```python
result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
```
2. Retrieve the result of the scraping operation:
```python
print(result)
```
### Loop through list of URLs
1. Set up a list of target URLs
```python
URLS = [
'https://url1.com/',
'https://url2.com/homepage/',
'https://url2.com/about',
]
```
2. Use the `loop_through` method to scrape through the target webpages webpage:
```python
# The result value will be a list if save mode is on, else a JSON string
result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
```
3. Retrieve the result of the scraping operation:
```python
print(result)
```
### Save results to a file
#### Activate save mode
1. Via toggle:
```python
scrappy.toggle_save_mode()
```
Here, the save mode which is set to `False` by Default is toggled to `True`. So the save file informations are default.
2. Via `set_save_info` method:
```python
scrappy.set_save_info(save=True)
```
Here, we directly set save mode to `True` leaving other infos to default.
#### Configure save mode
1. Via `set_save_info` method:
```python
FILE_NAME = "output"
FILE_FORMAT = "json"
SAVE_LOCATION = "path/to/file/"
scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
```
Please note that you will need to have the necessary `Selenium` and `WebDriver` dependencies installed to use this code.
## Documentation
The necessary informations on the ScrapPyJS class is available in `.\CLASS_STRUCTURE.md`
## License
This code has been licensed under `GNU AGPLv3` open source copyleft license.
## Author
**NAME:** *Hind Sagar Biswas*
**Website:** [coderaptors.epizy.com](http://coderaptors.epizy.com/)
[![Author Facebook](https://img.shields.io/static/v1?label=facebook&message=hindsagar.biswas&style=social&logo=facebook)](https://m.facebook.com/hindsagar.biswas)
Raw data
{
"_id": null,
"home_page": "",
"name": "ScrapPyJS",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "python,web scrapping,scrape data",
"author": "Hind Sagar Biswas",
"author_email": "<hindsbhk@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/70/f4/3d84e2cf6aa25ce88e7ee6f7756801aa2f4d75a5bc9afffc4127eefaf62f/ScrapPyJS-1.1.0.tar.gz",
"platform": null,
"description": "\r\n# ScrapPyJS\r\n\r\n\r\n\r\n![Project Language](https://img.shields.io/static/v1?label=language&message=python&color=blue)\r\n\r\n![Project Type](https://img.shields.io/static/v1?label=type&message=package&color=red)\r\n\r\n[![PyPI project](https://img.shields.io/static/v1?label=PyPI&message=ScrapPyJS&color=blue)](https://pypi.org/project/ScrapPyJS/)\r\n\r\n![Current Version](https://img.shields.io/static/v1?label=current-version&message=v1.0.2&color=lightgrey)\r\n\r\n![Stable Version](https://img.shields.io/static/v1?label=stable-version&message=v1.0.0&color=brightgreen)\r\n\r\n![Maintained](https://img.shields.io/static/v1?label=maintained&message=yes&color=green)\r\n\r\n![Ask Me Anything](https://img.shields.io/static/v1?label=ask-me&message=anything&color=green)\r\n\r\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)\r\n\r\n\r\n\r\nThe `ScrapPyJS` class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.\r\n\r\n\r\n\r\n## Installing\r\n\r\n\r\n\r\n```terminal\r\n\r\npip install ScrapPyJS\r\n\r\n```\r\n\r\n\r\n\r\n## How to Use\r\n\r\n\r\n\r\n### Including and Initiating\r\n\r\n\r\n\r\n```python\r\n\r\nfrom ScrapPyJS import ScrapPyJS\r\n\r\n\r\n\r\n# initiate ScrapPyJS\r\n\r\nscrappy = ScrapPyJS()\r\n\r\n\r\n\r\n# set js script\r\n\r\nJS_SCRIPT = \"return 'ScrapPy scrapping!'\"\r\n\r\nscrappy.set_script(JS_SCRIPT)\r\n\r\n\r\n\r\n# rest of the code goes here...\r\n\r\n\r\n\r\n# close ScrapPyJS\r\n\r\nscrappy.end()\r\n\r\n```\r\n\r\n\r\n\r\n### Simple way\r\n\r\n\r\n\r\n1. Use the `scrap` method to scrape a webpage:\r\n\r\n\r\n\r\n ```python\r\n\r\n result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')\r\n\r\n ```\r\n\r\n\r\n\r\n2. Retrieve the result of the scraping operation:\r\n\r\n\r\n\r\n ```python\r\n\r\n print(result)\r\n\r\n ```\r\n\r\n\r\n\r\n### Loop through list of URLs\r\n\r\n\r\n\r\n1. Set up a list of target URLs\r\n\r\n\r\n\r\n ```python\r\n\r\n URLS = [\r\n\r\n 'https://url1.com/',\r\n\r\n 'https://url2.com/homepage/',\r\n\r\n 'https://url2.com/about',\r\n\r\n ]\r\n\r\n ```\r\n\r\n\r\n\r\n2. Use the `loop_through` method to scrape through the target webpages webpage:\r\n\r\n\r\n\r\n ```python\r\n\r\n # The result value will be a list if save mode is on, else a JSON string\r\n\r\n result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')\r\n\r\n ```\r\n\r\n\r\n\r\n3. Retrieve the result of the scraping operation:\r\n\r\n\r\n\r\n ```python\r\n\r\n print(result)\r\n\r\n ```\r\n\r\n\r\n\r\n### Save results to a file\r\n\r\n\r\n\r\n#### Activate save mode\r\n\r\n\r\n\r\n1. Via toggle:\r\n\r\n\r\n\r\n ```python\r\n\r\n scrappy.toggle_save_mode()\r\n\r\n ```\r\n\r\n\r\n\r\n Here, the save mode which is set to `False` by Default is toggled to `True`. So the save file informations are default.\r\n\r\n\r\n\r\n2. Via `set_save_info` method:\r\n\r\n\r\n\r\n ```python\r\n\r\n scrappy.set_save_info(save=True)\r\n\r\n ```\r\n\r\n\r\n\r\n Here, we directly set save mode to `True` leaving other infos to default.\r\n\r\n\r\n\r\n#### Configure save mode\r\n\r\n\r\n\r\n1. Via `set_save_info` method:\r\n\r\n\r\n\r\n ```python\r\n\r\n FILE_NAME = \"output\"\r\n\r\n FILE_FORMAT = \"json\"\r\n\r\n SAVE_LOCATION = \"path/to/file/\"\r\n\r\n\r\n\r\n scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)\r\n\r\n ```\r\n\r\n\r\n\r\nPlease note that you will need to have the necessary `Selenium` and `WebDriver` dependencies installed to use this code.\r\n\r\n\r\n\r\n## Documentation\r\n\r\n\r\n\r\nThe necessary informations on the ScrapPyJS class is available in `.\\CLASS_STRUCTURE.md`\r\n\r\n\r\n\r\n## License\r\n\r\n\r\n\r\nThis code has been licensed under `GNU AGPLv3` open source copyleft license.\r\n\r\n\r\n\r\n## Author\r\n\r\n\r\n\r\n**NAME:** *Hind Sagar Biswas*\r\n\r\n\r\n\r\n**Website:** [coderaptors.epizy.com](http://coderaptors.epizy.com/)\r\n\r\n\r\n\r\n[![Author Facebook](https://img.shields.io/static/v1?label=facebook&message=hindsagar.biswas&style=social&logo=facebook)](https://m.facebook.com/hindsagar.biswas)\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "An easy to use web scrapping library via JS scripts",
"version": "1.1.0",
"project_urls": null,
"split_keywords": [
"python",
"web scrapping",
"scrape data"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3b95d32c12e9ef88d2baa5a11e8359eaa3ea4b8cce132e1fe21364e3e88126fc",
"md5": "2c5e1469b30abac80263b97c9f0f8776",
"sha256": "4770bd9985be81327fd3092385f9cb6df2762125eb2cb6f6115580d4a35e18dd"
},
"downloads": -1,
"filename": "ScrapPyJS-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2c5e1469b30abac80263b97c9f0f8776",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5849,
"upload_time": "2023-05-20T20:00:20",
"upload_time_iso_8601": "2023-05-20T20:00:20.532569Z",
"url": "https://files.pythonhosted.org/packages/3b/95/d32c12e9ef88d2baa5a11e8359eaa3ea4b8cce132e1fe21364e3e88126fc/ScrapPyJS-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "70f43d84e2cf6aa25ce88e7ee6f7756801aa2f4d75a5bc9afffc4127eefaf62f",
"md5": "ddf4d6caffa9559f08e1886f8040c109",
"sha256": "f8fa767d2771b0b406b60b81571a304a4b179bf3347c40779f3de1a32f6f110f"
},
"downloads": -1,
"filename": "ScrapPyJS-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "ddf4d6caffa9559f08e1886f8040c109",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5876,
"upload_time": "2023-05-20T20:00:23",
"upload_time_iso_8601": "2023-05-20T20:00:23.802921Z",
"url": "https://files.pythonhosted.org/packages/70/f4/3d84e2cf6aa25ce88e7ee6f7756801aa2f4d75a5bc9afffc4127eefaf62f/ScrapPyJS-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-20 20:00:23",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "scrappyjs"
}