Name | scraple JSON |
Version |
0.1.1
JSON |
| download |
home_page | |
Summary | Simplify web scraping |
upload_time | 2023-06-19 14:36:14 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.6 |
license | MIT License Copyright (c) 2023 Jibril Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
css
scraping
selector
simple
webscraping
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Scraple
Scraple is a Python library designed to simplify the process of web scraping,
providing easy scraping and easy searching for selectors.
## Version
v0.1.1 [changelog](https://github.com/max-efort/scraple/releases)
## Installation
The package is hosted in [Pypi](https://pypi.org/project/scraple/) and can be
installed using pip:
```shell
pip install scraple
```
## Main API
The package provides two main classes: Rules and SimpleExtractor.
#### 1. Rules
The Rules class allows you to define rules of extraction.
You can pick selector just by knowing what string present in that page using the `add_field_rule` method.
This method automatically searches for selector of element which text content match the string.
Additionally, the `add_field_rule` method supports regular expression matching.
```python
from scraple import Rules
#To instantiate Rules object you need to have the reference page.
some_rules = Rules("reference in the form of string path to local html file", "local")
some_rules.add_field_rule("a sentence or word exist in reference page", "field name 1")
some_rules.add_field_rule("some othe.*?text", "field name 2", re_flag=True)
# Add more field rules...
# It automatically search for the selector, to see it you can see the rule in console
# or by printing it
# print(rules)
```
#### 2. SimpleExtractor
The SimpleExtractor class performs the actual scraping based on a defined rule.
A Rules object act as the "which to extract" and the SimpleExtractor do the "extract" or
scraping. First, pass a Rules object
to SimpleExtractor constructor and use the
`perform_extraction` method to create a generator object that iterate dictionary of
elements extracted.
```python
from scraple import SimpleExtractor
extractor = SimpleExtractor(some_rules) # some_rules from above code snippet
result = extractor.perform_extraction(
"web page in the form of beautifulSoup4 object",
"parsed"
)
# print(next(result))
# {
# "field name 1": [element, ...],
# "field name 2": ...,
# ...
# }
```
For more information and tutorial, see the [documentation](https://github.com/max-efort/scraple/doc) or
visit the main [repository](https://github.com/max-efort/scraple)
Raw data
{
"_id": null,
"home_page": "",
"name": "scraple",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "CSS,scraping,selector,simple,webscraping",
"author": "",
"author_email": "Jibril <erikfortran@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/83/a3/3c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691/scraple-0.1.1.tar.gz",
"platform": null,
"description": "# Scraple\n\nScraple is a Python library designed to simplify the process of web scraping, \nproviding easy scraping and easy searching for selectors.\n\n## Version \nv0.1.1 [changelog](https://github.com/max-efort/scraple/releases)\n\n\n## Installation\nThe package is hosted in [Pypi](https://pypi.org/project/scraple/) and can be \ninstalled using pip:\n\n```shell\npip install scraple\n```\n\n## Main API\nThe package provides two main classes: Rules and SimpleExtractor.\n\n#### 1. Rules\nThe Rules class allows you to define rules of extraction. \nYou can pick selector just by knowing what string present in that page using the `add_field_rule` method. \nThis method automatically searches for selector of element which text content match the string. \nAdditionally, the `add_field_rule` method supports regular expression matching.\n\n```python\nfrom scraple import Rules\n\n#To instantiate Rules object you need to have the reference page.\nsome_rules = Rules(\"reference in the form of string path to local html file\", \"local\")\nsome_rules.add_field_rule(\"a sentence or word exist in reference page\", \"field name 1\")\nsome_rules.add_field_rule(\"some othe.*?text\", \"field name 2\", re_flag=True)\n# Add more field rules...\n\n# It automatically search for the selector, to see it you can see the rule in console\n# or by printing it\n# print(rules)\n```\n\n#### 2. SimpleExtractor\nThe SimpleExtractor class performs the actual scraping based on a defined rule.\nA Rules object act as the \"which to extract\" and the SimpleExtractor do the \"extract\" or \nscraping. First, pass a Rules object\nto SimpleExtractor constructor and use the \n`perform_extraction` method to create a generator object that iterate dictionary of\nelements extracted.\n\n```python\nfrom scraple import SimpleExtractor\n\nextractor = SimpleExtractor(some_rules) # some_rules from above code snippet\nresult = extractor.perform_extraction(\n \"web page in the form of beautifulSoup4 object\",\n \"parsed\"\n)\n\n# print(next(result))\n# {\n# \"field name 1\": [element, ...],\n# \"field name 2\": ...,\n# ...\n# }\n```\nFor more information and tutorial, see the [documentation](https://github.com/max-efort/scraple/doc) or \nvisit the main [repository](https://github.com/max-efort/scraple)\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2023 Jibril Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "Simplify web scraping",
"version": "0.1.1",
"project_urls": {
"changelog": "https://github.com/max-efort/scraple/releases",
"repository": "https://github.com/max-efort/scraple"
},
"split_keywords": [
"css",
"scraping",
"selector",
"simple",
"webscraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0bb625042702af812f70518e652f91b76eda78ed3447c628a577dcaa5b3d1451",
"md5": "0643e2f17f2460da4446e6c79bdb6f26",
"sha256": "9ef6eb0d678614e8d181453c1bf15d79e45c52054f42b1a6f6c67d481b771ce0"
},
"downloads": -1,
"filename": "scraple-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0643e2f17f2460da4446e6c79bdb6f26",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 10015,
"upload_time": "2023-06-19T14:36:12",
"upload_time_iso_8601": "2023-06-19T14:36:12.007174Z",
"url": "https://files.pythonhosted.org/packages/0b/b6/25042702af812f70518e652f91b76eda78ed3447c628a577dcaa5b3d1451/scraple-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "83a33c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691",
"md5": "4a84297cfe45313f9e5e0ffd4ff76787",
"sha256": "4c2dc6538a1436a43e4bbd59edb300ea6b3599190a6a97e82e5209939a4a38a7"
},
"downloads": -1,
"filename": "scraple-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "4a84297cfe45313f9e5e0ffd4ff76787",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 41827,
"upload_time": "2023-06-19T14:36:14",
"upload_time_iso_8601": "2023-06-19T14:36:14.026062Z",
"url": "https://files.pythonhosted.org/packages/83/a3/3c66c8e7ad76c630ed2c225c9e8b6fa00cefda1af5947d0cd9e9d351f691/scraple-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-19 14:36:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "max-efort",
"github_project": "scraple",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "scraple"
}