|axe| Chopper
=============
|pypi| |github-actions| |readthedocs|
Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.
Compatible with Python >= 3.8
Installation
------------
``pip install chopper``
Full documentation
------------------
http://chopper.readthedocs.org/en/latest/
Quick start
-----------
.. code-block:: python
from chopper.extractor import Extractor
HTML = """
<html>
<head>
<title>Test</title>
</head>
<body>
<div id="header"></div>
<div id="main">
<div class="iwantthis">
HELLO WORLD
<a href="/nope">Do not want</a>
</div>
</div>
<div id="footer"></div>
</body>
</html>
"""
CSS = """
div { border: 1px solid black; }
div#main { color: blue; }
div.iwantthis { background-color: red; }
a { color: green; }
div#footer { border-top: 2px solid red; }
"""
extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a')
html, css = extractor.extract(HTML, CSS)
The result is :
.. code-block:: python
>>> html
"""
<html>
<body>
<div id="main">
<div class="iwantthis">
HELLO WORLD
</div>
</div>
</body>
</html>"""
>>> css
"""
div{border:1px solid black;}
div#main{color:blue;}
div.iwantthis{background-color:red;}
"""
.. |axe| image:: http://icons.iconarchive.com/icons/aha-soft/desktop-halloween/32/Hatchet-icon.png
.. |pypi| image:: http://img.shields.io/pypi/v/chopper.svg?style=flat
:target: https://pypi.python.org/pypi/chopper
.. |github-actions| image:: https://github.com/jurismarches/chopper/actions/workflows/ci.yml/badge.svg
:target: https://github.com/jurismarches/chopper/actions/
.. |readthedocs| image:: https://readthedocs.org/projects/chopper/badge/?version=latest
:target: https://readthedocs.org/projects/chopper
Raw data
{
"_id": null,
"home_page": "https://github.com/jurismarches/chopper",
"name": "chopper",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Jurismarches",
"author_email": "contact@octopusmind.info",
"download_url": "https://files.pythonhosted.org/packages/50/b9/f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240/chopper-0.6.0.linux-x86_64.tar.gz",
"platform": null,
"description": "|axe| Chopper\n=============\n\n|pypi| |github-actions| |readthedocs|\n\nChopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.\n\nCompatible with Python >= 3.8\n\n\nInstallation\n------------\n\n``pip install chopper``\n\n\nFull documentation\n------------------\n\nhttp://chopper.readthedocs.org/en/latest/\n\n\nQuick start\n-----------\n\n.. code-block:: python\n\n from chopper.extractor import Extractor\n\n HTML = \"\"\"\n <html>\n <head>\n <title>Test</title>\n </head>\n <body>\n <div id=\"header\"></div>\n <div id=\"main\">\n <div class=\"iwantthis\">\n HELLO WORLD\n <a href=\"/nope\">Do not want</a>\n </div>\n </div>\n <div id=\"footer\"></div>\n </body>\n </html>\n \"\"\"\n\n CSS = \"\"\"\n div { border: 1px solid black; }\n div#main { color: blue; }\n div.iwantthis { background-color: red; }\n a { color: green; }\n div#footer { border-top: 2px solid red; }\n \"\"\"\n\n extractor = Extractor.keep('//div[@class=\"iwantthis\"]').discard('//a')\n html, css = extractor.extract(HTML, CSS)\n\nThe result is :\n\n.. code-block:: python\n\n >>> html\n \"\"\"\n <html>\n <body>\n <div id=\"main\">\n <div class=\"iwantthis\">\n HELLO WORLD\n </div>\n </div>\n </body>\n </html>\"\"\"\n\n >>> css\n \"\"\"\n div{border:1px solid black;}\n div#main{color:blue;}\n div.iwantthis{background-color:red;}\n \"\"\"\n\n.. |axe| image:: http://icons.iconarchive.com/icons/aha-soft/desktop-halloween/32/Hatchet-icon.png\n.. |pypi| image:: http://img.shields.io/pypi/v/chopper.svg?style=flat\n :target: https://pypi.python.org/pypi/chopper\n.. |github-actions| image:: https://github.com/jurismarches/chopper/actions/workflows/ci.yml/badge.svg\n :target: https://github.com/jurismarches/chopper/actions/\n.. |readthedocs| image:: https://readthedocs.org/projects/chopper/badge/?version=latest\n :target: https://readthedocs.org/projects/chopper\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Lib to extract html elements by preserving ancestors and cleaning CSS",
"version": "0.6.0",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "50b9f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240",
"md5": "b853b838758be139f9d34493d0226ffd",
"sha256": "1d80edbdbe1775e678c548b548e47f3865f3c21db73d65113fd54e985570d301"
},
"downloads": -1,
"filename": "chopper-0.6.0.linux-x86_64.tar.gz",
"has_sig": false,
"md5_digest": "b853b838758be139f9d34493d0226ffd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 22575,
"upload_time": "2023-04-26T10:16:25",
"upload_time_iso_8601": "2023-04-26T10:16:25.415730Z",
"url": "https://files.pythonhosted.org/packages/50/b9/f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240/chopper-0.6.0.linux-x86_64.tar.gz",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9b9db361aa78acb0ea02e456db57e9988a064b527ecd9c266a51ee1ef2f462fc",
"md5": "f12cbf817031f7e13b669c743d542d89",
"sha256": "662f87c1922c5661c4560c7b770fb7e59fcc846fe29b0911f865dc80ef664e6e"
},
"downloads": -1,
"filename": "chopper-0.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f12cbf817031f7e13b669c743d542d89",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16409,
"upload_time": "2023-04-26T10:16:23",
"upload_time_iso_8601": "2023-04-26T10:16:23.494279Z",
"url": "https://files.pythonhosted.org/packages/9b/9d/b361aa78acb0ea02e456db57e9988a064b527ecd9c266a51ee1ef2f462fc/chopper-0.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-26 10:16:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "jurismarches",
"github_project": "chopper",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [],
"lcname": "chopper"
}