chopper


Namechopper JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/jurismarches/chopper
SummaryLib to extract html elements by preserving ancestors and cleaning CSS
upload_time2023-04-26 10:16:25
maintainer
docs_urlNone
authorJurismarches
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            |axe| Chopper
=============

|pypi| |github-actions| |readthedocs|

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.

Compatible with Python >= 3.8


Installation
------------

``pip install chopper``


Full documentation
------------------

http://chopper.readthedocs.org/en/latest/


Quick start
-----------

.. code-block:: python

  from chopper.extractor import Extractor

  HTML = """
  <html>
    <head>
      <title>Test</title>
    </head>
    <body>
      <div id="header"></div>
      <div id="main">
        <div class="iwantthis">
          HELLO WORLD
          <a href="/nope">Do not want</a>
        </div>
      </div>
      <div id="footer"></div>
    </body>
  </html>
  """

  CSS = """
  div { border: 1px solid black; }
  div#main { color: blue; }
  div.iwantthis { background-color: red; }
  a { color: green; }
  div#footer { border-top: 2px solid red; }
  """

  extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a')
  html, css = extractor.extract(HTML, CSS)

The result is :

.. code-block:: python

  >>> html
  """
  <html>
    <body>
      <div id="main">
        <div class="iwantthis">
          HELLO WORLD
        </div>
      </div>
    </body>
  </html>"""

  >>> css
  """
  div{border:1px solid black;}
  div#main{color:blue;}
  div.iwantthis{background-color:red;}
  """

.. |axe| image:: http://icons.iconarchive.com/icons/aha-soft/desktop-halloween/32/Hatchet-icon.png
.. |pypi| image:: http://img.shields.io/pypi/v/chopper.svg?style=flat
    :target: https://pypi.python.org/pypi/chopper
.. |github-actions| image:: https://github.com/jurismarches/chopper/actions/workflows/ci.yml/badge.svg
    :target: https://github.com/jurismarches/chopper/actions/
.. |readthedocs| image:: https://readthedocs.org/projects/chopper/badge/?version=latest
    :target: https://readthedocs.org/projects/chopper



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jurismarches/chopper",
    "name": "chopper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jurismarches",
    "author_email": "contact@octopusmind.info",
    "download_url": "https://files.pythonhosted.org/packages/50/b9/f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240/chopper-0.6.0.linux-x86_64.tar.gz",
    "platform": null,
    "description": "|axe| Chopper\n=============\n\n|pypi| |github-actions| |readthedocs|\n\nChopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.\n\nCompatible with Python >= 3.8\n\n\nInstallation\n------------\n\n``pip install chopper``\n\n\nFull documentation\n------------------\n\nhttp://chopper.readthedocs.org/en/latest/\n\n\nQuick start\n-----------\n\n.. code-block:: python\n\n  from chopper.extractor import Extractor\n\n  HTML = \"\"\"\n  <html>\n    <head>\n      <title>Test</title>\n    </head>\n    <body>\n      <div id=\"header\"></div>\n      <div id=\"main\">\n        <div class=\"iwantthis\">\n          HELLO WORLD\n          <a href=\"/nope\">Do not want</a>\n        </div>\n      </div>\n      <div id=\"footer\"></div>\n    </body>\n  </html>\n  \"\"\"\n\n  CSS = \"\"\"\n  div { border: 1px solid black; }\n  div#main { color: blue; }\n  div.iwantthis { background-color: red; }\n  a { color: green; }\n  div#footer { border-top: 2px solid red; }\n  \"\"\"\n\n  extractor = Extractor.keep('//div[@class=\"iwantthis\"]').discard('//a')\n  html, css = extractor.extract(HTML, CSS)\n\nThe result is :\n\n.. code-block:: python\n\n  >>> html\n  \"\"\"\n  <html>\n    <body>\n      <div id=\"main\">\n        <div class=\"iwantthis\">\n          HELLO WORLD\n        </div>\n      </div>\n    </body>\n  </html>\"\"\"\n\n  >>> css\n  \"\"\"\n  div{border:1px solid black;}\n  div#main{color:blue;}\n  div.iwantthis{background-color:red;}\n  \"\"\"\n\n.. |axe| image:: http://icons.iconarchive.com/icons/aha-soft/desktop-halloween/32/Hatchet-icon.png\n.. |pypi| image:: http://img.shields.io/pypi/v/chopper.svg?style=flat\n    :target: https://pypi.python.org/pypi/chopper\n.. |github-actions| image:: https://github.com/jurismarches/chopper/actions/workflows/ci.yml/badge.svg\n    :target: https://github.com/jurismarches/chopper/actions/\n.. |readthedocs| image:: https://readthedocs.org/projects/chopper/badge/?version=latest\n    :target: https://readthedocs.org/projects/chopper\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Lib to extract html elements by preserving ancestors and cleaning CSS",
    "version": "0.6.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "50b9f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240",
                "md5": "b853b838758be139f9d34493d0226ffd",
                "sha256": "1d80edbdbe1775e678c548b548e47f3865f3c21db73d65113fd54e985570d301"
            },
            "downloads": -1,
            "filename": "chopper-0.6.0.linux-x86_64.tar.gz",
            "has_sig": false,
            "md5_digest": "b853b838758be139f9d34493d0226ffd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22575,
            "upload_time": "2023-04-26T10:16:25",
            "upload_time_iso_8601": "2023-04-26T10:16:25.415730Z",
            "url": "https://files.pythonhosted.org/packages/50/b9/f85a586995dedd16998408d08e07c6c5b6cb2c65cdebc5f7d248faf95240/chopper-0.6.0.linux-x86_64.tar.gz",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b9db361aa78acb0ea02e456db57e9988a064b527ecd9c266a51ee1ef2f462fc",
                "md5": "f12cbf817031f7e13b669c743d542d89",
                "sha256": "662f87c1922c5661c4560c7b770fb7e59fcc846fe29b0911f865dc80ef664e6e"
            },
            "downloads": -1,
            "filename": "chopper-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f12cbf817031f7e13b669c743d542d89",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16409,
            "upload_time": "2023-04-26T10:16:23",
            "upload_time_iso_8601": "2023-04-26T10:16:23.494279Z",
            "url": "https://files.pythonhosted.org/packages/9b/9d/b361aa78acb0ea02e456db57e9988a064b527ecd9c266a51ee1ef2f462fc/chopper-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-26 10:16:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "jurismarches",
    "github_project": "chopper",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "chopper"
}
        
Elapsed time: 0.06560s