html5lib-modern


Namehtml5lib-modern JSON
Version 1.2 PyPI version JSON
download
home_pagehttps://github.com/html5lib/html5lib-python
SummaryHTML parser based on the WHATWG HTML specification
upload_time2024-09-25 04:19:49
maintainerJames Graham
docs_urlNone
authorNone
requires_python>=3.8
licenseCopyright (c) 2006-2013 James Graham and other contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            html5lib
========

.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg
    :target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml

html5lib is a pure-python library for parsing HTML. It is designed to
conform to the WHATWG HTML specification, as is implemented by all major
web browsers.


Usage
-----

Simple usage follows this pattern:

.. code-block:: python

  import html5lib
  with open("mydocument.html", "rb") as f:
      document = html5lib.parse(f)

or:

.. code-block:: python

  import html5lib
  document = html5lib.parse("<p>Hello World!")

By default, the ``document`` will be an ``xml.etree`` element instance.
Whenever possible, html5lib chooses the accelerated ``ElementTree``
implementation.

Two other tree types are supported: ``xml.dom.minidom`` and
``lxml.etree``. To use an alternative format, specify the name of
a treebuilder:

.. code-block:: python

  import html5lib
  with open("mydocument.html", "rb") as f:
      lxml_etree_document = html5lib.parse(f, treebuilder="lxml")

When using with ``urllib.request`` (Python 3), the charset from HTTP
should be pass into html5lib as follows:

.. code-block:: python

  from urllib.request import urlopen
  import html5lib

  with urlopen("http://example.com/") as f:
      document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())

To have more control over the parser, create a parser object explicitly.
For instance, to make the parser raise exceptions on parse errors, use:

.. code-block:: python

  import html5lib
  with open("mydocument.html", "rb") as f:
      parser = html5lib.HTMLParser(strict=True)
      document = parser.parse(f)

When you're instantiating parser objects explicitly, pass a treebuilder
class as the ``tree`` keyword argument to use an alternative document
format:

.. code-block:: python

  import html5lib
  parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
  minidom_document = parser.parse("<p>Hello World!")

More documentation is available at https://html5lib.readthedocs.io/.


Installation
------------

html5lib works on CPython 3.8+ and PyPy. To install:

.. code-block:: bash

    $ pip install html5lib

The goal is to support a (non-strict) superset of the versions that `pip
supports
<https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.

Optional Dependencies
---------------------

The following third-party libraries may be used for additional
functionality:

- ``lxml`` is supported as a tree format (for both building and
  walking) under CPython (but *not* PyPy where it is known to cause
  segfaults);

- ``genshi`` has a treewalker (but not builder); and

- ``chardet`` can be used as a fallback when character encoding cannot
  be determined.


Bugs
----

Please report any bugs on the `issue tracker
<https://github.com/html5lib/html5lib-python/issues>`_.


Tests
-----

Unit tests require the ``pytest`` and ``mock`` libraries and can be
run using the ``pytest`` command in the root directory.

Test data are contained in a separate `html5lib-tests
<https://github.com/html5lib/html5lib-tests>`_ repository and included
as a submodule, thus for git checkouts they must be initialized::

  $ git submodule init
  $ git submodule update

If you have all compatible Python implementations available on your
system, you can run tests on all of them using the ``tox`` utility,
which can be found on PyPI.


Questions?
----------

Check out `the docs <https://html5lib.readthedocs.io/en/latest/>`_. Still
need help? Go to our `GitHub Discussions
<https://github.com/html5lib/html5lib-python/discussions>`_.

You can also browse the archives of the `html5lib-discuss mailing list 
<https://www.mail-archive.com/html5lib-discuss@googlegroups.com/>`_.

Credits
=======

``html5lib`` is written and maintained by:

- James Graham
- Sam Sneddon
- Łukasz Langa
- Will Kahn-Greene


Patches and suggestions
-----------------------
(In chronological order, by first commit:)

- Anne van Kesteren
- Lachlan Hunt
- lantis63
- Sam Ruby
- Thomas Broyer
- Tim Fletcher
- Mark Pilgrim
- Ryan King
- Philip Taylor
- Edward Z. Yang
- fantasai
- Philip Jägenstedt
- Ms2ger
- Mohammad Taha Jahangir
- Andy Wingo
- Andreas Madsack
- Karim Valiev
- Juan Carlos Garcia Segovia
- Mike West
- Marc DM
- Simon Sapin
- Michael[tm] Smith
- Ritwik Gupta
- Marc Abramowitz
- Tony Lopes
- lilbludevil
- Kevin
- Drew Hubl
- Austin Kumbera
- Jim Baker
- Jon Dufresne
- Donald Stufft
- Alex Gaynor
- Nik Nyby
- Jakub Wilk
- Sigmund Cherem
- Gabi Davar
- Florian Mounier
- neumond
- Vitalik Verhovodov
- Kovid Goyal
- Adam Chainz
- John Vandenberg
- Eric Amorde
- Benedikt Morbach
- Jonathan Vanasco
- Tom Most
- Ville Skyttä
- Hugo van Kemenade
- Mark Vasilkov


HTML5Lib-modern
---------------

- Ashley Sommer

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/html5lib/html5lib-python",
    "name": "html5lib-modern",
    "maintainer": "James Graham",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "james@hoppipolla.co.uk",
    "keywords": null,
    "author": null,
    "author_email": "Ashley Sommer <ashleysommer@gmail.com>, James Graham <james@hoppipolla.co.uk>",
    "download_url": "https://files.pythonhosted.org/packages/af/6d/a773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0/html5lib_modern-1.2.tar.gz",
    "platform": null,
    "description": "html5lib\n========\n\n.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg\n    :target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml\n\nhtml5lib is a pure-python library for parsing HTML. It is designed to\nconform to the WHATWG HTML specification, as is implemented by all major\nweb browsers.\n\n\nUsage\n-----\n\nSimple usage follows this pattern:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      document = html5lib.parse(f)\n\nor:\n\n.. code-block:: python\n\n  import html5lib\n  document = html5lib.parse(\"<p>Hello World!\")\n\nBy default, the ``document`` will be an ``xml.etree`` element instance.\nWhenever possible, html5lib chooses the accelerated ``ElementTree``\nimplementation.\n\nTwo other tree types are supported: ``xml.dom.minidom`` and\n``lxml.etree``. To use an alternative format, specify the name of\na treebuilder:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      lxml_etree_document = html5lib.parse(f, treebuilder=\"lxml\")\n\nWhen using with ``urllib.request`` (Python 3), the charset from HTTP\nshould be pass into html5lib as follows:\n\n.. code-block:: python\n\n  from urllib.request import urlopen\n  import html5lib\n\n  with urlopen(\"http://example.com/\") as f:\n      document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())\n\nTo have more control over the parser, create a parser object explicitly.\nFor instance, to make the parser raise exceptions on parse errors, use:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      parser = html5lib.HTMLParser(strict=True)\n      document = parser.parse(f)\n\nWhen you're instantiating parser objects explicitly, pass a treebuilder\nclass as the ``tree`` keyword argument to use an alternative document\nformat:\n\n.. code-block:: python\n\n  import html5lib\n  parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder(\"dom\"))\n  minidom_document = parser.parse(\"<p>Hello World!\")\n\nMore documentation is available at https://html5lib.readthedocs.io/.\n\n\nInstallation\n------------\n\nhtml5lib works on CPython 3.8+ and PyPy. To install:\n\n.. code-block:: bash\n\n    $ pip install html5lib\n\nThe goal is to support a (non-strict) superset of the versions that `pip\nsupports\n<https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.\n\nOptional Dependencies\n---------------------\n\nThe following third-party libraries may be used for additional\nfunctionality:\n\n- ``lxml`` is supported as a tree format (for both building and\n  walking) under CPython (but *not* PyPy where it is known to cause\n  segfaults);\n\n- ``genshi`` has a treewalker (but not builder); and\n\n- ``chardet`` can be used as a fallback when character encoding cannot\n  be determined.\n\n\nBugs\n----\n\nPlease report any bugs on the `issue tracker\n<https://github.com/html5lib/html5lib-python/issues>`_.\n\n\nTests\n-----\n\nUnit tests require the ``pytest`` and ``mock`` libraries and can be\nrun using the ``pytest`` command in the root directory.\n\nTest data are contained in a separate `html5lib-tests\n<https://github.com/html5lib/html5lib-tests>`_ repository and included\nas a submodule, thus for git checkouts they must be initialized::\n\n  $ git submodule init\n  $ git submodule update\n\nIf you have all compatible Python implementations available on your\nsystem, you can run tests on all of them using the ``tox`` utility,\nwhich can be found on PyPI.\n\n\nQuestions?\n----------\n\nCheck out `the docs <https://html5lib.readthedocs.io/en/latest/>`_. Still\nneed help? Go to our `GitHub Discussions\n<https://github.com/html5lib/html5lib-python/discussions>`_.\n\nYou can also browse the archives of the `html5lib-discuss mailing list \n<https://www.mail-archive.com/html5lib-discuss@googlegroups.com/>`_.\n\nCredits\n=======\n\n``html5lib`` is written and maintained by:\n\n- James Graham\n- Sam Sneddon\n- \u0141ukasz Langa\n- Will Kahn-Greene\n\n\nPatches and suggestions\n-----------------------\n(In chronological order, by first commit:)\n\n- Anne van Kesteren\n- Lachlan Hunt\n- lantis63\n- Sam Ruby\n- Thomas Broyer\n- Tim Fletcher\n- Mark Pilgrim\n- Ryan King\n- Philip Taylor\n- Edward Z. Yang\n- fantasai\n- Philip J\u00e4genstedt\n- Ms2ger\n- Mohammad Taha Jahangir\n- Andy Wingo\n- Andreas Madsack\n- Karim Valiev\n- Juan Carlos Garcia Segovia\n- Mike West\n- Marc DM\n- Simon Sapin\n- Michael[tm] Smith\n- Ritwik Gupta\n- Marc Abramowitz\n- Tony Lopes\n- lilbludevil\n- Kevin\n- Drew Hubl\n- Austin Kumbera\n- Jim Baker\n- Jon Dufresne\n- Donald Stufft\n- Alex Gaynor\n- Nik Nyby\n- Jakub Wilk\n- Sigmund Cherem\n- Gabi Davar\n- Florian Mounier\n- neumond\n- Vitalik Verhovodov\n- Kovid Goyal\n- Adam Chainz\n- John Vandenberg\n- Eric Amorde\n- Benedikt Morbach\n- Jonathan Vanasco\n- Tom Most\n- Ville Skytt\u00e4\n- Hugo van Kemenade\n- Mark Vasilkov\n\n\nHTML5Lib-modern\n---------------\n\n- Ashley Sommer\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2006-2013 James Graham and other contributors  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "HTML parser based on the WHATWG HTML specification",
    "version": "1.2",
    "project_urls": {
        "Homepage": "https://github.com/html5lib/html5lib-python"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1cdddf0baebab2dfac62a90af9d7a1c2504d697f1411f2529b928e02b4e9cd0",
                "md5": "fc6fa9f8e8dbce1d904ea0b0af18aa9f",
                "sha256": "3458b6e31525ede4fcaac0ff42d9eeb5efaf755473768103cb56e0275caa8d99"
            },
            "downloads": -1,
            "filename": "html5lib_modern-1.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fc6fa9f8e8dbce1d904ea0b0af18aa9f",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 116249,
            "upload_time": "2024-09-25T04:19:46",
            "upload_time_iso_8601": "2024-09-25T04:19:46.627682Z",
            "url": "https://files.pythonhosted.org/packages/b1/cd/ddf0baebab2dfac62a90af9d7a1c2504d697f1411f2529b928e02b4e9cd0/html5lib_modern-1.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "af6da773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0",
                "md5": "19c61ffaf0a57719d1c86f6550aff461",
                "sha256": "1fadbfc27ea955431270e4e79a4a4c290ba11c3a3098a95cc22dc73e312a1768"
            },
            "downloads": -1,
            "filename": "html5lib_modern-1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "19c61ffaf0a57719d1c86f6550aff461",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 275189,
            "upload_time": "2024-09-25T04:19:49",
            "upload_time_iso_8601": "2024-09-25T04:19:49.004186Z",
            "url": "https://files.pythonhosted.org/packages/af/6d/a773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0/html5lib_modern-1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-25 04:19:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "html5lib",
    "github_project": "html5lib-python",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "appveyor": true,
    "requirements": [],
    "tox": true,
    "lcname": "html5lib-modern"
}
        
Elapsed time: 0.35494s