Name | html5lib-modern JSON |
Version |
1.2
JSON |
| download |
home_page | https://github.com/html5lib/html5lib-python |
Summary | HTML parser based on the WHATWG HTML specification |
upload_time | 2024-09-25 04:19:49 |
maintainer | James Graham |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | Copyright (c) 2006-2013 James Graham and other contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
html5lib
========
.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg
:target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml
html5lib is a pure-python library for parsing HTML. It is designed to
conform to the WHATWG HTML specification, as is implemented by all major
web browsers.
Usage
-----
Simple usage follows this pattern:
.. code-block:: python
import html5lib
with open("mydocument.html", "rb") as f:
document = html5lib.parse(f)
or:
.. code-block:: python
import html5lib
document = html5lib.parse("<p>Hello World!")
By default, the ``document`` will be an ``xml.etree`` element instance.
Whenever possible, html5lib chooses the accelerated ``ElementTree``
implementation.
Two other tree types are supported: ``xml.dom.minidom`` and
``lxml.etree``. To use an alternative format, specify the name of
a treebuilder:
.. code-block:: python
import html5lib
with open("mydocument.html", "rb") as f:
lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
When using with ``urllib.request`` (Python 3), the charset from HTTP
should be pass into html5lib as follows:
.. code-block:: python
from urllib.request import urlopen
import html5lib
with urlopen("http://example.com/") as f:
document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())
To have more control over the parser, create a parser object explicitly.
For instance, to make the parser raise exceptions on parse errors, use:
.. code-block:: python
import html5lib
with open("mydocument.html", "rb") as f:
parser = html5lib.HTMLParser(strict=True)
document = parser.parse(f)
When you're instantiating parser objects explicitly, pass a treebuilder
class as the ``tree`` keyword argument to use an alternative document
format:
.. code-block:: python
import html5lib
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
minidom_document = parser.parse("<p>Hello World!")
More documentation is available at https://html5lib.readthedocs.io/.
Installation
------------
html5lib works on CPython 3.8+ and PyPy. To install:
.. code-block:: bash
$ pip install html5lib
The goal is to support a (non-strict) superset of the versions that `pip
supports
<https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.
Optional Dependencies
---------------------
The following third-party libraries may be used for additional
functionality:
- ``lxml`` is supported as a tree format (for both building and
walking) under CPython (but *not* PyPy where it is known to cause
segfaults);
- ``genshi`` has a treewalker (but not builder); and
- ``chardet`` can be used as a fallback when character encoding cannot
be determined.
Bugs
----
Please report any bugs on the `issue tracker
<https://github.com/html5lib/html5lib-python/issues>`_.
Tests
-----
Unit tests require the ``pytest`` and ``mock`` libraries and can be
run using the ``pytest`` command in the root directory.
Test data are contained in a separate `html5lib-tests
<https://github.com/html5lib/html5lib-tests>`_ repository and included
as a submodule, thus for git checkouts they must be initialized::
$ git submodule init
$ git submodule update
If you have all compatible Python implementations available on your
system, you can run tests on all of them using the ``tox`` utility,
which can be found on PyPI.
Questions?
----------
Check out `the docs <https://html5lib.readthedocs.io/en/latest/>`_. Still
need help? Go to our `GitHub Discussions
<https://github.com/html5lib/html5lib-python/discussions>`_.
You can also browse the archives of the `html5lib-discuss mailing list
<https://www.mail-archive.com/html5lib-discuss@googlegroups.com/>`_.
Credits
=======
``html5lib`` is written and maintained by:
- James Graham
- Sam Sneddon
- Łukasz Langa
- Will Kahn-Greene
Patches and suggestions
-----------------------
(In chronological order, by first commit:)
- Anne van Kesteren
- Lachlan Hunt
- lantis63
- Sam Ruby
- Thomas Broyer
- Tim Fletcher
- Mark Pilgrim
- Ryan King
- Philip Taylor
- Edward Z. Yang
- fantasai
- Philip Jägenstedt
- Ms2ger
- Mohammad Taha Jahangir
- Andy Wingo
- Andreas Madsack
- Karim Valiev
- Juan Carlos Garcia Segovia
- Mike West
- Marc DM
- Simon Sapin
- Michael[tm] Smith
- Ritwik Gupta
- Marc Abramowitz
- Tony Lopes
- lilbludevil
- Kevin
- Drew Hubl
- Austin Kumbera
- Jim Baker
- Jon Dufresne
- Donald Stufft
- Alex Gaynor
- Nik Nyby
- Jakub Wilk
- Sigmund Cherem
- Gabi Davar
- Florian Mounier
- neumond
- Vitalik Verhovodov
- Kovid Goyal
- Adam Chainz
- John Vandenberg
- Eric Amorde
- Benedikt Morbach
- Jonathan Vanasco
- Tom Most
- Ville Skyttä
- Hugo van Kemenade
- Mark Vasilkov
HTML5Lib-modern
---------------
- Ashley Sommer
Raw data
{
"_id": null,
"home_page": "https://github.com/html5lib/html5lib-python",
"name": "html5lib-modern",
"maintainer": "James Graham",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "james@hoppipolla.co.uk",
"keywords": null,
"author": null,
"author_email": "Ashley Sommer <ashleysommer@gmail.com>, James Graham <james@hoppipolla.co.uk>",
"download_url": "https://files.pythonhosted.org/packages/af/6d/a773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0/html5lib_modern-1.2.tar.gz",
"platform": null,
"description": "html5lib\n========\n\n.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg\n :target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml\n\nhtml5lib is a pure-python library for parsing HTML. It is designed to\nconform to the WHATWG HTML specification, as is implemented by all major\nweb browsers.\n\n\nUsage\n-----\n\nSimple usage follows this pattern:\n\n.. code-block:: python\n\n import html5lib\n with open(\"mydocument.html\", \"rb\") as f:\n document = html5lib.parse(f)\n\nor:\n\n.. code-block:: python\n\n import html5lib\n document = html5lib.parse(\"<p>Hello World!\")\n\nBy default, the ``document`` will be an ``xml.etree`` element instance.\nWhenever possible, html5lib chooses the accelerated ``ElementTree``\nimplementation.\n\nTwo other tree types are supported: ``xml.dom.minidom`` and\n``lxml.etree``. To use an alternative format, specify the name of\na treebuilder:\n\n.. code-block:: python\n\n import html5lib\n with open(\"mydocument.html\", \"rb\") as f:\n lxml_etree_document = html5lib.parse(f, treebuilder=\"lxml\")\n\nWhen using with ``urllib.request`` (Python 3), the charset from HTTP\nshould be pass into html5lib as follows:\n\n.. code-block:: python\n\n from urllib.request import urlopen\n import html5lib\n\n with urlopen(\"http://example.com/\") as f:\n document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())\n\nTo have more control over the parser, create a parser object explicitly.\nFor instance, to make the parser raise exceptions on parse errors, use:\n\n.. code-block:: python\n\n import html5lib\n with open(\"mydocument.html\", \"rb\") as f:\n parser = html5lib.HTMLParser(strict=True)\n document = parser.parse(f)\n\nWhen you're instantiating parser objects explicitly, pass a treebuilder\nclass as the ``tree`` keyword argument to use an alternative document\nformat:\n\n.. code-block:: python\n\n import html5lib\n parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder(\"dom\"))\n minidom_document = parser.parse(\"<p>Hello World!\")\n\nMore documentation is available at https://html5lib.readthedocs.io/.\n\n\nInstallation\n------------\n\nhtml5lib works on CPython 3.8+ and PyPy. To install:\n\n.. code-block:: bash\n\n $ pip install html5lib\n\nThe goal is to support a (non-strict) superset of the versions that `pip\nsupports\n<https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.\n\nOptional Dependencies\n---------------------\n\nThe following third-party libraries may be used for additional\nfunctionality:\n\n- ``lxml`` is supported as a tree format (for both building and\n walking) under CPython (but *not* PyPy where it is known to cause\n segfaults);\n\n- ``genshi`` has a treewalker (but not builder); and\n\n- ``chardet`` can be used as a fallback when character encoding cannot\n be determined.\n\n\nBugs\n----\n\nPlease report any bugs on the `issue tracker\n<https://github.com/html5lib/html5lib-python/issues>`_.\n\n\nTests\n-----\n\nUnit tests require the ``pytest`` and ``mock`` libraries and can be\nrun using the ``pytest`` command in the root directory.\n\nTest data are contained in a separate `html5lib-tests\n<https://github.com/html5lib/html5lib-tests>`_ repository and included\nas a submodule, thus for git checkouts they must be initialized::\n\n $ git submodule init\n $ git submodule update\n\nIf you have all compatible Python implementations available on your\nsystem, you can run tests on all of them using the ``tox`` utility,\nwhich can be found on PyPI.\n\n\nQuestions?\n----------\n\nCheck out `the docs <https://html5lib.readthedocs.io/en/latest/>`_. Still\nneed help? Go to our `GitHub Discussions\n<https://github.com/html5lib/html5lib-python/discussions>`_.\n\nYou can also browse the archives of the `html5lib-discuss mailing list \n<https://www.mail-archive.com/html5lib-discuss@googlegroups.com/>`_.\n\nCredits\n=======\n\n``html5lib`` is written and maintained by:\n\n- James Graham\n- Sam Sneddon\n- \u0141ukasz Langa\n- Will Kahn-Greene\n\n\nPatches and suggestions\n-----------------------\n(In chronological order, by first commit:)\n\n- Anne van Kesteren\n- Lachlan Hunt\n- lantis63\n- Sam Ruby\n- Thomas Broyer\n- Tim Fletcher\n- Mark Pilgrim\n- Ryan King\n- Philip Taylor\n- Edward Z. Yang\n- fantasai\n- Philip J\u00e4genstedt\n- Ms2ger\n- Mohammad Taha Jahangir\n- Andy Wingo\n- Andreas Madsack\n- Karim Valiev\n- Juan Carlos Garcia Segovia\n- Mike West\n- Marc DM\n- Simon Sapin\n- Michael[tm] Smith\n- Ritwik Gupta\n- Marc Abramowitz\n- Tony Lopes\n- lilbludevil\n- Kevin\n- Drew Hubl\n- Austin Kumbera\n- Jim Baker\n- Jon Dufresne\n- Donald Stufft\n- Alex Gaynor\n- Nik Nyby\n- Jakub Wilk\n- Sigmund Cherem\n- Gabi Davar\n- Florian Mounier\n- neumond\n- Vitalik Verhovodov\n- Kovid Goyal\n- Adam Chainz\n- John Vandenberg\n- Eric Amorde\n- Benedikt Morbach\n- Jonathan Vanasco\n- Tom Most\n- Ville Skytt\u00e4\n- Hugo van Kemenade\n- Mark Vasilkov\n\n\nHTML5Lib-modern\n---------------\n\n- Ashley Sommer\n",
"bugtrack_url": null,
"license": "Copyright (c) 2006-2013 James Graham and other contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
"summary": "HTML parser based on the WHATWG HTML specification",
"version": "1.2",
"project_urls": {
"Homepage": "https://github.com/html5lib/html5lib-python"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b1cdddf0baebab2dfac62a90af9d7a1c2504d697f1411f2529b928e02b4e9cd0",
"md5": "fc6fa9f8e8dbce1d904ea0b0af18aa9f",
"sha256": "3458b6e31525ede4fcaac0ff42d9eeb5efaf755473768103cb56e0275caa8d99"
},
"downloads": -1,
"filename": "html5lib_modern-1.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "fc6fa9f8e8dbce1d904ea0b0af18aa9f",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 116249,
"upload_time": "2024-09-25T04:19:46",
"upload_time_iso_8601": "2024-09-25T04:19:46.627682Z",
"url": "https://files.pythonhosted.org/packages/b1/cd/ddf0baebab2dfac62a90af9d7a1c2504d697f1411f2529b928e02b4e9cd0/html5lib_modern-1.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "af6da773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0",
"md5": "19c61ffaf0a57719d1c86f6550aff461",
"sha256": "1fadbfc27ea955431270e4e79a4a4c290ba11c3a3098a95cc22dc73e312a1768"
},
"downloads": -1,
"filename": "html5lib_modern-1.2.tar.gz",
"has_sig": false,
"md5_digest": "19c61ffaf0a57719d1c86f6550aff461",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 275189,
"upload_time": "2024-09-25T04:19:49",
"upload_time_iso_8601": "2024-09-25T04:19:49.004186Z",
"url": "https://files.pythonhosted.org/packages/af/6d/a773b5338f4341cdeca17d17cf0e56016ed1f9e7ea8377456b275b63a7b0/html5lib_modern-1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-25 04:19:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "html5lib",
"github_project": "html5lib-python",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"appveyor": true,
"requirements": [],
"tox": true,
"lcname": "html5lib-modern"
}