`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the
library's many capabilities include:
* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.
Changes
=======
* 0.3.2: Initialization bug fix
* 0.3.1: find_library support while still allowing a list of library names
* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
functional interface (which will lazily create a global Tidy() object) for
backward compatibility. You can now pass a list of library names and base
options when instantiating Tidy. The keep_doc argument is now deprecated
and does nothing; use PersistentTidy.
* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.
Small example of use
====================
The following code cleans up an invalid HTML document and sets an option::
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors
Docs
====
Documentation is shipped with the source distribution and is available at
the `PyTidyLib`_ web page.
.. _`HTML Tidy`: http://tidy.sourceforge.net/
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/
Raw data
{
"_id": null,
"home_page": "http://countergram.com/open-source/pytidylib/",
"name": "pytidylib",
"maintainer": null,
"docs_url": "https://pythonhosted.org/pytidylib/",
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Jason Stitt",
"author_email": "js@jasonstitt.com",
"download_url": "https://files.pythonhosted.org/packages/2d/5e/4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0/pytidylib-0.3.2.tar.gz",
"platform": "UNKNOWN",
"description": "`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This\nallows you, from Python code, to \"fix\" invalid (X)HTML markup. Some of the\nlibrary's many capabilities include:\n\n* Clean up unclosed tags and unescaped characters such as ampersands\n* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes\n* Convert named entities to numeric entities, which can then be used in XML\n documents without an HTML doctype.\n* Clean up HTML from programs such as Word (to an extent)\n* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,\n which some (X)HTML indenting code overlooks.\n\nChanges\n=======\n\n* 0.3.2: Initialization bug fix\n\n* 0.3.1: find_library support while still allowing a list of library names\n\n* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the\nfunctional interface (which will lazily create a global Tidy() object) for\nbackward compatibility. You can now pass a list of library names and base\noptions when instantiating Tidy. The keep_doc argument is now deprecated\nand does nothing; use PersistentTidy.\n\n* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.\n\n* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.\n\nSmall example of use\n====================\n\nThe following code cleans up an invalid HTML document and sets an option::\n\n from tidylib import tidy_document\n document, errors = tidy_document('''<p>fõo <img src=\"bar.jpg\">''',\n options={'numeric-entities':1})\n print document\n print errors\n\nDocs\n====\n\nDocumentation is shipped with the source distribution and is available at\nthe `PyTidyLib`_ web page.\n\n.. _`HTML Tidy`: http://tidy.sourceforge.net/\n.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/",
"bugtrack_url": null,
"license": "UNKNOWN",
"summary": "Python wrapper for HTML Tidy (tidylib) on Python 2 and 3",
"version": "0.3.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "06569f09914df642da09ba83dbec3112",
"sha256": "22b1c8d75970d8064ff999c2369e98af1d0685417eda4c829a5c9f56764b0af3"
},
"downloads": -1,
"filename": "pytidylib-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "06569f09914df642da09ba83dbec3112",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 87669,
"upload_time": "2016-11-16T01:53:00",
"upload_time_iso_8601": "2016-11-16T01:53:00.990126Z",
"url": "https://files.pythonhosted.org/packages/2d/5e/4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0/pytidylib-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2016-11-16 01:53:00",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "pytidylib"
}