| |travisci| |version| |versions| |impls| |wheel| |coverage| |br-coverage|
.. |travisci| image:: https://api.travis-ci.org/jonathaneunice/namedentities.svg
:target: http://travis-ci.org/jonathaneunice/namedentities
.. |version| image:: http://img.shields.io/pypi/v/namedentities.svg?style=flat
:alt: PyPI Package latest release
:target: https://pypi.python.org/pypi/namedentities
.. |versions| image:: https://img.shields.io/pypi/pyversions/namedentities.svg
:alt: Supported versions
:target: https://pypi.python.org/pypi/namedentities
.. |impls| image:: https://img.shields.io/pypi/implementation/namedentities.svg
:alt: Supported implementations
:target: https://pypi.python.org/pypi/namedentities
.. |wheel| image:: https://img.shields.io/pypi/wheel/namedentities.svg
:alt: Wheel packaging support
:target: https://pypi.python.org/pypi/namedentities
.. |coverage| image:: https://img.shields.io/badge/test_coverage-100%25-6600CC.svg
:alt: Test line coverage
:target: https://pypi.python.org/pypi/namedentities
.. |br-coverage| image:: https://img.shields.io/badge/test_coverage-100%25-6600CC.svg
:alt: Test branch coverage
:target: https://pypi.python.org/pypi/namedentities
.. |oplus| unicode:: 0x2295 .. oplus
When reading HTML, named entities are neater and often easier to comprehend
than numeric entities (whether in decimal or hexidecimal notation), Unicode
characters, or a mixture. The |oplus| character, for example, is easier to
recognize and remember as ``⊕`` than ``⊕`` or ``⊕`` or
``\u2295``. It's also a lot mroe compact than its verbose Unicode descriptor,
``CIRCLED PLUS``.
Because they use only pure 7-bit ASCII characters, entities are safer to
use in databases, files, emails, and other contexts, especially given the
many encodings (UTF-8 and such) required to fit Unicode into byte-oriented
storage--and the many platform variations and quirks seen along the way.
This module helps convert from whatever mixture of characters and/or
entities you have into named HTML entities. Or, if you prefer,
into numeric HTML entities (either decimal or
hexadecimal). It will even help you go the other way,
mapping entities into Unicode.
Usage
=====
Python 2::
from __future__ import print_function # Python 2/3 compatibiltiy
from namedentities import *
u = u'both em\u2014and–dashes…'
print("named: ", repr(named_entities(u)))
print("numeric:", repr(numeric_entities(u)))
print("hex:" ", repr(hex_entities(u)))
print("unicode:", repr(unicode_entities(u)))
yields::
named: 'both em—and–dashes…'
numeric: 'both em—and–dashes…'
hex: 'both em—and–dashes…'
unicode: u'both em\u2014and\u2013dashes\u2026'
You can do just about the same thing in Python 3, but you have to use a
``print`` function rather than a ``print`` statement, and prior to 3.3, you
have to skip the ``u`` prefix that in Python 2 marks string literals as
being Unicode literals. In Python 3.3 and following, however, you can start
using the ``u`` marker again, if you like. While all Python 3 strings are
Unicode, it helps with cross-version code compatibility. (You can use the
``six`` cross-version compatibility library, as the tests do.)
One good use for ``unicode_entities`` is to create cross-platform,
cross-Python-version strings that conceptually contain
Unicode characters, but spelled out as named (or numeric) HTML entities. For
example::
unicode_entities('This ’thing” is great!')
This has the advantage of using only ASCII characters and common
string encoding mechanisms, yet rendering full Unicode strings upon
reconstitution. You can use the other functions, say ``named_entities()``,
to go from Unicode characters to named entities.
Other APIs
==========
``entities(text, kind)`` takes text and the kind of entities
you'd like returned. ``kind`` can be ``'named'`` (the default), ``'numeric'``
(a.k.a. ``'decimal'``),
``'hex'``, ``'unicode'``, or ``'none'`` (or the actual ``None``).
It's an alternative to the
more explicit individual functions such as ``named_entities``,
and can be useful when the kind of entitites you want to
generate is data-driven.
``unescape(text)`` changes all entities (save the HTML and XML syntactic
marers ``<``, ``>``, and ``&``)
into Unicode characters. It has a near-alias, ``unicode_entities(text)``
that parallelism with the other APIs.
Encodings Akimbo
================
This module helps map string between HTML entities (named, numeric, or hex)
and Unicode characters. It makes those mappings--previously somewhat obscure
and nitsy--easy. Yay us! It will not, however, specifically help you with
"encodings" of Unicode characters such as UTF-8; for these, use Python's
built-in features.
Python 3 tends to handle encoding/decoding pretty transparently.
Python 2, however, does not. Use the ``decode``
string method to get (byte) strings including UTF-8 into Unicode;
use ``encode`` to convert true ``unicode`` strings into UTF-8. Please convert
them to Unicode *before* processing with ``namedentities``::
s = "String with some UTF-8 characters..."
print(named_entities(s.decode("utf-8")))
The best strategy is to convert data to full Unicode as soon as
possible after ingesting it. Process everything uniformly in Unicode.
Then encode back to UTF-8 etc. as you write the data out. This strategy is
baked-in to Python 3, but must be manually accomplished in Python 2.
Escaping
========
Converting the character entities used in text strings to more
convenient encodings is the primary point of this module. This
role is different from that of "escaping" key characters
such as ``&``, ``<`` and ``>`` (and possibly quotation marks such as ``'``
and ``"``) that have special meaning in
HTML and XML. Still, the tasks overlap. They're both about
transforming strings using entity representations, and when
you want to do one, you will often need to do both. ``namedentities``
therefore provides a mechanism to make this convenient.
Any of this modudle's functions take an optional ``escape``
keyword argument. If set to ``True``, strings are pre-processed
with the equivalent of the Python standard library's
``html.escape`` so that ``&``, ``<`` and ``>`` are replaced
with ``&``, ``<``, and ``>`` respectively.
Quotations are not escaped, by default.
If you provide a function instead of ``True``, that function
will be used as the escape transformation. E.g.:
import html
hex_entities('...', escape=html.escape)
Will escape all of the HTML relevant characters, including quotations.
Notes
=====
* Version 1.9.4 achieves 100% branch testing coverage.
* Version 1.9 adds the convenience HTML escaping.
* Version 1.8.1 starts automatic test branch coverage with 96% coverage.
* Version 1.8 acheives 100% test line coverage.
* See ``CHANGES.yml`` for more historical changes.
* Doesn't attempt to encode ``<``, ``>``, or
``&`` (or their numerical equivalents) to avoid interfering
with HTML escaping.
* Automated multi-version testing managed with the wonderful
`pytest <http://pypi.python.org/pypi/pytest>`_,
`pytest-cov <http://pypi.python.org/pypi/pytest-cov>`_,
`coverage <http://pypi.python.org/pypi/coverage>`_,
and `tox <http://pypi.python.org/pypi/tox>`_.
Continuous integration testing
with `Travis-CI <https://travis-ci.org/jonathaneunice/namedentities>`_.
Packaging linting with `pyroma <https://pypi.python.org/pypi/pyroma>`_.
Successfully packaged for, and
tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3,
3.4, 3.5, 3.6, 3.7 pre-release, and late-model PyPy and PyPy3.
* This module started as basically a packaging of `Ian Beck's recipe
<http://beckism.com/2009/03/named_entities_python/>`_. While it's
moved forward since then, Ian's contribution to the core remains
key. Thank you, Ian!
* The author, `Jonathan Eunice <mailto:jonathan.eunice@gmail.com>`_
or `@jeunice on Twitter <http://twitter.com/jeunice>`_ welcomes
your comments and suggestions.
Installation
============
To install or upgrade to the latest version::
pip install -U namedentities
You may need to prefix these with ``sudo`` to authorize
installation. In environments without super-user privileges, you may want to
use ``pip``'s ``--user`` option, to install only for a single user, rather
than system-wide. You may also need to use version-specific ``pip2`` and
``pip3`` installers, depending on your local system configuration and desired
version of Python.
Testing
=======
To run the module tests, use one of these commands::
tox # normal run - speed optimized
tox -e py36 # run for a specific version only (e.g. py27, py36)
tox -c toxcov.ini # run full coverage tests
Raw data
{
"_id": null,
"home_page": "http://bitbucket.org/jeunice/namedentities",
"name": "namedentities",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "HTML entities XML Unicode named numeric decimal hex hexadecimal glyph character set charset",
"author": "Jonathan Eunice",
"author_email": "jonathan.eunice@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/9f/6e/d28dda74e61f53976679ad2f6778dfd1e1780d217e53eb61e50ac5e65b09/namedentities-1.9.4.zip",
"platform": "",
"description": "\n| |travisci| |version| |versions| |impls| |wheel| |coverage| |br-coverage|\n\n.. |travisci| image:: https://api.travis-ci.org/jonathaneunice/namedentities.svg\n :target: http://travis-ci.org/jonathaneunice/namedentities\n\n.. |version| image:: http://img.shields.io/pypi/v/namedentities.svg?style=flat\n :alt: PyPI Package latest release\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |versions| image:: https://img.shields.io/pypi/pyversions/namedentities.svg\n :alt: Supported versions\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |impls| image:: https://img.shields.io/pypi/implementation/namedentities.svg\n :alt: Supported implementations\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |wheel| image:: https://img.shields.io/pypi/wheel/namedentities.svg\n :alt: Wheel packaging support\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |coverage| image:: https://img.shields.io/badge/test_coverage-100%25-6600CC.svg\n :alt: Test line coverage\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |br-coverage| image:: https://img.shields.io/badge/test_coverage-100%25-6600CC.svg\n :alt: Test branch coverage\n :target: https://pypi.python.org/pypi/namedentities\n\n.. |oplus| unicode:: 0x2295 .. oplus\n\nWhen reading HTML, named entities are neater and often easier to comprehend\nthan numeric entities (whether in decimal or hexidecimal notation), Unicode\ncharacters, or a mixture. The |oplus| character, for example, is easier to\nrecognize and remember as ``⊕`` than ``⊕`` or ``⊕`` or\n``\\u2295``. It's also a lot mroe compact than its verbose Unicode descriptor,\n``CIRCLED PLUS``.\n\nBecause they use only pure 7-bit ASCII characters, entities are safer to\nuse in databases, files, emails, and other contexts, especially given the\nmany encodings (UTF-8 and such) required to fit Unicode into byte-oriented\nstorage--and the many platform variations and quirks seen along the way.\n\nThis module helps convert from whatever mixture of characters and/or\nentities you have into named HTML entities. Or, if you prefer,\ninto numeric HTML entities (either decimal or\nhexadecimal). It will even help you go the other way,\nmapping entities into Unicode.\n\nUsage\n=====\n\nPython 2::\n\n from __future__ import print_function # Python 2/3 compatibiltiy\n from namedentities import *\n\n u = u'both em\\u2014and–dashes…'\n\n print(\"named: \", repr(named_entities(u)))\n print(\"numeric:\", repr(numeric_entities(u)))\n print(\"hex:\" \", repr(hex_entities(u)))\n print(\"unicode:\", repr(unicode_entities(u)))\n\nyields::\n\n named: 'both em—and–dashes…'\n numeric: 'both em—and–dashes…'\n hex: 'both em—and–dashes…'\n unicode: u'both em\\u2014and\\u2013dashes\\u2026'\n\nYou can do just about the same thing in Python 3, but you have to use a\n``print`` function rather than a ``print`` statement, and prior to 3.3, you\nhave to skip the ``u`` prefix that in Python 2 marks string literals as\nbeing Unicode literals. In Python 3.3 and following, however, you can start\nusing the ``u`` marker again, if you like. While all Python 3 strings are\nUnicode, it helps with cross-version code compatibility. (You can use the\n``six`` cross-version compatibility library, as the tests do.)\n\nOne good use for ``unicode_entities`` is to create cross-platform,\ncross-Python-version strings that conceptually contain\nUnicode characters, but spelled out as named (or numeric) HTML entities. For\nexample::\n\n unicode_entities('This ’thing” is great!')\n\nThis has the advantage of using only ASCII characters and common\nstring encoding mechanisms, yet rendering full Unicode strings upon\nreconstitution. You can use the other functions, say ``named_entities()``,\nto go from Unicode characters to named entities.\n\nOther APIs\n==========\n\n``entities(text, kind)`` takes text and the kind of entities\nyou'd like returned. ``kind`` can be ``'named'`` (the default), ``'numeric'``\n(a.k.a. ``'decimal'``),\n``'hex'``, ``'unicode'``, or ``'none'`` (or the actual ``None``).\nIt's an alternative to the\nmore explicit individual functions such as ``named_entities``,\nand can be useful when the kind of entitites you want to\ngenerate is data-driven.\n\n``unescape(text)`` changes all entities (save the HTML and XML syntactic\nmarers ``<``, ``>``, and ``&``)\ninto Unicode characters. It has a near-alias, ``unicode_entities(text)``\nthat parallelism with the other APIs.\n\nEncodings Akimbo\n================\n\nThis module helps map string between HTML entities (named, numeric, or hex)\nand Unicode characters. It makes those mappings--previously somewhat obscure\nand nitsy--easy. Yay us! It will not, however, specifically help you with\n\"encodings\" of Unicode characters such as UTF-8; for these, use Python's\nbuilt-in features.\n\nPython 3 tends to handle encoding/decoding pretty transparently.\nPython 2, however, does not. Use the ``decode``\nstring method to get (byte) strings including UTF-8 into Unicode;\nuse ``encode`` to convert true ``unicode`` strings into UTF-8. Please convert\nthem to Unicode *before* processing with ``namedentities``::\n\n s = \"String with some UTF-8 characters...\"\n print(named_entities(s.decode(\"utf-8\")))\n\nThe best strategy is to convert data to full Unicode as soon as\npossible after ingesting it. Process everything uniformly in Unicode.\nThen encode back to UTF-8 etc. as you write the data out. This strategy is\nbaked-in to Python 3, but must be manually accomplished in Python 2.\n\nEscaping\n========\n\nConverting the character entities used in text strings to more\nconvenient encodings is the primary point of this module. This\nrole is different from that of \"escaping\" key characters\nsuch as ``&``, ``<`` and ``>`` (and possibly quotation marks such as ``'``\nand ``\"``) that have special meaning in\nHTML and XML. Still, the tasks overlap. They're both about\ntransforming strings using entity representations, and when\nyou want to do one, you will often need to do both. ``namedentities``\ntherefore provides a mechanism to make this convenient.\n\nAny of this modudle's functions take an optional ``escape``\nkeyword argument. If set to ``True``, strings are pre-processed\nwith the equivalent of the Python standard library's\n``html.escape`` so that ``&``, ``<`` and ``>`` are replaced\nwith ``&``, ``<``, and ``>`` respectively.\nQuotations are not escaped, by default.\n\nIf you provide a function instead of ``True``, that function\nwill be used as the escape transformation. E.g.:\n\n import html\n hex_entities('...', escape=html.escape)\n\nWill escape all of the HTML relevant characters, including quotations.\n\n\nNotes\n=====\n\n* Version 1.9.4 achieves 100% branch testing coverage.\n\n* Version 1.9 adds the convenience HTML escaping.\n\n* Version 1.8.1 starts automatic test branch coverage with 96% coverage.\n\n* Version 1.8 acheives 100% test line coverage.\n\n* See ``CHANGES.yml`` for more historical changes.\n\n* Doesn't attempt to encode ``<``, ``>``, or\n ``&`` (or their numerical equivalents) to avoid interfering\n with HTML escaping.\n\n* Automated multi-version testing managed with the wonderful\n `pytest <http://pypi.python.org/pypi/pytest>`_,\n `pytest-cov <http://pypi.python.org/pypi/pytest-cov>`_,\n `coverage <http://pypi.python.org/pypi/coverage>`_,\n and `tox <http://pypi.python.org/pypi/tox>`_.\n Continuous integration testing\n with `Travis-CI <https://travis-ci.org/jonathaneunice/namedentities>`_.\n Packaging linting with `pyroma <https://pypi.python.org/pypi/pyroma>`_.\n\n Successfully packaged for, and\n tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3,\n 3.4, 3.5, 3.6, 3.7 pre-release, and late-model PyPy and PyPy3.\n\n* This module started as basically a packaging of `Ian Beck's recipe\n <http://beckism.com/2009/03/named_entities_python/>`_. While it's\n moved forward since then, Ian's contribution to the core remains\n key. Thank you, Ian!\n\n* The author, `Jonathan Eunice <mailto:jonathan.eunice@gmail.com>`_\n or `@jeunice on Twitter <http://twitter.com/jeunice>`_ welcomes\n your comments and suggestions.\n\n\nInstallation\n============\n\nTo install or upgrade to the latest version::\n\n pip install -U namedentities\n\nYou may need to prefix these with ``sudo`` to authorize\ninstallation. In environments without super-user privileges, you may want to\nuse ``pip``'s ``--user`` option, to install only for a single user, rather\nthan system-wide. You may also need to use version-specific ``pip2`` and\n``pip3`` installers, depending on your local system configuration and desired\nversion of Python.\n\nTesting\n=======\n\nTo run the module tests, use one of these commands::\n\n tox # normal run - speed optimized\n tox -e py36 # run for a specific version only (e.g. py27, py36)\n tox -c toxcov.ini # run full coverage tests\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Named (and numeric) HTML entities to/from each other or Unicode",
"version": "1.9.4",
"project_urls": {
"Homepage": "http://bitbucket.org/jeunice/namedentities"
},
"split_keywords": [
"html",
"entities",
"xml",
"unicode",
"named",
"numeric",
"decimal",
"hex",
"hexadecimal",
"glyph",
"character",
"set",
"charset"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8d10691788b896ae9ae840facf54c16453dfa0da8c7a915c29ea40431effb469",
"md5": "edd7541c5c8cf3b5de77f25a762ee595",
"sha256": "65ccdb2950ad13a651fc62f170169ad7c13697ad702ff655f7ab5aa4fcbd162e"
},
"downloads": -1,
"filename": "namedentities-1.9.4-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "edd7541c5c8cf3b5de77f25a762ee595",
"packagetype": "bdist_wheel",
"python_version": "3.6",
"requires_python": null,
"size": 12716,
"upload_time": "2017-05-31T19:44:44",
"upload_time_iso_8601": "2017-05-31T19:44:44.182648Z",
"url": "https://files.pythonhosted.org/packages/8d/10/691788b896ae9ae840facf54c16453dfa0da8c7a915c29ea40431effb469/namedentities-1.9.4-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9f6ed28dda74e61f53976679ad2f6778dfd1e1780d217e53eb61e50ac5e65b09",
"md5": "57a12e0a99b5c49752804a1ff6167d73",
"sha256": "0bd2a5e5f4136230429c72b7357b3370098f702fe116e09204f128dd6da614b2"
},
"downloads": -1,
"filename": "namedentities-1.9.4.zip",
"has_sig": false,
"md5_digest": "57a12e0a99b5c49752804a1ff6167d73",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21830,
"upload_time": "2017-05-31T19:44:41",
"upload_time_iso_8601": "2017-05-31T19:44:41.517504Z",
"url": "https://files.pythonhosted.org/packages/9f/6e/d28dda74e61f53976679ad2f6778dfd1e1780d217e53eb61e50ac5e65b09/namedentities-1.9.4.zip",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2017-05-31 19:44:41",
"github": false,
"gitlab": false,
"bitbucket": true,
"codeberg": false,
"bitbucket_user": "jeunice",
"bitbucket_project": "namedentities",
"lcname": "namedentities"
}