best-effort representations using smaller coded character sets (ASCII,
ISO 8859, etc.). The translation tables used by the codecs are from
the ``transtab`` collection by Markus Kuhn.
Three types of transliterating codecs are provided:
"long", using as many characters as needed to make a natural
replacement. For example, \u00e4 LATIN SMALL LETTER A WITH
DIAERESIS ``ä`` will be replaced with ``ae``.
"short", using the minimum number of characters to make a
replacement. For example, \u00e4 LATIN SMALL LETTER A WITH
DIAERESIS ``ä`` will be replaced with ``a``.
"one", only performing single character replacements. Characters
that can not be transliterated with a single character are passed
through unchanged. For example, \u2639 WHITE FROWNING FACE ``☹``
will be passed through unchanged.
Using the codecs is simple::
>>> import translitcodec
>>> import codecs
>>> codecs.encode('fácil € ☺', 'translit/long')
'facil EUR :-)'
>>> codecs.encode('fácil € ☺', 'translit/short')
'facil E :-)'
The codecs return Unicode by default. To receive a bytestring back,
either chain the output of encode() to another codec, or append the
name of the desired byte encoding to the codec name::
>>> codecs.encode('fácil € ☺', 'translit/one').encode('ascii', 'replace')
'facil E ?'
>>> 'fácil € ☺'.encode('translit/one/ascii', 'replace')
'facil E ?'
The package also supplies a 'transliterate' codec, an alias for
'translit/long'.
Another way to use the library is to use an error handle.
Error handles are available:
* 'strict/translit/long', 'strict/translit/short', 'strict/translit/one' - similar to 'strict'
* 'ignore/translit/long', 'ignore/translit/short', 'ignore/translit/one' - similar to 'ignore'
* 'replace/translit/long', 'replace/translit/short', 'replace/translit/one' - similar to 'replace'
These error handles above, work similarly to Python's built-in ones.
The difference is that transliteration is attempted first.
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2')
'Zażółć gęślą jaźń EUR :-)?!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2')
'Zażółć gęślą jaźń E :-)?!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2')
'Zażółć gęślą jaźń E ??!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2')
'Zażółć gęślą jaźń EUR :-)!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2')
'Zażółć gęślą jaźń E :-)!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2')
'Zażółć gęślą jaźń E !@#'
translitcodec Changes
=====================
0.7.0
-----
Released on May 8, 2021
- Added support for error handles
- Fixed conversion of the German eszett char
0.6.0
-----
Released on December 13, 2020
- Add support for Python 3.9
0.5.2
-----
Released on January 19, 2020
- Install package with setuptools
0.5.1
-----
Released on January 19, 2020
- Add python_requires to prevent installation with Python 2 packages
0.5
---
Released on January 18, 2020
- Complete coverage of the Vietnamese alphabet
- Removed Python 2 support
0.4
---
Released on May 11, 2015
- Added Python 3 compatibility
0.3
---
Released on February 14, 2011
- Fixes to the transtab table rebuilding tool.
- Added translitcodec.__version__
0.2
---
Released on January 27, 2011
- Resolves issue of "TypeError: character mapping must return integer,
None or unicode" when a blank value (eg: \N{ZERO WIDTH SPACE} \u200B)
was encoded. Unicode blanks are now returned.
- Characters in the ASCII range are no longer included in the translation
tables.
0.1
---
Released on December 28, 2008
- Initial packaged release.
Raw data
{
"_id": null,
"home_page": "https://github.com/claudep/translitcodec",
"name": "translitcodec",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": "",
"keywords": "",
"author": "Jason Kirtland",
"author_email": "jek@discorporate.us",
"download_url": "https://files.pythonhosted.org/packages/f1/28/c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4/translitcodec-0.7.0.tar.gz",
"platform": "",
"description": "best-effort representations using smaller coded character sets (ASCII,\nISO 8859, etc.). The translation tables used by the codecs are from\nthe ``transtab`` collection by Markus Kuhn.\n\nThree types of transliterating codecs are provided:\n\n \"long\", using as many characters as needed to make a natural\n replacement. For example, \\u00e4 LATIN SMALL LETTER A WITH\n DIAERESIS ``\u00e4`` will be replaced with ``ae``.\n\n \"short\", using the minimum number of characters to make a\n replacement. For example, \\u00e4 LATIN SMALL LETTER A WITH\n DIAERESIS ``\u00e4`` will be replaced with ``a``.\n\n \"one\", only performing single character replacements. Characters\n that can not be transliterated with a single character are passed\n through unchanged. For example, \\u2639 WHITE FROWNING FACE ``\u2639``\n will be passed through unchanged.\n\nUsing the codecs is simple::\n\n >>> import translitcodec\n >>> import codecs\n >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/long')\n 'facil EUR :-)'\n >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/short')\n 'facil E :-)'\n\nThe codecs return Unicode by default. To receive a bytestring back,\neither chain the output of encode() to another codec, or append the\nname of the desired byte encoding to the codec name::\n\n >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/one').encode('ascii', 'replace')\n 'facil E ?'\n >>> 'f\u00e1cil \u20ac \u263a'.encode('translit/one/ascii', 'replace')\n 'facil E ?'\n\nThe package also supplies a 'transliterate' codec, an alias for\n'translit/long'.\n\nAnother way to use the library is to use an error handle.\nError handles are available:\n\n * 'strict/translit/long', 'strict/translit/short', 'strict/translit/one' - similar to 'strict'\n * 'ignore/translit/long', 'ignore/translit/short', 'ignore/translit/one' - similar to 'ignore'\n * 'replace/translit/long', 'replace/translit/short', 'replace/translit/one' - similar to 'replace'\n\nThese error handles above, work similarly to Python's built-in ones.\nThe difference is that transliteration is attempted first.\n\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 EUR :-)?!@#'\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E :-)?!@#'\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E ??!@#'\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 EUR :-)!@#'\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E :-)!@#'\n >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2')\n 'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E !@#'\n\ntranslitcodec Changes\n=====================\n\n0.7.0\n-----\nReleased on May 8, 2021\n\n- Added support for error handles\n- Fixed conversion of the German eszett char\n\n0.6.0\n-----\nReleased on December 13, 2020\n\n- Add support for Python 3.9\n\n0.5.2\n-----\nReleased on January 19, 2020\n\n- Install package with setuptools\n\n0.5.1\n-----\nReleased on January 19, 2020\n\n- Add python_requires to prevent installation with Python 2 packages\n\n0.5\n---\nReleased on January 18, 2020\n\n- Complete coverage of the Vietnamese alphabet\n\n- Removed Python 2 support\n\n0.4\n---\nReleased on May 11, 2015\n\n- Added Python 3 compatibility\n\n0.3\n---\n\nReleased on February 14, 2011\n\n- Fixes to the transtab table rebuilding tool.\n\n- Added translitcodec.__version__\n\n0.2\n---\n\nReleased on January 27, 2011\n\n- Resolves issue of \"TypeError: character mapping must return integer,\n None or unicode\" when a blank value (eg: \\N{ZERO WIDTH SPACE} \\u200B)\n was encoded. Unicode blanks are now returned.\n\n- Characters in the ASCII range are no longer included in the translation\n tables.\n\n0.1\n---\n\nReleased on December 28, 2008\n\n- Initial packaged release.",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Unicode to 8-bit charset transliteration codec",
"version": "0.7.0",
"project_urls": {
"Homepage": "https://github.com/claudep/translitcodec"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f128c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4",
"md5": "a9699192bcc25bc5248e375703e28309",
"sha256": "3be7975c630ec0f1dd5b3712160c991a9776132985aed2588cba083ba00fa3c8"
},
"downloads": -1,
"filename": "translitcodec-0.7.0.tar.gz",
"has_sig": false,
"md5_digest": "a9699192bcc25bc5248e375703e28309",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 52414,
"upload_time": "2021-05-08T18:39:09",
"upload_time_iso_8601": "2021-05-08T18:39:09.432301Z",
"url": "https://files.pythonhosted.org/packages/f1/28/c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4/translitcodec-0.7.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-05-08 18:39:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "claudep",
"github_project": "translitcodec",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "translitcodec"
}