translitcodec


Nametranslitcodec JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/claudep/translitcodec
SummaryUnicode to 8-bit charset transliteration codec
upload_time2021-05-08 18:39:09
maintainer
docs_urlNone
authorJason Kirtland
requires_python>=3
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            best-effort representations using smaller coded character sets (ASCII,
ISO 8859, etc.).  The translation tables used by the codecs are from
the ``transtab`` collection by Markus Kuhn.

Three types of transliterating codecs are provided:

  "long", using as many characters as needed to make a natural
   replacement.  For example, \u00e4 LATIN SMALL LETTER A WITH
   DIAERESIS ``ä`` will be replaced with ``ae``.

  "short", using the minimum number of characters to make a
  replacement.  For example, \u00e4 LATIN SMALL LETTER A WITH
  DIAERESIS ``ä`` will be replaced with ``a``.

  "one", only performing single character replacements.  Characters
  that can not be transliterated with a single character are passed
  through unchanged. For example, \u2639 WHITE FROWNING FACE ``☹``
  will be passed through unchanged.

Using the codecs is simple::

  >>> import translitcodec
  >>> import codecs
  >>> codecs.encode('fácil € ☺', 'translit/long')
  'facil EUR :-)'
  >>> codecs.encode('fácil € ☺', 'translit/short')
  'facil E :-)'

The codecs return Unicode by default.  To receive a bytestring back,
either chain the output of encode() to another codec, or append the
name of the desired byte encoding to the codec name::

  >>> codecs.encode('fácil € ☺', 'translit/one').encode('ascii', 'replace')
  'facil E ?'
  >>> 'fácil € ☺'.encode('translit/one/ascii', 'replace')
  'facil E ?'

The package also supplies a 'transliterate' codec, an alias for
'translit/long'.

Another way to use the library is to use an error handle.
Error handles are available:

  * 'strict/translit/long', 'strict/translit/short', 'strict/translit/one' - similar to 'strict'
  * 'ignore/translit/long', 'ignore/translit/short', 'ignore/translit/one' - similar to 'ignore'
  * 'replace/translit/long', 'replace/translit/short', 'replace/translit/one' - similar to 'replace'

These error handles above, work similarly to Python's built-in ones.
The difference is that transliteration is attempted first.

  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2')
  'Zażółć gęślą jaźń EUR :-)?!@#'
  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2')
  'Zażółć gęślą jaźń E :-)?!@#'
  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2')
  'Zażółć gęślą jaźń E ??!@#'
  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2')
  'Zażółć gęślą jaźń EUR :-)!@#'
  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2')
  'Zażółć gęślą jaźń E :-)!@#'
  >>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2')
  'Zażółć gęślą jaźń E !@#'

translitcodec Changes
=====================

0.7.0
-----
Released on May 8, 2021

- Added support for error handles
- Fixed conversion of the German eszett char

0.6.0
-----
Released on December 13, 2020

- Add support for Python 3.9

0.5.2
-----
Released on January 19, 2020

- Install package with setuptools

0.5.1
-----
Released on January 19, 2020

- Add python_requires to prevent installation with Python 2 packages

0.5
---
Released on January 18, 2020

- Complete coverage of the Vietnamese alphabet

- Removed Python 2 support

0.4
---
Released on May 11, 2015

- Added Python 3 compatibility

0.3
---

Released on February 14, 2011

- Fixes to the transtab table rebuilding tool.

- Added translitcodec.__version__

0.2
---

Released on January 27, 2011

- Resolves issue of "TypeError: character mapping must return integer,
  None or unicode" when a blank value (eg: \N{ZERO WIDTH SPACE} \u200B)
  was encoded.  Unicode blanks are now returned.

- Characters in the ASCII range are no longer included in the translation
  tables.

0.1
---

Released on December 28, 2008

- Initial packaged release.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/claudep/translitcodec",
    "name": "translitcodec",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jason Kirtland",
    "author_email": "jek@discorporate.us",
    "download_url": "https://files.pythonhosted.org/packages/f1/28/c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4/translitcodec-0.7.0.tar.gz",
    "platform": "",
    "description": "best-effort representations using smaller coded character sets (ASCII,\nISO 8859, etc.).  The translation tables used by the codecs are from\nthe ``transtab`` collection by Markus Kuhn.\n\nThree types of transliterating codecs are provided:\n\n  \"long\", using as many characters as needed to make a natural\n   replacement.  For example, \\u00e4 LATIN SMALL LETTER A WITH\n   DIAERESIS ``\u00e4`` will be replaced with ``ae``.\n\n  \"short\", using the minimum number of characters to make a\n  replacement.  For example, \\u00e4 LATIN SMALL LETTER A WITH\n  DIAERESIS ``\u00e4`` will be replaced with ``a``.\n\n  \"one\", only performing single character replacements.  Characters\n  that can not be transliterated with a single character are passed\n  through unchanged. For example, \\u2639 WHITE FROWNING FACE ``\u2639``\n  will be passed through unchanged.\n\nUsing the codecs is simple::\n\n  >>> import translitcodec\n  >>> import codecs\n  >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/long')\n  'facil EUR :-)'\n  >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/short')\n  'facil E :-)'\n\nThe codecs return Unicode by default.  To receive a bytestring back,\neither chain the output of encode() to another codec, or append the\nname of the desired byte encoding to the codec name::\n\n  >>> codecs.encode('f\u00e1cil \u20ac \u263a', 'translit/one').encode('ascii', 'replace')\n  'facil E ?'\n  >>> 'f\u00e1cil \u20ac \u263a'.encode('translit/one/ascii', 'replace')\n  'facil E ?'\n\nThe package also supplies a 'transliterate' codec, an alias for\n'translit/long'.\n\nAnother way to use the library is to use an error handle.\nError handles are available:\n\n  * 'strict/translit/long', 'strict/translit/short', 'strict/translit/one' - similar to 'strict'\n  * 'ignore/translit/long', 'ignore/translit/short', 'ignore/translit/one' - similar to 'ignore'\n  * 'replace/translit/long', 'replace/translit/short', 'replace/translit/one' - similar to 'replace'\n\nThese error handles above, work similarly to Python's built-in ones.\nThe difference is that transliteration is attempted first.\n\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 EUR :-)?!@#'\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E :-)?!@#'\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E ??!@#'\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 EUR :-)!@#'\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E :-)!@#'\n  >>> codecs.encode('Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 \u20ac \u263a\u53e6!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2')\n  'Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144 E !@#'\n\ntranslitcodec Changes\n=====================\n\n0.7.0\n-----\nReleased on May 8, 2021\n\n- Added support for error handles\n- Fixed conversion of the German eszett char\n\n0.6.0\n-----\nReleased on December 13, 2020\n\n- Add support for Python 3.9\n\n0.5.2\n-----\nReleased on January 19, 2020\n\n- Install package with setuptools\n\n0.5.1\n-----\nReleased on January 19, 2020\n\n- Add python_requires to prevent installation with Python 2 packages\n\n0.5\n---\nReleased on January 18, 2020\n\n- Complete coverage of the Vietnamese alphabet\n\n- Removed Python 2 support\n\n0.4\n---\nReleased on May 11, 2015\n\n- Added Python 3 compatibility\n\n0.3\n---\n\nReleased on February 14, 2011\n\n- Fixes to the transtab table rebuilding tool.\n\n- Added translitcodec.__version__\n\n0.2\n---\n\nReleased on January 27, 2011\n\n- Resolves issue of \"TypeError: character mapping must return integer,\n  None or unicode\" when a blank value (eg: \\N{ZERO WIDTH SPACE} \\u200B)\n  was encoded.  Unicode blanks are now returned.\n\n- Characters in the ASCII range are no longer included in the translation\n  tables.\n\n0.1\n---\n\nReleased on December 28, 2008\n\n- Initial packaged release.",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Unicode to 8-bit charset transliteration codec",
    "version": "0.7.0",
    "project_urls": {
        "Homepage": "https://github.com/claudep/translitcodec"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f128c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4",
                "md5": "a9699192bcc25bc5248e375703e28309",
                "sha256": "3be7975c630ec0f1dd5b3712160c991a9776132985aed2588cba083ba00fa3c8"
            },
            "downloads": -1,
            "filename": "translitcodec-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a9699192bcc25bc5248e375703e28309",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 52414,
            "upload_time": "2021-05-08T18:39:09",
            "upload_time_iso_8601": "2021-05-08T18:39:09.432301Z",
            "url": "https://files.pythonhosted.org/packages/f1/28/c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4/translitcodec-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-05-08 18:39:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "claudep",
    "github_project": "translitcodec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "translitcodec"
}
        
Elapsed time: 0.21120s