====================
Graph Transliterator
====================
.. image:: https://img.shields.io/pypi/v/graphtransliterator.svg
:target: https://pypi.python.org/pypi/graphtransliterator
:alt: PyPi Version
.. image:: https://readthedocs.org/projects/graphtransliterator/badge/?version=latest
:target: https://graphtransliterator.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://pyup.io/repos/github/seanpue/graphtransliterator/shield.svg
:target: https://pyup.io/repos/github/seanpue/graphtransliterator/
:alt: PyUp Updates
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/ambv/black
:alt: Code Style: Black
.. image:: https://img.shields.io/pypi/pyversions/graphtransliterator
:alt: PyPI - Python Version
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3558365.svg
:target: https://doi.org/10.5281/zenodo.3558365
:alt: Software repository DOI
.. image:: https://joss.theoj.org/papers/10.21105/joss.01717/status.svg
:target: https://doi.org/10.21105/joss.01717
:alt: Paper DOI
A graph-based transliteration tool that lets you convert the symbols of one
language or script to those of another using rules that you define.
* Free software: MIT license
* Documentation: https://graphtransliterator.readthedocs.io
* Repository: https://github.com/seanpue/graphtransliterator
Transliteration... What? Why?
-----------------------------
Moving text or data from one script or encoding to another is a common problem:
- Many languages are written in multiple scripts, and many people can only read one of
them. Moving between them can be a complex but necessary task in order to make
texts accessible.
- The identification of names and locations, as well as machine translation,
benefit from transliteration.
- Library systems often require metadata be in particular forms of romanization in
addition to the original script.
- Linguists need to move between different methods of phonetic transcription.
- Documents in legacy fonts must now be converted to contemporary Unicode ones.
- Complex-script languages are frequently approached in natural language processing and
in digital humanities research through transliteration, as it provides disambiguating
information about pronunciation, morphological boundaries, and unwritten elements not
present in the original script.
Graph Transliterator abstracts transliteration, offering an "easy reading" method for
developing transliterators that does not require writing a complex program. It also
contains bundled transliterators that are rigorously tested. These can be expanded to
handle many transliteration tasks.
Contributions are very welcome!
Features
--------
* Provides a transliteration tool that can be configured to convert the tokens
of an input string into an output string using:
* user-defined types of input **tokens** and **token classes**
* **transliteration rules** based on:
* a sequence of input tokens
* specific input tokens that precede or follow the token sequence
* classes of input tokens preceding or following specified tokens
* **"on match" rules** for output to be inserted between transliteration
rules involving particular token classes
* defined rules for **whitespace**, including its optional consolidation
* Can be setup using:
* an **"easy reading"** `YAML <https://yaml.org>`_ format that lets you
quickly craft settings for the transliteration tool
* a `JSON <https://json.org>`_ dump of a transliterator (quicker!)
* **"direct"** settings, perhaps passed programmatically, using a dictionary
* **Automatically orders rules** by the number of tokens in a
transliteration rule
* **Checks for ambiguity** in transliteration rules
* Can provide **details** about each transliteration rule match
* Allows **optional matching of all possible rules** in a particular location
* Permits **pruning of rules** with certain productions
* **Validates**, as well as **serializes** to and **deserializes** from JSON
and Python data types, using accessible
`marshmallow <https://marshmallow.readthedocs.io/>`_ schemas
* Provides **full support for Unicode**, including Unicode **character names**
in the "easy reading" YAML format
* Constructs and uses a **directed tree** and performs a **best-first search**
to find the most specific transliteration rule in a given context
* Includes **bundled transliterators** that *you* can add to
hat check for full test coverage of the nodes and edges of the internal graph and any
"on match" rules
* Includes a command-line interface to perform transliteration and other tasks
Sample Code and Graph
---------------------
.. code-block:: python
from graphtransliterator import GraphTransliterator
GraphTransliterator.from_yaml("""
tokens:
h: [consonant]
i: [vowel]
" ": [whitespace]
rules:
h: \N{LATIN SMALL LETTER TURNED I}
i: \N{LATIN SMALL LETTER TURNED H}
<whitespace> i: \N{LATIN CAPITAL LETTER TURNED H}
(<whitespace> h) i: \N{LATIN SMALL LETTER TURNED H}!
onmatch_rules:
- <whitespace> + <consonant>: ¡
whitespace:
default: " "
consolidate: true
token_class: whitespace
metadata:
title: "Upside Down Greeting Transliterator"
version: "1.0.0"
""").transliterate("hi")
.. code-block:: python
'¡ᴉɥ!'
.. figure:: https://raw.githubusercontent.com/seanpue/graphtransliterator/master/docs/_static/sample_graph.png
:alt: sample graph
Sample directed tree created by Graph Transliterator. The `rule` nodes are in double
circles, and `token` nodes are single circles. The numbers are the cost of the
particular edge, and less costly edges are searched first. Previous token classes
and previous tokens that must be present are found as constraints on the edges
incident to the terminal leaf `rule` nodes.
Get It Now
==========
.. code-block:: bash
$ pip install -U graphtransliterator
Citation
========
To cite Graph Transliterator, please use:
Pue, A. Sean (2019). Graph Transliterator: A graph-based transliteration tool.
Journal of Open Source Software, 4(44), 1717, https://doi.org/10.21105/joss.01717
Raw data
{
"_id": null,
"home_page": "https://github.com/seanpue/graphtransliterator",
"name": "graphtransliterator",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "A. Sean Pue",
"author_email": "seanpue@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/81/37/b348fdcfd80feb2164bcd63f38555417860efb9837f56f88dab714583a2b/graphtransliterator-1.2.4.tar.gz",
"platform": null,
"description": "====================\nGraph Transliterator\n====================\n\n.. image:: https://img.shields.io/pypi/v/graphtransliterator.svg\n :target: https://pypi.python.org/pypi/graphtransliterator\n :alt: PyPi Version\n\n.. image:: https://readthedocs.org/projects/graphtransliterator/badge/?version=latest\n :target: https://graphtransliterator.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://pyup.io/repos/github/seanpue/graphtransliterator/shield.svg\n :target: https://pyup.io/repos/github/seanpue/graphtransliterator/\n :alt: PyUp Updates\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :target: https://github.com/ambv/black\n :alt: Code Style: Black\n\n.. image:: https://img.shields.io/pypi/pyversions/graphtransliterator\n :alt: PyPI - Python Version\n\n.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3558365.svg\n :target: https://doi.org/10.5281/zenodo.3558365\n :alt: Software repository DOI\n\n.. image:: https://joss.theoj.org/papers/10.21105/joss.01717/status.svg\n :target: https://doi.org/10.21105/joss.01717\n :alt: Paper DOI\n\nA graph-based transliteration tool that lets you convert the symbols of one\nlanguage or script to those of another using rules that you define.\n\n* Free software: MIT license\n* Documentation: https://graphtransliterator.readthedocs.io\n* Repository: https://github.com/seanpue/graphtransliterator\n\nTransliteration... What? Why?\n-----------------------------\n\nMoving text or data from one script or encoding to another is a common problem:\n\n- Many languages are written in multiple scripts, and many people can only read one of\n them. Moving between them can be a complex but necessary task in order to make\n texts accessible.\n\n- The identification of names and locations, as well as machine translation,\n benefit from transliteration.\n\n- Library systems often require metadata be in particular forms of romanization in\n addition to the original script.\n\n- Linguists need to move between different methods of phonetic transcription.\n\n- Documents in legacy fonts must now be converted to contemporary Unicode ones.\n\n- Complex-script languages are frequently approached in natural language processing and\n in digital humanities research through transliteration, as it provides disambiguating\n information about pronunciation, morphological boundaries, and unwritten elements not\n present in the original script.\n\nGraph Transliterator abstracts transliteration, offering an \"easy reading\" method for\ndeveloping transliterators that does not require writing a complex program. It also\ncontains bundled transliterators that are rigorously tested. These can be expanded to\nhandle many transliteration tasks.\n\nContributions are very welcome!\n\n\nFeatures\n--------\n\n* Provides a transliteration tool that can be configured to convert the tokens\n of an input string into an output string using:\n\n * user-defined types of input **tokens** and **token classes**\n * **transliteration rules** based on:\n\n * a sequence of input tokens\n * specific input tokens that precede or follow the token sequence\n * classes of input tokens preceding or following specified tokens\n\n * **\"on match\" rules** for output to be inserted between transliteration\n rules involving particular token classes\n * defined rules for **whitespace**, including its optional consolidation\n\n* Can be setup using:\n\n * an **\"easy reading\"** `YAML <https://yaml.org>`_ format that lets you\n quickly craft settings for the transliteration tool\n * a `JSON <https://json.org>`_ dump of a transliterator (quicker!)\n * **\"direct\"** settings, perhaps passed programmatically, using a dictionary\n\n* **Automatically orders rules** by the number of tokens in a\n transliteration rule\n* **Checks for ambiguity** in transliteration rules\n* Can provide **details** about each transliteration rule match\n* Allows **optional matching of all possible rules** in a particular location\n* Permits **pruning of rules** with certain productions\n* **Validates**, as well as **serializes** to and **deserializes** from JSON\n and Python data types, using accessible\n `marshmallow <https://marshmallow.readthedocs.io/>`_ schemas\n* Provides **full support for Unicode**, including Unicode **character names**\n in the \"easy reading\" YAML format\n* Constructs and uses a **directed tree** and performs a **best-first search**\n to find the most specific transliteration rule in a given context\n* Includes **bundled transliterators** that *you* can add to\n hat check for full test coverage of the nodes and edges of the internal graph and any\n \"on match\" rules\n* Includes a command-line interface to perform transliteration and other tasks\n\nSample Code and Graph\n---------------------\n\n.. code-block:: python\n\n from graphtransliterator import GraphTransliterator\n GraphTransliterator.from_yaml(\"\"\"\n tokens:\n h: [consonant]\n i: [vowel]\n \" \": [whitespace]\n rules:\n h: \\N{LATIN SMALL LETTER TURNED I}\n i: \\N{LATIN SMALL LETTER TURNED H}\n <whitespace> i: \\N{LATIN CAPITAL LETTER TURNED H}\n (<whitespace> h) i: \\N{LATIN SMALL LETTER TURNED H}!\n onmatch_rules:\n - <whitespace> + <consonant>: \u00a1\n whitespace:\n default: \" \"\n consolidate: true\n token_class: whitespace\n metadata:\n title: \"Upside Down Greeting Transliterator\"\n version: \"1.0.0\"\n \"\"\").transliterate(\"hi\")\n\n.. code-block:: python\n\n '\u00a1\u1d09\u0265!'\n\n.. figure:: https://raw.githubusercontent.com/seanpue/graphtransliterator/master/docs/_static/sample_graph.png\n :alt: sample graph\n\n Sample directed tree created by Graph Transliterator. The `rule` nodes are in double\n circles, and `token` nodes are single circles. The numbers are the cost of the\n particular edge, and less costly edges are searched first. Previous token classes\n and previous tokens that must be present are found as constraints on the edges\n incident to the terminal leaf `rule` nodes.\n\n\nGet It Now\n==========\n\n.. code-block:: bash\n\n $ pip install -U graphtransliterator\n\nCitation\n========\n\nTo cite Graph Transliterator, please use:\n\n Pue, A. Sean (2019). Graph Transliterator: A graph-based transliteration tool.\n Journal of Open Source Software, 4(44), 1717, https://doi.org/10.21105/joss.01717\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A graph-based transliteration tool",
"version": "1.2.4",
"project_urls": {
"Documentation": "https://graphtransliterator.readthedocs.org",
"Homepage": "https://github.com/seanpue/graphtransliterator",
"Repository": "https://github.com/seanpue/graphtransliterator"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9ed04cedbfaf0b81e105128064b22bd24de69a045f9767929a2c1955d7b5faaa",
"md5": "b28399bcdd82e442343470692d10a884",
"sha256": "13b6e133529556e8a36946fd333fee9131737843b092375d4552a8f7204c83b2"
},
"downloads": -1,
"filename": "graphtransliterator-1.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b28399bcdd82e442343470692d10a884",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 44119,
"upload_time": "2023-10-16T01:59:31",
"upload_time_iso_8601": "2023-10-16T01:59:31.812543Z",
"url": "https://files.pythonhosted.org/packages/9e/d0/4cedbfaf0b81e105128064b22bd24de69a045f9767929a2c1955d7b5faaa/graphtransliterator-1.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8137b348fdcfd80feb2164bcd63f38555417860efb9837f56f88dab714583a2b",
"md5": "953a54a48783bbe8cfe151e5a44400d6",
"sha256": "6a2c4f442af028f915ec707c8bb506f8d27d756f0fd5f1557c999b53eac636ac"
},
"downloads": -1,
"filename": "graphtransliterator-1.2.4.tar.gz",
"has_sig": false,
"md5_digest": "953a54a48783bbe8cfe151e5a44400d6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 38022,
"upload_time": "2023-10-16T01:59:33",
"upload_time_iso_8601": "2023-10-16T01:59:33.495059Z",
"url": "https://files.pythonhosted.org/packages/81/37/b348fdcfd80feb2164bcd63f38555417860efb9837f56f88dab714583a2b/graphtransliterator-1.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-16 01:59:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "seanpue",
"github_project": "graphtransliterator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "graphtransliterator"
}