[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7734906.svg)](https://doi.org/10.5281/zenodo.7734906)
## What is CyrTranslit?
A Python package for bi-directional transliteration of Cyrillic script to Latin script and vice versa.
By default, transliterates for the Serbian language. A language flag can be set in order to transliterate to and from Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.
## What is transliteration?
Transliteration is the conversion of a text from one script to another. For instance, a Latin alphabet transliteration of the Serbian phrase _"Мој ховеркрафт је пун јегуља"_ is _"Moj hoverkraft je pun jegulja"_.
## Citation
A citation would be much appreciated if you use CyrTranslit in a research publication:
[Georges Labrèche. (2023). CyrTranslit (v1.1.1). Zenodo. https://doi.org/10.5281/zenodo.7734906](https://doi.org/10.5281/zenodo.7734906)
BibTex entry:
```bibtex
@software{georges_labreche_2023_7734906,
author = {Georges Labrèche},
title = {CyrTranslit},
month = mar,
year = 2023,
note = {{A Python package for bi-directional
transliteration of Cyrillic script to Latin script
and vice versa. Supports transliteration for
Bulgarian, Montenegrin, Macedonian, Mongolian,
Russian, Serbian, Tajik, and Ukrainian.}},
publisher = {Zenodo},
version = {v1.1.1},
doi = {10.5281/zenodo.7734906},
url = {https://doi.org/10.5281/zenodo.7734906}
}
```
## Supporting research
CyrTranslit is actively used as a reliable tool to advance research! Here's an incomplete list of publications for research projects that have relied on CyrTranslit:
- Ljajić, Adela & Prodanović, Nikola & Medvecki, Darija & Bašaragin, Bojana & Mitrović, Jelena. (2022). "[Topic Modeling Technique on Covid19 Tweets in Serbian](https://www.researchgate.net/publication/364302202_Topic_Modeling_Technique_on_Covid19_Tweets_in_Serbian)," in 12th International Conference on Information Society and Technology (ICIST), Kopaonik, Serbia.
- Mussylmanbay, Meiirgali. (2022). "[Addresses Standardization and Geocoding using Natural Language Processing](https://nur.nu.edu.kz/handle/123456789/6705)," Nazarbayev University, Kazakhstan.
- Jokic, Danka & Stanković, Ranka & Krstev, Cvetana & Šandrih Todorović, Branislava. (2021). "[A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian](https://drops.dagstuhl.de/opus/volltexte/2021/14549/)," in 3rd Conference on Language, Data and Knowledge (LDK 2021). 10.4230/OASIcs.LDK.2021.13.
- Lakew, Surafel Melaku (2020). "[Thesis Multilingual Neural Machine Translation for Low Resource Languages](https://surafelml.github.io/phd-thesis/)," University of Trento, Italy.
- Filo, Denis. (2020). "[Neuronový strojový překlad pro jazykové páry s malým množstvím trénovacích dat: Low-Resource Neural Machine Translation](https://www.fit.vut.cz/study/thesis/23087/.en)," Brno University of Technology, Brno, Czechia.
- Batanović, Vuk & Nikolic, Bosko. (2019). "[Using Language Technologies to Automate the UNDP Rapid Integrated Assessment Mechanism in Serbian](https://www.researchgate.net/publication/339615659_Using_Language_Technologies_to_Automate_the_UNDP_Rapid_Integrated_Assessment_Mechanism_in_Serbian)," in International Conference on Language Technologies for All: Enabling Linguistic Diversity and Multilingualism Worldwide (LT4All), Paris, France.
- Brown, J. M. M. & Schmidt, Andreas & Wierzba, Marta (Eds.). (2019). "[Of trees and birds: A Festschrift for Gisbert Fanselow](https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/42654/file/of_trees_and_birds.pdf)," Universitätsverlag Potsdam, Potsdam.
- Lakew, Surafel Melaku & Erofeeva, Aliia & Federico, Marcello. (2018). "[Neural Machine Translation into Language Varieties](https://aclanthology.org/W18-6316/)," in 3rd Conference on Machine Translation: Research Papers, Brussels, Belgium.
- Ljajić, Adela & Marovac, Ulfeta. (2018). "[Improving sentiment analysis for twitter data by handling negation rules in the Serbian language](http://www.doiserbia.nb.rs/Article.aspx?ID=1820-02141800013L)," Computer Science and Information Systems. 16. 13-13. 10.2298/CSIS180122013L.
- Жабран, И., Кикоть, А., Гафияк, А., Бородина, Е., & Алёшин, С. (2017). "[Developing Q-Orca site backend using various Python programming language libraries](https://www.moderntechno.de/index.php/meit/article/view/meit07-03-021)," Modern Engineering and Innovative Technologies, 3(07-03), 48–53.
## How do I install this?
CyrTranslit is [hosted in the Python Package Index (PyPI)](https://pypi.python.org/pypi/cyrtranslit) so it can be installed using pip:
```
python -m pip install cyrtranslit # latest version
python -m pip install cyrtranslit==1.1.1 # specific version
python -m pip install cyrtranslit>=1.1.1 # minimum version
```
## What languages are supported?
CyrTranslit currently supports bi-directional transliteration of Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian:
```python
>>> import cyrtranslit
>>> cyrtranslit.supported()
['bg', 'me', 'mk', 'mn', 'ru', 'sr', 'tj', 'ua']
```
## How do I use this?
CyrTranslit can be used both programatically and via command line interface.
### Programmatically
#### Bulgarian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Съединението прави силата!", "bg")
"Săedinenieto pravi silata!"
>>> cyrtranslit.to_cyrillic("Săedinenieto pravi silata!", "bg")
"Съединението прави силата!"
```
#### Montenegrin
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Република", "me")
"Republika"
>>> cyrtranslit.to_cyrillic("Republika", "me")
"Република"
```
#### Macedonian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моето летачко возило е полно со јагули", "mk")
"Moeto letačko vozilo e polno so jaguli"
>>> cyrtranslit.to_cyrillic("Moeto letačko vozilo e polno so jaguli", "mk")
"Моето летачко возило е полно со јагули"
```
#### Mongolian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө", "mn")
"Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö"
>>> cyrtranslit.to_cyrillic("Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö", "mn")
"Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө"
```
#### Russian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моё судно на воздушной подушке полно угрей", "ru")
"Moyo sudno na vozdushnoj podushke polno ugrej"
>>> cyrtranslit.to_cyrillic("Moyo sudno na vozdushnoj podushke polno ugrej", "ru")
"Моё судно на воздушной подушке полно угрей"
```
#### Serbian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Мој ховеркрафт је пун јегуља")
"Moj hoverkraft je pun jegulja"
>>> cyrtranslit.to_cyrillic("Moj hoverkraft je pun jegulja")
"Мој ховеркрафт је пун јегуља"
```
#### Tajik
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Ман мактуб навишта истодам", "tj")
"Man maktub navišta istodam"
>>> cyrtranslit.to_cyrillic("Man maktub navišta istodam", "tj")
"Ман мактуб навишта истодам"
```
#### Ukrainian
```python
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Під лежачий камінь вода не тече", "ua")
"Pid ležačyj kamin' voda ne teče"
>>> cyrtranslit.to_cyrillic("Pid ležačyj kamin' voda ne teče", "ua")
"Під лежачий камінь вода не тече"
```
## Command Line Interface
Sample command line call to transliterate a Russian text file:
```bash
$ cyrtranslit -l RU -i tests/ru.txt -o tests/output.txt
```
Use the -c argument to accomplish the reverse, that is to input latin characters and output cyrillic.
Use the -h argument for help.
You can also omit the input and output files and use standard input/output
```bash
$ echo 'Мој ховеркрафт је пун јегуља' | cyrtranslit -l sr
Moj hoverkraft je pun jegulja
$ echo 'Moj hoverkraft je pun jegulja' | cyrtranslit -l sr
Мој ховеркрафт је пун јегуља
```
You can test the "script" by running it directly on the Python command line interface, e.g.:
```python
>>> import sys
>>> import cyrtranslit.cyrtranslit
>>> sys.argv.extend(['-l', 'RU'])
>>> sys.argv.extend(['-i', 'tests/ru.txt'])
>>> sys.argv.extend(['-o', 'tests/output.txt'])
>>> cyrtranslit.cyrtranslit.main()
>>> exit()
```
## How can I contribute?
You can include support for other Cyrillic script alphabets. Follow these steps in order to do so:
1. Create a new transliteration dictionary in the **[mapping.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py)** file and reference to it in the _**[TRANSLIT\_DICT](https://github.com/opendatakosovo/cyrillic-transliteration/blob/ab88bb466d12b9a9ad8d3eb6dc86d0bab871175d/cyrtranslit/mapping.py#L326-L360)**_ dictionary.
2. Watch out for cases where two consecutive Latin alphabet letters are meant to transliterate into a single Cyrillic script letter. These cases need to be explicitly checked for [inside the **to_cyrillic()** function in **\_\_init\_\_.py**](https://github.com/opendatakosovo/cyrillic-transliteration/blob/ab88bb466d12b9a9ad8d3eb6dc86d0bab871175d/cyrtranslit/__init__.py#L62-L191).
3. Add test cases inside of **[tests.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/tests.py)**.
4. Add test CLI input files in the **[tests](https://github.com/opendatakosovo/cyrillic-transliteration/tree/master/tests)** directory.
5. Update the documentation in the **[README.md](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/README.md)**.
6. List yourself as one of the contributors.
Before tagging a release version and deploying to [PyPI](https://pypi.org/):
1. Update the `version` and `download_url` properties in [setup.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/setup.py).
2. [Reserve a Zenodo DOI](https://cassgvp.github.io/github-for-collaborative-documentation/docs/tut/6-Zenodo-integration.html) for the release and update this readme's Zenodo badge and [citation instructions](https://github.com/opendatakosovo/cyrillic-transliteration#citation).
A big thank you to everyone who contributed:
- Bulgarian 🇧🇬: [@Syndamia](https://github.com/Syndamia) and [@Sparkycz](https://github.com/Sparkycz).
- Russian 🇷🇺: [@ratijas](https://github.com/ratijas) and [@rominf](https://github.com/rominf).
- Tajik 🇹🇯: [@diejani](https://github.com/diejani).
- Ukrainian 🇺🇦: [@AnonymousVoice1](https://github.com/AnonymousVoice1).
- Mongolian 🇲🇳: [@Serbipunk](https://github.com/Serbipunk).
- Command Line Interface (CLI): [@ZJaume](https://github.com/ZJaume).
Raw data
{
"_id": null,
"home_page": "https://github.com/opendatakosovo/cyrillic-transliteration",
"name": "cyrtranslit",
"maintainer": "",
"docs_url": "https://pythonhosted.org/cyrtranslit/",
"requires_python": "",
"maintainer_email": "",
"keywords": "cyrillic,latin,transliteration,transliterate,cyrtranslit,bulgarian,montenegrin,macedonian,mongolian,russian,serbian,tajik,ukrainian",
"author": "Georges Labr\u00e8che, Open Data Kosovo",
"author_email": "georges@tanagraspace.com",
"download_url": "https://files.pythonhosted.org/packages/32/3e/b9f6d47b8326a263251c4ca0d1710010778a1654f0ee1ebbdf75376726e8/cyrtranslit-1.1.1.tar.gz",
"platform": null,
"description": "[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7734906.svg)](https://doi.org/10.5281/zenodo.7734906)\n\n## What is CyrTranslit?\nA Python package for bi-directional transliteration of Cyrillic script to Latin script and vice versa.\n\nBy default, transliterates for the Serbian language. A language flag can be set in order to transliterate to and from Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.\n\n## What is transliteration?\nTransliteration is the conversion of a text from one script to another. For instance, a Latin alphabet transliteration of the Serbian phrase _\"\u041c\u043e\u0458 \u0445\u043e\u0432\u0435\u0440\u043a\u0440\u0430\u0444\u0442 \u0458\u0435 \u043f\u0443\u043d \u0458\u0435\u0433\u0443\u0459\u0430\"_ is _\"Moj hoverkraft je pun jegulja\"_.\n\n## Citation\nA citation would be much appreciated if you use CyrTranslit in a research publication:\n\n[Georges Labr\u00e8che. (2023). CyrTranslit (v1.1.1). Zenodo. https://doi.org/10.5281/zenodo.7734906](https://doi.org/10.5281/zenodo.7734906)\n\nBibTex entry:\n```bibtex\n@software{georges_labreche_2023_7734906,\n author = {Georges Labr\u00e8che},\n title = {CyrTranslit},\n month = mar,\n year = 2023,\n note = {{A Python package for bi-directional \n transliteration of Cyrillic script to Latin script\n and vice versa. Supports transliteration for\n Bulgarian, Montenegrin, Macedonian, Mongolian,\n Russian, Serbian, Tajik, and Ukrainian.}},\n publisher = {Zenodo},\n version = {v1.1.1},\n doi = {10.5281/zenodo.7734906},\n url = {https://doi.org/10.5281/zenodo.7734906}\n}\n```\n\n## Supporting research\nCyrTranslit is actively used as a reliable tool to advance research! Here's an incomplete list of publications for research projects that have relied on CyrTranslit:\n- Ljaji\u0107, Adela & Prodanovi\u0107, Nikola & Medvecki, Darija & Ba\u0161aragin, Bojana & Mitrovi\u0107, Jelena. (2022). \"[Topic Modeling Technique on Covid19 Tweets in Serbian](https://www.researchgate.net/publication/364302202_Topic_Modeling_Technique_on_Covid19_Tweets_in_Serbian),\" in 12th International Conference on Information Society and Technology (ICIST), Kopaonik, Serbia.\n- Mussylmanbay, Meiirgali. (2022). \"[Addresses Standardization and Geocoding using Natural Language Processing](https://nur.nu.edu.kz/handle/123456789/6705),\" Nazarbayev University, Kazakhstan.\n- Jokic, Danka & Stankovi\u0107, Ranka & Krstev, Cvetana & \u0160andrih Todorovi\u0107, Branislava. (2021). \"[A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian](https://drops.dagstuhl.de/opus/volltexte/2021/14549/),\" in 3rd Conference on Language, Data and Knowledge (LDK 2021). 10.4230/OASIcs.LDK.2021.13. \n- Lakew, Surafel Melaku (2020). \"[Thesis Multilingual Neural Machine Translation for Low Resource Languages](https://surafelml.github.io/phd-thesis/),\" University of Trento, Italy.\n- Filo, Denis. (2020). \"[Neuronov\u00fd strojov\u00fd p\u0159eklad pro jazykov\u00e9 p\u00e1ry s mal\u00fdm mno\u017estv\u00edm tr\u00e9novac\u00edch dat: Low-Resource Neural Machine Translation](https://www.fit.vut.cz/study/thesis/23087/.en),\" Brno University of Technology, Brno, Czechia.\n- Batanovi\u0107, Vuk & Nikolic, Bosko. (2019). \"[Using Language Technologies to Automate the UNDP Rapid Integrated Assessment Mechanism in Serbian](https://www.researchgate.net/publication/339615659_Using_Language_Technologies_to_Automate_the_UNDP_Rapid_Integrated_Assessment_Mechanism_in_Serbian),\" in International Conference on Language Technologies for All: Enabling Linguistic Diversity and Multilingualism Worldwide (LT4All), Paris, France.\n- Brown, J. M. M. & Schmidt, Andreas & Wierzba, Marta (Eds.). (2019). \"[Of trees and birds: A Festschrift for Gisbert Fanselow](https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/42654/file/of_trees_and_birds.pdf),\" Universit\u00e4tsverlag Potsdam, Potsdam.\n- Lakew, Surafel Melaku & Erofeeva, Aliia & Federico, Marcello. (2018). \"[Neural Machine Translation into Language Varieties](https://aclanthology.org/W18-6316/),\" in 3rd Conference on Machine Translation: Research Papers, Brussels, Belgium.\n- Ljaji\u0107, Adela & Marovac, Ulfeta. (2018). \"[Improving sentiment analysis for twitter data by handling negation rules in the Serbian language](http://www.doiserbia.nb.rs/Article.aspx?ID=1820-02141800013L),\" Computer Science and Information Systems. 16. 13-13. 10.2298/CSIS180122013L. \n- \u0416\u0430\u0431\u0440\u0430\u043d, \u0418., \u041a\u0438\u043a\u043e\u0442\u044c, \u0410., \u0413\u0430\u0444\u0438\u044f\u043a, \u0410., \u0411\u043e\u0440\u043e\u0434\u0438\u043d\u0430, \u0415., & \u0410\u043b\u0451\u0448\u0438\u043d, \u0421. (2017). \"[Developing Q-Orca site backend using various Python programming language libraries](https://www.moderntechno.de/index.php/meit/article/view/meit07-03-021),\" Modern Engineering and Innovative Technologies, 3(07-03), 48\u201353.\n\n## How do I install this?\nCyrTranslit is [hosted in the Python Package Index (PyPI)](https://pypi.python.org/pypi/cyrtranslit) so it can be installed using pip:\n```\npython -m pip install cyrtranslit # latest version\npython -m pip install cyrtranslit==1.1.1 # specific version\npython -m pip install cyrtranslit>=1.1.1 # minimum version\n```\n\n## What languages are supported?\nCyrTranslit currently supports bi-directional transliteration of Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian:\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.supported()\n['bg', 'me', 'mk', 'mn', 'ru', 'sr', 'tj', 'ua']\n```\n## How do I use this? \nCyrTranslit can be used both programatically and via command line interface.\n\n### Programmatically\n#### Bulgarian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u0421\u044a\u0435\u0434\u0438\u043d\u0435\u043d\u0438\u0435\u0442\u043e \u043f\u0440\u0430\u0432\u0438 \u0441\u0438\u043b\u0430\u0442\u0430!\", \"bg\")\n\"S\u0103edinenieto pravi silata!\"\n>>> cyrtranslit.to_cyrillic(\"S\u0103edinenieto pravi silata!\", \"bg\")\n\"\u0421\u044a\u0435\u0434\u0438\u043d\u0435\u043d\u0438\u0435\u0442\u043e \u043f\u0440\u0430\u0432\u0438 \u0441\u0438\u043b\u0430\u0442\u0430!\"\n```\n\n#### Montenegrin\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430\", \"me\")\n\"Republika\"\n>>> cyrtranslit.to_cyrillic(\"Republika\", \"me\")\n\"\u0420\u0435\u043f\u0443\u0431\u043b\u0438\u043a\u0430\"\n```\n\n#### Macedonian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u041c\u043e\u0435\u0442\u043e \u043b\u0435\u0442\u0430\u0447\u043a\u043e \u0432\u043e\u0437\u0438\u043b\u043e \u0435 \u043f\u043e\u043b\u043d\u043e \u0441\u043e \u0458\u0430\u0433\u0443\u043b\u0438\", \"mk\")\n\"Moeto leta\u010dko vozilo e polno so jaguli\"\n>>> cyrtranslit.to_cyrillic(\"Moeto leta\u010dko vozilo e polno so jaguli\", \"mk\")\n\"\u041c\u043e\u0435\u0442\u043e \u043b\u0435\u0442\u0430\u0447\u043a\u043e \u0432\u043e\u0437\u0438\u043b\u043e \u0435 \u043f\u043e\u043b\u043d\u043e \u0441\u043e \u0458\u0430\u0433\u0443\u043b\u0438\"\n```\n\n#### Mongolian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u0410\u043c\u0440\u0430\u0433\u0430\u0430 \u0421\u04af\u043d\u0436\u0438\u0434\u043c\u0430\u0430\u0433\u0430\u0430 \u0433\u044d\u0441\u044d\u044d\u0440 \u0438\u0440\u043b\u044d\u044d \u0434\u044d\u044d \u0445\u04e9-\u0445\u04e9-\u0445\u04e9\", \"mn\")\n\"Amragaa S\u00fcnjidmaagaa geseer irlee dee kh\u00f6-kh\u00f6-kh\u00f6\"\n>>> cyrtranslit.to_cyrillic(\"Amragaa S\u00fcnjidmaagaa geseer irlee dee kh\u00f6-kh\u00f6-kh\u00f6\", \"mn\")\n\"\u0410\u043c\u0440\u0430\u0433\u0430\u0430 \u0421\u04af\u043d\u0436\u0438\u0434\u043c\u0430\u0430\u0433\u0430\u0430 \u0433\u044d\u0441\u044d\u044d\u0440 \u0438\u0440\u043b\u044d\u044d \u0434\u044d\u044d \u0445\u04e9-\u0445\u04e9-\u0445\u04e9\"\n```\n\n#### Russian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u041c\u043e\u0451 \u0441\u0443\u0434\u043d\u043e \u043d\u0430 \u0432\u043e\u0437\u0434\u0443\u0448\u043d\u043e\u0439 \u043f\u043e\u0434\u0443\u0448\u043a\u0435 \u043f\u043e\u043b\u043d\u043e \u0443\u0433\u0440\u0435\u0439\", \"ru\")\n\"Moyo sudno na vozdushnoj podushke polno ugrej\"\n>>> cyrtranslit.to_cyrillic(\"Moyo sudno na vozdushnoj podushke polno ugrej\", \"ru\")\n\"\u041c\u043e\u0451 \u0441\u0443\u0434\u043d\u043e \u043d\u0430 \u0432\u043e\u0437\u0434\u0443\u0448\u043d\u043e\u0439 \u043f\u043e\u0434\u0443\u0448\u043a\u0435 \u043f\u043e\u043b\u043d\u043e \u0443\u0433\u0440\u0435\u0439\"\n```\n\n#### Serbian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u041c\u043e\u0458 \u0445\u043e\u0432\u0435\u0440\u043a\u0440\u0430\u0444\u0442 \u0458\u0435 \u043f\u0443\u043d \u0458\u0435\u0433\u0443\u0459\u0430\")\n\"Moj hoverkraft je pun jegulja\"\n>>> cyrtranslit.to_cyrillic(\"Moj hoverkraft je pun jegulja\")\n\"\u041c\u043e\u0458 \u0445\u043e\u0432\u0435\u0440\u043a\u0440\u0430\u0444\u0442 \u0458\u0435 \u043f\u0443\u043d \u0458\u0435\u0433\u0443\u0459\u0430\"\n```\n\n#### Tajik\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u041c\u0430\u043d \u043c\u0430\u043a\u0442\u0443\u0431 \u043d\u0430\u0432\u0438\u0448\u0442\u0430 \u0438\u0441\u0442\u043e\u0434\u0430\u043c\", \"tj\")\n\"Man maktub navi\u0161ta istodam\"\n>>> cyrtranslit.to_cyrillic(\"Man maktub navi\u0161ta istodam\", \"tj\")\n\"\u041c\u0430\u043d \u043c\u0430\u043a\u0442\u0443\u0431 \u043d\u0430\u0432\u0438\u0448\u0442\u0430 \u0438\u0441\u0442\u043e\u0434\u0430\u043c\"\n```\n\n#### Ukrainian\n```python\n>>> import cyrtranslit\n>>> cyrtranslit.to_latin(\"\u041f\u0456\u0434 \u043b\u0435\u0436\u0430\u0447\u0438\u0439 \u043a\u0430\u043c\u0456\u043d\u044c \u0432\u043e\u0434\u0430 \u043d\u0435 \u0442\u0435\u0447\u0435\", \"ua\")\n\"Pid le\u017ea\u010dyj kamin' voda ne te\u010de\"\n>>> cyrtranslit.to_cyrillic(\"Pid le\u017ea\u010dyj kamin' voda ne te\u010de\", \"ua\")\n\"\u041f\u0456\u0434 \u043b\u0435\u0436\u0430\u0447\u0438\u0439 \u043a\u0430\u043c\u0456\u043d\u044c \u0432\u043e\u0434\u0430 \u043d\u0435 \u0442\u0435\u0447\u0435\"\n```\n\n## Command Line Interface\nSample command line call to transliterate a Russian text file:\n```bash\n$ cyrtranslit -l RU -i tests/ru.txt -o tests/output.txt\n```\n\nUse the -c argument to accomplish the reverse, that is to input latin characters and output cyrillic.\n\nUse the -h argument for help.\n\nYou can also omit the input and output files and use standard input/output\n```bash\n$ echo '\u041c\u043e\u0458 \u0445\u043e\u0432\u0435\u0440\u043a\u0440\u0430\u0444\u0442 \u0458\u0435 \u043f\u0443\u043d \u0458\u0435\u0433\u0443\u0459\u0430' | cyrtranslit -l sr\nMoj hoverkraft je pun jegulja\n$ echo 'Moj hoverkraft je pun jegulja' | cyrtranslit -l sr\n\u041c\u043e\u0458 \u0445\u043e\u0432\u0435\u0440\u043a\u0440\u0430\u0444\u0442 \u0458\u0435 \u043f\u0443\u043d \u0458\u0435\u0433\u0443\u0459\u0430\n```\n\nYou can test the \"script\" by running it directly on the Python command line interface, e.g.:\n```python\n>>> import sys\n>>> import cyrtranslit.cyrtranslit\n>>> sys.argv.extend(['-l', 'RU'])\n>>> sys.argv.extend(['-i', 'tests/ru.txt'])\n>>> sys.argv.extend(['-o', 'tests/output.txt'])\n>>> cyrtranslit.cyrtranslit.main()\n>>> exit()\n```\n\n\n## How can I contribute?\nYou can include support for other Cyrillic script alphabets. Follow these steps in order to do so:\n\n1. Create a new transliteration dictionary in the **[mapping.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py)** file and reference to it in the _**[TRANSLIT\\_DICT](https://github.com/opendatakosovo/cyrillic-transliteration/blob/ab88bb466d12b9a9ad8d3eb6dc86d0bab871175d/cyrtranslit/mapping.py#L326-L360)**_ dictionary.\n2. Watch out for cases where two consecutive Latin alphabet letters are meant to transliterate into a single Cyrillic script letter. These cases need to be explicitly checked for [inside the **to_cyrillic()** function in **\\_\\_init\\_\\_.py**](https://github.com/opendatakosovo/cyrillic-transliteration/blob/ab88bb466d12b9a9ad8d3eb6dc86d0bab871175d/cyrtranslit/__init__.py#L62-L191).\n3. Add test cases inside of **[tests.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/tests.py)**.\n4. Add test CLI input files in the **[tests](https://github.com/opendatakosovo/cyrillic-transliteration/tree/master/tests)** directory.\n5. Update the documentation in the **[README.md](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/README.md)**.\n6. List yourself as one of the contributors.\n\nBefore tagging a release version and deploying to [PyPI](https://pypi.org/):\n1. Update the `version` and `download_url` properties in [setup.py](https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/setup.py).\n2. [Reserve a Zenodo DOI](https://cassgvp.github.io/github-for-collaborative-documentation/docs/tut/6-Zenodo-integration.html) for the release and update this readme's Zenodo badge and [citation instructions](https://github.com/opendatakosovo/cyrillic-transliteration#citation).\n\nA big thank you to everyone who contributed:\n- Bulgarian \ud83c\udde7\ud83c\uddec: [@Syndamia](https://github.com/Syndamia) and [@Sparkycz](https://github.com/Sparkycz).\n- Russian \ud83c\uddf7\ud83c\uddfa: [@ratijas](https://github.com/ratijas) and [@rominf](https://github.com/rominf).\n- Tajik \ud83c\uddf9\ud83c\uddef: [@diejani](https://github.com/diejani).\n- Ukrainian \ud83c\uddfa\ud83c\udde6: [@AnonymousVoice1](https://github.com/AnonymousVoice1).\n- Mongolian \ud83c\uddf2\ud83c\uddf3: [@Serbipunk](https://github.com/Serbipunk).\n- Command Line Interface (CLI): [@ZJaume](https://github.com/ZJaume).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Bi-directional Cyrillic transliteration. Transliterate Cyrillic script to Latin script and vice versa. Supports transliteration for Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.",
"version": "1.1.1",
"split_keywords": [
"cyrillic",
"latin",
"transliteration",
"transliterate",
"cyrtranslit",
"bulgarian",
"montenegrin",
"macedonian",
"mongolian",
"russian",
"serbian",
"tajik",
"ukrainian"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b90996818910b17cae199c3183aab7a8e3e9b694337c4673b36dc7165bb0c440",
"md5": "45f7c720b48eae387179d6399453a304",
"sha256": "51b65cb0497042231bce2951fd2df1fb8f659fc20ac355a1044e578b97870256"
},
"downloads": -1,
"filename": "cyrtranslit-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "45f7c720b48eae387179d6399453a304",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14172,
"upload_time": "2023-03-14T22:41:51",
"upload_time_iso_8601": "2023-03-14T22:41:51.445343Z",
"url": "https://files.pythonhosted.org/packages/b9/09/96818910b17cae199c3183aab7a8e3e9b694337c4673b36dc7165bb0c440/cyrtranslit-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "323eb9f6d47b8326a263251c4ca0d1710010778a1654f0ee1ebbdf75376726e8",
"md5": "2e44ca23a2036985173b978dccb5cade",
"sha256": "04edd4c89b1a4611b81de609f5c91f91afae980c728dea2f09522d7b10f73228"
},
"downloads": -1,
"filename": "cyrtranslit-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "2e44ca23a2036985173b978dccb5cade",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 17927,
"upload_time": "2023-03-14T22:41:53",
"upload_time_iso_8601": "2023-03-14T22:41:53.715262Z",
"url": "https://files.pythonhosted.org/packages/32/3e/b9f6d47b8326a263251c4ca0d1710010778a1654f0ee1ebbdf75376726e8/cyrtranslit-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-14 22:41:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "opendatakosovo",
"github_project": "cyrillic-transliteration",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "cyrtranslit"
}