# Alphabetic
A lightweight Python module for querying language alphabets, codes, syllabaries and logographics
# Description
Alphabetic is a small project that was born out of the need to find out the alphabet of several languages for a private NLP project. Determining the alphabet of a language is required for various NLP tasks, e.g. for classifying the language of a given text or for normalizing it (e.g., by removing noisy/random strings).
The idea is simple: given the name of the [desired language](#Supported_Languages), Alphabetic first translates the language internally into an [ISO 639-2](https://www.loc.gov/standards/iso639-2/php/code_list.php) language code and then returns the corresponding alphabet.
# Installation
The easiest way to install Alphabetic is to use pip, where you can choose between (1) the PyPI repository and (2) this repository.
- (1) ```pip install alphabetic```
- (2) ```pip install git+https://github.com/Halvani/alphabetic.git```
The latter will pull and install the latest commit from this repository as well as the required Python dependencies.
# Usage
A simple lookup of a language's alphabet can be performed as follows:
```python
from alphabetic import Language, Alphabet
print(*Alphabet.by_language(Language.Greek))
# Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω
```
The output of ```by_language``` is a list of utf8-strings. Depending on the [selected language](#Supported_Languages), the alphabet can be further restricted in terms of letter casing:
```python
from alphabetic import Language, Alphabet, LetterCase
print(*Alphabet.by_language(Language.Bosnian, letter_case=LetterCase.Lower))
# а б в г д е ж з и к л м н о п р с т у ф х ц ч ш ђ ј љ њ ћ џ
```
Note that for some so-called [non-bicameral](https://www.liquidbubble.co.uk/blog/the-comprehensive-guide-to-typography-jargon-for-designers/) languages such as *Hebrew* or *Arabic*, which have **no** upper/lower case, such restrictions are not possible. Therefore, in such cases, the entire alphabet is returned:
```python
from alphabetic import Language, Alphabet, LetterCase
print(*Alphabet.by_language(Language.Hebrew, letter_case=LetterCase.Lower))
# א ב ג ד ה ו ז ח ט י כ ך ל מ ם נ ן ס ע פ ף צ ץ ק ר ש ת
print(*Alphabet.by_language(Language.Arabic, letter_case=LetterCase.Lower))
# ا ب ة ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي
```
According to [Wikipedia](https://en.wikipedia.org/wiki/List_of_writing_systems#Syllabaries):
"*A true alphabet contains separate letters (**not diacritic marks**) for both consonants and vowels.*" In order to strip out diacritics from a desired alphabet, you can restrict the output of ```by_language``` as follows:
```python
print(*Alphabet.by_language(Language.Czech, strip_diacritics=True))
# A B C C h D E F G H I J K L M N O P Q R S T U V W X Y Z a b c c h d e f g h i j k l m n o p q r s t u v w x y z
```
Moreover, you can strip out diphthongs that are present for several languages:
```python
print(*Alphabet.by_language(Language.Albanian)
# Entire alphabet: A B C Ç D Dh E Ë F G Gj H I J K L Ll M N Nj O P Q R Rr S Sh T Th U V X Xh Y Z Zh a b c ç d dh e ë f g gj h i j k l ll m n nj o p q r rr s sh t th u v x xh y z zh
print(*Alphabet.by_language(Language.Albanian, strip_diphthongs=True))
# A B C Ç D E Ë F G H I J K L M N O P Q R S T U V X Y Z a b c ç d e ë f g h i j k l m n o p q r s t u v x y z
```
# Features
- Currently 117 languages are supported, with more to follow over time
- At the heart of Alphabetic is a [json file](https://github.com/Halvani/alphabetic/blob/main/alphabetic/data/language_data.json) that can be used independently of the respective programming language or application
- Besides langauge alphabets, Alphabetic also provides codes (e.g., [Morse](https://en.wikipedia.org/wiki/Morse_code) or [NATO Phonetic Alphabet](https://en.wikipedia.org/wiki/NATO_phonetic_alphabet)), [syllabaries](https://en.wikipedia.org/wiki/Syllabary) and [logographics](https://en.wikipedia.org/wiki/Logogram)
<a name="Supported_Languages"></a>
# Supported languages
<details><summary>Open to view all supported languages</summary>
|Language|ISO 639-2 code|
|---|---|
|Abkhazian|abk|
|Afar|aar|
|Afrikaans|afr|
|Albanian|sqi|
|Amharic|amh|
|Arabic|ara|
|Armenian|arm|
|Assamese|asm|
|Avar|ava|
|Avestan|ave|
|Bambara|bam|
|Bashkir|bak|
|Basque|baq|
|Belarusian|bel|
|Bislama|bis|
|Boko|bqc|
|Bosnian|bos|
|Breton|bre|
|Bulgarian|bul|
|Buryat|bua|
|Catalan|cat|
|Chamorro|cha|
|Chechen|che|
|Cherokee|chr|
|Chichewa|nya|
|Chukchi|ckt|
|Chuvash|chv|
|Corsican|cos|
|Croatian|hrv|
|Czech|ces|
|Danish|dan|
|Dungan|dng|
|Dutch|nld|
|Dzongkha|dzo|
|English|eng|
|Esperanto|epo|
|Estonian|est|
|Ewe|ewe|
|Faroese|fao|
|Fijian|fij|
|Finnish|fin|
|French|fra|
|Gaelic|gla|
|Georgian|kat|
|German|deu|
|Greek|gre|
|Guarani|grn|
|Haitian|hat|
|Hausa|hau|
|Hawaiian|haw|
|Hebrew|heb|
|Herero|her|
|Hindi|hin|
|Icelandic|isl|
|Igbo|ibo|
|Indonesian|ind|
|Italian|ita|
|Javanese|jav|
|Kabardian|kbd|
|Kashubian|csb|
|Kazakh|kaz|
|Kirghiz|kir|
|Komi|kpv|
|Korean|kor|
|Kumyk|kum|
|Kurmanji|kmr|
|Latin|lat|
|Latvian|lav|
|Lezghian|lez|
|Lithuanian|lit|
|Luganda|lug|
|Macedonian|mkd|
|Malay|may|
|Maltese|mlt|
|Maori|mao|
|Mari|chm|
|Moksha|mdf|
|Moldovan|rum|
|Mongolian|mon|
|Mru|mro|
|Nepali|nep|
|Norwegian|nor|
|Occitan|oci|
|Pashto|pus|
|Persian|per|
|Polish|pol|
|Portuguese|por|
|Punjabi|pan|
|Quechua|que|
|Rohingya|rhg|
|Russian|rus|
|Samoan|smo|
|Sango|sag|
|Sanskrit|san|
|Serbian|srp|
|Slovak|slo|
|Slovenian|slv|
|Somali|som|
|Sorani|ckb|
|Spanish|spa|
|Sundanese|sun|
|Swedish|swe|
|Tajik|tgk|
|Tatar|tat|
|Turkish|tur|
|Turkmen|tuk|
|Tuvan|tyv|
|Twi|twi|
|Ukrainian|ukr|
|Uzbek|uzb|
|Venda|ven|
|Volapük|vol|
|Welsh|wel|
|Wolof|wol|
|Yakut|sah|
|Yiddish|yid|
|Zulu|zul|
</details>
# Contribution
If you like this project, you are welcome to support it, e.g. by providing additional languages (there is a lot to do with regard to [ISO 639-2](https://www.loc.gov/standards/iso639-2/php/code_list.php)). Feel free to fork the repository and create a pull request to suggest and collaborate on changes.
# Last remarks
As is usual with open source projects, we developers do not earn any money with what we do, but are primarily interested in giving something back to the community with fun, passion and joy. Nevertheless, we would be very happy if you rewarded all the time that has gone into the project with just a small star 🤗
Raw data
{
"_id": null,
"home_page": null,
"name": "alphabetic",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "alphabet, alphabet characters, alphabet letters, alphabet-list",
"author": "Oren Halvani",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f4/2b/7f63ccbd3e30b5cf93e9fd99a4ce43af3bd83b8a58f81f0d937b533a7ffc/alphabetic-0.0.4.tar.gz",
"platform": null,
"description": "# Alphabetic\nA lightweight Python module for querying language alphabets, codes, syllabaries and logographics\n\n# Description\nAlphabetic is a small project that was born out of the need to find out the alphabet of several languages for a private NLP project. Determining the alphabet of a language is required for various NLP tasks, e.g. for classifying the language of a given text or for normalizing it (e.g., by removing noisy/random strings). \n\nThe idea is simple: given the name of the [desired language](#Supported_Languages), Alphabetic first translates the language internally into an [ISO 639-2](https://www.loc.gov/standards/iso639-2/php/code_list.php) language code and then returns the corresponding alphabet. \n\n# Installation\nThe easiest way to install Alphabetic is to use pip, where you can choose between (1) the PyPI repository and (2) this repository. \n\n- (1) ```pip install alphabetic```\n- (2) ```pip install git+https://github.com/Halvani/alphabetic.git```\n\nThe latter will pull and install the latest commit from this repository as well as the required Python dependencies. \n\n# Usage\nA simple lookup of a language's alphabet can be performed as follows:\n```python\nfrom alphabetic import Language, Alphabet\n\nprint(*Alphabet.by_language(Language.Greek))\n\n# \u0391 \u0392 \u0393 \u0394 \u0395 \u0396 \u0397 \u0398 \u0399 \u039a \u039b \u039c \u039d \u039e \u039f \u03a0 \u03a1 \u03a3 \u03a4 \u03a5 \u03a6 \u03a7 \u03a8 \u03a9 \u03b1 \u03b2 \u03b3 \u03b4 \u03b5 \u03b6 \u03b7 \u03b8 \u03b9 \u03ba \u03bb \u03bc \u03bd \u03be \u03bf \u03c0 \u03c1 \u03c3 \u03c4 \u03c5 \u03c6 \u03c7 \u03c8 \u03c9\n```\n\nThe output of ```by_language``` is a list of utf8-strings. Depending on the [selected language](#Supported_Languages), the alphabet can be further restricted in terms of letter casing: \n\n```python\nfrom alphabetic import Language, Alphabet, LetterCase \n\nprint(*Alphabet.by_language(Language.Bosnian, letter_case=LetterCase.Lower))\n\n# \u0430 \u0431 \u0432 \u0433 \u0434 \u0435 \u0436 \u0437 \u0438 \u043a \u043b \u043c \u043d \u043e \u043f \u0440 \u0441 \u0442 \u0443 \u0444 \u0445 \u0446 \u0447 \u0448 \u0452 \u0458 \u0459 \u045a \u045b \u045f\n```\nNote that for some so-called [non-bicameral](https://www.liquidbubble.co.uk/blog/the-comprehensive-guide-to-typography-jargon-for-designers/) languages such as *Hebrew* or *Arabic*, which have **no** upper/lower case, such restrictions are not possible. Therefore, in such cases, the entire alphabet is returned:\n\n```python\nfrom alphabetic import Language, Alphabet, LetterCase \n\nprint(*Alphabet.by_language(Language.Hebrew, letter_case=LetterCase.Lower))\n\n# \u05d0 \u05d1 \u05d2 \u05d3 \u05d4 \u05d5 \u05d6 \u05d7 \u05d8 \u05d9 \u05db \u05da \u05dc \u05de \u05dd \u05e0 \u05df \u05e1 \u05e2 \u05e4 \u05e3 \u05e6 \u05e5 \u05e7 \u05e8 \u05e9 \u05ea\n\nprint(*Alphabet.by_language(Language.Arabic, letter_case=LetterCase.Lower))\n\n# \u0627 \u0628 \u0629 \u062a \u062b \u062c \u062d \u062e \u062f \u0630 \u0631 \u0632 \u0633 \u0634 \u0635 \u0636 \u0637 \u0638 \u0639 \u063a \u0641 \u0642 \u0643 \u0644 \u0645 \u0646 \u0647 \u0648 \u064a\n```\n\nAccording to [Wikipedia](https://en.wikipedia.org/wiki/List_of_writing_systems#Syllabaries): \n\"*A true alphabet contains separate letters (**not diacritic marks**) for both consonants and vowels.*\" In order to strip out diacritics from a desired alphabet, you can restrict the output of ```by_language``` as follows:\n```python\nprint(*Alphabet.by_language(Language.Czech, strip_diacritics=True))\n\n# A B C C h D E F G H I J K L M N O P Q R S T U V W X Y Z a b c c h d e f g h i j k l m n o p q r s t u v w x y z\n```\n\nMoreover, you can strip out diphthongs that are present for several languages:\n```python\n\nprint(*Alphabet.by_language(Language.Albanian)\n\n# Entire alphabet: A B C \u00c7 D Dh E \u00cb F G Gj H I J K L Ll M N Nj O P Q R Rr S Sh T Th U V X Xh Y Z Zh a b c \u00e7 d dh e \u00eb f g gj h i j k l ll m n nj o p q r rr s sh t th u v x xh y z zh\n\nprint(*Alphabet.by_language(Language.Albanian, strip_diphthongs=True))\n\n# A B C \u00c7 D E \u00cb F G H I J K L M N O P Q R S T U V X Y Z a b c \u00e7 d e \u00eb f g h i j k l m n o p q r s t u v x y z\n```\n\n# Features\n- Currently 117 languages are supported, with more to follow over time\n- At the heart of Alphabetic is a [json file](https://github.com/Halvani/alphabetic/blob/main/alphabetic/data/language_data.json) that can be used independently of the respective programming language or application\n- Besides langauge alphabets, Alphabetic also provides codes (e.g., [Morse](https://en.wikipedia.org/wiki/Morse_code) or [NATO Phonetic Alphabet](https://en.wikipedia.org/wiki/NATO_phonetic_alphabet)), [syllabaries](https://en.wikipedia.org/wiki/Syllabary) and [logographics](https://en.wikipedia.org/wiki/Logogram)\n\n\n<a name=\"Supported_Languages\"></a>\n# Supported languages\n<details><summary>Open to view all supported languages</summary>\n\n|Language|ISO 639-2 code|\n|---|---|\n|Abkhazian|abk|\n|Afar|aar|\n|Afrikaans|afr|\n|Albanian|sqi|\n|Amharic|amh|\n|Arabic|ara|\n|Armenian|arm|\n|Assamese|asm|\n|Avar|ava|\n|Avestan|ave|\n|Bambara|bam|\n|Bashkir|bak|\n|Basque|baq|\n|Belarusian|bel|\n|Bislama|bis|\n|Boko|bqc|\n|Bosnian|bos|\n|Breton|bre|\n|Bulgarian|bul|\n|Buryat|bua|\n|Catalan|cat|\n|Chamorro|cha|\n|Chechen|che|\n|Cherokee|chr|\n|Chichewa|nya|\n|Chukchi|ckt|\n|Chuvash|chv|\n|Corsican|cos|\n|Croatian|hrv|\n|Czech|ces|\n|Danish|dan|\n|Dungan|dng|\n|Dutch|nld|\n|Dzongkha|dzo|\n|English|eng|\n|Esperanto|epo|\n|Estonian|est|\n|Ewe|ewe|\n|Faroese|fao|\n|Fijian|fij|\n|Finnish|fin|\n|French|fra|\n|Gaelic|gla|\n|Georgian|kat|\n|German|deu|\n|Greek|gre|\n|Guarani|grn|\n|Haitian|hat|\n|Hausa|hau|\n|Hawaiian|haw|\n|Hebrew|heb|\n|Herero|her|\n|Hindi|hin|\n|Icelandic|isl|\n|Igbo|ibo|\n|Indonesian|ind|\n|Italian|ita|\n|Javanese|jav|\n|Kabardian|kbd|\n|Kashubian|csb|\n|Kazakh|kaz|\n|Kirghiz|kir|\n|Komi|kpv|\n|Korean|kor|\n|Kumyk|kum|\n|Kurmanji|kmr|\n|Latin|lat|\n|Latvian|lav|\n|Lezghian|lez|\n|Lithuanian|lit|\n|Luganda|lug|\n|Macedonian|mkd|\n|Malay|may|\n|Maltese|mlt|\n|Maori|mao|\n|Mari|chm|\n|Moksha|mdf|\n|Moldovan|rum|\n|Mongolian|mon|\n|Mru|mro|\n|Nepali|nep|\n|Norwegian|nor|\n|Occitan|oci|\n|Pashto|pus|\n|Persian|per|\n|Polish|pol|\n|Portuguese|por|\n|Punjabi|pan|\n|Quechua|que|\n|Rohingya|rhg|\n|Russian|rus|\n|Samoan|smo|\n|Sango|sag|\n|Sanskrit|san|\n|Serbian|srp|\n|Slovak|slo|\n|Slovenian|slv|\n|Somali|som|\n|Sorani|ckb|\n|Spanish|spa|\n|Sundanese|sun|\n|Swedish|swe|\n|Tajik|tgk|\n|Tatar|tat|\n|Turkish|tur|\n|Turkmen|tuk|\n|Tuvan|tyv|\n|Twi|twi|\n|Ukrainian|ukr|\n|Uzbek|uzb|\n|Venda|ven|\n|Volap\u00fck|vol|\n|Welsh|wel|\n|Wolof|wol|\n|Yakut|sah|\n|Yiddish|yid|\n|Zulu|zul|\n</details>\n\n# Contribution\nIf you like this project, you are welcome to support it, e.g. by providing additional languages (there is a lot to do with regard to [ISO 639-2](https://www.loc.gov/standards/iso639-2/php/code_list.php)). Feel free to fork the repository and create a pull request to suggest and collaborate on changes.\n\n# Last remarks\nAs is usual with open source projects, we developers do not earn any money with what we do, but are primarily interested in giving something back to the community with fun, passion and joy. Nevertheless, we would be very happy if you rewarded all the time that has gone into the project with just a small star \ud83e\udd17 \n\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A lightweight Python module for querying language alphabets, codes, syllabaries and logographics",
"version": "0.0.4",
"project_urls": {
"Bug Tracker": "https://github.com/Halvani/alphabetic/issues",
"Homepage": "https://github.com/Halvani/alphabetic"
},
"split_keywords": [
"alphabet",
" alphabet characters",
" alphabet letters",
" alphabet-list"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "75fea88d7880f9d726a3daf8583c6b419c93445d54acf6f83b5402083ba8a612",
"md5": "12e2377c34552d2b59b4762b6e02c468",
"sha256": "62f390d3d2d3d6d1242f2aa794efe18311bc74ad96b1d2de50f481c41605e035"
},
"downloads": -1,
"filename": "alphabetic-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "12e2377c34552d2b59b4762b6e02c468",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 27176,
"upload_time": "2024-05-13T15:47:42",
"upload_time_iso_8601": "2024-05-13T15:47:42.446033Z",
"url": "https://files.pythonhosted.org/packages/75/fe/a88d7880f9d726a3daf8583c6b419c93445d54acf6f83b5402083ba8a612/alphabetic-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f42b7f63ccbd3e30b5cf93e9fd99a4ce43af3bd83b8a58f81f0d937b533a7ffc",
"md5": "095cf7de071883678deb4209d2a6d1dc",
"sha256": "f872da0e1eff731743c355defc25d18bd11ec4c123bbd963260e3bdbe0784d9d"
},
"downloads": -1,
"filename": "alphabetic-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "095cf7de071883678deb4209d2a6d1dc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 27892,
"upload_time": "2024-05-13T15:47:44",
"upload_time_iso_8601": "2024-05-13T15:47:44.012510Z",
"url": "https://files.pythonhosted.org/packages/f4/2b/7f63ccbd3e30b5cf93e9fd99a4ce43af3bd83b8a58f81f0d937b533a7ffc/alphabetic-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-13 15:47:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Halvani",
"github_project": "alphabetic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "dcl",
"specs": [
[
"==",
"1.0.0"
]
]
}
],
"lcname": "alphabetic"
}