# numericnormalizer
This is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.
## Installation
`pip install numericnormalizer`
## Usage
### importing the module
```Python
from numericnormalizer import normalizer
```
### Convert a number to a word (i.e. 5 -> 'five')
```Python
normalizer.number_to_word(5, lang='en')
>> "five"
normalizer.number_to_word(5, lang='zh')
>> "五"
```
### Convert a word to a number (i.e. 'five' -> 5)
```Python
normalizer.word_to_number('five', lang='en')
>> 5
normalizer.number_to_word('五', lang='zh')
>> 5
```
### Format numbers in a sentence
#### Example 1: default formatting
```Python
normalizer.format_sentence(
sentence='What are the 6 principles of intercultural adaption?',
lang='zh'
)
>> "What are the six (6) principles of intercultural adaption?"
```
#### Example 2: Custom Formatting
```Python
normalizer.format_sentence(
sentence='I have 4 apples and five oranges.',
lang='zh',
formatting='{number} [{word}]', # custom formatting
)
>> "I have 4 [four] apples and 5 [five] oranges."
```
#### Example 3: Number restricting
```Python
normalizer.format_sentence(
sentence='I have 4 apples and five oranges.',
lang='zh',
max_number=4 # restrict the max_number
)
>> "I have four (4) apples and five oranges."
```
## Language Support
The supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Bulgarian (bg)
- Burmese (my)
- Catalan (ca)
- Central Khmer (km)
- Chinese (zh)
- Chinese Simplified (zh_chs)
- Chinese Traditional (zh_cht)
- Chuvash (cv)
- Corsican (co)
- Croatian (hr)
- Czech (cs)
- Danish (da)
- Dari (prs)
- Divehi (dv)
- Dutch (nl)
- English (en)
- Esperanto (eo)
- Estonian (et)
- Faroese (fo)
- Fijian (fj)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian (ht)
- Hausa (ha)
- Hebrew (he)
- Hindi (hi)
- Hmong Daw (mww)
- Hungarian (hu)
- Icelandic (is)
- Igbo (ig)
- Indonesian (id)
- Inuktitut (iu)
- Irish (ga)
- Italian (it)
- Japanese (ja)
- Javanese (jv)
- Kannada (kn)
- Kazakh (kk)
- Kinyarwanda (rw)
- Kirghiz (ky)
- Korean (ko)
- Kurdish (ku)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lithuanian (lt)
- Luxembourgish (lb)
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Maori (mi)
- Marathi (mr)
- Mongolian (mn)
- Nepali (ne)
- Norwegian (no)
- Norwegian Nynorsk (nn)
- Odia (or)
- Pasht (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Queretaro Otomi (otq)
- Romanian (ro)
- Russian (ru)
- Samoan (sm)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
- Tagalog (tl)
- Tahitian (ty)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Tigrinya (ti)
- Tongan (to)
- Turkish (tr)
- Turkmen (tk)
- Upper Sorbian (hsb)
- Uyghur (ug)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Xhosa (xh)
- Yiddish (yi)
- Yoruba (yo)
- Yucatec Maya (yua)
- Zulu (zu)
However for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation
## Number support
Currently only support numbers 0 - 10. No negatives.
Raw data
{
"_id": null,
"home_page": "",
"name": "numericnormalizer",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "python,nlp,numeric,sentence,language,convert,regex",
"author": "mattcoulter7 (Matt Coulter)",
"author_email": "<mattcoul7@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
"platform": null,
"description": "\r\n# numericnormalizer\r\n \r\nThis is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.\r\n\r\n## Installation\r\n`pip install numericnormalizer`\r\n\r\n## Usage\r\n### importing the module\r\n```Python\r\nfrom numericnormalizer import normalizer\r\n```\r\n\r\n### Convert a number to a word (i.e. 5 -> 'five')\r\n```Python\r\nnormalizer.number_to_word(5, lang='en')\r\n>> \"five\"\r\n\r\n\r\nnormalizer.number_to_word(5, lang='zh')\r\n>> \"\u4e94\"\r\n```\r\n\r\n### Convert a word to a number (i.e. 'five' -> 5)\r\n```Python\r\nnormalizer.word_to_number('five', lang='en')\r\n>> 5\r\n\r\n\r\nnormalizer.number_to_word('\u4e94', lang='zh')\r\n>> 5\r\n```\r\n\r\n### Format numbers in a sentence\r\n#### Example 1: default formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n sentence='What are the 6 principles of intercultural adaption?',\r\n lang='zh'\r\n)\r\n>> \"What are the six (6) principles of intercultural adaption?\"\r\n```\r\n\r\n#### Example 2: Custom Formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n sentence='I have 4 apples and five oranges.',\r\n lang='zh',\r\n formatting='{number} [{word}]', # custom formatting\r\n)\r\n>> \"I have 4 [four] apples and 5 [five] oranges.\"\r\n```\r\n\r\n#### Example 3: Number restricting\r\n```Python\r\nnormalizer.format_sentence(\r\n sentence='I have 4 apples and five oranges.',\r\n lang='zh',\r\n max_number=4 # restrict the max_number\r\n)\r\n>> \"I have four (4) apples and five oranges.\"\r\n```\r\n\r\n## Language Support\r\nThe supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):\r\n- Afrikaans (af)\r\n- Albanian (sq)\r\n- Amharic (am)\r\n- Arabic (ar)\r\n- Armenian (hy)\r\n- Assamese (as)\r\n- Azerbaijani (az)\r\n- Bashkir (ba)\r\n- Basque (eu)\r\n- Belarusian (be)\r\n- Bengali (bn)\r\n- Bosnian (bs)\r\n- Bulgarian (bg)\r\n- Burmese (my)\r\n- Catalan (ca)\r\n- Central Khmer (km)\r\n- Chinese (zh)\r\n- Chinese Simplified (zh_chs)\r\n- Chinese Traditional (zh_cht)\r\n- Chuvash (cv)\r\n- Corsican (co)\r\n- Croatian (hr)\r\n- Czech (cs)\r\n- Danish (da)\r\n- Dari (prs)\r\n- Divehi (dv)\r\n- Dutch (nl)\r\n- English (en)\r\n- Esperanto (eo)\r\n- Estonian (et)\r\n- Faroese (fo)\r\n- Fijian (fj)\r\n- Finnish (fi)\r\n- French (fr)\r\n- Galician (gl)\r\n- Georgian (ka)\r\n- German (de)\r\n- Greek (el)\r\n- Gujarati (gu)\r\n- Haitian (ht)\r\n- Hausa (ha)\r\n- Hebrew (he)\r\n- Hindi (hi)\r\n- Hmong Daw (mww)\r\n- Hungarian (hu)\r\n- Icelandic (is)\r\n- Igbo (ig)\r\n- Indonesian (id)\r\n- Inuktitut (iu)\r\n- Irish (ga)\r\n- Italian (it)\r\n- Japanese (ja)\r\n- Javanese (jv)\r\n- Kannada (kn)\r\n- Kazakh (kk)\r\n- Kinyarwanda (rw)\r\n- Kirghiz (ky)\r\n- Korean (ko)\r\n- Kurdish (ku)\r\n- Lao (lo)\r\n- Latin (la)\r\n- Latvian (lv)\r\n- Lithuanian (lt)\r\n- Luxembourgish (lb)\r\n- Macedonian (mk)\r\n- Malagasy (mg)\r\n- Malay (ms)\r\n- Malayalam (ml)\r\n- Maltese (mt)\r\n- Maori (mi)\r\n- Marathi (mr)\r\n- Mongolian (mn)\r\n- Nepali (ne)\r\n- Norwegian (no)\r\n- Norwegian Nynorsk (nn)\r\n- Odia (or)\r\n- Pasht (ps)\r\n- Persian (fa)\r\n- Polish (pl)\r\n- Portuguese (pt)\r\n- Punjabi (pa)\r\n- Queretaro Otomi (otq)\r\n- Romanian (ro)\r\n- Russian (ru)\r\n- Samoan (sm)\r\n- Serbian (sr)\r\n- Shona (sn)\r\n- Sindhi (sd)\r\n- Sinhala (si)\r\n- Slovak (sk)\r\n- Slovenian (sl)\r\n- Somali (so)\r\n- Spanish (es)\r\n- Sundanese (su)\r\n- Swahili (sw)\r\n- Swedish (sv)\r\n- Tagalog (tl)\r\n- Tahitian (ty)\r\n- Tajik (tg)\r\n- Tamil (ta)\r\n- Tatar (tt)\r\n- Telugu (te)\r\n- Thai (th)\r\n- Tibetan (bo)\r\n- Tigrinya (ti)\r\n- Tongan (to)\r\n- Turkish (tr)\r\n- Turkmen (tk)\r\n- Upper Sorbian (hsb)\r\n- Uyghur (ug)\r\n- Ukrainian (uk)\r\n- Urdu (ur)\r\n- Uzbek (uz)\r\n- Vietnamese (vi)\r\n- Welsh (cy)\r\n- Xhosa (xh)\r\n- Yiddish (yi)\r\n- Yoruba (yo)\r\n- Yucatec Maya (yua)\r\n- Zulu (zu)\r\n\r\nHowever for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation\r\n\r\n## Number support\r\nCurrently only support numbers 0 - 10. No negatives.\r\n",
"bugtrack_url": null,
"license": "",
"summary": "Converting number formats",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"python",
"nlp",
"numeric",
"sentence",
"language",
"convert",
"regex"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aa5b2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5",
"md5": "bac6b269f9cf3543e7ff67ae4509d992",
"sha256": "712370a059bb3338a4f8f83590f759ba6b2367e8fbee8a68a414273666c61ea5"
},
"downloads": -1,
"filename": "numericnormalizer-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bac6b269f9cf3543e7ff67ae4509d992",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12315,
"upload_time": "2023-05-10T10:04:35",
"upload_time_iso_8601": "2023-05-10T10:04:35.470387Z",
"url": "https://files.pythonhosted.org/packages/aa/5b/2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5/numericnormalizer-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3f29e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63",
"md5": "e20589ec00d1d6962c733ee0e6f05024",
"sha256": "269df4cdd8b15493ac81a3e34f63ea8c8db4d359b598c9cbbe433e2e4a3bd373"
},
"downloads": -1,
"filename": "numericnormalizer-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e20589ec00d1d6962c733ee0e6f05024",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14684,
"upload_time": "2023-05-10T10:04:38",
"upload_time_iso_8601": "2023-05-10T10:04:38.236598Z",
"url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-10 10:04:38",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "numericnormalizer"
}