numericnormalizer

Name	numericnormalizer JSON
Version	0.1.0 JSON
	download
home_page
Summary	Converting number formats
upload_time	2023-05-10 10:04:38
maintainer
docs_url	None
author	mattcoulter7 (Matt Coulter)
requires_python
license
keywords	python nlp numeric sentence language convert regex
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# numericnormalizer
 
This is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.

## Installation
`pip install numericnormalizer`

## Usage
### importing the module
```Python
from numericnormalizer import normalizer
```

### Convert a number to a word (i.e. 5 -> 'five')
```Python
normalizer.number_to_word(5, lang='en')
>> "five"


normalizer.number_to_word(5, lang='zh')
>> "五"
```

### Convert a word to a number (i.e. 'five' -> 5)
```Python
normalizer.word_to_number('five', lang='en')
>> 5


normalizer.number_to_word('五', lang='zh')
>> 5
```

### Format numbers in a sentence
#### Example 1: default formatting
```Python
normalizer.format_sentence(
    sentence='What are the 6 principles of intercultural adaption?',
    lang='zh'
)
>> "What are the six (6) principles of intercultural adaption?"
```

#### Example 2: Custom Formatting
```Python
normalizer.format_sentence(
    sentence='I have 4 apples and five oranges.',
    lang='zh',
    formatting='{number} [{word}]',  # custom formatting
)
>> "I have 4 [four] apples and 5 [five] oranges."
```

#### Example 3: Number restricting
```Python
normalizer.format_sentence(
    sentence='I have 4 apples and five oranges.',
    lang='zh',
    max_number=4  # restrict the max_number
)
>> "I have four (4) apples and five oranges."
```

## Language Support
The supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Bulgarian (bg)
- Burmese (my)
- Catalan (ca)
- Central Khmer (km)
- Chinese (zh)
- Chinese Simplified (zh_chs)
- Chinese Traditional (zh_cht)
- Chuvash (cv)
- Corsican (co)
- Croatian (hr)
- Czech (cs)
- Danish (da)
- Dari (prs)
- Divehi (dv)
- Dutch (nl)
- English (en)
- Esperanto (eo)
- Estonian (et)
- Faroese (fo)
- Fijian (fj)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian (ht)
- Hausa (ha)
- Hebrew (he)
- Hindi (hi)
- Hmong Daw (mww)
- Hungarian (hu)
- Icelandic (is)
- Igbo (ig)
- Indonesian (id)
- Inuktitut (iu)
- Irish (ga)
- Italian (it)
- Japanese (ja)
- Javanese (jv)
- Kannada (kn)
- Kazakh (kk)
- Kinyarwanda (rw)
- Kirghiz (ky)
- Korean (ko)
- Kurdish (ku)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lithuanian (lt)
- Luxembourgish (lb)
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Maori (mi)
- Marathi (mr)
- Mongolian (mn)
- Nepali (ne)
- Norwegian (no)
- Norwegian Nynorsk (nn)
- Odia (or)
- Pasht (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Queretaro Otomi (otq)
- Romanian (ro)
- Russian (ru)
- Samoan (sm)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
- Tagalog (tl)
- Tahitian (ty)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Tigrinya (ti)
- Tongan (to)
- Turkish (tr)
- Turkmen (tk)
- Upper Sorbian (hsb)
- Uyghur (ug)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Xhosa (xh)
- Yiddish (yi)
- Yoruba (yo)
- Yucatec Maya (yua)
- Zulu (zu)

However for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation

## Number support
Currently only support numbers 0 - 10. No negatives.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "numericnormalizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,nlp,numeric,sentence,language,convert,regex",
    "author": "mattcoulter7 (Matt Coulter)",
    "author_email": "<mattcoul7@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
    "platform": null,
    "description": "\r\n# numericnormalizer\r\n \r\nThis is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.\r\n\r\n## Installation\r\n`pip install numericnormalizer`\r\n\r\n## Usage\r\n### importing the module\r\n```Python\r\nfrom numericnormalizer import normalizer\r\n```\r\n\r\n### Convert a number to a word (i.e. 5 -> 'five')\r\n```Python\r\nnormalizer.number_to_word(5, lang='en')\r\n>> \"five\"\r\n\r\n\r\nnormalizer.number_to_word(5, lang='zh')\r\n>> \"\u4e94\"\r\n```\r\n\r\n### Convert a word to a number (i.e. 'five' -> 5)\r\n```Python\r\nnormalizer.word_to_number('five', lang='en')\r\n>> 5\r\n\r\n\r\nnormalizer.number_to_word('\u4e94', lang='zh')\r\n>> 5\r\n```\r\n\r\n### Format numbers in a sentence\r\n#### Example 1: default formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='What are the 6 principles of intercultural adaption?',\r\n    lang='zh'\r\n)\r\n>> \"What are the six (6) principles of intercultural adaption?\"\r\n```\r\n\r\n#### Example 2: Custom Formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='I have 4 apples and five oranges.',\r\n    lang='zh',\r\n    formatting='{number} [{word}]',  # custom formatting\r\n)\r\n>> \"I have 4 [four] apples and 5 [five] oranges.\"\r\n```\r\n\r\n#### Example 3: Number restricting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='I have 4 apples and five oranges.',\r\n    lang='zh',\r\n    max_number=4  # restrict the max_number\r\n)\r\n>> \"I have four (4) apples and five oranges.\"\r\n```\r\n\r\n## Language Support\r\nThe supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):\r\n- Afrikaans (af)\r\n- Albanian (sq)\r\n- Amharic (am)\r\n- Arabic (ar)\r\n- Armenian (hy)\r\n- Assamese (as)\r\n- Azerbaijani (az)\r\n- Bashkir (ba)\r\n- Basque (eu)\r\n- Belarusian (be)\r\n- Bengali (bn)\r\n- Bosnian (bs)\r\n- Bulgarian (bg)\r\n- Burmese (my)\r\n- Catalan (ca)\r\n- Central Khmer (km)\r\n- Chinese (zh)\r\n- Chinese Simplified (zh_chs)\r\n- Chinese Traditional (zh_cht)\r\n- Chuvash (cv)\r\n- Corsican (co)\r\n- Croatian (hr)\r\n- Czech (cs)\r\n- Danish (da)\r\n- Dari (prs)\r\n- Divehi (dv)\r\n- Dutch (nl)\r\n- English (en)\r\n- Esperanto (eo)\r\n- Estonian (et)\r\n- Faroese (fo)\r\n- Fijian (fj)\r\n- Finnish (fi)\r\n- French (fr)\r\n- Galician (gl)\r\n- Georgian (ka)\r\n- German (de)\r\n- Greek (el)\r\n- Gujarati (gu)\r\n- Haitian (ht)\r\n- Hausa (ha)\r\n- Hebrew (he)\r\n- Hindi (hi)\r\n- Hmong Daw (mww)\r\n- Hungarian (hu)\r\n- Icelandic (is)\r\n- Igbo (ig)\r\n- Indonesian (id)\r\n- Inuktitut (iu)\r\n- Irish (ga)\r\n- Italian (it)\r\n- Japanese (ja)\r\n- Javanese (jv)\r\n- Kannada (kn)\r\n- Kazakh (kk)\r\n- Kinyarwanda (rw)\r\n- Kirghiz (ky)\r\n- Korean (ko)\r\n- Kurdish (ku)\r\n- Lao (lo)\r\n- Latin (la)\r\n- Latvian (lv)\r\n- Lithuanian (lt)\r\n- Luxembourgish (lb)\r\n- Macedonian (mk)\r\n- Malagasy (mg)\r\n- Malay (ms)\r\n- Malayalam (ml)\r\n- Maltese (mt)\r\n- Maori (mi)\r\n- Marathi (mr)\r\n- Mongolian (mn)\r\n- Nepali (ne)\r\n- Norwegian (no)\r\n- Norwegian Nynorsk (nn)\r\n- Odia (or)\r\n- Pasht (ps)\r\n- Persian (fa)\r\n- Polish (pl)\r\n- Portuguese (pt)\r\n- Punjabi (pa)\r\n- Queretaro Otomi (otq)\r\n- Romanian (ro)\r\n- Russian (ru)\r\n- Samoan (sm)\r\n- Serbian (sr)\r\n- Shona (sn)\r\n- Sindhi (sd)\r\n- Sinhala (si)\r\n- Slovak (sk)\r\n- Slovenian (sl)\r\n- Somali (so)\r\n- Spanish (es)\r\n- Sundanese (su)\r\n- Swahili (sw)\r\n- Swedish (sv)\r\n- Tagalog (tl)\r\n- Tahitian (ty)\r\n- Tajik (tg)\r\n- Tamil (ta)\r\n- Tatar (tt)\r\n- Telugu (te)\r\n- Thai (th)\r\n- Tibetan (bo)\r\n- Tigrinya (ti)\r\n- Tongan (to)\r\n- Turkish (tr)\r\n- Turkmen (tk)\r\n- Upper Sorbian (hsb)\r\n- Uyghur (ug)\r\n- Ukrainian (uk)\r\n- Urdu (ur)\r\n- Uzbek (uz)\r\n- Vietnamese (vi)\r\n- Welsh (cy)\r\n- Xhosa (xh)\r\n- Yiddish (yi)\r\n- Yoruba (yo)\r\n- Yucatec Maya (yua)\r\n- Zulu (zu)\r\n\r\nHowever for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation\r\n\r\n## Number support\r\nCurrently only support numbers 0 - 10. No negatives.\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Converting number formats",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        "nlp",
        "numeric",
        "sentence",
        "language",
        "convert",
        "regex"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa5b2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5",
                "md5": "bac6b269f9cf3543e7ff67ae4509d992",
                "sha256": "712370a059bb3338a4f8f83590f759ba6b2367e8fbee8a68a414273666c61ea5"
            },
            "downloads": -1,
            "filename": "numericnormalizer-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bac6b269f9cf3543e7ff67ae4509d992",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12315,
            "upload_time": "2023-05-10T10:04:35",
            "upload_time_iso_8601": "2023-05-10T10:04:35.470387Z",
            "url": "https://files.pythonhosted.org/packages/aa/5b/2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5/numericnormalizer-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f29e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63",
                "md5": "e20589ec00d1d6962c733ee0e6f05024",
                "sha256": "269df4cdd8b15493ac81a3e34f63ea8c8db4d359b598c9cbbe433e2e4a3bd373"
            },
            "downloads": -1,
            "filename": "numericnormalizer-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e20589ec00d1d6962c733ee0e6f05024",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14684,
            "upload_time": "2023-05-10T10:04:38",
            "upload_time_iso_8601": "2023-05-10T10:04:38.236598Z",
            "url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-10 10:04:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "numericnormalizer"
}

mattcoulter7 (Matt Coulter)