numericnormalizer


Namenumericnormalizer JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryConverting number formats
upload_time2023-05-10 10:04:38
maintainer
docs_urlNone
authormattcoulter7 (Matt Coulter)
requires_python
license
keywords python nlp numeric sentence language convert regex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# numericnormalizer
 
This is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.

## Installation
`pip install numericnormalizer`

## Usage
### importing the module
```Python
from numericnormalizer import normalizer
```

### Convert a number to a word (i.e. 5 -> 'five')
```Python
normalizer.number_to_word(5, lang='en')
>> "five"


normalizer.number_to_word(5, lang='zh')
>> "五"
```

### Convert a word to a number (i.e. 'five' -> 5)
```Python
normalizer.word_to_number('five', lang='en')
>> 5


normalizer.number_to_word('五', lang='zh')
>> 5
```

### Format numbers in a sentence
#### Example 1: default formatting
```Python
normalizer.format_sentence(
    sentence='What are the 6 principles of intercultural adaption?',
    lang='zh'
)
>> "What are the six (6) principles of intercultural adaption?"
```

#### Example 2: Custom Formatting
```Python
normalizer.format_sentence(
    sentence='I have 4 apples and five oranges.',
    lang='zh',
    formatting='{number} [{word}]',  # custom formatting
)
>> "I have 4 [four] apples and 5 [five] oranges."
```

#### Example 3: Number restricting
```Python
normalizer.format_sentence(
    sentence='I have 4 apples and five oranges.',
    lang='zh',
    max_number=4  # restrict the max_number
)
>> "I have four (4) apples and five oranges."
```

## Language Support
The supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Bulgarian (bg)
- Burmese (my)
- Catalan (ca)
- Central Khmer (km)
- Chinese (zh)
- Chinese Simplified (zh_chs)
- Chinese Traditional (zh_cht)
- Chuvash (cv)
- Corsican (co)
- Croatian (hr)
- Czech (cs)
- Danish (da)
- Dari (prs)
- Divehi (dv)
- Dutch (nl)
- English (en)
- Esperanto (eo)
- Estonian (et)
- Faroese (fo)
- Fijian (fj)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian (ht)
- Hausa (ha)
- Hebrew (he)
- Hindi (hi)
- Hmong Daw (mww)
- Hungarian (hu)
- Icelandic (is)
- Igbo (ig)
- Indonesian (id)
- Inuktitut (iu)
- Irish (ga)
- Italian (it)
- Japanese (ja)
- Javanese (jv)
- Kannada (kn)
- Kazakh (kk)
- Kinyarwanda (rw)
- Kirghiz (ky)
- Korean (ko)
- Kurdish (ku)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lithuanian (lt)
- Luxembourgish (lb)
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Maori (mi)
- Marathi (mr)
- Mongolian (mn)
- Nepali (ne)
- Norwegian (no)
- Norwegian Nynorsk (nn)
- Odia (or)
- Pasht (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Queretaro Otomi (otq)
- Romanian (ro)
- Russian (ru)
- Samoan (sm)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
- Tagalog (tl)
- Tahitian (ty)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Tigrinya (ti)
- Tongan (to)
- Turkish (tr)
- Turkmen (tk)
- Upper Sorbian (hsb)
- Uyghur (ug)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Xhosa (xh)
- Yiddish (yi)
- Yoruba (yo)
- Yucatec Maya (yua)
- Zulu (zu)

However for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation

## Number support
Currently only support numbers 0 - 10. No negatives.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "numericnormalizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,nlp,numeric,sentence,language,convert,regex",
    "author": "mattcoulter7 (Matt Coulter)",
    "author_email": "<mattcoul7@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
    "platform": null,
    "description": "\r\n# numericnormalizer\r\n \r\nThis is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.\r\n\r\n## Installation\r\n`pip install numericnormalizer`\r\n\r\n## Usage\r\n### importing the module\r\n```Python\r\nfrom numericnormalizer import normalizer\r\n```\r\n\r\n### Convert a number to a word (i.e. 5 -> 'five')\r\n```Python\r\nnormalizer.number_to_word(5, lang='en')\r\n>> \"five\"\r\n\r\n\r\nnormalizer.number_to_word(5, lang='zh')\r\n>> \"\u4e94\"\r\n```\r\n\r\n### Convert a word to a number (i.e. 'five' -> 5)\r\n```Python\r\nnormalizer.word_to_number('five', lang='en')\r\n>> 5\r\n\r\n\r\nnormalizer.number_to_word('\u4e94', lang='zh')\r\n>> 5\r\n```\r\n\r\n### Format numbers in a sentence\r\n#### Example 1: default formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='What are the 6 principles of intercultural adaption?',\r\n    lang='zh'\r\n)\r\n>> \"What are the six (6) principles of intercultural adaption?\"\r\n```\r\n\r\n#### Example 2: Custom Formatting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='I have 4 apples and five oranges.',\r\n    lang='zh',\r\n    formatting='{number} [{word}]',  # custom formatting\r\n)\r\n>> \"I have 4 [four] apples and 5 [five] oranges.\"\r\n```\r\n\r\n#### Example 3: Number restricting\r\n```Python\r\nnormalizer.format_sentence(\r\n    sentence='I have 4 apples and five oranges.',\r\n    lang='zh',\r\n    max_number=4  # restrict the max_number\r\n)\r\n>> \"I have four (4) apples and five oranges.\"\r\n```\r\n\r\n## Language Support\r\nThe supported languages are from the [Azure Language Detect List](https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/language-detection/language-support):\r\n- Afrikaans (af)\r\n- Albanian (sq)\r\n- Amharic (am)\r\n- Arabic (ar)\r\n- Armenian (hy)\r\n- Assamese (as)\r\n- Azerbaijani (az)\r\n- Bashkir (ba)\r\n- Basque (eu)\r\n- Belarusian (be)\r\n- Bengali (bn)\r\n- Bosnian (bs)\r\n- Bulgarian (bg)\r\n- Burmese (my)\r\n- Catalan (ca)\r\n- Central Khmer (km)\r\n- Chinese (zh)\r\n- Chinese Simplified (zh_chs)\r\n- Chinese Traditional (zh_cht)\r\n- Chuvash (cv)\r\n- Corsican (co)\r\n- Croatian (hr)\r\n- Czech (cs)\r\n- Danish (da)\r\n- Dari (prs)\r\n- Divehi (dv)\r\n- Dutch (nl)\r\n- English (en)\r\n- Esperanto (eo)\r\n- Estonian (et)\r\n- Faroese (fo)\r\n- Fijian (fj)\r\n- Finnish (fi)\r\n- French (fr)\r\n- Galician (gl)\r\n- Georgian (ka)\r\n- German (de)\r\n- Greek (el)\r\n- Gujarati (gu)\r\n- Haitian (ht)\r\n- Hausa (ha)\r\n- Hebrew (he)\r\n- Hindi (hi)\r\n- Hmong Daw (mww)\r\n- Hungarian (hu)\r\n- Icelandic (is)\r\n- Igbo (ig)\r\n- Indonesian (id)\r\n- Inuktitut (iu)\r\n- Irish (ga)\r\n- Italian (it)\r\n- Japanese (ja)\r\n- Javanese (jv)\r\n- Kannada (kn)\r\n- Kazakh (kk)\r\n- Kinyarwanda (rw)\r\n- Kirghiz (ky)\r\n- Korean (ko)\r\n- Kurdish (ku)\r\n- Lao (lo)\r\n- Latin (la)\r\n- Latvian (lv)\r\n- Lithuanian (lt)\r\n- Luxembourgish (lb)\r\n- Macedonian (mk)\r\n- Malagasy (mg)\r\n- Malay (ms)\r\n- Malayalam (ml)\r\n- Maltese (mt)\r\n- Maori (mi)\r\n- Marathi (mr)\r\n- Mongolian (mn)\r\n- Nepali (ne)\r\n- Norwegian (no)\r\n- Norwegian Nynorsk (nn)\r\n- Odia (or)\r\n- Pasht (ps)\r\n- Persian (fa)\r\n- Polish (pl)\r\n- Portuguese (pt)\r\n- Punjabi (pa)\r\n- Queretaro Otomi (otq)\r\n- Romanian (ro)\r\n- Russian (ru)\r\n- Samoan (sm)\r\n- Serbian (sr)\r\n- Shona (sn)\r\n- Sindhi (sd)\r\n- Sinhala (si)\r\n- Slovak (sk)\r\n- Slovenian (sl)\r\n- Somali (so)\r\n- Spanish (es)\r\n- Sundanese (su)\r\n- Swahili (sw)\r\n- Swedish (sv)\r\n- Tagalog (tl)\r\n- Tahitian (ty)\r\n- Tajik (tg)\r\n- Tamil (ta)\r\n- Tatar (tt)\r\n- Telugu (te)\r\n- Thai (th)\r\n- Tibetan (bo)\r\n- Tigrinya (ti)\r\n- Tongan (to)\r\n- Turkish (tr)\r\n- Turkmen (tk)\r\n- Upper Sorbian (hsb)\r\n- Uyghur (ug)\r\n- Ukrainian (uk)\r\n- Urdu (ur)\r\n- Uzbek (uz)\r\n- Vietnamese (vi)\r\n- Welsh (cy)\r\n- Xhosa (xh)\r\n- Yiddish (yi)\r\n- Yoruba (yo)\r\n- Yucatec Maya (yua)\r\n- Zulu (zu)\r\n\r\nHowever for the `format_sentence` feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation\r\n\r\n## Number support\r\nCurrently only support numbers 0 - 10. No negatives.\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Converting number formats",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        "nlp",
        "numeric",
        "sentence",
        "language",
        "convert",
        "regex"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa5b2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5",
                "md5": "bac6b269f9cf3543e7ff67ae4509d992",
                "sha256": "712370a059bb3338a4f8f83590f759ba6b2367e8fbee8a68a414273666c61ea5"
            },
            "downloads": -1,
            "filename": "numericnormalizer-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bac6b269f9cf3543e7ff67ae4509d992",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12315,
            "upload_time": "2023-05-10T10:04:35",
            "upload_time_iso_8601": "2023-05-10T10:04:35.470387Z",
            "url": "https://files.pythonhosted.org/packages/aa/5b/2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5/numericnormalizer-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f29e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63",
                "md5": "e20589ec00d1d6962c733ee0e6f05024",
                "sha256": "269df4cdd8b15493ac81a3e34f63ea8c8db4d359b598c9cbbe433e2e4a3bd373"
            },
            "downloads": -1,
            "filename": "numericnormalizer-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e20589ec00d1d6962c733ee0e6f05024",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14684,
            "upload_time": "2023-05-10T10:04:38",
            "upload_time_iso_8601": "2023-05-10T10:04:38.236598Z",
            "url": "https://files.pythonhosted.org/packages/3f/29/e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63/numericnormalizer-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-10 10:04:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "numericnormalizer"
}
        
Elapsed time: 0.63416s