zh-normalization


Namezh-normalization JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/shibing624/zh-normalization
SummaryChinese Text Normalization(for speech recognition and text to speech)
upload_time2024-02-05 11:33:35
maintainer
docs_urlNone
authorXuMing
requires_python
licenseApache 2.0
keywords tts asr text to speech speech
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # zh-normalization
Chinese sentence NSW(Non-Standard-Word) Normalization

## Supported NSW (Non-Standard-Word) Normalization

|NSW type|raw|normalized|
|:--|:-|:-|
|serial number|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九|
|cardinal|这块黄金重达324.75克<br>我们班的最高总分为583分|这块黄金重达三百二十四点七五克<br>我们班的最高总分为五百八十三分|
|numeric range |12\~23<br>-1.5\~2|十二到二十三<br>负一点五到二|
|date|她出生于86年8月18日,她弟弟出生于1995年3月1日|她出生于八六年八月十八日, 她弟弟出生于一九九五年三月一日|
|time|等会请在12:05请通知我|等会请在十二点零五分请通知我
|temperature|今天的最低气温达到-10°C|今天的最低气温达到零下十度
|fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票|
|percentage|明天有62%的概率降雨|明天有百分之六十二的概率降雨|
|money|随便来几个价格12块5,34.5元,20.1万|随便来几个价格十二块五,三十四点五元,二十点一万|
|telephone|这是固话0421-33441122<br>这是手机+86 18544139121|这是固话零四二一三三四四一一二二<br>这是手机八六一八五四四一三九一二一|

## Usage
```shell
pip install zh-normalization
```

Run the following code to normalize the Chinese sentence:
```python
from zh_normalization import TextNormalizer

m = TextNormalizer()
text = "电影中梁朝伟扮演的陈永仁的编号27149!"
sents = m.normalize(text)
new_text = ''.join(sents)
print(new_text)
```

Output:
```shell
电影中梁朝伟扮演的陈永仁的编号二七幺四九!
```
## References
[Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shibing624/zh-normalization",
    "name": "zh-normalization",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "TTS,ASR,text to speech,speech",
    "author": "XuMing",
    "author_email": "xuming624@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/60/a8/56709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803/zh-normalization-0.0.1.tar.gz",
    "platform": "Windows",
    "description": "# zh-normalization\nChinese sentence NSW(Non-Standard-Word) Normalization\n\n## Supported NSW (Non-Standard-Word) Normalization\n\n|NSW type|raw|normalized|\n|:--|:-|:-|\n|serial number|\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f727149|\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f7\u4e8c\u4e03\u4e00\u56db\u4e5d|\n|cardinal|\u8fd9\u5757\u9ec4\u91d1\u91cd\u8fbe324.75\u514b<br>\u6211\u4eec\u73ed\u7684\u6700\u9ad8\u603b\u5206\u4e3a583\u5206|\u8fd9\u5757\u9ec4\u91d1\u91cd\u8fbe\u4e09\u767e\u4e8c\u5341\u56db\u70b9\u4e03\u4e94\u514b<br>\u6211\u4eec\u73ed\u7684\u6700\u9ad8\u603b\u5206\u4e3a\u4e94\u767e\u516b\u5341\u4e09\u5206|\n|numeric range |12\\~23<br>-1.5\\~2|\u5341\u4e8c\u5230\u4e8c\u5341\u4e09<br>\u8d1f\u4e00\u70b9\u4e94\u5230\u4e8c|\n|date|\u5979\u51fa\u751f\u4e8e86\u5e748\u670818\u65e5\uff0c\u5979\u5f1f\u5f1f\u51fa\u751f\u4e8e1995\u5e743\u67081\u65e5|\u5979\u51fa\u751f\u4e8e\u516b\u516d\u5e74\u516b\u6708\u5341\u516b\u65e5\uff0c \u5979\u5f1f\u5f1f\u51fa\u751f\u4e8e\u4e00\u4e5d\u4e5d\u4e94\u5e74\u4e09\u6708\u4e00\u65e5|\n|time|\u7b49\u4f1a\u8bf7\u572812:05\u8bf7\u901a\u77e5\u6211|\u7b49\u4f1a\u8bf7\u5728\u5341\u4e8c\u70b9\u96f6\u4e94\u5206\u8bf7\u901a\u77e5\u6211\n|temperature|\u4eca\u5929\u7684\u6700\u4f4e\u6c14\u6e29\u8fbe\u5230-10\u00b0C|\u4eca\u5929\u7684\u6700\u4f4e\u6c14\u6e29\u8fbe\u5230\u96f6\u4e0b\u5341\u5ea6\n|fraction|\u73b0\u573a\u67097/12\u7684\u89c2\u4f17\u6295\u51fa\u4e86\u8d5e\u6210\u7968|\u73b0\u573a\u6709\u5341\u4e8c\u5206\u4e4b\u4e03\u7684\u89c2\u4f17\u6295\u51fa\u4e86\u8d5e\u6210\u7968|\n|percentage|\u660e\u5929\u670962\uff05\u7684\u6982\u7387\u964d\u96e8|\u660e\u5929\u6709\u767e\u5206\u4e4b\u516d\u5341\u4e8c\u7684\u6982\u7387\u964d\u96e8|\n|money|\u968f\u4fbf\u6765\u51e0\u4e2a\u4ef7\u683c12\u57575\uff0c34.5\u5143\uff0c20.1\u4e07|\u968f\u4fbf\u6765\u51e0\u4e2a\u4ef7\u683c\u5341\u4e8c\u5757\u4e94\uff0c\u4e09\u5341\u56db\u70b9\u4e94\u5143\uff0c\u4e8c\u5341\u70b9\u4e00\u4e07|\n|telephone|\u8fd9\u662f\u56fa\u8bdd0421-33441122<br>\u8fd9\u662f\u624b\u673a+86 18544139121|\u8fd9\u662f\u56fa\u8bdd\u96f6\u56db\u4e8c\u4e00\u4e09\u4e09\u56db\u56db\u4e00\u4e00\u4e8c\u4e8c<br>\u8fd9\u662f\u624b\u673a\u516b\u516d\u4e00\u516b\u4e94\u56db\u56db\u4e00\u4e09\u4e5d\u4e00\u4e8c\u4e00|\n\n## Usage\n```shell\npip install zh-normalization\n```\n\nRun the following code to normalize the Chinese sentence:\n```python\nfrom zh_normalization import TextNormalizer\n\nm = TextNormalizer()\ntext = \"\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f727149!\"\nsents = m.normalize(text)\nnew_text = ''.join(sents)\nprint(new_text)\n```\n\nOutput:\n```shell\n\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f7\u4e8c\u4e03\u5e7a\u56db\u4e5d!\n```\n## References\n[Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)\n\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Chinese Text Normalization(for speech recognition and text to speech)",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/shibing624/zh-normalization"
    },
    "split_keywords": [
        "tts",
        "asr",
        "text to speech",
        "speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "60a856709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803",
                "md5": "dc6dc852a1fed606de6286dad2144c9f",
                "sha256": "c7873f9d3259a1975ea56a3bfb8a72edd5b0f37cbc47aa8c12d462763864c6ee"
            },
            "downloads": -1,
            "filename": "zh-normalization-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "dc6dc852a1fed606de6286dad2144c9f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 50841,
            "upload_time": "2024-02-05T11:33:35",
            "upload_time_iso_8601": "2024-02-05T11:33:35.602552Z",
            "url": "https://files.pythonhosted.org/packages/60/a8/56709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803/zh-normalization-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-05 11:33:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shibing624",
    "github_project": "zh-normalization",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "zh-normalization"
}
        
Elapsed time: 0.80277s