# zh-normalization
Chinese sentence NSW(Non-Standard-Word) Normalization
## Supported NSW (Non-Standard-Word) Normalization
|NSW type|raw|normalized|
|:--|:-|:-|
|serial number|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九|
|cardinal|这块黄金重达324.75克<br>我们班的最高总分为583分|这块黄金重达三百二十四点七五克<br>我们班的最高总分为五百八十三分|
|numeric range |12\~23<br>-1.5\~2|十二到二十三<br>负一点五到二|
|date|她出生于86年8月18日,她弟弟出生于1995年3月1日|她出生于八六年八月十八日, 她弟弟出生于一九九五年三月一日|
|time|等会请在12:05请通知我|等会请在十二点零五分请通知我
|temperature|今天的最低气温达到-10°C|今天的最低气温达到零下十度
|fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票|
|percentage|明天有62%的概率降雨|明天有百分之六十二的概率降雨|
|money|随便来几个价格12块5,34.5元,20.1万|随便来几个价格十二块五,三十四点五元,二十点一万|
|telephone|这是固话0421-33441122<br>这是手机+86 18544139121|这是固话零四二一三三四四一一二二<br>这是手机八六一八五四四一三九一二一|
## Usage
```shell
pip install zh-normalization
```
Run the following code to normalize the Chinese sentence:
```python
from zh_normalization import TextNormalizer
m = TextNormalizer()
text = "电影中梁朝伟扮演的陈永仁的编号27149!"
sents = m.normalize(text)
new_text = ''.join(sents)
print(new_text)
```
Output:
```shell
电影中梁朝伟扮演的陈永仁的编号二七幺四九!
```
## References
[Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)
Raw data
{
"_id": null,
"home_page": "https://github.com/shibing624/zh-normalization",
"name": "zh-normalization",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "TTS,ASR,text to speech,speech",
"author": "XuMing",
"author_email": "xuming624@qq.com",
"download_url": "https://files.pythonhosted.org/packages/60/a8/56709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803/zh-normalization-0.0.1.tar.gz",
"platform": "Windows",
"description": "# zh-normalization\nChinese sentence NSW(Non-Standard-Word) Normalization\n\n## Supported NSW (Non-Standard-Word) Normalization\n\n|NSW type|raw|normalized|\n|:--|:-|:-|\n|serial number|\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f727149|\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f7\u4e8c\u4e03\u4e00\u56db\u4e5d|\n|cardinal|\u8fd9\u5757\u9ec4\u91d1\u91cd\u8fbe324.75\u514b<br>\u6211\u4eec\u73ed\u7684\u6700\u9ad8\u603b\u5206\u4e3a583\u5206|\u8fd9\u5757\u9ec4\u91d1\u91cd\u8fbe\u4e09\u767e\u4e8c\u5341\u56db\u70b9\u4e03\u4e94\u514b<br>\u6211\u4eec\u73ed\u7684\u6700\u9ad8\u603b\u5206\u4e3a\u4e94\u767e\u516b\u5341\u4e09\u5206|\n|numeric range |12\\~23<br>-1.5\\~2|\u5341\u4e8c\u5230\u4e8c\u5341\u4e09<br>\u8d1f\u4e00\u70b9\u4e94\u5230\u4e8c|\n|date|\u5979\u51fa\u751f\u4e8e86\u5e748\u670818\u65e5\uff0c\u5979\u5f1f\u5f1f\u51fa\u751f\u4e8e1995\u5e743\u67081\u65e5|\u5979\u51fa\u751f\u4e8e\u516b\u516d\u5e74\u516b\u6708\u5341\u516b\u65e5\uff0c \u5979\u5f1f\u5f1f\u51fa\u751f\u4e8e\u4e00\u4e5d\u4e5d\u4e94\u5e74\u4e09\u6708\u4e00\u65e5|\n|time|\u7b49\u4f1a\u8bf7\u572812:05\u8bf7\u901a\u77e5\u6211|\u7b49\u4f1a\u8bf7\u5728\u5341\u4e8c\u70b9\u96f6\u4e94\u5206\u8bf7\u901a\u77e5\u6211\n|temperature|\u4eca\u5929\u7684\u6700\u4f4e\u6c14\u6e29\u8fbe\u5230-10\u00b0C|\u4eca\u5929\u7684\u6700\u4f4e\u6c14\u6e29\u8fbe\u5230\u96f6\u4e0b\u5341\u5ea6\n|fraction|\u73b0\u573a\u67097/12\u7684\u89c2\u4f17\u6295\u51fa\u4e86\u8d5e\u6210\u7968|\u73b0\u573a\u6709\u5341\u4e8c\u5206\u4e4b\u4e03\u7684\u89c2\u4f17\u6295\u51fa\u4e86\u8d5e\u6210\u7968|\n|percentage|\u660e\u5929\u670962\uff05\u7684\u6982\u7387\u964d\u96e8|\u660e\u5929\u6709\u767e\u5206\u4e4b\u516d\u5341\u4e8c\u7684\u6982\u7387\u964d\u96e8|\n|money|\u968f\u4fbf\u6765\u51e0\u4e2a\u4ef7\u683c12\u57575\uff0c34.5\u5143\uff0c20.1\u4e07|\u968f\u4fbf\u6765\u51e0\u4e2a\u4ef7\u683c\u5341\u4e8c\u5757\u4e94\uff0c\u4e09\u5341\u56db\u70b9\u4e94\u5143\uff0c\u4e8c\u5341\u70b9\u4e00\u4e07|\n|telephone|\u8fd9\u662f\u56fa\u8bdd0421-33441122<br>\u8fd9\u662f\u624b\u673a+86 18544139121|\u8fd9\u662f\u56fa\u8bdd\u96f6\u56db\u4e8c\u4e00\u4e09\u4e09\u56db\u56db\u4e00\u4e00\u4e8c\u4e8c<br>\u8fd9\u662f\u624b\u673a\u516b\u516d\u4e00\u516b\u4e94\u56db\u56db\u4e00\u4e09\u4e5d\u4e00\u4e8c\u4e00|\n\n## Usage\n```shell\npip install zh-normalization\n```\n\nRun the following code to normalize the Chinese sentence:\n```python\nfrom zh_normalization import TextNormalizer\n\nm = TextNormalizer()\ntext = \"\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f727149!\"\nsents = m.normalize(text)\nnew_text = ''.join(sents)\nprint(new_text)\n```\n\nOutput:\n```shell\n\u7535\u5f71\u4e2d\u6881\u671d\u4f1f\u626e\u6f14\u7684\u9648\u6c38\u4ec1\u7684\u7f16\u53f7\u4e8c\u4e03\u5e7a\u56db\u4e5d!\n```\n## References\n[Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files)\n\n\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Chinese Text Normalization(for speech recognition and text to speech)",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/shibing624/zh-normalization"
},
"split_keywords": [
"tts",
"asr",
"text to speech",
"speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "60a856709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803",
"md5": "dc6dc852a1fed606de6286dad2144c9f",
"sha256": "c7873f9d3259a1975ea56a3bfb8a72edd5b0f37cbc47aa8c12d462763864c6ee"
},
"downloads": -1,
"filename": "zh-normalization-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "dc6dc852a1fed606de6286dad2144c9f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 50841,
"upload_time": "2024-02-05T11:33:35",
"upload_time_iso_8601": "2024-02-05T11:33:35.602552Z",
"url": "https://files.pythonhosted.org/packages/60/a8/56709db37cdcca2431dcb21403873fb3fc79eb8a69ef86f0e80ee690b803/zh-normalization-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-05 11:33:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shibing624",
"github_project": "zh-normalization",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "zh-normalization"
}