# SinoPhone (中华音码)
[](https://badge.fury.io/py/sinophone-zh)
[](https://pypi.org/project/sinophone-zh/)
[](https://opensource.org/licenses/MIT)
SinoPhone(中华音码)是一个用于将中文拼音转换为语音模糊哈希编码的Python库。主要用于处理中文语音识别中的同音字和方言混淆问题。
## 特性
- 🎯 **语音模糊匹配**:支持常见的方言混淆,如 l/n 不分、f/h 不分等
- 🚀 **高效编码**:将拼音音节转换为简短的哈希编码
- 🔧 **易于使用**:简单的API,支持中文文本和拼音文本输入
- 📦 **轻量级**:只依赖 pypinyin 库
- 🌏 **中文友好**:专为中文语音处理设计
## 安装
```bash
pip install sinophone-zh
```
## 快速开始
### 基本用法
```python
from sinophone import chinese_to_sinophone, sinophone
# 中文文本转SinoPhone编码
result = chinese_to_sinophone("中国")
print(result) # 输出: "ZG UG"
# 拼音转SinoPhone编码
result = sinophone("zhong guo")
print(result) # 输出: "ZG UG"
# 不使用空格连接
result = chinese_to_sinophone("中国", join_with_space=False)
print(result) # 输出: "ZGUG"
```
### 语音模糊匹配示例
SinoPhone 能够处理常见的方言混淆:
```python
# l/n 不分
print(chinese_to_sinophone("南")) # "NN"
print(chinese_to_sinophone("兰")) # "NN" (相同编码)
# f/h 不分
print(chinese_to_sinophone("发")) # "HI"
print(chinese_to_sinophone("花")) # "HI" (相同编码)
# o/e 混淆
print(sinophone("bo")) # "BE"
print(sinophone("be")) # "BE" (相同编码)
```
## API 文档
### `chinese_to_sinophone(chinese_text, join_with_space=True)`
将中文文本转换为 SinoPhone 编码。
**参数:**
- `chinese_text` (str): 中文字符串
- `join_with_space` (bool, 可选): 是否用空格连接音节编码,默认为 True
**返回:**
- str: SinoPhone 编码字符串
### `sinophone(pinyin_text)`
将拼音文本转换为 SinoPhone 编码。
**参数:**
- `pinyin_text` (str): 拼音字符串,音节之间用空格分隔
**返回:**
- str: SinoPhone 编码字符串
## 编码规则
### 声母映射
- 标准声母:b→B, p→P, m→M, f→H, d→D, t→T, n→N, l→N 等
- 混淆规则:l/n → N, f/h → H
- 零声母:y/w/yu → _
### 韵母映射
- 标准韵母:a/ia/ua→A, o/uo→E, e→E, ai/uai→I 等
- 混淆规则:o/e → E
- 鼻音韵母:an/en系列→N, ang/eng系列→G
### 特殊音节
- zhei → ZE("这"的口语变体)
- shei → SV("谁"的变体)
- ng → _E("嗯")
- m → _U("呣")
## 应用场景
- 🎤 **语音识别**:处理同音字混淆
- 🔍 **模糊搜索**:中文文本的语音相似度匹配
- 📝 **输入法**:拼音输入的容错处理
- 🗣️ **方言处理**:标准普通话与方言的映射
- 🤖 **NLP应用**:中文文本的语音特征提取
## 开发
### 安装开发依赖
```bash
pip install -e .[dev]
```
### 运行测试
```bash
pytest
```
### 代码格式化
```bash
black .
```
## 贡献
欢迎提交 Issue 和 Pull Request!
## 许可证
MIT License - 详见 [LICENSE](LICENSE) 文件。
## 更新日志
### v0.0.1
- 初始版本发布
- 支持中文文本和拼音文本转SinoPhone编码
- 实现语音模糊匹配规则
- 支持特殊音节处理
Raw data
{
"_id": null,
"home_page": "https://github.com/Johnless31/SinoPhone",
"name": "sinophone-zh",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "johnless <346656208@qq.com>",
"keywords": "chinese, pinyin, phonetic, hash, encoding, sinophone, \u4e2d\u6587, \u62fc\u97f3, \u8bed\u97f3\u7f16\u7801",
"author": "johnless",
"author_email": "johnless <346656208@qq.com>",
"download_url": "https://files.pythonhosted.org/packages/24/d3/cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e/sinophone_zh-0.0.2.tar.gz",
"platform": null,
"description": "# SinoPhone (\u4e2d\u534e\u97f3\u7801)\n\n[](https://badge.fury.io/py/sinophone-zh)\n[](https://pypi.org/project/sinophone-zh/)\n[](https://opensource.org/licenses/MIT)\n\nSinoPhone\uff08\u4e2d\u534e\u97f3\u7801\uff09\u662f\u4e00\u4e2a\u7528\u4e8e\u5c06\u4e2d\u6587\u62fc\u97f3\u8f6c\u6362\u4e3a\u8bed\u97f3\u6a21\u7cca\u54c8\u5e0c\u7f16\u7801\u7684Python\u5e93\u3002\u4e3b\u8981\u7528\u4e8e\u5904\u7406\u4e2d\u6587\u8bed\u97f3\u8bc6\u522b\u4e2d\u7684\u540c\u97f3\u5b57\u548c\u65b9\u8a00\u6df7\u6dc6\u95ee\u9898\u3002\n\n## \u7279\u6027\n\n- \ud83c\udfaf **\u8bed\u97f3\u6a21\u7cca\u5339\u914d**\uff1a\u652f\u6301\u5e38\u89c1\u7684\u65b9\u8a00\u6df7\u6dc6\uff0c\u5982 l/n \u4e0d\u5206\u3001f/h \u4e0d\u5206\u7b49\n- \ud83d\ude80 **\u9ad8\u6548\u7f16\u7801**\uff1a\u5c06\u62fc\u97f3\u97f3\u8282\u8f6c\u6362\u4e3a\u7b80\u77ed\u7684\u54c8\u5e0c\u7f16\u7801\n- \ud83d\udd27 **\u6613\u4e8e\u4f7f\u7528**\uff1a\u7b80\u5355\u7684API\uff0c\u652f\u6301\u4e2d\u6587\u6587\u672c\u548c\u62fc\u97f3\u6587\u672c\u8f93\u5165\n- \ud83d\udce6 **\u8f7b\u91cf\u7ea7**\uff1a\u53ea\u4f9d\u8d56 pypinyin \u5e93\n- \ud83c\udf0f **\u4e2d\u6587\u53cb\u597d**\uff1a\u4e13\u4e3a\u4e2d\u6587\u8bed\u97f3\u5904\u7406\u8bbe\u8ba1\n\n## \u5b89\u88c5\n\n```bash\npip install sinophone-zh\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n### \u57fa\u672c\u7528\u6cd5\n\n```python\nfrom sinophone import chinese_to_sinophone, sinophone\n\n# \u4e2d\u6587\u6587\u672c\u8f6cSinoPhone\u7f16\u7801\nresult = chinese_to_sinophone(\"\u4e2d\u56fd\")\nprint(result) # \u8f93\u51fa: \"ZG UG\"\n\n# \u62fc\u97f3\u8f6cSinoPhone\u7f16\u7801\nresult = sinophone(\"zhong guo\")\nprint(result) # \u8f93\u51fa: \"ZG UG\"\n\n# \u4e0d\u4f7f\u7528\u7a7a\u683c\u8fde\u63a5\nresult = chinese_to_sinophone(\"\u4e2d\u56fd\", join_with_space=False)\nprint(result) # \u8f93\u51fa: \"ZGUG\"\n```\n\n### \u8bed\u97f3\u6a21\u7cca\u5339\u914d\u793a\u4f8b\n\nSinoPhone \u80fd\u591f\u5904\u7406\u5e38\u89c1\u7684\u65b9\u8a00\u6df7\u6dc6\uff1a\n\n```python\n# l/n \u4e0d\u5206\nprint(chinese_to_sinophone(\"\u5357\")) # \"NN\"\nprint(chinese_to_sinophone(\"\u5170\")) # \"NN\" (\u76f8\u540c\u7f16\u7801)\n\n# f/h \u4e0d\u5206\nprint(chinese_to_sinophone(\"\u53d1\")) # \"HI\"\nprint(chinese_to_sinophone(\"\u82b1\")) # \"HI\" (\u76f8\u540c\u7f16\u7801)\n\n# o/e \u6df7\u6dc6\nprint(sinophone(\"bo\")) # \"BE\"\nprint(sinophone(\"be\")) # \"BE\" (\u76f8\u540c\u7f16\u7801)\n```\n\n## API \u6587\u6863\n\n### `chinese_to_sinophone(chinese_text, join_with_space=True)`\n\n\u5c06\u4e2d\u6587\u6587\u672c\u8f6c\u6362\u4e3a SinoPhone \u7f16\u7801\u3002\n\n**\u53c2\u6570\uff1a**\n- `chinese_text` (str): \u4e2d\u6587\u5b57\u7b26\u4e32\n- `join_with_space` (bool, \u53ef\u9009): \u662f\u5426\u7528\u7a7a\u683c\u8fde\u63a5\u97f3\u8282\u7f16\u7801\uff0c\u9ed8\u8ba4\u4e3a True\n\n**\u8fd4\u56de\uff1a**\n- str: SinoPhone \u7f16\u7801\u5b57\u7b26\u4e32\n\n### `sinophone(pinyin_text)`\n\n\u5c06\u62fc\u97f3\u6587\u672c\u8f6c\u6362\u4e3a SinoPhone \u7f16\u7801\u3002\n\n**\u53c2\u6570\uff1a**\n- `pinyin_text` (str): \u62fc\u97f3\u5b57\u7b26\u4e32\uff0c\u97f3\u8282\u4e4b\u95f4\u7528\u7a7a\u683c\u5206\u9694\n\n**\u8fd4\u56de\uff1a**\n- str: SinoPhone \u7f16\u7801\u5b57\u7b26\u4e32\n\n## \u7f16\u7801\u89c4\u5219\n\n### \u58f0\u6bcd\u6620\u5c04\n- \u6807\u51c6\u58f0\u6bcd\uff1ab\u2192B, p\u2192P, m\u2192M, f\u2192H, d\u2192D, t\u2192T, n\u2192N, l\u2192N \u7b49\n- \u6df7\u6dc6\u89c4\u5219\uff1al/n \u2192 N, f/h \u2192 H\n- \u96f6\u58f0\u6bcd\uff1ay/w/yu \u2192 _\n\n### \u97f5\u6bcd\u6620\u5c04\n- \u6807\u51c6\u97f5\u6bcd\uff1aa/ia/ua\u2192A, o/uo\u2192E, e\u2192E, ai/uai\u2192I \u7b49\n- \u6df7\u6dc6\u89c4\u5219\uff1ao/e \u2192 E\n- \u9f3b\u97f3\u97f5\u6bcd\uff1aan/en\u7cfb\u5217\u2192N, ang/eng\u7cfb\u5217\u2192G\n\n### \u7279\u6b8a\u97f3\u8282\n- zhei \u2192 ZE\uff08\"\u8fd9\"\u7684\u53e3\u8bed\u53d8\u4f53\uff09\n- shei \u2192 SV\uff08\"\u8c01\"\u7684\u53d8\u4f53\uff09\n- ng \u2192 _E\uff08\"\u55ef\"\uff09\n- m \u2192 _U\uff08\"\u5463\"\uff09\n\n## \u5e94\u7528\u573a\u666f\n\n- \ud83c\udfa4 **\u8bed\u97f3\u8bc6\u522b**\uff1a\u5904\u7406\u540c\u97f3\u5b57\u6df7\u6dc6\n- \ud83d\udd0d **\u6a21\u7cca\u641c\u7d22**\uff1a\u4e2d\u6587\u6587\u672c\u7684\u8bed\u97f3\u76f8\u4f3c\u5ea6\u5339\u914d\n- \ud83d\udcdd **\u8f93\u5165\u6cd5**\uff1a\u62fc\u97f3\u8f93\u5165\u7684\u5bb9\u9519\u5904\u7406\n- \ud83d\udde3\ufe0f **\u65b9\u8a00\u5904\u7406**\uff1a\u6807\u51c6\u666e\u901a\u8bdd\u4e0e\u65b9\u8a00\u7684\u6620\u5c04\n- \ud83e\udd16 **NLP\u5e94\u7528**\uff1a\u4e2d\u6587\u6587\u672c\u7684\u8bed\u97f3\u7279\u5f81\u63d0\u53d6\n\n## \u5f00\u53d1\n\n### \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\n\n```bash\npip install -e .[dev]\n```\n\n### \u8fd0\u884c\u6d4b\u8bd5\n\n```bash\npytest\n```\n\n### \u4ee3\u7801\u683c\u5f0f\u5316\n\n```bash\nblack .\n```\n\n## \u8d21\u732e\n\n\u6b22\u8fce\u63d0\u4ea4 Issue \u548c Pull Request\uff01\n\n## \u8bb8\u53ef\u8bc1\n\nMIT License - \u8be6\u89c1 [LICENSE](LICENSE) \u6587\u4ef6\u3002\n\n## \u66f4\u65b0\u65e5\u5fd7\n\n### v0.0.1\n- \u521d\u59cb\u7248\u672c\u53d1\u5e03\n- \u652f\u6301\u4e2d\u6587\u6587\u672c\u548c\u62fc\u97f3\u6587\u672c\u8f6cSinoPhone\u7f16\u7801\n- \u5b9e\u73b0\u8bed\u97f3\u6a21\u7cca\u5339\u914d\u89c4\u5219\n- \u652f\u6301\u7279\u6b8a\u97f3\u8282\u5904\u7406\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "\u4e2d\u534e\u97f3\u7801\uff08SinoPhone\uff09- \u4e2d\u6587\u62fc\u97f3\u8bed\u97f3\u6a21\u7cca\u54c8\u5e0c\u7f16\u7801\u7b97\u6cd5",
"version": "0.0.2",
"project_urls": {
"Documentation": "https://github.com/Johnless31/SinoPhone#readme",
"Homepage": "https://github.com/Johnless31/SinoPhone",
"Issues": "https://github.com/Johnless31/SinoPhone/issues",
"Repository": "https://github.com/Johnless31/SinoPhone"
},
"split_keywords": [
"chinese",
" pinyin",
" phonetic",
" hash",
" encoding",
" sinophone",
" \u4e2d\u6587",
" \u62fc\u97f3",
" \u8bed\u97f3\u7f16\u7801"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4e8e00c2b5acf709a7f75a7efe5b6ad8ee7d99c0fc58f938580ad9ba04b1916f",
"md5": "2bbf68b4dcd7f084e7c541b5f0aaeb5f",
"sha256": "50dc7ad9ea55caea2104ca75703ed8abe8e81d3e2d3dd2ba642d4498b3ee71cd"
},
"downloads": -1,
"filename": "sinophone_zh-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2bbf68b4dcd7f084e7c541b5f0aaeb5f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 7110,
"upload_time": "2025-09-04T06:32:35",
"upload_time_iso_8601": "2025-09-04T06:32:35.969562Z",
"url": "https://files.pythonhosted.org/packages/4e/8e/00c2b5acf709a7f75a7efe5b6ad8ee7d99c0fc58f938580ad9ba04b1916f/sinophone_zh-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "24d3cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e",
"md5": "5eac74bc5bd107aeff969b1535854c7d",
"sha256": "d983a600ad1c102672d36de2128783a08f91bbb27559f59f356e26aa3e6e255c"
},
"downloads": -1,
"filename": "sinophone_zh-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "5eac74bc5bd107aeff969b1535854c7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 23569,
"upload_time": "2025-09-04T06:32:38",
"upload_time_iso_8601": "2025-09-04T06:32:38.648131Z",
"url": "https://files.pythonhosted.org/packages/24/d3/cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e/sinophone_zh-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-04 06:32:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Johnless31",
"github_project": "SinoPhone",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pypinyin",
"specs": [
[
">=",
"0.44.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"2.10.0"
]
]
},
{
"name": "pytest-timeout",
"specs": [
[
">=",
"1.4.0"
]
]
},
{
"name": "pytest-xdist",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "psutil",
"specs": [
[
">=",
"5.8.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"5.0.0"
]
]
}
],
"lcname": "sinophone-zh"
}