sinophone-zh


Namesinophone-zh JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/Johnless31/SinoPhone
Summary中华音码(SinoPhone)- 中文拼音语音模糊哈希编码算法
upload_time2025-09-04 06:32:38
maintainerNone
docs_urlNone
authorjohnless
requires_python>=3.6
licenseMIT
keywords chinese pinyin phonetic hash encoding sinophone 中文 拼音 语音编码
VCS
bugtrack_url
requirements pypinyin pytest pytest-cov pytest-timeout pytest-xdist psutil flake8
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SinoPhone (中华音码)

[![PyPI version](https://badge.fury.io/py/sinophone-zh.svg)](https://badge.fury.io/py/sinophone-zh)
[![Python](https://img.shields.io/pypi/pyversions/sinophone-zh.svg)](https://pypi.org/project/sinophone-zh/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

SinoPhone(中华音码)是一个用于将中文拼音转换为语音模糊哈希编码的Python库。主要用于处理中文语音识别中的同音字和方言混淆问题。

## 特性

- 🎯 **语音模糊匹配**:支持常见的方言混淆,如 l/n 不分、f/h 不分等
- 🚀 **高效编码**:将拼音音节转换为简短的哈希编码
- 🔧 **易于使用**:简单的API,支持中文文本和拼音文本输入
- 📦 **轻量级**:只依赖 pypinyin 库
- 🌏 **中文友好**:专为中文语音处理设计

## 安装

```bash
pip install sinophone-zh
```

## 快速开始

### 基本用法

```python
from sinophone import chinese_to_sinophone, sinophone

# 中文文本转SinoPhone编码
result = chinese_to_sinophone("中国")
print(result)  # 输出: "ZG UG"

# 拼音转SinoPhone编码
result = sinophone("zhong guo")
print(result)  # 输出: "ZG UG"

# 不使用空格连接
result = chinese_to_sinophone("中国", join_with_space=False)
print(result)  # 输出: "ZGUG"
```

### 语音模糊匹配示例

SinoPhone 能够处理常见的方言混淆:

```python
# l/n 不分
print(chinese_to_sinophone("南"))  # "NN"
print(chinese_to_sinophone("兰"))  # "NN" (相同编码)

# f/h 不分
print(chinese_to_sinophone("发"))  # "HI"
print(chinese_to_sinophone("花"))  # "HI" (相同编码)

# o/e 混淆
print(sinophone("bo"))  # "BE"
print(sinophone("be"))  # "BE" (相同编码)
```

## API 文档

### `chinese_to_sinophone(chinese_text, join_with_space=True)`

将中文文本转换为 SinoPhone 编码。

**参数:**
- `chinese_text` (str): 中文字符串
- `join_with_space` (bool, 可选): 是否用空格连接音节编码,默认为 True

**返回:**
- str: SinoPhone 编码字符串

### `sinophone(pinyin_text)`

将拼音文本转换为 SinoPhone 编码。

**参数:**
- `pinyin_text` (str): 拼音字符串,音节之间用空格分隔

**返回:**
- str: SinoPhone 编码字符串

## 编码规则

### 声母映射
- 标准声母:b→B, p→P, m→M, f→H, d→D, t→T, n→N, l→N 等
- 混淆规则:l/n → N, f/h → H
- 零声母:y/w/yu → _

### 韵母映射
- 标准韵母:a/ia/ua→A, o/uo→E, e→E, ai/uai→I 等
- 混淆规则:o/e → E
- 鼻音韵母:an/en系列→N, ang/eng系列→G

### 特殊音节
- zhei → ZE("这"的口语变体)
- shei → SV("谁"的变体)
- ng → _E("嗯")
- m → _U("呣")

## 应用场景

- 🎤 **语音识别**:处理同音字混淆
- 🔍 **模糊搜索**:中文文本的语音相似度匹配
- 📝 **输入法**:拼音输入的容错处理
- 🗣️ **方言处理**:标准普通话与方言的映射
- 🤖 **NLP应用**:中文文本的语音特征提取

## 开发

### 安装开发依赖

```bash
pip install -e .[dev]
```

### 运行测试

```bash
pytest
```

### 代码格式化

```bash
black .
```

## 贡献

欢迎提交 Issue 和 Pull Request!

## 许可证

MIT License - 详见 [LICENSE](LICENSE) 文件。

## 更新日志

### v0.0.1
- 初始版本发布
- 支持中文文本和拼音文本转SinoPhone编码
- 实现语音模糊匹配规则
- 支持特殊音节处理

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Johnless31/SinoPhone",
    "name": "sinophone-zh",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "johnless <346656208@qq.com>",
    "keywords": "chinese, pinyin, phonetic, hash, encoding, sinophone, \u4e2d\u6587, \u62fc\u97f3, \u8bed\u97f3\u7f16\u7801",
    "author": "johnless",
    "author_email": "johnless <346656208@qq.com>",
    "download_url": "https://files.pythonhosted.org/packages/24/d3/cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e/sinophone_zh-0.0.2.tar.gz",
    "platform": null,
    "description": "# SinoPhone (\u4e2d\u534e\u97f3\u7801)\n\n[![PyPI version](https://badge.fury.io/py/sinophone-zh.svg)](https://badge.fury.io/py/sinophone-zh)\n[![Python](https://img.shields.io/pypi/pyversions/sinophone-zh.svg)](https://pypi.org/project/sinophone-zh/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nSinoPhone\uff08\u4e2d\u534e\u97f3\u7801\uff09\u662f\u4e00\u4e2a\u7528\u4e8e\u5c06\u4e2d\u6587\u62fc\u97f3\u8f6c\u6362\u4e3a\u8bed\u97f3\u6a21\u7cca\u54c8\u5e0c\u7f16\u7801\u7684Python\u5e93\u3002\u4e3b\u8981\u7528\u4e8e\u5904\u7406\u4e2d\u6587\u8bed\u97f3\u8bc6\u522b\u4e2d\u7684\u540c\u97f3\u5b57\u548c\u65b9\u8a00\u6df7\u6dc6\u95ee\u9898\u3002\n\n## \u7279\u6027\n\n- \ud83c\udfaf **\u8bed\u97f3\u6a21\u7cca\u5339\u914d**\uff1a\u652f\u6301\u5e38\u89c1\u7684\u65b9\u8a00\u6df7\u6dc6\uff0c\u5982 l/n \u4e0d\u5206\u3001f/h \u4e0d\u5206\u7b49\n- \ud83d\ude80 **\u9ad8\u6548\u7f16\u7801**\uff1a\u5c06\u62fc\u97f3\u97f3\u8282\u8f6c\u6362\u4e3a\u7b80\u77ed\u7684\u54c8\u5e0c\u7f16\u7801\n- \ud83d\udd27 **\u6613\u4e8e\u4f7f\u7528**\uff1a\u7b80\u5355\u7684API\uff0c\u652f\u6301\u4e2d\u6587\u6587\u672c\u548c\u62fc\u97f3\u6587\u672c\u8f93\u5165\n- \ud83d\udce6 **\u8f7b\u91cf\u7ea7**\uff1a\u53ea\u4f9d\u8d56 pypinyin \u5e93\n- \ud83c\udf0f **\u4e2d\u6587\u53cb\u597d**\uff1a\u4e13\u4e3a\u4e2d\u6587\u8bed\u97f3\u5904\u7406\u8bbe\u8ba1\n\n## \u5b89\u88c5\n\n```bash\npip install sinophone-zh\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n### \u57fa\u672c\u7528\u6cd5\n\n```python\nfrom sinophone import chinese_to_sinophone, sinophone\n\n# \u4e2d\u6587\u6587\u672c\u8f6cSinoPhone\u7f16\u7801\nresult = chinese_to_sinophone(\"\u4e2d\u56fd\")\nprint(result)  # \u8f93\u51fa: \"ZG UG\"\n\n# \u62fc\u97f3\u8f6cSinoPhone\u7f16\u7801\nresult = sinophone(\"zhong guo\")\nprint(result)  # \u8f93\u51fa: \"ZG UG\"\n\n# \u4e0d\u4f7f\u7528\u7a7a\u683c\u8fde\u63a5\nresult = chinese_to_sinophone(\"\u4e2d\u56fd\", join_with_space=False)\nprint(result)  # \u8f93\u51fa: \"ZGUG\"\n```\n\n### \u8bed\u97f3\u6a21\u7cca\u5339\u914d\u793a\u4f8b\n\nSinoPhone \u80fd\u591f\u5904\u7406\u5e38\u89c1\u7684\u65b9\u8a00\u6df7\u6dc6\uff1a\n\n```python\n# l/n \u4e0d\u5206\nprint(chinese_to_sinophone(\"\u5357\"))  # \"NN\"\nprint(chinese_to_sinophone(\"\u5170\"))  # \"NN\" (\u76f8\u540c\u7f16\u7801)\n\n# f/h \u4e0d\u5206\nprint(chinese_to_sinophone(\"\u53d1\"))  # \"HI\"\nprint(chinese_to_sinophone(\"\u82b1\"))  # \"HI\" (\u76f8\u540c\u7f16\u7801)\n\n# o/e \u6df7\u6dc6\nprint(sinophone(\"bo\"))  # \"BE\"\nprint(sinophone(\"be\"))  # \"BE\" (\u76f8\u540c\u7f16\u7801)\n```\n\n## API \u6587\u6863\n\n### `chinese_to_sinophone(chinese_text, join_with_space=True)`\n\n\u5c06\u4e2d\u6587\u6587\u672c\u8f6c\u6362\u4e3a SinoPhone \u7f16\u7801\u3002\n\n**\u53c2\u6570\uff1a**\n- `chinese_text` (str): \u4e2d\u6587\u5b57\u7b26\u4e32\n- `join_with_space` (bool, \u53ef\u9009): \u662f\u5426\u7528\u7a7a\u683c\u8fde\u63a5\u97f3\u8282\u7f16\u7801\uff0c\u9ed8\u8ba4\u4e3a True\n\n**\u8fd4\u56de\uff1a**\n- str: SinoPhone \u7f16\u7801\u5b57\u7b26\u4e32\n\n### `sinophone(pinyin_text)`\n\n\u5c06\u62fc\u97f3\u6587\u672c\u8f6c\u6362\u4e3a SinoPhone \u7f16\u7801\u3002\n\n**\u53c2\u6570\uff1a**\n- `pinyin_text` (str): \u62fc\u97f3\u5b57\u7b26\u4e32\uff0c\u97f3\u8282\u4e4b\u95f4\u7528\u7a7a\u683c\u5206\u9694\n\n**\u8fd4\u56de\uff1a**\n- str: SinoPhone \u7f16\u7801\u5b57\u7b26\u4e32\n\n## \u7f16\u7801\u89c4\u5219\n\n### \u58f0\u6bcd\u6620\u5c04\n- \u6807\u51c6\u58f0\u6bcd\uff1ab\u2192B, p\u2192P, m\u2192M, f\u2192H, d\u2192D, t\u2192T, n\u2192N, l\u2192N \u7b49\n- \u6df7\u6dc6\u89c4\u5219\uff1al/n \u2192 N, f/h \u2192 H\n- \u96f6\u58f0\u6bcd\uff1ay/w/yu \u2192 _\n\n### \u97f5\u6bcd\u6620\u5c04\n- \u6807\u51c6\u97f5\u6bcd\uff1aa/ia/ua\u2192A, o/uo\u2192E, e\u2192E, ai/uai\u2192I \u7b49\n- \u6df7\u6dc6\u89c4\u5219\uff1ao/e \u2192 E\n- \u9f3b\u97f3\u97f5\u6bcd\uff1aan/en\u7cfb\u5217\u2192N, ang/eng\u7cfb\u5217\u2192G\n\n### \u7279\u6b8a\u97f3\u8282\n- zhei \u2192 ZE\uff08\"\u8fd9\"\u7684\u53e3\u8bed\u53d8\u4f53\uff09\n- shei \u2192 SV\uff08\"\u8c01\"\u7684\u53d8\u4f53\uff09\n- ng \u2192 _E\uff08\"\u55ef\"\uff09\n- m \u2192 _U\uff08\"\u5463\"\uff09\n\n## \u5e94\u7528\u573a\u666f\n\n- \ud83c\udfa4 **\u8bed\u97f3\u8bc6\u522b**\uff1a\u5904\u7406\u540c\u97f3\u5b57\u6df7\u6dc6\n- \ud83d\udd0d **\u6a21\u7cca\u641c\u7d22**\uff1a\u4e2d\u6587\u6587\u672c\u7684\u8bed\u97f3\u76f8\u4f3c\u5ea6\u5339\u914d\n- \ud83d\udcdd **\u8f93\u5165\u6cd5**\uff1a\u62fc\u97f3\u8f93\u5165\u7684\u5bb9\u9519\u5904\u7406\n- \ud83d\udde3\ufe0f **\u65b9\u8a00\u5904\u7406**\uff1a\u6807\u51c6\u666e\u901a\u8bdd\u4e0e\u65b9\u8a00\u7684\u6620\u5c04\n- \ud83e\udd16 **NLP\u5e94\u7528**\uff1a\u4e2d\u6587\u6587\u672c\u7684\u8bed\u97f3\u7279\u5f81\u63d0\u53d6\n\n## \u5f00\u53d1\n\n### \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\n\n```bash\npip install -e .[dev]\n```\n\n### \u8fd0\u884c\u6d4b\u8bd5\n\n```bash\npytest\n```\n\n### \u4ee3\u7801\u683c\u5f0f\u5316\n\n```bash\nblack .\n```\n\n## \u8d21\u732e\n\n\u6b22\u8fce\u63d0\u4ea4 Issue \u548c Pull Request\uff01\n\n## \u8bb8\u53ef\u8bc1\n\nMIT License - \u8be6\u89c1 [LICENSE](LICENSE) \u6587\u4ef6\u3002\n\n## \u66f4\u65b0\u65e5\u5fd7\n\n### v0.0.1\n- \u521d\u59cb\u7248\u672c\u53d1\u5e03\n- \u652f\u6301\u4e2d\u6587\u6587\u672c\u548c\u62fc\u97f3\u6587\u672c\u8f6cSinoPhone\u7f16\u7801\n- \u5b9e\u73b0\u8bed\u97f3\u6a21\u7cca\u5339\u914d\u89c4\u5219\n- \u652f\u6301\u7279\u6b8a\u97f3\u8282\u5904\u7406\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "\u4e2d\u534e\u97f3\u7801\uff08SinoPhone\uff09- \u4e2d\u6587\u62fc\u97f3\u8bed\u97f3\u6a21\u7cca\u54c8\u5e0c\u7f16\u7801\u7b97\u6cd5",
    "version": "0.0.2",
    "project_urls": {
        "Documentation": "https://github.com/Johnless31/SinoPhone#readme",
        "Homepage": "https://github.com/Johnless31/SinoPhone",
        "Issues": "https://github.com/Johnless31/SinoPhone/issues",
        "Repository": "https://github.com/Johnless31/SinoPhone"
    },
    "split_keywords": [
        "chinese",
        " pinyin",
        " phonetic",
        " hash",
        " encoding",
        " sinophone",
        " \u4e2d\u6587",
        " \u62fc\u97f3",
        " \u8bed\u97f3\u7f16\u7801"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4e8e00c2b5acf709a7f75a7efe5b6ad8ee7d99c0fc58f938580ad9ba04b1916f",
                "md5": "2bbf68b4dcd7f084e7c541b5f0aaeb5f",
                "sha256": "50dc7ad9ea55caea2104ca75703ed8abe8e81d3e2d3dd2ba642d4498b3ee71cd"
            },
            "downloads": -1,
            "filename": "sinophone_zh-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2bbf68b4dcd7f084e7c541b5f0aaeb5f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 7110,
            "upload_time": "2025-09-04T06:32:35",
            "upload_time_iso_8601": "2025-09-04T06:32:35.969562Z",
            "url": "https://files.pythonhosted.org/packages/4e/8e/00c2b5acf709a7f75a7efe5b6ad8ee7d99c0fc58f938580ad9ba04b1916f/sinophone_zh-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "24d3cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e",
                "md5": "5eac74bc5bd107aeff969b1535854c7d",
                "sha256": "d983a600ad1c102672d36de2128783a08f91bbb27559f59f356e26aa3e6e255c"
            },
            "downloads": -1,
            "filename": "sinophone_zh-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5eac74bc5bd107aeff969b1535854c7d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 23569,
            "upload_time": "2025-09-04T06:32:38",
            "upload_time_iso_8601": "2025-09-04T06:32:38.648131Z",
            "url": "https://files.pythonhosted.org/packages/24/d3/cf103d463b970378e391cc92eca0b63eeea9fdf484642022cc17a3720a9e/sinophone_zh-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-04 06:32:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Johnless31",
    "github_project": "SinoPhone",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pypinyin",
            "specs": [
                [
                    ">=",
                    "0.44.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "2.10.0"
                ]
            ]
        },
        {
            "name": "pytest-timeout",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "pytest-xdist",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.8.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        }
    ],
    "lcname": "sinophone-zh"
}
        
Elapsed time: 0.50135s