# dict-from-pypinyin
[![PyPI](https://img.shields.io/pypi/v/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![MIT](https://img.shields.io/github/license/stefantaubert/dict-from-pypinyin.svg)](https://github.com/stefantaubert/dict-from-pypinyin/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![PyPI](https://img.shields.io/pypi/implementation/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7524283.svg)](https://doi.org/10.5281/zenodo.7524283)
Command-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using [pypinyin](https://github.com/mozillazg/python-pinyin) including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.
## Installation
```sh
pip install dict-from-pypinyin --user
```
## Usage
```sh
dict-from-pypinyin-cli
```
### Example
```sh
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
社会语言学?
㐻,
『㑐
鲜-亮。
『占斌?
『机具-机呀?
EOF
# Create dictionary from vocabulary
dict-from-pypinyin-cli \
/tmp/vocabulary.txt \
/tmp/result.dict \
--split-on-hyphen
cat /tmp/result.dict
```
Output:
```txt
社会语言学? shè huì yǔ yán xué ?
社会语言学? shè huì yǔ yàn xué ?
社会语言学? shè huì yǔ yín xué ?
社会语言学? shè huì yù yán xué ?
社会语言学? shè huì yù yàn xué ?
社会语言学? shè huì yù yín xué ?
社会语言学? shè kuài yǔ yán xué ?
社会语言学? shè kuài yǔ yàn xué ?
社会语言学? shè kuài yǔ yín xué ?
社会语言学? shè kuài yù yán xué ?
社会语言学? shè kuài yù yàn xué ?
社会语言学? shè kuài yù yín xué ?
㐻, nèi ,
『㑐 『 shū
鲜-亮。 xiān - liàng 。
鲜-亮。 xiān - liáng 。
鲜-亮。 xiǎn - liàng 。
鲜-亮。 xiǎn - liáng 。
『占斌? 『 zhàn bīn ?
『占斌? 『 zhān bīn ?
『占斌? 『 tiē bīn ?
『机具-机呀? 『 jī jù - jī ya ?
『机具-机呀? 『 jī jù - jī yā ?
『机具-机呀? 『 jī jù - jī xiā ?
『机具-机呀? 『 jī jù - wèi ya ?
『机具-机呀? 『 jī jù - wèi yā ?
『机具-机呀? 『 jī jù - wèi xiā ?
『机具-机呀? 『 wèi jù - jī ya ?
『机具-机呀? 『 wèi jù - jī yā ?
『机具-机呀? 『 wèi jù - jī xiā ?
『机具-机呀? 『 wèi jù - wèi ya ?
『机具-机呀? 『 wèi jù - wèi yā ?
『机具-机呀? 『 wèi jù - wèi xiā ?
```
## Dependencies
- `pronunciation-dictionary >= 0.0.5`
- `word-to-pronunciation >= 0.0.1`
- `ordered-set >= 4.1.0`
- `pypinyin >= 0.47.1, < 0.48`
- `tqdm`
## License
MIT License
## Acknowledgments
[pypinyin](https://github.com/mozillazg/python-pinyin)
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
## Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).
Raw data
{
"_id": null,
"home_page": "",
"name": "dict-from-pypinyin",
"maintainer": "Stefan Taubert",
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": "pypi@stefantaubert.com",
"keywords": "Pronunciation,Dictionary,Chinese,Language,Pinyin,Speech Synthesis,TTS,Linguistics",
"author": "Stefan Taubert",
"author_email": "pypi@stefantaubert.com",
"download_url": "https://files.pythonhosted.org/packages/b8/c7/861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0/dict-from-pypinyin-0.0.1.tar.gz",
"platform": null,
"description": "# dict-from-pypinyin\n\n[![PyPI](https://img.shields.io/pypi/v/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![MIT](https://img.shields.io/github/license/stefantaubert/dict-from-pypinyin.svg)](https://github.com/stefantaubert/dict-from-pypinyin/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![PyPI](https://img.shields.io/pypi/implementation/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7524283.svg)](https://doi.org/10.5281/zenodo.7524283)\n\nCommand-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using [pypinyin](https://github.com/mozillazg/python-pinyin) including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.\n\n## Installation\n\n```sh\npip install dict-from-pypinyin --user\n```\n\n## Usage\n\n```sh\ndict-from-pypinyin-cli\n```\n\n### Example\n\n```sh\n# Create example vocabulary\ncat > /tmp/vocabulary.txt << EOF\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f\n\u343b\uff0c\n\u300e\u3450\n\u9c9c-\u4eae\u3002\n\u300e\u5360\u658c\uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f\nEOF\n\n# Create dictionary from vocabulary\ndict-from-pypinyin-cli \\\n /tmp/vocabulary.txt \\\n /tmp/result.dict \\\n --split-on-hyphen\n\ncat /tmp/result.dict\n```\n\nOutput:\n\n```txt\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u01d4 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u01d4 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u01d4 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u00f9 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u00f9 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 hu\u00ec y\u00f9 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u01d4 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u01d4 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u01d4 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u00f9 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u00f9 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f sh\u00e8 ku\u00e0i y\u00f9 y\u00edn xu\u00e9 \uff1f\n\u343b\uff0c n\u00e8i \uff0c\n\u300e\u3450 \u300e sh\u016b\n\u9c9c-\u4eae\u3002 xi\u0101n - li\u00e0ng \u3002\n\u9c9c-\u4eae\u3002 xi\u0101n - li\u00e1ng \u3002\n\u9c9c-\u4eae\u3002 xi\u01cen - li\u00e0ng \u3002\n\u9c9c-\u4eae\u3002 xi\u01cen - li\u00e1ng \u3002\n\u300e\u5360\u658c\uff1f \u300e zh\u00e0n b\u012bn \uff1f\n\u300e\u5360\u658c\uff1f \u300e zh\u0101n b\u012bn \uff1f\n\u300e\u5360\u658c\uff1f \u300e ti\u0113 b\u012bn \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - j\u012b ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - j\u012b y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - j\u012b xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - w\u00e8i ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - w\u00e8i y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e j\u012b j\u00f9 - w\u00e8i xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - j\u012b ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - j\u012b y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - j\u012b xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - w\u00e8i ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - w\u00e8i y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f \u300e w\u00e8i j\u00f9 - w\u00e8i xi\u0101 \uff1f\n```\n\n## Dependencies\n\n- `pronunciation-dictionary >= 0.0.5`\n- `word-to-pronunciation >= 0.0.1`\n- `ordered-set >= 4.1.0`\n- `pypinyin >= 0.47.1, < 0.48`\n- `tqdm`\n\n## License\n\nMIT License\n\n## Acknowledgments\n\n[pypinyin](https://github.com/mozillazg/python-pinyin)\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Command-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using pypinyin including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.",
"version": "0.0.1",
"split_keywords": [
"pronunciation",
"dictionary",
"chinese",
"language",
"pinyin",
"speech synthesis",
"tts",
"linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "571886d4bfa11ce261dbdaade1c3df3fcbefeb18ede9bac812b1ee87f07eaa8e",
"md5": "f8815e63c89082eab76422045720a208",
"sha256": "a4c75feded68c96ab9a5e3de718fc92cffbc0c83ec23baa2f8dd69524187554c"
},
"downloads": -1,
"filename": "dict_from_pypinyin-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f8815e63c89082eab76422045720a208",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 14839,
"upload_time": "2023-01-11T08:37:53",
"upload_time_iso_8601": "2023-01-11T08:37:53.061051Z",
"url": "https://files.pythonhosted.org/packages/57/18/86d4bfa11ce261dbdaade1c3df3fcbefeb18ede9bac812b1ee87f07eaa8e/dict_from_pypinyin-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b8c7861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0",
"md5": "10c28520cfa1741db15aa7cdcf262d3b",
"sha256": "2f0052f333d57fd9caa680498d739f554b5ccd590f3885470d268d21f372f614"
},
"downloads": -1,
"filename": "dict-from-pypinyin-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "10c28520cfa1741db15aa7cdcf262d3b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 103499,
"upload_time": "2023-01-11T08:37:54",
"upload_time_iso_8601": "2023-01-11T08:37:54.743469Z",
"url": "https://files.pythonhosted.org/packages/b8/c7/861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0/dict-from-pypinyin-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-11 08:37:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "dict-from-pypinyin"
}