dict-from-pypinyin


Namedict-from-pypinyin JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryCommand-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using pypinyin including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.
upload_time2023-01-11 08:37:54
maintainerStefan Taubert
docs_urlNone
authorStefan Taubert
requires_python<4,>=3.8
licenseMIT
keywords pronunciation dictionary chinese language pinyin speech synthesis tts linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dict-from-pypinyin

[![PyPI](https://img.shields.io/pypi/v/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![MIT](https://img.shields.io/github/license/stefantaubert/dict-from-pypinyin.svg)](https://github.com/stefantaubert/dict-from-pypinyin/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![PyPI](https://img.shields.io/pypi/implementation/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7524283.svg)](https://doi.org/10.5281/zenodo.7524283)

Command-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using [pypinyin](https://github.com/mozillazg/python-pinyin) including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.

## Installation

```sh
pip install dict-from-pypinyin --user
```

## Usage

```sh
dict-from-pypinyin-cli
```

### Example

```sh
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
社会语言学?
㐻,
『㑐
鲜-亮。
『占斌?
『机具-机呀?
EOF

# Create dictionary from vocabulary
dict-from-pypinyin-cli \
  /tmp/vocabulary.txt \
  /tmp/result.dict \
  --split-on-hyphen

cat /tmp/result.dict
```

Output:

```txt
社会语言学?  shè huì yǔ yán xué ?
社会语言学?  shè huì yǔ yàn xué ?
社会语言学?  shè huì yǔ yín xué ?
社会语言学?  shè huì yù yán xué ?
社会语言学?  shè huì yù yàn xué ?
社会语言学?  shè huì yù yín xué ?
社会语言学?  shè kuài yǔ yán xué ?
社会语言学?  shè kuài yǔ yàn xué ?
社会语言学?  shè kuài yǔ yín xué ?
社会语言学?  shè kuài yù yán xué ?
社会语言学?  shè kuài yù yàn xué ?
社会语言学?  shè kuài yù yín xué ?
㐻,  nèi ,
『㑐  『 shū
鲜-亮。  xiān - liàng 。
鲜-亮。  xiān - liáng 。
鲜-亮。  xiǎn - liàng 。
鲜-亮。  xiǎn - liáng 。
『占斌?  『 zhàn bīn ?
『占斌?  『 zhān bīn ?
『占斌?  『 tiē bīn ?
『机具-机呀?  『 jī jù - jī ya ?
『机具-机呀?  『 jī jù - jī yā ?
『机具-机呀?  『 jī jù - jī xiā ?
『机具-机呀?  『 jī jù - wèi ya ?
『机具-机呀?  『 jī jù - wèi yā ?
『机具-机呀?  『 jī jù - wèi xiā ?
『机具-机呀?  『 wèi jù - jī ya ?
『机具-机呀?  『 wèi jù - jī yā ?
『机具-机呀?  『 wèi jù - jī xiā ?
『机具-机呀?  『 wèi jù - wèi ya ?
『机具-机呀?  『 wèi jù - wèi yā ?
『机具-机呀?  『 wèi jù - wèi xiā ?
```

## Dependencies

- `pronunciation-dictionary >= 0.0.5`
- `word-to-pronunciation >= 0.0.1`
- `ordered-set >= 4.1.0`
- `pypinyin >= 0.47.1, < 0.48`
- `tqdm`

## License

MIT License

## Acknowledgments

[pypinyin](https://github.com/mozillazg/python-pinyin)

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

## Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dict-from-pypinyin",
    "maintainer": "Stefan Taubert",
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": "pypi@stefantaubert.com",
    "keywords": "Pronunciation,Dictionary,Chinese,Language,Pinyin,Speech Synthesis,TTS,Linguistics",
    "author": "Stefan Taubert",
    "author_email": "pypi@stefantaubert.com",
    "download_url": "https://files.pythonhosted.org/packages/b8/c7/861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0/dict-from-pypinyin-0.0.1.tar.gz",
    "platform": null,
    "description": "# dict-from-pypinyin\n\n[![PyPI](https://img.shields.io/pypi/v/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![MIT](https://img.shields.io/github/license/stefantaubert/dict-from-pypinyin.svg)](https://github.com/stefantaubert/dict-from-pypinyin/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![PyPI](https://img.shields.io/pypi/implementation/dict-from-pypinyin.svg)](https://pypi.python.org/pypi/dict-from-pypinyin)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7524283.svg)](https://doi.org/10.5281/zenodo.7524283)\n\nCommand-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using [pypinyin](https://github.com/mozillazg/python-pinyin) including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.\n\n## Installation\n\n```sh\npip install dict-from-pypinyin --user\n```\n\n## Usage\n\n```sh\ndict-from-pypinyin-cli\n```\n\n### Example\n\n```sh\n# Create example vocabulary\ncat > /tmp/vocabulary.txt << EOF\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f\n\u343b\uff0c\n\u300e\u3450\n\u9c9c-\u4eae\u3002\n\u300e\u5360\u658c\uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f\nEOF\n\n# Create dictionary from vocabulary\ndict-from-pypinyin-cli \\\n  /tmp/vocabulary.txt \\\n  /tmp/result.dict \\\n  --split-on-hyphen\n\ncat /tmp/result.dict\n```\n\nOutput:\n\n```txt\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u01d4 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u01d4 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u01d4 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u00f9 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u00f9 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 hu\u00ec y\u00f9 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u01d4 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u01d4 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u01d4 y\u00edn xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u00f9 y\u00e1n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u00f9 y\u00e0n xu\u00e9 \uff1f\n\u793e\u4f1a\u8bed\u8a00\u5b66\uff1f  sh\u00e8 ku\u00e0i y\u00f9 y\u00edn xu\u00e9 \uff1f\n\u343b\uff0c  n\u00e8i \uff0c\n\u300e\u3450  \u300e sh\u016b\n\u9c9c-\u4eae\u3002  xi\u0101n - li\u00e0ng \u3002\n\u9c9c-\u4eae\u3002  xi\u0101n - li\u00e1ng \u3002\n\u9c9c-\u4eae\u3002  xi\u01cen - li\u00e0ng \u3002\n\u9c9c-\u4eae\u3002  xi\u01cen - li\u00e1ng \u3002\n\u300e\u5360\u658c\uff1f  \u300e zh\u00e0n b\u012bn \uff1f\n\u300e\u5360\u658c\uff1f  \u300e zh\u0101n b\u012bn \uff1f\n\u300e\u5360\u658c\uff1f  \u300e ti\u0113 b\u012bn \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - j\u012b ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - j\u012b y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - j\u012b xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - w\u00e8i ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - w\u00e8i y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e j\u012b j\u00f9 - w\u00e8i xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - j\u012b ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - j\u012b y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - j\u012b xi\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - w\u00e8i ya \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - w\u00e8i y\u0101 \uff1f\n\u300e\u673a\u5177-\u673a\u5440\uff1f  \u300e w\u00e8i j\u00f9 - w\u00e8i xi\u0101 \uff1f\n```\n\n## Dependencies\n\n- `pronunciation-dictionary >= 0.0.5`\n- `word-to-pronunciation >= 0.0.1`\n- `ordered-set >= 4.1.0`\n- `pypinyin >= 0.47.1, < 0.48`\n- `tqdm`\n\n## License\n\nMIT License\n\n## Acknowledgments\n\n[pypinyin](https://github.com/mozillazg/python-pinyin)\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Command-line interface (CLI) to create a pronunciation dictionary by looking up pinyin transcriptions using pypinyin including the possibility of ignoring punctuation and splitting words on hyphens before transcribing them.",
    "version": "0.0.1",
    "split_keywords": [
        "pronunciation",
        "dictionary",
        "chinese",
        "language",
        "pinyin",
        "speech synthesis",
        "tts",
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "571886d4bfa11ce261dbdaade1c3df3fcbefeb18ede9bac812b1ee87f07eaa8e",
                "md5": "f8815e63c89082eab76422045720a208",
                "sha256": "a4c75feded68c96ab9a5e3de718fc92cffbc0c83ec23baa2f8dd69524187554c"
            },
            "downloads": -1,
            "filename": "dict_from_pypinyin-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8815e63c89082eab76422045720a208",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 14839,
            "upload_time": "2023-01-11T08:37:53",
            "upload_time_iso_8601": "2023-01-11T08:37:53.061051Z",
            "url": "https://files.pythonhosted.org/packages/57/18/86d4bfa11ce261dbdaade1c3df3fcbefeb18ede9bac812b1ee87f07eaa8e/dict_from_pypinyin-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b8c7861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0",
                "md5": "10c28520cfa1741db15aa7cdcf262d3b",
                "sha256": "2f0052f333d57fd9caa680498d739f554b5ccd590f3885470d268d21f372f614"
            },
            "downloads": -1,
            "filename": "dict-from-pypinyin-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "10c28520cfa1741db15aa7cdcf262d3b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 103499,
            "upload_time": "2023-01-11T08:37:54",
            "upload_time_iso_8601": "2023-01-11T08:37:54.743469Z",
            "url": "https://files.pythonhosted.org/packages/b8/c7/861052884ef15d5b21f5dd597a371369692dd52cf5c0f5c2e9bed9a7bef0/dict-from-pypinyin-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-11 08:37:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "dict-from-pypinyin"
}
        
Elapsed time: 0.03992s