pronunciation-dictionary-utils


Namepronunciation-dictionary-utils JSON
Version 0.0.5 PyPI version JSON
download
home_page
SummaryCLI and library to modify pronunciation dictionaries (any language).
upload_time2024-01-24 10:35:49
maintainer
docs_urlNone
author
requires_python<3.13,>=3.8
licenseMIT
keywords arpabet ipa x-sampa cmu tts text-to-speech speech synthesis language linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pronunciation-dictionary-utils

[![PyPI](https://img.shields.io/pypi/v/pronunciation-dictionary-utils.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)
[![PyPI](https://img.shields.io/pypi/pyversions/pronunciation-dictionary-utils.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)
[![MIT](https://img.shields.io/github/license/stefantaubert/pronunciation-dictionary-utils.svg)](LICENSE)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/pronunciation-dictionary-utils/latest/master.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10560153.svg)](https://doi.org/10.5281/zenodo.10560153)

Library and CLI to modify pronunciation dictionaries (any language).

## Features

- `export-vocabulary`: export vocabulary from dictionaries
- `export-phonemes`: export phoneme set from dictionaries
- `merge`: merge dictionaries together
- `extract`: extract subset of dictionary vocabulary
- `map-symbols-in-pronunciations`: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
- `map-symbols-in-pronunciations-json`: map phonemes/symbols in pronunciations to phoneme/symbol specified in file
- `remove-symbols-from-vocabulary`: remove phonemes/symbols from vocabulary
- `remove-symbols-from-pronunciations`: remove phonemes/symbols from pronunciations
- `remove-symbols-from-words`: remove characters/symbols from words
- `change-formatting`: change formatting of dictionaries
- `select-single-pronunciation`: select single pronunciation
- `change-word-casing`: transform all words to upper- or lower-case
- `sort-words`: sort dictionary after words
- `sort-pronunciations`: sort dictionary pronunciations
- `normalize-weights`: normalize pronunciation weights for each word

## Roadmap

- Adding tests
- Implementation of printing of statistics
- Add change of pronunciation for a word via CLI

## Installation

```sh
pip install pronunciation-dictionary-utils --user
```

## Usage

```txt
usage: dict-cli [-h] [-v]
                {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                ...

This program provides methods to modify pronunciation dictionaries.

positional arguments:
  {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                                        description
    export-vocabulary                   export vocabulary from dictionaries
    export-phonemes                     export phoneme set from dictionaries
    merge                               merge dictionaries together
    extract                             extract subset of dictionary vocabulary
    map-symbols-in-pronunciations       map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
    map-symbols-in-pronunciations-json  map phonemes/symbols in pronunciations to phoneme/symbol specified in file
    remove-symbols-from-vocabulary      remove phonemes/symbols from vocabulary
    remove-symbols-from-pronunciations  remove phonemes/symbols from pronunciations
    remove-symbols-from-words           remove characters/symbols from words
    change-formatting                   change formatting of dictionaries
    select-single-pronunciation         select single pronunciation
    change-word-casing                  transform all words to upper- or lower-case
    sort-words                          sort dictionary after words
    sort-pronunciations                 sort dictionary pronunciations
    normalize-weights                   normalize pronunciation weights for each word

optional arguments:
  -h, --help                            show this help message and exit
  -v, --version                         show program's version number and exit
```

### Example

```sh
# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
  -O "/tmp/example.dict"

# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
  "/tmp/example.dict" \
  --deserialization-encoding "ISO-8859-1" \
  --consider-numbers \
  --consider-pronunciation-comments \
  --serialization-encoding "UTF-8"

# Export phoneme set
dict-cli export-phonemes \
  "/tmp/example.dict" \
  "/tmp/example-phoneme-set.txt"
  
# Export vocabulary
dict-cli export-vocabulary \
  "/tmp/example.dict" \
  "/tmp/example-vocabulary.txt"

# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
  "/tmp/example.dict" \
  --mode "first"

# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
  "/tmp/example.dict" \
  "ER0" "ER"
```

## Contributing

### Development setup

```sh
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv \
  python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev
```

## Running the tests

```sh
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox
```

Final lines of test result output:

```log
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
```

## Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

## Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).

```txt
Taubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pronunciation-dictionary-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "keywords": "ARPAbet,IPA,X-SAMPA,CMU,TTS,Text-to-speech,Speech synthesis,Language,Linguistics",
    "author": "",
    "author_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "download_url": "https://files.pythonhosted.org/packages/bd/23/f50d150e24b1e62fa4c83c1fae4469061a9b1dd02d451683814da38499b0/pronunciation-dictionary-utils-0.0.5.tar.gz",
    "platform": null,
    "description": "# pronunciation-dictionary-utils\n\n[![PyPI](https://img.shields.io/pypi/v/pronunciation-dictionary-utils.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)\n[![PyPI](https://img.shields.io/pypi/pyversions/pronunciation-dictionary-utils.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)\n[![MIT](https://img.shields.io/github/license/stefantaubert/pronunciation-dictionary-utils.svg)](LICENSE)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/pronunciation-dictionary-utils/latest/master.svg)](https://pypi.python.org/pypi/pronunciation-dictionary-utils)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10560153.svg)](https://doi.org/10.5281/zenodo.10560153)\n\nLibrary and CLI to modify pronunciation dictionaries (any language).\n\n## Features\n\n- `export-vocabulary`: export vocabulary from dictionaries\n- `export-phonemes`: export phoneme set from dictionaries\n- `merge`: merge dictionaries together\n- `extract`: extract subset of dictionary vocabulary\n- `map-symbols-in-pronunciations`: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA\n- `map-symbols-in-pronunciations-json`: map phonemes/symbols in pronunciations to phoneme/symbol specified in file\n- `remove-symbols-from-vocabulary`: remove phonemes/symbols from vocabulary\n- `remove-symbols-from-pronunciations`: remove phonemes/symbols from pronunciations\n- `remove-symbols-from-words`: remove characters/symbols from words\n- `change-formatting`: change formatting of dictionaries\n- `select-single-pronunciation`: select single pronunciation\n- `change-word-casing`: transform all words to upper- or lower-case\n- `sort-words`: sort dictionary after words\n- `sort-pronunciations`: sort dictionary pronunciations\n- `normalize-weights`: normalize pronunciation weights for each word\n\n## Roadmap\n\n- Adding tests\n- Implementation of printing of statistics\n- Add change of pronunciation for a word via CLI\n\n## Installation\n\n```sh\npip install pronunciation-dictionary-utils --user\n```\n\n## Usage\n\n```txt\nusage: dict-cli [-h] [-v]\n                {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}\n                ...\n\nThis program provides methods to modify pronunciation dictionaries.\n\npositional arguments:\n  {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}\n                                        description\n    export-vocabulary                   export vocabulary from dictionaries\n    export-phonemes                     export phoneme set from dictionaries\n    merge                               merge dictionaries together\n    extract                             extract subset of dictionary vocabulary\n    map-symbols-in-pronunciations       map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA\n    map-symbols-in-pronunciations-json  map phonemes/symbols in pronunciations to phoneme/symbol specified in file\n    remove-symbols-from-vocabulary      remove phonemes/symbols from vocabulary\n    remove-symbols-from-pronunciations  remove phonemes/symbols from pronunciations\n    remove-symbols-from-words           remove characters/symbols from words\n    change-formatting                   change formatting of dictionaries\n    select-single-pronunciation         select single pronunciation\n    change-word-casing                  transform all words to upper- or lower-case\n    sort-words                          sort dictionary after words\n    sort-pronunciations                 sort dictionary pronunciations\n    normalize-weights                   normalize pronunciation weights for each word\n\noptional arguments:\n  -h, --help                            show this help message and exit\n  -v, --version                         show program's version number and exit\n```\n\n### Example\n\n```sh\n# Download CMU dictionary\nwget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \\\n  -O \"/tmp/example.dict\"\n\n# Change formatting to remove numbers from words, comments and save as UTF-8\ndict-cli change-formatting \\\n  \"/tmp/example.dict\" \\\n  --deserialization-encoding \"ISO-8859-1\" \\\n  --consider-numbers \\\n  --consider-pronunciation-comments \\\n  --serialization-encoding \"UTF-8\"\n\n# Export phoneme set\ndict-cli export-phonemes \\\n  \"/tmp/example.dict\" \\\n  \"/tmp/example-phoneme-set.txt\"\n  \n# Export vocabulary\ndict-cli export-vocabulary \\\n  \"/tmp/example.dict\" \\\n  \"/tmp/example-vocabulary.txt\"\n\n# Keep first pronunciation for each word and discard the rest\ndict-cli select-single-pronunciation \\\n  \"/tmp/example.dict\" \\\n  --mode \"first\"\n\n# Replace all \"ER0\" phonemes with \"ER\"\ndict-cli map-symbols-in-pronunciations \\\n  \"/tmp/example.dict\" \\\n  \"ER0\" \"ER\"\n```\n\n## Contributing\n\n### Development setup\n\n```sh\n# update\nsudo apt update\n# install Python 3.8-3.12 for ensuring that tests can be run\nsudo apt install python3-pip \\\n  python3.8 python3.8-dev python3.8-distutils python3.8-venv \\\n  python3.9 python3.9-dev python3.9-distutils python3.9-venv \\\n  python3.10 python3.10-dev python3.10-distutils python3.10-venv \\\n  python3.11 python3.11-dev python3.11-distutils python3.11-venv \\\n  python3.12 python3.12-dev python3.12-distutils python3.12-venv\n# install pipenv for creation of virtual environments\npython3.8 -m pip install pipenv --user\n\n# check out repo\ngit clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git\ncd pronunciation-dictionary-utils\n# create virtual environment\npython3.8 -m pipenv install --dev\n```\n\n## Running the tests\n\n```sh\n# first install the tool like in \"Development setup\"\n# then, navigate into the directory of the repo (if not already done)\ncd pronunciation-dictionary-utils\n# activate environment\npython3.8 -m pipenv shell\n# run tests\ntox\n```\n\nFinal lines of test result output:\n\n```log\npy38: commands succeeded\npy39: commands succeeded\npy310: commands succeeded\npy311: commands succeeded\npy312: commands succeeded\ncongratulations :)\n```\n\n## Acknowledgments\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n```txt\nTaubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CLI and library to modify pronunciation dictionaries (any language).",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/stefantaubert/pronunciation-dictionary-utils",
        "Issues": "https://github.com/stefantaubert/pronunciation-dictionary-utils/issues"
    },
    "split_keywords": [
        "arpabet",
        "ipa",
        "x-sampa",
        "cmu",
        "tts",
        "text-to-speech",
        "speech synthesis",
        "language",
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d52b802314e91e86ce8cb844d50223d0c7c51cd69532334bec9ac4daebf970d6",
                "md5": "7d18c73ed60bb92cf1f2d9a640bb89f9",
                "sha256": "fbaec5b3cf78a138a43705f9e55b7701b254654aec84fa46e05903547d9080da"
            },
            "downloads": -1,
            "filename": "pronunciation_dictionary_utils-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d18c73ed60bb92cf1f2d9a640bb89f9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 62017,
            "upload_time": "2024-01-24T10:35:47",
            "upload_time_iso_8601": "2024-01-24T10:35:47.492723Z",
            "url": "https://files.pythonhosted.org/packages/d5/2b/802314e91e86ce8cb844d50223d0c7c51cd69532334bec9ac4daebf970d6/pronunciation_dictionary_utils-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd23f50d150e24b1e62fa4c83c1fae4469061a9b1dd02d451683814da38499b0",
                "md5": "bda7a6bd0a4a25cd89009153750800ef",
                "sha256": "2f3d2b51c7f4076241174bcd910f6fe61c2ed6e837ded8040614628790a3b42a"
            },
            "downloads": -1,
            "filename": "pronunciation-dictionary-utils-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bda7a6bd0a4a25cd89009153750800ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 30605,
            "upload_time": "2024-01-24T10:35:49",
            "upload_time_iso_8601": "2024-01-24T10:35:49.479183Z",
            "url": "https://files.pythonhosted.org/packages/bd/23/f50d150e24b1e62fa4c83c1fae4469061a9b1dd02d451683814da38499b0/pronunciation-dictionary-utils-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 10:35:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stefantaubert",
    "github_project": "pronunciation-dictionary-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pronunciation-dictionary-utils"
}
        
Elapsed time: 0.17770s