dict-from-dict


Namedict-from-dict JSON
Version 0.0.4 PyPI version JSON
download
home_page
SummaryCommand-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
upload_time2024-01-24 11:12:43
maintainerStefan Taubert
docs_urlNone
authorStefan Taubert
requires_python<3.13,>=3.8
licenseMIT
keywords language linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dict-from-dict

[![PyPI](https://img.shields.io/pypi/v/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)
[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)
[![MIT](https://img.shields.io/github/license/stefantaubert/pronunciation-dict-creation.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)
[![PyPI](https://img.shields.io/pypi/implementation/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/pronunciation-dict-creation/latest/master.svg)](https://github.com/stefantaubert/pronunciation-dict-creation/compare/v0.0.4...master)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10560441.svg)](https://doi.org/10.5281/zenodo.10560441)

Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.

## Features

- ignore casing of words while lookup
- trimming symbols at start and end of word before lookup
- separate word on hyphen before lookup
  - if the dictionary contains words with hyphens they will be considered first (see example below)
- words with multiple pronunciations are supported
  - weights will be multiplied for hyphenated words (see example below)
- outputting OOV words
- multiprocessing

## Installation

```sh
pip install dict-from-dict --user
```

## Usage

```sh
dict-from-dict-cli
```

### Example

```sh
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF

# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test  0.7  T E0 S T
test  0.3  T E1 S T
def  0.4  D E0 F
def  0.6  D E1 F
xyz  2.0  ?
"xyz?  1.0  ' X Y Z ??
uv  2.0  ?
w  2.0  ?
uv-w  1.0  U V - W
EOF

# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
  /tmp/vocabulary.txt \
  /tmp/dictionary.dict --consider-weights \
  /tmp/result.dict \
  --ignore-case --split-on-hyphen \
  --trim "?" "\"" "," "." \
  --n-jobs 4 \
  --oov-out /tmp/oov.txt

cat /tmp/result.dict
# -------
# Output:
# -------
Test?  0.7  T E0 S T ?
Test?  0.3  T E1 S T ?
"def  0.4  " D E0 F
"def  0.6  " D E1 F
Test-def.  0.27999999999999997  T E0 S T - D E0 F .
Test-def.  0.42  T E0 S T - D E1 F .
Test-def.  0.12  T E1 S T - D E0 F .
Test-def.  0.18  T E1 S T - D E1 F .
"xyz?  1.0  ' X Y Z ??
"uv-w?  1.0  " U V - W ?
# -------

cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------
```

## Development setup

```sh
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv \
  python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dict-creation.git
cd pronunciation-dict-creation
# create virtual environment
python3.8 -m pipenv install --dev
```

## Running the tests

```sh
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dict-creation
# activate environment
python3.8 -m pipenv shell
# run tests
tox
```

Final lines of test result output:

```log
  py38: commands succeeded
  py39: commands succeeded
  py310: commands succeeded
  py311: commands succeeded
  py312: commands succeeded
  congratulations :)
```

## License

MIT License

## Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

## Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).

```txt
Taubert, S. (2024). dict-from-dict (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.10560441
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dict-from-dict",
    "maintainer": "Stefan Taubert",
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": "pypi@stefantaubert.com",
    "keywords": "Language,Linguistics",
    "author": "Stefan Taubert",
    "author_email": "pypi@stefantaubert.com",
    "download_url": "https://files.pythonhosted.org/packages/cb/07/ae2ed3b1c2f0c8e52919200070717a3c8385ca1e1529493553bd694c13c0/dict-from-dict-0.0.4.tar.gz",
    "platform": null,
    "description": "# dict-from-dict\n\n[![PyPI](https://img.shields.io/pypi/v/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)\n[![PyPI](https://img.shields.io/pypi/pyversions/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)\n[![MIT](https://img.shields.io/github/license/stefantaubert/pronunciation-dict-creation.svg)](LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)\n[![PyPI](https://img.shields.io/pypi/implementation/dict-from-dict.svg)](https://pypi.python.org/pypi/dict-from-dict)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/pronunciation-dict-creation/latest/master.svg)](https://github.com/stefantaubert/pronunciation-dict-creation/compare/v0.0.4...master)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10560441.svg)](https://doi.org/10.5281/zenodo.10560441)\n\nCommand-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.\n\n## Features\n\n- ignore casing of words while lookup\n- trimming symbols at start and end of word before lookup\n- separate word on hyphen before lookup\n  - if the dictionary contains words with hyphens they will be considered first (see example below)\n- words with multiple pronunciations are supported\n  - weights will be multiplied for hyphenated words (see example below)\n- outputting OOV words\n- multiprocessing\n\n## Installation\n\n```sh\npip install dict-from-dict --user\n```\n\n## Usage\n\n```sh\ndict-from-dict-cli\n```\n\n### Example\n\n```sh\n# Create example vocabulary\ncat > /tmp/vocabulary.txt << EOF\nTest?\nabc,\n\"def\nTest-def.\n\"xyz?\n\"uv-w?\nEOF\n\n# Create example dictionary\ncat > /tmp/dictionary.dict << EOF\ntest  0.7  T E0 S T\ntest  0.3  T E1 S T\ndef  0.4  D E0 F\ndef  0.6  D E1 F\nxyz  2.0  ?\n\"xyz?  1.0  ' X Y Z ??\nuv  2.0  ?\nw  2.0  ?\nuv-w  1.0  U V - W\nEOF\n\n# Create dictionary from vocabulary and example dictionary\ndict-from-dict-cli \\\n  /tmp/vocabulary.txt \\\n  /tmp/dictionary.dict --consider-weights \\\n  /tmp/result.dict \\\n  --ignore-case --split-on-hyphen \\\n  --trim \"?\" \"\\\"\" \",\" \".\" \\\n  --n-jobs 4 \\\n  --oov-out /tmp/oov.txt\n\ncat /tmp/result.dict\n# -------\n# Output:\n# -------\nTest?  0.7  T E0 S T ?\nTest?  0.3  T E1 S T ?\n\"def  0.4  \" D E0 F\n\"def  0.6  \" D E1 F\nTest-def.  0.27999999999999997  T E0 S T - D E0 F .\nTest-def.  0.42  T E0 S T - D E1 F .\nTest-def.  0.12  T E1 S T - D E0 F .\nTest-def.  0.18  T E1 S T - D E1 F .\n\"xyz?  1.0  ' X Y Z ??\n\"uv-w?  1.0  \" U V - W ?\n# -------\n\ncat /tmp/oov.txt\n# -------\n# Output:\n# -------\n# abc,\n# -------\n```\n\n## Development setup\n\n```sh\n# update\nsudo apt update\n# install Python 3.8-3.12 for ensuring that tests can be run\nsudo apt install python3-pip \\\n  python3.8 python3.8-dev python3.8-distutils python3.8-venv \\\n  python3.9 python3.9-dev python3.9-distutils python3.9-venv \\\n  python3.10 python3.10-dev python3.10-distutils python3.10-venv \\\n  python3.11 python3.11-dev python3.11-distutils python3.11-venv \\\n  python3.12 python3.12-dev python3.12-distutils python3.12-venv\n# install pipenv for creation of virtual environments\npython3.8 -m pip install pipenv --user\n\n# check out repo\ngit clone https://github.com/stefantaubert/pronunciation-dict-creation.git\ncd pronunciation-dict-creation\n# create virtual environment\npython3.8 -m pipenv install --dev\n```\n\n## Running the tests\n\n```sh\n# first install the tool like in \"Development setup\"\n# then, navigate into the directory of the repo (if not already done)\ncd pronunciation-dict-creation\n# activate environment\npython3.8 -m pipenv shell\n# run tests\ntox\n```\n\nFinal lines of test result output:\n\n```log\n  py38: commands succeeded\n  py39: commands succeeded\n  py310: commands succeeded\n  py311: commands succeeded\n  py312: commands succeeded\n  congratulations :)\n```\n\n## License\n\nMIT License\n\n## Acknowledgments\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n```txt\nTaubert, S. (2024). dict-from-dict (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.10560441\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/stefantaubert/pronunciation-dict-creation",
        "Issues": "https://github.com/stefantaubert/pronunciation-dict-creation/issues"
    },
    "split_keywords": [
        "language",
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "76a6094515c8d84e0ddbb56851cf70f8d5935d7520f74fd01e779ee50e453f1e",
                "md5": "86dd2bc4bad7c528b2350a104b16e520",
                "sha256": "6a3bfdf6820b85ea167ede1a6cfe10130e5fe0a168a7f3147e3828898ec77c4c"
            },
            "downloads": -1,
            "filename": "dict_from_dict-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "86dd2bc4bad7c528b2350a104b16e520",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 10656,
            "upload_time": "2024-01-24T11:12:41",
            "upload_time_iso_8601": "2024-01-24T11:12:41.470467Z",
            "url": "https://files.pythonhosted.org/packages/76/a6/094515c8d84e0ddbb56851cf70f8d5935d7520f74fd01e779ee50e453f1e/dict_from_dict-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb07ae2ed3b1c2f0c8e52919200070717a3c8385ca1e1529493553bd694c13c0",
                "md5": "4efcbb797f6ac68eed345dbd44f14954",
                "sha256": "72f1c383d957c71209cf6bf04c4454517c20fe6f5ccf4be37f9596aef167e675"
            },
            "downloads": -1,
            "filename": "dict-from-dict-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "4efcbb797f6ac68eed345dbd44f14954",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 11421,
            "upload_time": "2024-01-24T11:12:43",
            "upload_time_iso_8601": "2024-01-24T11:12:43.333139Z",
            "url": "https://files.pythonhosted.org/packages/cb/07/ae2ed3b1c2f0c8e52919200070717a3c8385ca1e1529493553bd694c13c0/dict-from-dict-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 11:12:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stefantaubert",
    "github_project": "pronunciation-dict-creation",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "dict-from-dict"
}
        
Elapsed time: 0.18468s