phonemizer


Namephonemizer JSON
Version 3.2.1 PyPI version JSON
download
home_pagehttps://github.com/bootphon/phonemizer
SummarySimple text to phones converter for multiple languages
upload_time2022-06-09 19:46:24
maintainer
docs_urlNone
authorMathieu Bernard
requires_python>=3.6
licenseGPL3
keywords linguistics g2p phone festival espeak tts
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |         **Tests** | [![Linux][badge-test-linux]](https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml) [![MacOS][badge-test-macos]](https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml) [![Windows][badge-test-windows]](https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml) [![Codecov][badge-codecov]](https://codecov.io/gh/bootphon/phonemizer) |
|------------------:| --- |
| **Documentation** | [![Doc](https://github.com/bootphon/phonemizer/actions/workflows/doc.yaml/badge.svg)](https://bootphon.github.io/phonemizer/) |
|       **Release** | [![GitHub release (latest SemVer)][badge-github-version]](https://github.com/bootphon/phonemizer/releases/latest) [![PyPI][badge-pypi-version]](https://pypi.python.org/pypi/phonemizer) [![downloads][badge-pypi-downloads]](https://pypi.python.org/pypi/phonemizer) |
|      **Citation** | [![status][badge-joss]](https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e) [![DOI][badge-zenodo]](https://doi.org/10.5281/zenodo.1045825) |

---

# Phonemizer -- *foʊnmaɪzɚ*

* The phonemizer allows simple phonemization of words and texts in many languages.

* Provides both the `phonemize` command-line tool and the Python function
  `phonemizer.phonemize`. See [the package's documentation](https://bootphon.github.io/phonemizer/).

* It is based on four backends: **espeak**, **espeak-mbrola**, **festival** and
  **segments**. The backends have different properties and capabilities resumed
  in table below. The backend choice is let to the user.

  * [espeak-ng](https://github.com/espeak-ng/espeak-ng) is a Text-to-Speech
    software supporting a lot of languages and IPA (International Phonetic
    Alphabet) output.

  * [espeak-ng-mbrola](https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md)
    uses the SAMPA phonetic alphabet instead of IPA but does not preserve word
    boundaries.

  * [festival](http://www.cstr.ed.ac.uk/projects/festival) is another
    Tex-to-Speech engine. Its phonemizer backend currently supports only
    American English. It uses a [custom phoneset][festival-phoneset], but it
    allows tokenization at the syllable level.

  * [segments](https://github.com/cldf/segments) is a Unicode tokenizer that
    build a phonemization from a grapheme to phoneme mapping provided as a file
    by the user.

  |                              | espeak                   | espeak-mbrola           | festival                    | segments           |
  | ---:                         | ---                      | ---                     | ---                         | ---                |
  | **phone set**                | [IPA]                    | [SAMPA]                 | [custom][festival-phoneset] | user defined       |
  | **supported languages**      | [100+][espeak-languages] | [35][mbrola-languages] | US English                  | user defined       |
  | **processing speed**         | fast                     | slow                    | very slow                   | fast               |
  | **phone tokens**             | :heavy_check_mark:       | :heavy_check_mark:      | :heavy_check_mark:          | :heavy_check_mark: |
  | **syllable tokens**          | :x:                      | :x:                     | :heavy_check_mark:          | :x:                |
  | **word tokens**              | :heavy_check_mark:       | :x:                     | :heavy_check_mark:          | :heavy_check_mark: |
  | **punctuation preservation** | :heavy_check_mark:       | :x:                     | :heavy_check_mark:          | :heavy_check_mark: |
  | **stressed phones**          | :heavy_check_mark:       | :x:                     | :x:                         | :x:                |
  | [**tie**][tie-IPA]           | :heavy_check_mark:       | :x:                     | :x:                         | :x:                |



## Citation

To refenrece the `phonemizer` in your own work, please cite the following [JOSS
paper](https://joss.theoj.org/papers/10.21105/joss.03958).

```bibtex
@article{Bernard2021,
  doi = {10.21105/joss.03958},
  url = {https://doi.org/10.21105/joss.03958},
  year = {2021},
  publisher = {The Open Journal},
  volume = {6},
  number = {68},
  pages = {3958},
  author = {Mathieu Bernard and Hadrien Titeux},
  title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
  journal = {Journal of Open Source Software}
}
```


## Licence

**Copyright 2015-2021 Mathieu Bernard**

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.


[badge-test-linux]: https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml/badge.svg?branch=master
[badge-test-macos]: https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml/badge.svg?branch=master
[badge-test-windows]: https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml/badge.svg?branch=master
[badge-codecov]: https://img.shields.io/codecov/c/github/bootphon/phonemizer
[badge-github-version]: https://img.shields.io/github/v/release/bootphon/phonemizer
[badge-pypi-version]: https://img.shields.io/pypi/v/phonemizer
[badge-pypi-downloads]: https://img.shields.io/pypi/dm/phonemizer
[badge-joss]: https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e/status.svg
[badge-zenodo]: https://zenodo.org/badge/56728069.svg
[phonemizer-1.0]: https://github.com/bootphon/phonemizer/releases/tag/v1.0
[festival-phoneset]: http://www.festvox.org/bsv/c4711.html
[IPA]: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
[SAMPA]: https://en.wikipedia.org/wiki/SAMPA
[phonemize-function]: https://github.com/bootphon/phonemizer/blob/c5e2f3878d6db391ec7253173f44e4a85cfe41e3/phonemizer/phonemize.py#L33-L156
[tie-IPA]: https://en.wikipedia.org/wiki/Tie_(typography)#International_Phonetic_Alphabet
[espeak-languages]: https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md
[mbrola-languages]: https://github.com/numediart/MBROLA-voices



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bootphon/phonemizer",
    "name": "phonemizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "linguistics G2P phone festival espeak TTS",
    "author": "Mathieu Bernard",
    "author_email": "mathieu.a.bernard@inria.fr",
    "download_url": "https://files.pythonhosted.org/packages/2f/23/090873aa616dfaf9209dc319841204e0e85b3ff6ca6032551216e3dfe153/phonemizer-3.2.1.tar.gz",
    "platform": null,
    "description": "|         **Tests** | [![Linux][badge-test-linux]](https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml) [![MacOS][badge-test-macos]](https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml) [![Windows][badge-test-windows]](https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml) [![Codecov][badge-codecov]](https://codecov.io/gh/bootphon/phonemizer) |\n|------------------:| --- |\n| **Documentation** | [![Doc](https://github.com/bootphon/phonemizer/actions/workflows/doc.yaml/badge.svg)](https://bootphon.github.io/phonemizer/) |\n|       **Release** | [![GitHub release (latest SemVer)][badge-github-version]](https://github.com/bootphon/phonemizer/releases/latest) [![PyPI][badge-pypi-version]](https://pypi.python.org/pypi/phonemizer) [![downloads][badge-pypi-downloads]](https://pypi.python.org/pypi/phonemizer) |\n|      **Citation** | [![status][badge-joss]](https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e) [![DOI][badge-zenodo]](https://doi.org/10.5281/zenodo.1045825) |\n\n---\n\n# Phonemizer -- *fo\u028anma\u026az\u025a*\n\n* The phonemizer allows simple phonemization of words and texts in many languages.\n\n* Provides both the `phonemize` command-line tool and the Python function\n  `phonemizer.phonemize`. See [the package's documentation](https://bootphon.github.io/phonemizer/).\n\n* It is based on four backends: **espeak**, **espeak-mbrola**, **festival** and\n  **segments**. The backends have different properties and capabilities resumed\n  in table below. The backend choice is let to the user.\n\n  * [espeak-ng](https://github.com/espeak-ng/espeak-ng) is a Text-to-Speech\n    software supporting a lot of languages and IPA (International Phonetic\n    Alphabet) output.\n\n  * [espeak-ng-mbrola](https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md)\n    uses the SAMPA phonetic alphabet instead of IPA but does not preserve word\n    boundaries.\n\n  * [festival](http://www.cstr.ed.ac.uk/projects/festival) is another\n    Tex-to-Speech engine. Its phonemizer backend currently supports only\n    American English. It uses a [custom phoneset][festival-phoneset], but it\n    allows tokenization at the syllable level.\n\n  * [segments](https://github.com/cldf/segments) is a Unicode tokenizer that\n    build a phonemization from a grapheme to phoneme mapping provided as a file\n    by the user.\n\n  |                              | espeak                   | espeak-mbrola           | festival                    | segments           |\n  | ---:                         | ---                      | ---                     | ---                         | ---                |\n  | **phone set**                | [IPA]                    | [SAMPA]                 | [custom][festival-phoneset] | user defined       |\n  | **supported languages**      | [100+][espeak-languages] | [35][mbrola-languages] | US English                  | user defined       |\n  | **processing speed**         | fast                     | slow                    | very slow                   | fast               |\n  | **phone tokens**             | :heavy_check_mark:       | :heavy_check_mark:      | :heavy_check_mark:          | :heavy_check_mark: |\n  | **syllable tokens**          | :x:                      | :x:                     | :heavy_check_mark:          | :x:                |\n  | **word tokens**              | :heavy_check_mark:       | :x:                     | :heavy_check_mark:          | :heavy_check_mark: |\n  | **punctuation preservation** | :heavy_check_mark:       | :x:                     | :heavy_check_mark:          | :heavy_check_mark: |\n  | **stressed phones**          | :heavy_check_mark:       | :x:                     | :x:                         | :x:                |\n  | [**tie**][tie-IPA]           | :heavy_check_mark:       | :x:                     | :x:                         | :x:                |\n\n\n\n## Citation\n\nTo refenrece the `phonemizer` in your own work, please cite the following [JOSS\npaper](https://joss.theoj.org/papers/10.21105/joss.03958).\n\n```bibtex\n@article{Bernard2021,\n  doi = {10.21105/joss.03958},\n  url = {https://doi.org/10.21105/joss.03958},\n  year = {2021},\n  publisher = {The Open Journal},\n  volume = {6},\n  number = {68},\n  pages = {3958},\n  author = {Mathieu Bernard and Hadrien Titeux},\n  title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},\n  journal = {Journal of Open Source Software}\n}\n```\n\n\n## Licence\n\n**Copyright 2015-2021 Mathieu Bernard**\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program. If not, see <http://www.gnu.org/licenses/>.\n\n\n[badge-test-linux]: https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml/badge.svg?branch=master\n[badge-test-macos]: https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml/badge.svg?branch=master\n[badge-test-windows]: https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml/badge.svg?branch=master\n[badge-codecov]: https://img.shields.io/codecov/c/github/bootphon/phonemizer\n[badge-github-version]: https://img.shields.io/github/v/release/bootphon/phonemizer\n[badge-pypi-version]: https://img.shields.io/pypi/v/phonemizer\n[badge-pypi-downloads]: https://img.shields.io/pypi/dm/phonemizer\n[badge-joss]: https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e/status.svg\n[badge-zenodo]: https://zenodo.org/badge/56728069.svg\n[phonemizer-1.0]: https://github.com/bootphon/phonemizer/releases/tag/v1.0\n[festival-phoneset]: http://www.festvox.org/bsv/c4711.html\n[IPA]: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet\n[SAMPA]: https://en.wikipedia.org/wiki/SAMPA\n[phonemize-function]: https://github.com/bootphon/phonemizer/blob/c5e2f3878d6db391ec7253173f44e4a85cfe41e3/phonemizer/phonemize.py#L33-L156\n[tie-IPA]: https://en.wikipedia.org/wiki/Tie_(typography)#International_Phonetic_Alphabet\n[espeak-languages]: https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md\n[mbrola-languages]: https://github.com/numediart/MBROLA-voices\n\n\n",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": "Simple text to phones converter for multiple languages",
    "version": "3.2.1",
    "split_keywords": [
        "linguistics",
        "g2p",
        "phone",
        "festival",
        "espeak",
        "tts"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e6a7d44afafa844a26df8bc95d224b2d",
                "sha256": "7c82509cccced11810985f3253d0e87b73377d0f681b29333d2b214ca302fc56"
            },
            "downloads": -1,
            "filename": "phonemizer-3.2.1-py3.10.egg",
            "has_sig": false,
            "md5_digest": "e6a7d44afafa844a26df8bc95d224b2d",
            "packagetype": "bdist_egg",
            "python_version": "3.2.1",
            "requires_python": ">=3.6",
            "size": 160051,
            "upload_time": "2022-06-09T19:46:22",
            "upload_time_iso_8601": "2022-06-09T19:46:22.679560Z",
            "url": "https://files.pythonhosted.org/packages/ae/37/c6905229d6264d908418f549df9d11fe1f954cf868dfe6ae352931f5f4f2/phonemizer-3.2.1-py3.10.egg",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "62f002603d334095f6e6fd7dc1c551dc",
                "sha256": "65eef55cd180c1f0be245f68de12bd6014a102ecce0ed8af2584fda853c75ac7"
            },
            "downloads": -1,
            "filename": "phonemizer-3.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "62f002603d334095f6e6fd7dc1c551dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 90621,
            "upload_time": "2022-06-09T19:46:20",
            "upload_time_iso_8601": "2022-06-09T19:46:20.736813Z",
            "url": "https://files.pythonhosted.org/packages/cb/5a/b699d5c74959c69728b44692cbacaf1035838ba5dc6aee9b8e80e60637f3/phonemizer-3.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "e7686f26b9a41282370f45649936d56f",
                "sha256": "068f85f85a8a9adc638a3787aeacaf71a53e47578b12d773c097433500cd892b"
            },
            "downloads": -1,
            "filename": "phonemizer-3.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e7686f26b9a41282370f45649936d56f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 63146,
            "upload_time": "2022-06-09T19:46:24",
            "upload_time_iso_8601": "2022-06-09T19:46:24.093134Z",
            "url": "https://files.pythonhosted.org/packages/2f/23/090873aa616dfaf9209dc319841204e0e85b3ff6ca6032551216e3dfe153/phonemizer-3.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-06-09 19:46:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "bootphon",
    "github_project": "phonemizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "phonemizer"
}
        
Elapsed time: 0.08209s