| **Tests** | [![Linux][badge-test-linux]](https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml) [![MacOS][badge-test-macos]](https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml) [![Windows][badge-test-windows]](https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml) [![Codecov][badge-codecov]](https://codecov.io/gh/bootphon/phonemizer) |
|------------------:| --- |
| **Documentation** | [![Doc](https://github.com/bootphon/phonemizer/actions/workflows/doc.yaml/badge.svg)](https://bootphon.github.io/phonemizer/) |
| **Release** | [![GitHub release (latest SemVer)][badge-github-version]](https://github.com/bootphon/phonemizer/releases/latest) [![PyPI][badge-pypi-version]](https://pypi.python.org/pypi/phonemizer) [![downloads][badge-pypi-downloads]](https://pypi.python.org/pypi/phonemizer) |
| **Citation** | [![status][badge-joss]](https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e) [![DOI][badge-zenodo]](https://doi.org/10.5281/zenodo.1045825) |
---
# Phonemizer -- *foʊnmaɪzɚ*
* The phonemizer allows simple phonemization of words and texts in many languages.
* Provides both the `phonemize` command-line tool and the Python function
`phonemizer.phonemize`. See [the package's documentation](https://bootphon.github.io/phonemizer/).
* It is based on four backends: **espeak**, **espeak-mbrola**, **festival** and
**segments**. The backends have different properties and capabilities resumed
in table below. The backend choice is let to the user.
* [espeak-ng](https://github.com/espeak-ng/espeak-ng) is a Text-to-Speech
software supporting a lot of languages and IPA (International Phonetic
Alphabet) output.
* [espeak-ng-mbrola](https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md)
uses the SAMPA phonetic alphabet instead of IPA but does not preserve word
boundaries.
* [festival](http://www.cstr.ed.ac.uk/projects/festival) is another
Tex-to-Speech engine. Its phonemizer backend currently supports only
American English. It uses a [custom phoneset][festival-phoneset], but it
allows tokenization at the syllable level.
* [segments](https://github.com/cldf/segments) is a Unicode tokenizer that
build a phonemization from a grapheme to phoneme mapping provided as a file
by the user.
| | espeak | espeak-mbrola | festival | segments |
| ---: | --- | --- | --- | --- |
| **phone set** | [IPA] | [SAMPA] | [custom][festival-phoneset] | user defined |
| **supported languages** | [100+][espeak-languages] | [35][mbrola-languages] | US English | user defined |
| **processing speed** | fast | slow | very slow | fast |
| **phone tokens** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| **syllable tokens** | :x: | :x: | :heavy_check_mark: | :x: |
| **word tokens** | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |
| **punctuation preservation** | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |
| **stressed phones** | :heavy_check_mark: | :x: | :x: | :x: |
| [**tie**][tie-IPA] | :heavy_check_mark: | :x: | :x: | :x: |
## Citation
To refenrece the `phonemizer` in your own work, please cite the following [JOSS
paper](https://joss.theoj.org/papers/10.21105/joss.03958).
```bibtex
@article{Bernard2021,
doi = {10.21105/joss.03958},
url = {https://doi.org/10.21105/joss.03958},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {68},
pages = {3958},
author = {Mathieu Bernard and Hadrien Titeux},
title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
journal = {Journal of Open Source Software}
}
```
## Licence
**Copyright 2015-2021 Mathieu Bernard**
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
[badge-test-linux]: https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml/badge.svg?branch=master
[badge-test-macos]: https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml/badge.svg?branch=master
[badge-test-windows]: https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml/badge.svg?branch=master
[badge-codecov]: https://img.shields.io/codecov/c/github/bootphon/phonemizer
[badge-github-version]: https://img.shields.io/github/v/release/bootphon/phonemizer
[badge-pypi-version]: https://img.shields.io/pypi/v/phonemizer
[badge-pypi-downloads]: https://img.shields.io/pypi/dm/phonemizer
[badge-joss]: https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e/status.svg
[badge-zenodo]: https://zenodo.org/badge/56728069.svg
[phonemizer-1.0]: https://github.com/bootphon/phonemizer/releases/tag/v1.0
[festival-phoneset]: http://www.festvox.org/bsv/c4711.html
[IPA]: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
[SAMPA]: https://en.wikipedia.org/wiki/SAMPA
[phonemize-function]: https://github.com/bootphon/phonemizer/blob/c5e2f3878d6db391ec7253173f44e4a85cfe41e3/phonemizer/phonemize.py#L33-L156
[tie-IPA]: https://en.wikipedia.org/wiki/Tie_(typography)#International_Phonetic_Alphabet
[espeak-languages]: https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md
[mbrola-languages]: https://github.com/numediart/MBROLA-voices
Raw data
{
"_id": null,
"home_page": null,
"name": "pozalabs-phonemizer",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "linguistics, G2P, phone, festival, espeak, TTS",
"author": "pozalabs",
"author_email": "contact@pozalabs.com",
"download_url": "https://files.pythonhosted.org/packages/37/e2/a42549ce1d17d0bc21f32e0507f165025c3f2bd71a5b6b004f1ee672c1fc/pozalabs_phonemizer-3.3.0.tar.gz",
"platform": null,
"description": "| **Tests** | [![Linux][badge-test-linux]](https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml) [![MacOS][badge-test-macos]](https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml) [![Windows][badge-test-windows]](https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml) [![Codecov][badge-codecov]](https://codecov.io/gh/bootphon/phonemizer) |\n|------------------:| --- |\n| **Documentation** | [![Doc](https://github.com/bootphon/phonemizer/actions/workflows/doc.yaml/badge.svg)](https://bootphon.github.io/phonemizer/) |\n| **Release** | [![GitHub release (latest SemVer)][badge-github-version]](https://github.com/bootphon/phonemizer/releases/latest) [![PyPI][badge-pypi-version]](https://pypi.python.org/pypi/phonemizer) [![downloads][badge-pypi-downloads]](https://pypi.python.org/pypi/phonemizer) |\n| **Citation** | [![status][badge-joss]](https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e) [![DOI][badge-zenodo]](https://doi.org/10.5281/zenodo.1045825) |\n\n---\n\n# Phonemizer -- *fo\u028anma\u026az\u025a*\n\n* The phonemizer allows simple phonemization of words and texts in many languages.\n\n* Provides both the `phonemize` command-line tool and the Python function\n `phonemizer.phonemize`. See [the package's documentation](https://bootphon.github.io/phonemizer/).\n\n* It is based on four backends: **espeak**, **espeak-mbrola**, **festival** and\n **segments**. The backends have different properties and capabilities resumed\n in table below. The backend choice is let to the user.\n\n * [espeak-ng](https://github.com/espeak-ng/espeak-ng) is a Text-to-Speech\n software supporting a lot of languages and IPA (International Phonetic\n Alphabet) output.\n\n * [espeak-ng-mbrola](https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md)\n uses the SAMPA phonetic alphabet instead of IPA but does not preserve word\n boundaries.\n\n * [festival](http://www.cstr.ed.ac.uk/projects/festival) is another\n Tex-to-Speech engine. Its phonemizer backend currently supports only\n American English. It uses a [custom phoneset][festival-phoneset], but it\n allows tokenization at the syllable level.\n\n * [segments](https://github.com/cldf/segments) is a Unicode tokenizer that\n build a phonemization from a grapheme to phoneme mapping provided as a file\n by the user.\n\n | | espeak | espeak-mbrola | festival | segments |\n | ---: | --- | --- | --- | --- |\n | **phone set** | [IPA] | [SAMPA] | [custom][festival-phoneset] | user defined |\n | **supported languages** | [100+][espeak-languages] | [35][mbrola-languages] | US English | user defined |\n | **processing speed** | fast | slow | very slow | fast |\n | **phone tokens** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n | **syllable tokens** | :x: | :x: | :heavy_check_mark: | :x: |\n | **word tokens** | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |\n | **punctuation preservation** | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |\n | **stressed phones** | :heavy_check_mark: | :x: | :x: | :x: |\n | [**tie**][tie-IPA] | :heavy_check_mark: | :x: | :x: | :x: |\n\n\n\n## Citation\n\nTo refenrece the `phonemizer` in your own work, please cite the following [JOSS\npaper](https://joss.theoj.org/papers/10.21105/joss.03958).\n\n```bibtex\n@article{Bernard2021,\n doi = {10.21105/joss.03958},\n url = {https://doi.org/10.21105/joss.03958},\n year = {2021},\n publisher = {The Open Journal},\n volume = {6},\n number = {68},\n pages = {3958},\n author = {Mathieu Bernard and Hadrien Titeux},\n title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},\n journal = {Journal of Open Source Software}\n}\n```\n\n\n## Licence\n\n**Copyright 2015-2021 Mathieu Bernard**\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program. If not, see <http://www.gnu.org/licenses/>.\n\n\n[badge-test-linux]: https://github.com/bootphon/phonemizer/actions/workflows/linux.yaml/badge.svg?branch=master\n[badge-test-macos]: https://github.com/bootphon/phonemizer/actions/workflows/macos.yaml/badge.svg?branch=master\n[badge-test-windows]: https://github.com/bootphon/phonemizer/actions/workflows/windows.yaml/badge.svg?branch=master\n[badge-codecov]: https://img.shields.io/codecov/c/github/bootphon/phonemizer\n[badge-github-version]: https://img.shields.io/github/v/release/bootphon/phonemizer\n[badge-pypi-version]: https://img.shields.io/pypi/v/phonemizer\n[badge-pypi-downloads]: https://img.shields.io/pypi/dm/phonemizer\n[badge-joss]: https://joss.theoj.org/papers/08d1ffc14f233f56942f78f3742b266e/status.svg\n[badge-zenodo]: https://zenodo.org/badge/56728069.svg\n[phonemizer-1.0]: https://github.com/bootphon/phonemizer/releases/tag/v1.0\n[festival-phoneset]: http://www.festvox.org/bsv/c4711.html\n[IPA]: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet\n[SAMPA]: https://en.wikipedia.org/wiki/SAMPA\n[phonemize-function]: https://github.com/bootphon/phonemizer/blob/c5e2f3878d6db391ec7253173f44e4a85cfe41e3/phonemizer/phonemize.py#L33-L156\n[tie-IPA]: https://en.wikipedia.org/wiki/Tie_(typography)#International_Phonetic_Alphabet\n[espeak-languages]: https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md\n[mbrola-languages]: https://github.com/numediart/MBROLA-voices\n",
"bugtrack_url": null,
"license": "GPL3",
"summary": "Simple text to phones converter for multiple languages",
"version": "3.3.0",
"project_urls": null,
"split_keywords": [
"linguistics",
" g2p",
" phone",
" festival",
" espeak",
" tts"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3d58e0824e7f7c13d98e355baa87bbc60bb9e31115cfa713558c5f864ce0ef61",
"md5": "d08976fa8514e7d24fff64d7f3b56683",
"sha256": "f2d4d68ddd7805a9b20986ee5446cfde49a2e8cfe2a9e4685b132652a07a0c99"
},
"downloads": -1,
"filename": "pozalabs_phonemizer-3.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d08976fa8514e7d24fff64d7f3b56683",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 62894,
"upload_time": "2024-07-09T08:06:08",
"upload_time_iso_8601": "2024-07-09T08:06:08.194566Z",
"url": "https://files.pythonhosted.org/packages/3d/58/e0824e7f7c13d98e355baa87bbc60bb9e31115cfa713558c5f864ce0ef61/pozalabs_phonemizer-3.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "37e2a42549ce1d17d0bc21f32e0507f165025c3f2bd71a5b6b004f1ee672c1fc",
"md5": "d2b5748e41d1e30e1cbed817e7039038",
"sha256": "1d567ab945c31675030526e8a462984bb0fb5528ef3e52a516230677388cd220"
},
"downloads": -1,
"filename": "pozalabs_phonemizer-3.3.0.tar.gz",
"has_sig": false,
"md5_digest": "d2b5748e41d1e30e1cbed817e7039038",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 45226,
"upload_time": "2024-07-09T08:06:10",
"upload_time_iso_8601": "2024-07-09T08:06:10.265471Z",
"url": "https://files.pythonhosted.org/packages/37/e2/a42549ce1d17d0bc21f32e0507f165025c3f2bd71a5b6b004f1ee672c1fc/pozalabs_phonemizer-3.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-09 08:06:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pozalabs-phonemizer"
}