Name | promo JSON |
Version |
2.0.0
JSON |
| download |
home_page | https://github.com/timmahrt/ProMo |
Summary | Library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech |
upload_time | 2023-07-15 11:47:36 |
maintainer | |
docs_url | None |
author | Tim Mahrt |
requires_python | >3.6.0 |
license | LICENSE |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
-----------------------
ProMo (Prosody Morph)
-----------------------
.. image:: https://travis-ci.org/timmahrt/ProMo.svg?branch=main
:target: https://travis-ci.org/timmahrt/ProMo
.. image:: https://coveralls.io/repos/github/timmahrt/ProMo/badge.svg?branch=main
:target: https://coveralls.io/github/timmahrt/ProMo?branch=main
.. image:: https://img.shields.io/badge/license-MIT-blue.svg?
:target: http://opensource.org/licenses/MIT
*Questions? Comments? Feedback? Chat with us on gitter!*
.. image:: https://badges.gitter.im/pythonProMo/Lobby.svg?
:alt: Join the chat at https://gitter.im/pythonProMo/Lobby
:target: https://gitter.im/pythonProMo/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
-----
A library for manipulating pitch and duration in an algorithmic way, for
resynthesizing speech.
This library can be used to resynthesize pitch in natural speech using pitch
contours taken from other speech samples, generated pitch contours,
or through algorithmic manipulations of the source pitch contour.
.. sectnum::
.. contents::
Common Use Cases
================
What can you do with this library?
Apply the pitch or duration from one speech sample to another.
- alignment happens both in time and in hertz
- after the morph process, the source pitch points will be at the same
absolute pitch and relative time as they are in the target file
- time is relative to the start and stop time of the interval being
considered (e.g. the pitch value at 80% of the duration of the interval).
Relative time is used so that the source and target files don't have to
be the same length.
- temporal morphing is a minor effect if the sampling frequency is high
but it can be significant when, for example, using a stylized pitch
contour with few pitch samples.
- modifications can be done between entire wav files or between
corresponding intervals as specified in a textgrid or other annotation
(indicating the boundaries of words, stressed vowels, etc.)
- the larger the file, the less useful the results are likely to be
without using a transcript of some sort
- the transcripts do not have to match in lexical content, only in the
number of intervals (same number of words or phones, etc.)
- modifications can be scaled (it is possible to generate a wav file with
a pitch contour that is 30% or 60% between the source and target contours).
- can also morph the pitch range and average pitch independently.
- resynthesis is performed by Praat.
- pitch can be obtained from praat (such as by using praatio)
or from other sources (e.g. ESPS getF0)
- plots of the resynthesis (such as the ones below) can be generated
Illustrative example
======================
Consider the phrase "Mary rolled the barrel". In the first recording
(examples/mary1.wav), "Mary rolled the barrel" was said in response
to a question such as "Did John roll the barrel?". On the other hand,
in the second recording (examples/mary2.wav) the utterance was said
in response to a question such as "What happened yesterday".
"Mary" in "mary1.wav" is produced with more emphasis than in "mary2.wav".
It is longer and carries a more drammatic pitch excursion. Using
ProMo, we can make mary1.wav spoken similar to mary2.wav, even
though they were spoken in a different way and by different speakers.
Duration and pitch carry meaning. Change these, and you can change the
meaning being conveyed.
``Note that modifying pitch and duration too much can introduce artifacts.
Such artifacts can be heard even in pitch morphing mary1.wav to mary2.wav.``
Pitch morphing (examples/pitch_morph_example.py):
The following image shows morphing of pitch from mary1.wav to mary2.wav
on a word-by-word level
in increments of 33% (33%, 66%, 100%). Note that the morph adjusts the
temporal dimension of the target signal to fit the duration of the source
signal (the source and generated contours are equally shorter
than the target contour). This occurs at the level of the file unless
the user specifies an equal number of segments to align in time
(e.g. using word-level transcriptions, as done here, or phone-level
transcriptions, etc.)
.. image:: examples/files/mary1_mary2_f0_morph.png
:width: 500px
With the ability to morph pitch range and average pitch, it becomes easier
to morph contours produced by different speakers:
The following image shows four different pitch manipulations. On the
**upper left** is the raw morph. Notice that final output (black line) is
very close to the target. Differences stem from duration differences.
However, the average pitch and pitch range are qualities of speech that
can signify differences in gender in addition to other aspects of
speaker identity. By resetting the average pitch and pitch range to
that of the source, it is possible to morph the contour while maintaining
aspects of the source speaker's identity.
The image in the **upper right** contains a morph
followed by a reset of the average pitch to the source speaker's average
pitch. In the **bottom right** a morph followed by a reset of the speaker's
pitch range. In the **bottom right** pitch range was reset and then the
speaker's average pitch was reset.
The longer the speech sample, the more representative the pitch range and
mean pitch will be of the speaker. In this example both are skewed higher
by the pitch accent on the first word.
Here the average pitch of the source (a female speaker) is much higher
than the target (a male speaker) and the resulting morph sounds like it
comes from a different speaker than the source or target speakers.
The three recordings that involve resetting pitch range and/or average
pitch sound much more natural.
.. image:: examples/files/mary1_mary2_f0_morph_compare.png
:width: 500px
Duration morphing (examples/duration_manipulation_example.py):
The following image shows morphing of duration from mary1.wav to mary2.wav
on a word-by-word basis in increments of 33% (33%, 66%, 100%).
This process can operate over an entire file or, similar to pitch morphing,
with annotated segments, as done in this example.
.. image:: examples/files/mary1_mary2_dur_morph.png
:width: 500px
Tutorials
================
Tutorials for learning about prosody manipulation and how to use ProMo are available.
`Tutorial 1.1: Intro to ProMo <https://nbviewer.jupyter.org/github/timmahrt/ProMo/blob/main/tutorials/tutorial1_1_intro_to_promo.ipynb>`_
`Tutorial 1.2: Pitch manipulation tutorial <https://nbviewer.jupyter.org/github/timmahrt/ProMo/blob/main/tutorials/tutorial1_2_pitch_manipulations.ipynb>`_
Version History
================
*ProMo uses semantic versioning (Major.Minor.Patch)*
Please view `CHANGELOG.md <https://github.com/timmahrt/promo/blob/main/CHANGELOG.md>` for version history.
Requirements
==============
``Python 3.3.*`` or above (or below, probably)
My praatIO library is required and can be downloaded
`here <https://github.com/timmahrt/praatIO>`_
Matplotlib is required if you want to plot graphs.
`Matplotlib website <http://matplotlib.org/>`_
Scipy is required if you want to use interpolation--typically if you have stylized
pitch contours (in praat PitchTier format, for example) that you want to use in
your morphing).
`Scipy website <http://scipy.org/>`_
Matplotlib and SciPy are non-trivial to install, as they depends on several large
packages. You can
visit their websites for more information. **I recommend the following instructions to
install matplotlib** which uses *python wheels*. These will install all required
libraries in one fell swoop.
On Mac, open a terminal and type:
python -m pip install matplotlib
python -m pip install scipy
On Windows, open a cmd or powershell window and type:
<<path to python>> -m pip install matplotlib
<<path to python>> -m pip install scipy
e.g. C:\\python27\\python.exe -m install matplotlib
Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::
python setup.py install
If python is not in your path, you'll need to enter the full path e.g.::
C:\Python27\python.exe setup.py install
If you are using ``Python 2.x`` or ``Python < 3.7``, you can use `Promo 1.x`.
Usage
=========
See /examples for example usages
Installation
================
If you on Windows, you can use the installer found here (check that it is up to date though)
`Windows installer <http://www.timmahrt.com/python_installers>`_
Promo is on pypi and can be installed or upgraded from the command-line shell with pip like so::
python -m pip install promo --upgrade
Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::
python setup.py install
If python is not in your path, you'll need to enter the full path e.g.::
C:\Python36\python.exe setup.py install
Citing ProMo
===============
If you use ProMo in your research, please cite it like so:
Tim Mahrt. ProMo: The Prosody-Morphing Library.
https://github.com/timmahrt/ProMo, 2016.
Acknowledgements
================
Development of ProMo was possible thanks to NSF grant **BCS 12-51343** to
Jennifer Cole, José I. Hualde, and Caroline Smith and to the A*MIDEX project
(n° **ANR-11-IDEX-0001-02**) to James Sneed German funded by the
Investissements d'Avenir French Government program,
managed by the French National Research Agency (ANR).
Raw data
{
"_id": null,
"home_page": "https://github.com/timmahrt/ProMo",
"name": "promo",
"maintainer": "",
"docs_url": null,
"requires_python": ">3.6.0",
"maintainer_email": "",
"keywords": "",
"author": "Tim Mahrt",
"author_email": "timmahrt@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d8/09/0b36dd812d7f1496ce560503049f18400564078e08db7f2d50de2219217f/promo-2.0.0.tar.gz",
"platform": null,
"description": "\n-----------------------\nProMo (Prosody Morph)\n-----------------------\n\n.. image:: https://travis-ci.org/timmahrt/ProMo.svg?branch=main\n :target: https://travis-ci.org/timmahrt/ProMo\n\n.. image:: https://coveralls.io/repos/github/timmahrt/ProMo/badge.svg?branch=main\n :target: https://coveralls.io/github/timmahrt/ProMo?branch=main\n \n.. image:: https://img.shields.io/badge/license-MIT-blue.svg?\n :target: http://opensource.org/licenses/MIT\n \n*Questions? Comments? Feedback? Chat with us on gitter!*\n\n.. image:: https://badges.gitter.im/pythonProMo/Lobby.svg?\n :alt: Join the chat at https://gitter.im/pythonProMo/Lobby\n :target: https://gitter.im/pythonProMo/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge\n\n-----\n\nA library for manipulating pitch and duration in an algorithmic way, for\nresynthesizing speech.\n\nThis library can be used to resynthesize pitch in natural speech using pitch\ncontours taken from other speech samples, generated pitch contours,\nor through algorithmic manipulations of the source pitch contour.\n\n.. sectnum::\n.. contents::\n\nCommon Use Cases\n================\n\nWhat can you do with this library?\n\nApply the pitch or duration from one speech sample to another.\n\n- alignment happens both in time and in hertz\n\n - after the morph process, the source pitch points will be at the same\n absolute pitch and relative time as they are in the target file \n \n - time is relative to the start and stop time of the interval being\n considered (e.g. the pitch value at 80% of the duration of the interval).\n Relative time is used so that the source and target files don't have to\n be the same length.\n\n - temporal morphing is a minor effect if the sampling frequency is high\n but it can be significant when, for example, using a stylized pitch\n contour with few pitch samples.\n\n- modifications can be done between entire wav files or between\n corresponding intervals as specified in a textgrid or other annotation\n (indicating the boundaries of words, stressed vowels, etc.)\n\n - the larger the file, the less useful the results are likely to be\n without using a transcript of some sort\n \n - the transcripts do not have to match in lexical content, only in the\n number of intervals (same number of words or phones, etc.)\n\n- modifications can be scaled (it is possible to generate a wav file with\n a pitch contour that is 30% or 60% between the source and target contours).\n\n- can also morph the pitch range and average pitch independently.\n \n- resynthesis is performed by Praat.\n\n- pitch can be obtained from praat (such as by using praatio)\n or from other sources (e.g. ESPS getF0)\n\n- plots of the resynthesis (such as the ones below) can be generated\n\n\nIllustrative example\n======================\n\nConsider the phrase \"Mary rolled the barrel\". In the first recording\n(examples/mary1.wav), \"Mary rolled the barrel\" was said in response\nto a question such as \"Did John roll the barrel?\". On the other hand,\nin the second recording (examples/mary2.wav) the utterance was said \nin response to a question such as \"What happened yesterday\".\n\n\"Mary\" in \"mary1.wav\" is produced with more emphasis than in \"mary2.wav\".\nIt is longer and carries a more drammatic pitch excursion. Using \nProMo, we can make mary1.wav spoken similar to mary2.wav, even\nthough they were spoken in a different way and by different speakers.\n\nDuration and pitch carry meaning. Change these, and you can change the\nmeaning being conveyed.\n\n``Note that modifying pitch and duration too much can introduce artifacts. \nSuch artifacts can be heard even in pitch morphing mary1.wav to mary2.wav.``\n\nPitch morphing (examples/pitch_morph_example.py):\n\n The following image shows morphing of pitch from mary1.wav to mary2.wav\n on a word-by-word level\n in increments of 33% (33%, 66%, 100%). Note that the morph adjusts the\n temporal dimension of the target signal to fit the duration of the source\n signal (the source and generated contours are equally shorter \n than the target contour). This occurs at the level of the file unless\n the user specifies an equal number of segments to align in time\n (e.g. using word-level transcriptions, as done here, or phone-level\n transcriptions, etc.)\n\n.. image:: examples/files/mary1_mary2_f0_morph.png\n :width: 500px\n\nWith the ability to morph pitch range and average pitch, it becomes easier\nto morph contours produced by different speakers:\n\n The following image shows four different pitch manipulations. On the \n **upper left** is the raw morph. Notice that final output (black line) is\n very close to the target. Differences stem from duration differences.\n \n However, the average pitch and pitch range are qualities of speech that\n can signify differences in gender in addition to other aspects of\n speaker identity. By resetting the average pitch and pitch range to\n that of the source, it is possible to morph the contour while maintaining\n aspects of the source speaker's identity.\n \n The image in the **upper right** contains a morph\n followed by a reset of the average pitch to the source speaker's average\n pitch. In the **bottom right** a morph followed by a reset of the speaker's\n pitch range. In the **bottom right** pitch range was reset and then the\n speaker's average pitch was reset.\n \n The longer the speech sample, the more representative the pitch range and\n mean pitch will be of the speaker. In this example both are skewed higher\n by the pitch accent on the first word.\n\n Here the average pitch of the source (a female speaker) is much higher\n than the target (a male speaker) and the resulting morph sounds like it\n comes from a different speaker than the source or target speakers.\n The three recordings that involve resetting pitch range and/or average\n pitch sound much more natural.\n\n.. image:: examples/files/mary1_mary2_f0_morph_compare.png\n :width: 500px\n \nDuration morphing (examples/duration_manipulation_example.py):\n\n The following image shows morphing of duration from mary1.wav to mary2.wav\n on a word-by-word basis in increments of 33% (33%, 66%, 100%).\n This process can operate over an entire file or, similar to pitch morphing,\n with annotated segments, as done in this example.\n\n.. image:: examples/files/mary1_mary2_dur_morph.png\n :width: 500px\n\n\nTutorials\n================\n\nTutorials for learning about prosody manipulation and how to use ProMo are available.\n\n`Tutorial 1.1: Intro to ProMo <https://nbviewer.jupyter.org/github/timmahrt/ProMo/blob/main/tutorials/tutorial1_1_intro_to_promo.ipynb>`_\n\n`Tutorial 1.2: Pitch manipulation tutorial <https://nbviewer.jupyter.org/github/timmahrt/ProMo/blob/main/tutorials/tutorial1_2_pitch_manipulations.ipynb>`_\n\n\nVersion History\n================\n\n*ProMo uses semantic versioning (Major.Minor.Patch)*\n\nPlease view `CHANGELOG.md <https://github.com/timmahrt/promo/blob/main/CHANGELOG.md>` for version history.\n\n\nRequirements\n==============\n\n``Python 3.3.*`` or above (or below, probably)\n\nMy praatIO library is required and can be downloaded\n`here <https://github.com/timmahrt/praatIO>`_\n\nMatplotlib is required if you want to plot graphs.\n`Matplotlib website <http://matplotlib.org/>`_\n\nScipy is required if you want to use interpolation--typically if you have stylized\npitch contours (in praat PitchTier format, for example) that you want to use in\nyour morphing).\n`Scipy website <http://scipy.org/>`_\n\nMatplotlib and SciPy are non-trivial to install, as they depends on several large\npackages. You can\nvisit their websites for more information. **I recommend the following instructions to\ninstall matplotlib** which uses *python wheels*. These will install all required\nlibraries in one fell swoop.\n\nOn Mac, open a terminal and type:\n\n python -m pip install matplotlib\n \n python -m pip install scipy\n \nOn Windows, open a cmd or powershell window and type:\n\n <<path to python>> -m pip install matplotlib\n \n <<path to python>> -m pip install scipy\n \n e.g. C:\\\\python27\\\\python.exe -m install matplotlib\n\nOtherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::\n\n python setup.py install\n\nIf python is not in your path, you'll need to enter the full path e.g.::\n\n\tC:\\Python27\\python.exe setup.py install\n\nIf you are using ``Python 2.x`` or ``Python < 3.7``, you can use `Promo 1.x`.\n\nUsage\n=========\n\nSee /examples for example usages\n\n\nInstallation\n================\n\nIf you on Windows, you can use the installer found here (check that it is up to date though)\n`Windows installer <http://www.timmahrt.com/python_installers>`_\n\nPromo is on pypi and can be installed or upgraded from the command-line shell with pip like so::\n\n python -m pip install promo --upgrade\n\nOtherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::\n\n python setup.py install\n\nIf python is not in your path, you'll need to enter the full path e.g.::\n\n C:\\Python36\\python.exe setup.py install\n\n\nCiting ProMo\n===============\n\nIf you use ProMo in your research, please cite it like so:\n\nTim Mahrt. ProMo: The Prosody-Morphing Library.\nhttps://github.com/timmahrt/ProMo, 2016.\n\n\nAcknowledgements\n================\n\nDevelopment of ProMo was possible thanks to NSF grant **BCS 12-51343** to\nJennifer Cole, Jos\u00e9 I. Hualde, and Caroline Smith and to the A*MIDEX project\n(n\u00b0 **ANR-11-IDEX-0001-02**) to James Sneed German funded by the\nInvestissements d'Avenir French Government program,\nmanaged by the French National Research Agency (ANR).\n",
"bugtrack_url": null,
"license": "LICENSE",
"summary": "Library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/timmahrt/ProMo"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca451a9970cbdfa715fbdcd428922b41e9f91ac70635bcfefafa5a424885dd0c",
"md5": "94f6e379bcb3cbf1ab566dcf2b348a46",
"sha256": "c8256261c82d943a78e0b87821cfcf27cc5e51de0daa914d6444095e5b0ee641"
},
"downloads": -1,
"filename": "promo-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "94f6e379bcb3cbf1ab566dcf2b348a46",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">3.6.0",
"size": 20564,
"upload_time": "2023-07-15T11:47:34",
"upload_time_iso_8601": "2023-07-15T11:47:34.103481Z",
"url": "https://files.pythonhosted.org/packages/ca/45/1a9970cbdfa715fbdcd428922b41e9f91ac70635bcfefafa5a424885dd0c/promo-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d8090b36dd812d7f1496ce560503049f18400564078e08db7f2d50de2219217f",
"md5": "d5cb91cc4d602b441238e63a49562af5",
"sha256": "bd95040231e28971469ade0408f1cd4f1da7dfbadcd9066c426524a8ec80b731"
},
"downloads": -1,
"filename": "promo-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "d5cb91cc4d602b441238e63a49562af5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.6.0",
"size": 21403,
"upload_time": "2023-07-15T11:47:36",
"upload_time_iso_8601": "2023-07-15T11:47:36.785028Z",
"url": "https://files.pythonhosted.org/packages/d8/09/0b36dd812d7f1496ce560503049f18400564078e08db7f2d50de2219217f/promo-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-15 11:47:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "timmahrt",
"github_project": "ProMo",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "promo"
}