# textgrid-tools
[![PyPI](https://img.shields.io/pypi/v/textgrid-tools.svg)](https://pypi.python.org/pypi/textgrid-tools)
![PyPI](https://img.shields.io/pypi/pyversions/textgrid-tools.svg)
[![MIT](https://img.shields.io/github/license/stefantaubert/textgrid-ipa.svg)](https://github.com/stefantaubert/textgrid-ipa/blob/main/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/textgrid-tools.svg)](https://pypi.python.org/pypi/textgrid-tools/#files)
![PyPI](https://img.shields.io/pypi/implementation/textgrid-tools.svg)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/textgrid-ipa/latest/main.svg)](https://github.com/stefantaubert/textgrid-ipa/compare/v0.0.7...main)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7528782.svg)](https://doi.org/10.5281/zenodo.7528782)
Command-line interface (CLI) to modify TextGrids and their corresponding audio files.
## Features
- grids
- `merge`: merge grids together
- `plot-durations`: plot durations
- `mark-durations`: mark intervals with specific durations with a text
- `create-dictionary`: create pronunciation dictionary out of a word and a pronunciation tier
- `plot-stats`: plot statistics
- `export-vocabulary`: export vocabulary out of multiple grid files
- `export-marks`: exports marks of a tier to a file
- `export-durations`: exports durations of grids to a file
- `export-paths`: exports grid paths to a file
- `export-audio-paths`: exports audio paths to a file
- `import-paths`: import grids from paths written in a file
- `import-audio-paths`: import audio files from paths written in a file
- grid
- `create`: convert text files to grid files
- `sync`: synchronize grid minTime and maxTime according to the corresponding audio file
- `split`: split a grid file on intervals into multiple grid files (incl. audio files)
- `print-stats`: print statistics
- tiers
- `apply-mapping`: apply mapping table to marks
- `transcribe`: transcribe words of tiers using a pronunciation dictionary
- `remove`: remove tiers
- tier
- `rename`: rename tier
- `clone`: clone tier
- `map`: map tier to other tiers
- `move`: move tier to another position
- `export`: export content of tier to a txt file
- `import`: import content of tier from a txt file
- intervals
- `join`: join adjacent intervals
- `join-between-marks`: join intervals between marks
- `join-by-boundary`: join intervals by boundaries of a tier
- `join-by-duration`: join intervals by a duration
- `join-marks`: join intervals containing specific marks
- `join-symbols`: join intervals containing specific symbols
- `join-template`: join intervals according to a template
- `split`: split intervals
- `fix-boundaries`: align boundaries of tiers according to a reference tier
- `remove`: remove intervals
- `plot-durations`: plot durations
- `replace-text`: replace text using regex pattern
## Roadmap
- Performance improvement
- Adding more tests
## Installation
```sh
pip install textgrid-tools --user
```
## Usage
```txt
usage: textgrid-tools-cli [-h] [-v] {grids,grid,tiers,tier,intervals} ...
This program provides methods to modify TextGrids (.TextGrid) and their corresponding audio files (.wav).
positional arguments:
{grids,grid,tiers,tier,intervals} description
grids execute commands targeted at multiple grids at once
grid execute commands targeted at single grids
tiers execute commands targeted at multiple tiers at once
tier execute commands targeted at single tiers
intervals execute commands targeted at intervals of tiers
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
```
## Dependencies
- `numpy>=1.18.5`
- `scipy>=1.8.0`
- `tqdm>=4.63.0`
- `TextGrid>=1.5`
- `pandas>=1.4.0`
- `ordered_set>=4.1.0`
- `matplotlib>=3.5.0`
- `pronunciation_dictionary>=0.0.5`
## Contributing
If you notice an error, please don't hesitate to open an issue.
### Development setup
```sh
# update
sudo apt update
# install Python 3.8, 3.9, 3.10 & 3.11 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/textgrid-ipa.git
cd textgrid-ipa
# create virtual environment
python3.8 -m pipenv install --dev
```
## Running the tests
```sh
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd textgrid-ipa
# activate environment
python3.8 -m pipenv shell
# run tests
tox
```
Final lines of test result output:
```log
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
congratulations :)
```
## Troubleshooting
If recordings/audio files are not in `.wav` format they need to be converted, e.g.:
```sh
sudo apt install ffmpeg -y
# e.g., mp3 to wav conversion
ffmpeg -i *.mp3 -acodec pcm_s16le -ar 22050 *.wav
```
## License
MIT License
## Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
## Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).
## Changelog
- v0.0.7 (2023-01-12)
- Fixed
- Bugfix `grids import-paths` and `grids import-audio-paths`
- Added
- Added option `--ignore` to ignore custom marks in `grids export-vocabulary`
- Added option `--mode` to `intervals replace-text` to replace text on different interval positions
- Added returning of an exit code
- Removed
- Removed `tiers mark-silence` because `grids mark-durations` should be used
- Removed `tiers remove-symbols` because `intervals replace-text` should be used
- Removed `intervals join-between-pauses` because `join-between-marks` should be used
- v0.0.6 (2022-12-23)
- improved validation for pronunciation dictionary creation
- bugfix replace text logging
- added intervals join-template
- support Python 3.11
- update pylint config
- fix description of grid/audio import
- v0.0.5 (2022-11-25)
- `intervals remove`: added parameter `mode` to better choose which intervals should be removed
- Added method to plot statistics for all grids together
- `tiers transcribe`: added option `assign-mark-to-missing` to replace missing transcriptions with a custom mark
- Bugfix: `mark-durations` empty couldn't be assigned
- Added `--min-count` to `mark-durations`
- Improved sorting of phonemes in durations plotting
- Changed marks exporting format to only contain tier marks
- Added exporting/importing of audio paths
- Added durations exporting
- Added exporting/importing of grid paths
- Added replacement of marks using regex pattern
- Added `--dry` option to most methods
- Make split symbol on split mandatory
- Upper-cased metavars
- v0.0.4 (2022-06-09)
- fixed bug while saving TextGrids
- improved robustness against file system errors
- v0.0.3 (2022-05-31)
- fixed invalid installation format and clarified dependencies
- adjusted textgrid serialization equal to praat output
- added option `include-empty` on vocabulary export
- set default chunksize to `1`
- added missing `__init__.py` files
- improved logging
- v0.0.2 (2022-05-06)
- improved logging
- improved reading/saving speed of TextGrids
- removed n_digits argument
- added option to define encoding of TextGrids
- added option to insert interval between grids which should be merged together
- removed tier copy
- added parser for tier export
- v0.0.1 (2022-04-29)
- initial release
Raw data
{
"_id": null,
"home_page": "",
"name": "textgrid-tools",
"maintainer": "",
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": "Stefan Taubert <pypi@stefantaubert.com>",
"keywords": "Text-to-speech,Speech synthesis,praat,TextGrid,Utils,Language,Linguistics",
"author": "",
"author_email": "Stefan Taubert <pypi@stefantaubert.com>",
"download_url": "https://files.pythonhosted.org/packages/e1/bd/0a574c1429581b88af4125e7226808555702688dccd27b78550f85242871/textgrid-tools-0.0.7.tar.gz",
"platform": null,
"description": "# textgrid-tools\n\n[![PyPI](https://img.shields.io/pypi/v/textgrid-tools.svg)](https://pypi.python.org/pypi/textgrid-tools)\n![PyPI](https://img.shields.io/pypi/pyversions/textgrid-tools.svg)\n[![MIT](https://img.shields.io/github/license/stefantaubert/textgrid-ipa.svg)](https://github.com/stefantaubert/textgrid-ipa/blob/main/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/textgrid-tools.svg)](https://pypi.python.org/pypi/textgrid-tools/#files)\n![PyPI](https://img.shields.io/pypi/implementation/textgrid-tools.svg)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/textgrid-ipa/latest/main.svg)](https://github.com/stefantaubert/textgrid-ipa/compare/v0.0.7...main)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7528782.svg)](https://doi.org/10.5281/zenodo.7528782)\n\nCommand-line interface (CLI) to modify TextGrids and their corresponding audio files.\n\n## Features\n\n- grids\n - `merge`: merge grids together\n - `plot-durations`: plot durations\n - `mark-durations`: mark intervals with specific durations with a text\n - `create-dictionary`: create pronunciation dictionary out of a word and a pronunciation tier\n - `plot-stats`: plot statistics\n - `export-vocabulary`: export vocabulary out of multiple grid files\n - `export-marks`: exports marks of a tier to a file\n - `export-durations`: exports durations of grids to a file\n - `export-paths`: exports grid paths to a file\n - `export-audio-paths`: exports audio paths to a file\n - `import-paths`: import grids from paths written in a file\n - `import-audio-paths`: import audio files from paths written in a file\n- grid\n - `create`: convert text files to grid files\n - `sync`: synchronize grid minTime and maxTime according to the corresponding audio file\n - `split`: split a grid file on intervals into multiple grid files (incl. audio files)\n - `print-stats`: print statistics\n- tiers\n - `apply-mapping`: apply mapping table to marks\n - `transcribe`: transcribe words of tiers using a pronunciation dictionary\n - `remove`: remove tiers\n- tier\n - `rename`: rename tier\n - `clone`: clone tier\n - `map`: map tier to other tiers\n - `move`: move tier to another position\n - `export`: export content of tier to a txt file\n - `import`: import content of tier from a txt file\n- intervals\n - `join`: join adjacent intervals\n - `join-between-marks`: join intervals between marks\n - `join-by-boundary`: join intervals by boundaries of a tier\n - `join-by-duration`: join intervals by a duration\n - `join-marks`: join intervals containing specific marks\n - `join-symbols`: join intervals containing specific symbols\n - `join-template`: join intervals according to a template\n - `split`: split intervals\n - `fix-boundaries`: align boundaries of tiers according to a reference tier\n - `remove`: remove intervals\n - `plot-durations`: plot durations\n - `replace-text`: replace text using regex pattern\n\n## Roadmap\n\n- Performance improvement\n- Adding more tests\n\n## Installation\n\n```sh\npip install textgrid-tools --user\n```\n\n## Usage\n\n```txt\nusage: textgrid-tools-cli [-h] [-v] {grids,grid,tiers,tier,intervals} ...\n\nThis program provides methods to modify TextGrids (.TextGrid) and their corresponding audio files (.wav).\n\npositional arguments:\n {grids,grid,tiers,tier,intervals} description\n grids execute commands targeted at multiple grids at once\n grid execute commands targeted at single grids\n tiers execute commands targeted at multiple tiers at once\n tier execute commands targeted at single tiers\n intervals execute commands targeted at intervals of tiers\n\noptional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n```\n\n## Dependencies\n\n- `numpy>=1.18.5`\n- `scipy>=1.8.0`\n- `tqdm>=4.63.0`\n- `TextGrid>=1.5`\n- `pandas>=1.4.0`\n- `ordered_set>=4.1.0`\n- `matplotlib>=3.5.0`\n- `pronunciation_dictionary>=0.0.5`\n\n## Contributing\n\nIf you notice an error, please don't hesitate to open an issue.\n\n### Development setup\n\n```sh\n# update\nsudo apt update\n# install Python 3.8, 3.9, 3.10 & 3.11 for ensuring that tests can be run\nsudo apt install python3-pip \\\n python3.8 python3.8-dev python3.8-distutils python3.8-venv \\\n python3.9 python3.9-dev python3.9-distutils python3.9-venv \\\n python3.10 python3.10-dev python3.10-distutils python3.10-venv \\\n python3.11 python3.11-dev python3.11-distutils python3.11-venv\n# install pipenv for creation of virtual environments\npython3.8 -m pip install pipenv --user\n\n# check out repo\ngit clone https://github.com/stefantaubert/textgrid-ipa.git\ncd textgrid-ipa\n# create virtual environment\npython3.8 -m pipenv install --dev\n```\n\n## Running the tests\n\n```sh\n# first install the tool like in \"Development setup\"\n# then, navigate into the directory of the repo (if not already done)\ncd textgrid-ipa\n# activate environment\npython3.8 -m pipenv shell\n# run tests\ntox\n```\n\nFinal lines of test result output:\n\n```log\n py38: commands succeeded\n py39: commands succeeded\n py310: commands succeeded\n py311: commands succeeded\n congratulations :)\n```\n\n## Troubleshooting\n\nIf recordings/audio files are not in `.wav` format they need to be converted, e.g.:\n\n```sh\nsudo apt install ffmpeg -y\n# e.g., mp3 to wav conversion\nffmpeg -i *.mp3 -acodec pcm_s16le -ar 22050 *.wav\n```\n\n## License\n\nMIT License\n\n## Acknowledgments\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n## Changelog\n\n- v0.0.7 (2023-01-12)\n - Fixed\n - Bugfix `grids import-paths` and `grids import-audio-paths`\n - Added\n - Added option `--ignore` to ignore custom marks in `grids export-vocabulary`\n - Added option `--mode` to `intervals replace-text` to replace text on different interval positions\n - Added returning of an exit code\n - Removed\n - Removed `tiers mark-silence` because `grids mark-durations` should be used\n - Removed `tiers remove-symbols` because `intervals replace-text` should be used\n - Removed `intervals join-between-pauses` because `join-between-marks` should be used\n- v0.0.6 (2022-12-23)\n - improved validation for pronunciation dictionary creation\n - bugfix replace text logging\n - added intervals join-template\n - support Python 3.11\n - update pylint config\n - fix description of grid/audio import\n- v0.0.5 (2022-11-25)\n - `intervals remove`: added parameter `mode` to better choose which intervals should be removed\n - Added method to plot statistics for all grids together\n - `tiers transcribe`: added option `assign-mark-to-missing` to replace missing transcriptions with a custom mark\n - Bugfix: `mark-durations` empty couldn't be assigned\n - Added `--min-count` to `mark-durations`\n - Improved sorting of phonemes in durations plotting\n - Changed marks exporting format to only contain tier marks\n - Added exporting/importing of audio paths\n - Added durations exporting\n - Added exporting/importing of grid paths\n - Added replacement of marks using regex pattern\n - Added `--dry` option to most methods\n - Make split symbol on split mandatory\n - Upper-cased metavars\n- v0.0.4 (2022-06-09)\n - fixed bug while saving TextGrids\n - improved robustness against file system errors\n- v0.0.3 (2022-05-31)\n - fixed invalid installation format and clarified dependencies\n - adjusted textgrid serialization equal to praat output\n - added option `include-empty` on vocabulary export\n - set default chunksize to `1`\n - added missing `__init__.py` files\n - improved logging\n- v0.0.2 (2022-05-06)\n - improved logging\n - improved reading/saving speed of TextGrids\n - removed n_digits argument\n - added option to define encoding of TextGrids\n - added option to insert interval between grids which should be merged together\n - removed tier copy\n - added parser for tier export\n- v0.0.1 (2022-04-29)\n - initial release\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Command-line interface (CLI) to modify TextGrids and their corresponding audio files.",
"version": "0.0.7",
"split_keywords": [
"text-to-speech",
"speech synthesis",
"praat",
"textgrid",
"utils",
"language",
"linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b72209a3143a810c71dbaa2137d033d85dae653903164210f672ed69f00152c8",
"md5": "f94dfb55f2d569234c83a3ab20779a69",
"sha256": "e09d3959422cf6ae9fbbf08a13fb7c078721d2a5ac9f6ebc80feafdf1a3481b2"
},
"downloads": -1,
"filename": "textgrid_tools-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f94dfb55f2d569234c83a3ab20779a69",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 151955,
"upload_time": "2023-01-12T11:46:22",
"upload_time_iso_8601": "2023-01-12T11:46:22.002681Z",
"url": "https://files.pythonhosted.org/packages/b7/22/09a3143a810c71dbaa2137d033d85dae653903164210f672ed69f00152c8/textgrid_tools-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e1bd0a574c1429581b88af4125e7226808555702688dccd27b78550f85242871",
"md5": "1607b57b1b972f2036899de4ea244250",
"sha256": "f04d8f2760be000b3337f48d9927da193b37392e2d4ea5586ab428060e8dec30"
},
"downloads": -1,
"filename": "textgrid-tools-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "1607b57b1b972f2036899de4ea244250",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 81030,
"upload_time": "2023-01-12T11:46:23",
"upload_time_iso_8601": "2023-01-12T11:46:23.804310Z",
"url": "https://files.pythonhosted.org/packages/e1/bd/0a574c1429581b88af4125e7226808555702688dccd27b78550f85242871/textgrid-tools-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-12 11:46:23",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "textgrid-tools"
}