# Transcribe Allign TextGrid
A small wrapper package around [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped). Create force-aligned transcription TextGrids from raw audio.
## Installation
## Requirements
* `Python3.8` to `python3.11`.
* Use the executable `python3.x` on Unix, available in most package managers, or `py -3.x` on Windows.
* This command line executable of will be referred to as `[python-executable]` for the rest of the instructions
* Install pip on old python versions with `[python-executable] -m ensurepip --default-pip`
* `ffmpeg` Usually preinstalled on Linux. For Windows see instructions for installation on the [whisper repository](https://github.com/openai/whisper)
* `git` Usually preinstalled on Linux. For Windows, visit [the git site](https://git-scm.com/download/win).
* Needed for installation of whisper-timestamped, as it is not available on PyPI
* Note that it needs to be available from the command line; git-bash might not work.
## Installing Torch
Torch, on which Whisper is built, is quite a low-level library, meaning which version you'll need depends on your OS and type of GPU. On Mac and Windows, pip will by default install a non-accelerated CPU version of the library. If you are on Linux, it will presume you have a CUDA-capable (which is to say Nvidia branded) GPU. If you are on Windows and have an Nvidia GPU you can use, or are on Linux and either do not have a GPU or have an AMD GPU, you should check out the more detailed torch installation instructions [here](https://pytorch.org/get-started/locally/).
This should be done *before* installing `transcribe_allign_textgrid` and `whisper_timestamped`.
## Installing
Once the requirements are satisfied, you can install whisper-timestamped and this package:
Whisper-timestamped is not on Pypi, so a separate `git+` install is needed. (If you only want to use the package as a library instead of a cli, whisper-timestamped is not a dependency, and this manual install of it is not needed.)
```bash
[python-executable] -m pip install git+https://github.com/linto-ai/whisper-timestamped
[python-executable] -m pip install transcribe_allign_textgrid
```
# Running from the command line
Once the application is installed, you can run it with:
```bash
[python-executable] -m transcribe_allign_textgrid [path]
```
here `path` is the path to the audio files.
* If a directory path is passed, all audio files in the directory will be transcribed, and force-aligned transcription TextGrids of the same name will be generated in this directory.
* If a file path is passed, a force-aligned transcription TextGrid will be generated into the same directory with the same name as the original file.
* If a glob is passed, the glob will be resolved and all matches will be processed as if the files were passed individually
* By default, if a non-audio file is passed, an error is raised. To skip those instead, pass the `--skip` flag.
## Selecting a different model
By default, this will run on the smallest, that is, least accurate and fastest, model, `tiny`. To run with another model, pass it as an argument:
```bash
[python-executable] -m transcribe_allign_textgrid [path] --model [model]
```
The available models are:
| name | Parameters | Required VRAM | Relative speed |
|:------:|:----------:|:-------------:|:--------------:|
| tiny | 39 M | ~1 GB | ~32x |
| base | 74 M | ~1 GB | ~16x |
| small | 244 M | ~2 GB | ~6x |
| medium | 769 M | ~5 GB | ~2x |
| large | 1550 M | ~10 GB | 1x |
## Specifying what language to use
By default, the application will try to detect what language is used automatically. However, you can also specify this manually:
```bash
[python-executable] -m transcribe_allign_textgrid [path] --language [language]
# Or also specifying what model to use:
[python-executable] -m transcribe_allign_textgrid [path] --model [model] --language [language]
```
To see what languages are available, please see the [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) file in the Whisper source (Yes, the OpenAI team themselves recommends finding it this way, too.)
# Using as a library
The tool can also be used as a library. It exports one function: `whisper_to_textgrid()` Which takes in a transcription object (nested dictionary) from [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) and returns a Textgrid object from [praatio](https://github.com/timmahrt/praatIO). The typical Json output from whisper-timestamped works, too.
This library part of the package does not depend on whisper-timestamped, to make it fully installable and usable as a requirement via pipy.
# Output
The output TextGrids have four TextGridTiers:
* `segments_text` The text in a given segment (Speaker's turn)
* `segments_confidence` The confidence the model has that this is the correct labeling and segmentation for the segment
* `words_text` The text of a given word
* `words_confidence` The confidence the model has that this is the current labeling and segmentation for this word.
If one of these tiers would have been empty per the output of whisper-timestamped, to satisfy Praat's error handling, a tier with an empty interval (0.0, 0.1) is generated.
In praat, it will look a little like this:
<p allign="center">
<img src=".assets/sample_output.png" />
</p>
# Development
The package is quite trivial, but, if you want to work on it, here are some instructions
## Style
All code is formatted with the [Black](https://github.com/psf/black) code-formatter. As for casing, python standards are used except in cases where dependencies don't.
I am dyslectic, and quite likely to make spelling errors in variables. If you find any, don't hesitate to send me a pull request!
## Running Tests
After cloning the repository, moving into it, and installing `pytest` and `pytest-cov` with pip, run tests with:
```bash
# Install the current version of the package locally to be able to test it.
[python-executable] -m pip install -e .
[python-executable] -m pytest --cov=transcribe_allign_textgrid tests/
```
Raw data
{
"_id": null,
"home_page": "",
"name": "transcribe-allign-textgrid",
"maintainer": "",
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": "",
"keywords": "praat,whisper,force-allign,TextGrid",
"author": "",
"author_email": "JJWRoeloffs <jelleroeloffs@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/bd/a4/b614688568e55186c95b9f69022bee9040cb70642ad795275ebbb99f1ffe/transcribe_allign_textgrid-0.1.5.tar.gz",
"platform": null,
"description": "# Transcribe Allign TextGrid\n\nA small wrapper package around [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped). Create force-aligned transcription TextGrids from raw audio.\n\n## Installation\n\n## Requirements\n* `Python3.8` to `python3.11`.\n * Use the executable `python3.x` on Unix, available in most package managers, or `py -3.x` on Windows.\n * This command line executable of will be referred to as `[python-executable]` for the rest of the instructions\n * Install pip on old python versions with `[python-executable] -m ensurepip --default-pip`\n* `ffmpeg` Usually preinstalled on Linux. For Windows see instructions for installation on the [whisper repository](https://github.com/openai/whisper)\n* `git` Usually preinstalled on Linux. For Windows, visit [the git site](https://git-scm.com/download/win).\n * Needed for installation of whisper-timestamped, as it is not available on PyPI\n * Note that it needs to be available from the command line; git-bash might not work.\n\n## Installing Torch\n\nTorch, on which Whisper is built, is quite a low-level library, meaning which version you'll need depends on your OS and type of GPU. On Mac and Windows, pip will by default install a non-accelerated CPU version of the library. If you are on Linux, it will presume you have a CUDA-capable (which is to say Nvidia branded) GPU. If you are on Windows and have an Nvidia GPU you can use, or are on Linux and either do not have a GPU or have an AMD GPU, you should check out the more detailed torch installation instructions [here](https://pytorch.org/get-started/locally/).\n\nThis should be done *before* installing `transcribe_allign_textgrid` and `whisper_timestamped`.\n\n## Installing\nOnce the requirements are satisfied, you can install whisper-timestamped and this package:\n\nWhisper-timestamped is not on Pypi, so a separate `git+` install is needed. (If you only want to use the package as a library instead of a cli, whisper-timestamped is not a dependency, and this manual install of it is not needed.)\n```bash\n[python-executable] -m pip install git+https://github.com/linto-ai/whisper-timestamped\n[python-executable] -m pip install transcribe_allign_textgrid\n```\n\n# Running from the command line\nOnce the application is installed, you can run it with:\n```bash\n[python-executable] -m transcribe_allign_textgrid [path]\n```\n\nhere `path` is the path to the audio files.\n* If a directory path is passed, all audio files in the directory will be transcribed, and force-aligned transcription TextGrids of the same name will be generated in this directory.\n* If a file path is passed, a force-aligned transcription TextGrid will be generated into the same directory with the same name as the original file.\n* If a glob is passed, the glob will be resolved and all matches will be processed as if the files were passed individually\n* By default, if a non-audio file is passed, an error is raised. To skip those instead, pass the `--skip` flag.\n\n## Selecting a different model\nBy default, this will run on the smallest, that is, least accurate and fastest, model, `tiny`. To run with another model, pass it as an argument:\n```bash\n[python-executable] -m transcribe_allign_textgrid [path] --model [model]\n```\n\nThe available models are:\n\n| name | Parameters | Required VRAM | Relative speed |\n|:------:|:----------:|:-------------:|:--------------:|\n| tiny | 39 M | ~1 GB | ~32x |\n| base | 74 M | ~1 GB | ~16x |\n| small | 244 M | ~2 GB | ~6x |\n| medium | 769 M | ~5 GB | ~2x |\n| large | 1550 M | ~10 GB | 1x |\n\n## Specifying what language to use\nBy default, the application will try to detect what language is used automatically. However, you can also specify this manually:\n```bash\n[python-executable] -m transcribe_allign_textgrid [path] --language [language]\n\n# Or also specifying what model to use:\n[python-executable] -m transcribe_allign_textgrid [path] --model [model] --language [language]\n```\n\nTo see what languages are available, please see the [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) file in the Whisper source (Yes, the OpenAI team themselves recommends finding it this way, too.)\n\n# Using as a library\nThe tool can also be used as a library. It exports one function: `whisper_to_textgrid()` Which takes in a transcription object (nested dictionary) from [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) and returns a Textgrid object from [praatio](https://github.com/timmahrt/praatIO). The typical Json output from whisper-timestamped works, too.\n\nThis library part of the package does not depend on whisper-timestamped, to make it fully installable and usable as a requirement via pipy.\n\n# Output\nThe output TextGrids have four TextGridTiers:\n* `segments_text` The text in a given segment (Speaker's turn)\n* `segments_confidence` The confidence the model has that this is the correct labeling and segmentation for the segment\n* `words_text` The text of a given word\n* `words_confidence` The confidence the model has that this is the current labeling and segmentation for this word.\n\nIf one of these tiers would have been empty per the output of whisper-timestamped, to satisfy Praat's error handling, a tier with an empty interval (0.0, 0.1) is generated.\n\nIn praat, it will look a little like this:\n<p allign=\"center\">\n <img src=\".assets/sample_output.png\" />\n</p>\n\n# Development\nThe package is quite trivial, but, if you want to work on it, here are some instructions\n\n\n## Style\nAll code is formatted with the [Black](https://github.com/psf/black) code-formatter. As for casing, python standards are used except in cases where dependencies don't.\n\nI am dyslectic, and quite likely to make spelling errors in variables. If you find any, don't hesitate to send me a pull request!\n\n## Running Tests\nAfter cloning the repository, moving into it, and installing `pytest` and `pytest-cov` with pip, run tests with:\n```bash\n# Install the current version of the package locally to be able to test it.\n[python-executable] -m pip install -e .\n\n[python-executable] -m pytest --cov=transcribe_allign_textgrid tests/\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "Create for-alligned transcription TextGrids from Audio",
"version": "0.1.5",
"project_urls": {
"Bug Tracker": "https://github.com/JJWRoeloffs/transcribe_allign_textgrid/issues",
"Homepage": "https://github.com/JJWRoeloffs/transcribe_allign_textgrid"
},
"split_keywords": [
"praat",
"whisper",
"force-allign",
"textgrid"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "59665a5d54d4046182135267f8615da65d82fbe2d054a59418a939e8b3eac2e9",
"md5": "1cae73284621f494fbb9f61e9208a510",
"sha256": "ae87dfb448429de3a5f43703ac4bd5c635b7bb33430c2cfd5ce9f2cc355dd8f4"
},
"downloads": -1,
"filename": "transcribe_allign_textgrid-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1cae73284621f494fbb9f61e9208a510",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 20268,
"upload_time": "2023-11-13T21:10:34",
"upload_time_iso_8601": "2023-11-13T21:10:34.886974Z",
"url": "https://files.pythonhosted.org/packages/59/66/5a5d54d4046182135267f8615da65d82fbe2d054a59418a939e8b3eac2e9/transcribe_allign_textgrid-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bda4b614688568e55186c95b9f69022bee9040cb70642ad795275ebbb99f1ffe",
"md5": "180e756bcf2d2034221aa348ac3d0cfb",
"sha256": "5b68dd6c2506ddeb93f42bff51124548ce10152935f5a1fb2106dffb26fd9771"
},
"downloads": -1,
"filename": "transcribe_allign_textgrid-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "180e756bcf2d2034221aa348ac3d0cfb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.8",
"size": 21497,
"upload_time": "2023-11-13T21:10:36",
"upload_time_iso_8601": "2023-11-13T21:10:36.599039Z",
"url": "https://files.pythonhosted.org/packages/bd/a4/b614688568e55186c95b9f69022bee9040cb70642ad795275ebbb99f1ffe/transcribe_allign_textgrid-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-13 21:10:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JJWRoeloffs",
"github_project": "transcribe_allign_textgrid",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "praatio",
"specs": [
[
"~=",
"5.1"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"~=",
"4.1"
]
]
}
],
"lcname": "transcribe-allign-textgrid"
}