pyjuliusalign


Namepyjuliusalign JSON
Version 4.0.0 PyPI version JSON
download
home_pagehttps://github.com/timmahrt/pyJuliusAlign
SummaryA helper library for doing forced-alignment in Japanese with Julius.
upload_time2023-07-15 14:29:03
maintainer
docs_urlNone
authorTim Mahrt
requires_python>3.6.0
licenseLICENSE
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# pyJuliusAlign

 [![](https://badges.gitter.im/pyJuliusAlign/Lobby.svg)](https://gitter.im/pyJuliusAlign/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![](https://img.shields.io/badge/license-MIT-blue.svg?)](http://opensource.org/licenses/MIT) [![](https://img.shields.io/pypi/v/pyjuliusalign.svg)](https://pypi.org/project/pyjuliusalign/)

*Questions?  Comments?  Feedback?  Chat with us on gitter!*

-----

Input and output of pyJuliusAlign:

![PyJuliusAlign example](./examples/files/pyjulius_example.png)

録音音声とトランスクリプトはあるけど、どこにその単語またはその子音、母音があるのか正確に分からない時、それらを探すために「forced alignment」という機能を使います。「Julius」という音声認識システムは日本語で「forced alignment」を行うことができますが、音声内に発音を入れる必要があります。がしかし、基本的にはトランスクリプトは文字だけです。「cabocha」というソフトウェアは文章を元にそれぞれの単語からその発音まで変換することができます。「pyJuliusAlign」というライブラリは日本語を「forced alignment」する為に「Julius」と「cabocha」を一緒に使います。TextGridされた音声録音には、単語とその子音、母音を直接挿入することができます。

When we have a speech recording and a text transcript but we don't know where the words, vowels, and consonants are, we can use a tool called "forced alignment" to find them. There is a speech recognition system called "Julius" that can do forced alignment in Japanese. However, it requires the pronunciation used in the recording. Usually, in the text transcript, there is only words. The "Cabocha" software can convert sentences to individual words and their pronunciations. The software library "pyJuliusAlign" uses "Julius" and "cabocha" together. In textgrid speech transcripts, words, vowels, and consonants can be directly inserted.

----

英語やフランス語やスペイン語など「forced alignment」をしたいなら「SPPAS」と言うソフトウェアをお勧めします。

If you want to do forced alignment in English, French, or Spanish, I recommend SPPAS.

[http://www.sppas.org](http://www.sppas.org/)


----

To get started:

*/examples/align_example.py* should be sufficient for a large number of cases.

*/pyjuliusalign/alignFromTextgrid.py* provides a good example of building your own custom alignment code (with different inputs and outputs than textgrids).  


# Table of contents
1. [Documentation](#documentation)
2. [Major Revisions](#major-revisions)
3. [Requirements](#requirements)
  * [Mac-specific Requirements Information](#mac-specific-requirements-information)
  * [Windows-specific Requirements Information](#windows-specific-requirements-information)
4. [Installation](#installation)
5. [Testing Installation](#testing-installation)
6. [Example Usage](#example-usage)
7. [Tests](#tests)
8. [Troubleshooting](#troubleshooting)

## Documentation

Automatically generated pdocs can be found here:

http://timmahrt.github.io/pyJuliusAlign/


## Major Revisions

*PyJuliusAlign uses semantic versioning (Major.Minor.Patch)*

Please view [CHANGELOG.md](https://github.com/timmahrt/praatIO/blob/main/CHANGELOG.md) for version history.


## Requirements

python - https://www.python.org/

python-Levenshtein
- https://github.com/ztane/python-Levenshtein

pyDub
- https://github.com/jiaaro/pydub

praatIO - https://github.com/timmahrt/praatIO
 - for textgrid manipulations

Julius - https://github.com/julius-speech/julius
 - the speech recognition engine
 - pyJuliusAlign has been tested with Julius 4.5, released on January 2nd, 2019.

Julius Segmentation Kit - https://github.com/julius-speech/segmentation-kit
 - it's not a file you "install" but something you'll want to put in a stable folder where you can access it when needed
 - Change line 33 to:
  ```perl
  ## data directory
  $datadir = "./wav";
  if (defined $ARGV[0]) {
    $datadir = $ARGV[0];
  }
  ```
  - Also in the configuration section, I recommend setting `$hmmdefs` to an absolute path e.g. `$hmmdefs="/Users/tmahrt/segmentation-kit/models/hmmdefs_monof_mix16_gid.binhmm"; # monophone model`
  - Make sure to set silence appropriately.  If you have clearly marked the edges of speech, you'll want to turn off silence marking.  If you have not done so (for example, your recording only includes a single utterance) then you'll want to have the segmentation kit expect silence at the start and end of your recording.

Sox - http://sox.sourceforge.net/
 - Converts the sampling frequency of the audio if needed.
 - Optional.  If you choose to not install sox, you'll need to make sure your audio files are at the same sampling frequency as the model data (the included data is 14khz)
 - If you forced the script to run Julius on audio that has a different sampling frequency, the aligner would completely fail.

Cabocha - http://taku910.github.io/cabocha/ 
 - used to convert typical Japanese text into romaji/phones.
 - (throw it into google translate if you need it in English)
 - make a note of which encoding you use for the dictionary file--you'll need it in the code
 - (you may need to configure cabocha post-install; see https://github.com/timmahrt/pyJuliusAlign/issues/7)

Perl (for Julius)


### Mac-specific Requirements Information

I use a mac and was able to easily install many requirements using Homebrew.  Here are some guides that I found useful (they translate well enough from Japanese using google translate):
 - Sox https://qiita.com/samurai20000@github/items/2af98b6c468af317bb09
 - Cabocha https://qiita.com/musaprg/items/9a572ad5c4e28f79d2ae
 - I manually built Julius using the configure and make scripts included in that project


### Windows-specific Requirements Information

I currently don't have access to a Windows machine. Earlier, I tested installation and got as far as running Julius. Perl tried to run gzip which I couldn't get to install.

One user was able to get it working on Windows by installing cygwin and adding cygwin to the path in environment variables.  Also, they had to install MeCab before running Cabocha, otherwise, they would receive an exception saying there's something wrong with Cabocha.

## Installation

PyJuliusAlign is on pypi and can be installed or upgraded from the command-line shell with pip like so::

    python -m pip install pyjuliusalign --upgrade

Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::

    python setup.py install

If python is not in your path, you'll need to enter the full path e.g.::

	C:\Python36\python.exe setup.py install


## Testing Installation

In the folder 'examples' run the file 'align_example.py'.

If sox, cabocha, julius, and perl are all in your path, you won't need to specify them in any of the arguments--leave them with your default values. Otherwise, you'll need to specify the full path of their bin/executable files.

If you have difficulties running the code without specifying the full path, try using the full paths anyways.

Also, you will need to configure "segment_julius.pl" which is a part of the Julius Segmentation Kit.


## Example Usage

Please see /examples for an example usage.

There is pretty much only one way to use this library at the moment. Please contact me if you are having difficulties using this library.


## Tests

I run tests with the following command (this requires pytest and pytest-cov to be installed):

`pytest --cov=pyjuliusalign tests/`


## Troubleshooting

The scripts should catch any issues along the way with the exception of issues stemming from Julius.  If you get bogus/null results, most likely Julius hasn't been set up correctly.

The Julius Segmentation kit comes with an example.  If you can force align that, then you should be able to force align using this script as well.






            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/timmahrt/pyJuliusAlign",
    "name": "pyjuliusalign",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">3.6.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Tim Mahrt",
    "author_email": "timmahrt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b8/26/8b1696096bfb86d0b1e47426af3cbb62b809491d9e1aa164eab0ae11f260/pyjuliusalign-4.0.0.tar.gz",
    "platform": null,
    "description": "\n# pyJuliusAlign\n\n [![](https://badges.gitter.im/pyJuliusAlign/Lobby.svg)](https://gitter.im/pyJuliusAlign/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![](https://img.shields.io/badge/license-MIT-blue.svg?)](http://opensource.org/licenses/MIT) [![](https://img.shields.io/pypi/v/pyjuliusalign.svg)](https://pypi.org/project/pyjuliusalign/)\n\n*Questions?  Comments?  Feedback?  Chat with us on gitter!*\n\n-----\n\nInput and output of pyJuliusAlign:\n\n![PyJuliusAlign example](./examples/files/pyjulius_example.png)\n\n\u9332\u97f3\u97f3\u58f0\u3068\u30c8\u30e9\u30f3\u30b9\u30af\u30ea\u30d7\u30c8\u306f\u3042\u308b\u3051\u3069\u3001\u3069\u3053\u306b\u305d\u306e\u5358\u8a9e\u307e\u305f\u306f\u305d\u306e\u5b50\u97f3\u3001\u6bcd\u97f3\u304c\u3042\u308b\u306e\u304b\u6b63\u78ba\u306b\u5206\u304b\u3089\u306a\u3044\u6642\u3001\u305d\u308c\u3089\u3092\u63a2\u3059\u305f\u3081\u306b\u300cforced alignment\u300d\u3068\u3044\u3046\u6a5f\u80fd\u3092\u4f7f\u3044\u307e\u3059\u3002\u300cJulius\u300d\u3068\u3044\u3046\u97f3\u58f0\u8a8d\u8b58\u30b7\u30b9\u30c6\u30e0\u306f\u65e5\u672c\u8a9e\u3067\u300cforced alignment\u300d\u3092\u884c\u3046\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u304c\u3001\u97f3\u58f0\u5185\u306b\u767a\u97f3\u3092\u5165\u308c\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002\u304c\u3057\u304b\u3057\u3001\u57fa\u672c\u7684\u306b\u306f\u30c8\u30e9\u30f3\u30b9\u30af\u30ea\u30d7\u30c8\u306f\u6587\u5b57\u3060\u3051\u3067\u3059\u3002\u300ccabocha\u300d\u3068\u3044\u3046\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u306f\u6587\u7ae0\u3092\u5143\u306b\u305d\u308c\u305e\u308c\u306e\u5358\u8a9e\u304b\u3089\u305d\u306e\u767a\u97f3\u307e\u3067\u5909\u63db\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002\u300cpyJuliusAlign\u300d\u3068\u3044\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\u306f\u65e5\u672c\u8a9e\u3092\u300cforced alignment\u300d\u3059\u308b\u70ba\u306b\u300cJulius\u300d\u3068\u300ccabocha\u300d\u3092\u4e00\u7dd2\u306b\u4f7f\u3044\u307e\u3059\u3002TextGrid\u3055\u308c\u305f\u97f3\u58f0\u9332\u97f3\u306b\u306f\u3001\u5358\u8a9e\u3068\u305d\u306e\u5b50\u97f3\u3001\u6bcd\u97f3\u3092\u76f4\u63a5\u633f\u5165\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002\n\nWhen we have a speech recording and a text transcript but we don't know where the words, vowels, and consonants are, we can use a tool called \"forced alignment\" to find them. There is a speech recognition system called \"Julius\" that can do forced alignment in Japanese. However, it requires the pronunciation used in the recording. Usually, in the text transcript, there is only words. The \"Cabocha\" software can convert sentences to individual words and their pronunciations. The software library \"pyJuliusAlign\" uses \"Julius\" and \"cabocha\" together. In textgrid speech transcripts, words, vowels, and consonants can be directly inserted.\n\n----\n\n\u82f1\u8a9e\u3084\u30d5\u30e9\u30f3\u30b9\u8a9e\u3084\u30b9\u30da\u30a4\u30f3\u8a9e\u306a\u3069\u300cforced alignment\u300d\u3092\u3057\u305f\u3044\u306a\u3089\u300cSPPAS\u300d\u3068\u8a00\u3046\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u3092\u304a\u52e7\u3081\u3057\u307e\u3059\u3002\n\nIf you want to do forced alignment in English, French, or Spanish, I recommend SPPAS.\n\n[http://www.sppas.org](http://www.sppas.org/)\n\n\n----\n\nTo get started:\n\n*/examples/align_example.py* should be sufficient for a large number of cases.\n\n*/pyjuliusalign/alignFromTextgrid.py* provides a good example of building your own custom alignment code (with different inputs and outputs than textgrids).  \n\n\n# Table of contents\n1. [Documentation](#documentation)\n2. [Major Revisions](#major-revisions)\n3. [Requirements](#requirements)\n  * [Mac-specific Requirements Information](#mac-specific-requirements-information)\n  * [Windows-specific Requirements Information](#windows-specific-requirements-information)\n4. [Installation](#installation)\n5. [Testing Installation](#testing-installation)\n6. [Example Usage](#example-usage)\n7. [Tests](#tests)\n8. [Troubleshooting](#troubleshooting)\n\n## Documentation\n\nAutomatically generated pdocs can be found here:\n\nhttp://timmahrt.github.io/pyJuliusAlign/\n\n\n## Major Revisions\n\n*PyJuliusAlign uses semantic versioning (Major.Minor.Patch)*\n\nPlease view [CHANGELOG.md](https://github.com/timmahrt/praatIO/blob/main/CHANGELOG.md) for version history.\n\n\n## Requirements\n\npython - https://www.python.org/\n\npython-Levenshtein\n- https://github.com/ztane/python-Levenshtein\n\npyDub\n- https://github.com/jiaaro/pydub\n\npraatIO - https://github.com/timmahrt/praatIO\n - for textgrid manipulations\n\nJulius - https://github.com/julius-speech/julius\n - the speech recognition engine\n - pyJuliusAlign has been tested with Julius 4.5, released on January 2nd, 2019.\n\nJulius Segmentation Kit - https://github.com/julius-speech/segmentation-kit\n - it's not a file you \"install\" but something you'll want to put in a stable folder where you can access it when needed\n - Change line 33 to:\n  ```perl\n  ## data directory\n  $datadir = \"./wav\";\n  if (defined $ARGV[0]) {\n    $datadir = $ARGV[0];\n  }\n  ```\n  - Also in the configuration section, I recommend setting `$hmmdefs` to an absolute path e.g. `$hmmdefs=\"/Users/tmahrt/segmentation-kit/models/hmmdefs_monof_mix16_gid.binhmm\"; # monophone model`\n  - Make sure to set silence appropriately.  If you have clearly marked the edges of speech, you'll want to turn off silence marking.  If you have not done so (for example, your recording only includes a single utterance) then you'll want to have the segmentation kit expect silence at the start and end of your recording.\n\nSox - http://sox.sourceforge.net/\n - Converts the sampling frequency of the audio if needed.\n - Optional.  If you choose to not install sox, you'll need to make sure your audio files are at the same sampling frequency as the model data (the included data is 14khz)\n - If you forced the script to run Julius on audio that has a different sampling frequency, the aligner would completely fail.\n\nCabocha - http://taku910.github.io/cabocha/ \n - used to convert typical Japanese text into romaji/phones.\n - (throw it into google translate if you need it in English)\n - make a note of which encoding you use for the dictionary file--you'll need it in the code\n - (you may need to configure cabocha post-install; see https://github.com/timmahrt/pyJuliusAlign/issues/7)\n\nPerl (for Julius)\n\n\n### Mac-specific Requirements Information\n\nI use a mac and was able to easily install many requirements using Homebrew.  Here are some guides that I found useful (they translate well enough from Japanese using google translate):\n - Sox https://qiita.com/samurai20000@github/items/2af98b6c468af317bb09\n - Cabocha https://qiita.com/musaprg/items/9a572ad5c4e28f79d2ae\n - I manually built Julius using the configure and make scripts included in that project\n\n\n### Windows-specific Requirements Information\n\nI currently don't have access to a Windows machine. Earlier, I tested installation and got as far as running Julius. Perl tried to run gzip which I couldn't get to install.\n\nOne user was able to get it working on Windows by installing cygwin and adding cygwin to the path in environment variables.  Also, they had to install MeCab before running Cabocha, otherwise, they would receive an exception saying there's something wrong with Cabocha.\n\n## Installation\n\nPyJuliusAlign is on pypi and can be installed or upgraded from the command-line shell with pip like so::\n\n    python -m pip install pyjuliusalign --upgrade\n\nOtherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::\n\n    python setup.py install\n\nIf python is not in your path, you'll need to enter the full path e.g.::\n\n\tC:\\Python36\\python.exe setup.py install\n\n\n## Testing Installation\n\nIn the folder 'examples' run the file 'align_example.py'.\n\nIf sox, cabocha, julius, and perl are all in your path, you won't need to specify them in any of the arguments--leave them with your default values. Otherwise, you'll need to specify the full path of their bin/executable files.\n\nIf you have difficulties running the code without specifying the full path, try using the full paths anyways.\n\nAlso, you will need to configure \"segment_julius.pl\" which is a part of the Julius Segmentation Kit.\n\n\n## Example Usage\n\nPlease see /examples for an example usage.\n\nThere is pretty much only one way to use this library at the moment. Please contact me if you are having difficulties using this library.\n\n\n## Tests\n\nI run tests with the following command (this requires pytest and pytest-cov to be installed):\n\n`pytest --cov=pyjuliusalign tests/`\n\n\n## Troubleshooting\n\nThe scripts should catch any issues along the way with the exception of issues stemming from Julius.  If you get bogus/null results, most likely Julius hasn't been set up correctly.\n\nThe Julius Segmentation kit comes with an example.  If you can force align that, then you should be able to force align using this script as well.\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "LICENSE",
    "summary": "A helper library for doing forced-alignment in Japanese with Julius.",
    "version": "4.0.0",
    "project_urls": {
        "Homepage": "https://github.com/timmahrt/pyJuliusAlign"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ac88c5d9d0c422440c8c2d311fb048dceed6d1d53b8dc86f44ebd2e53211ee2",
                "md5": "4e0ee5cce1c36898c6cdf64201b5cb44",
                "sha256": "65644afac434a162ae0ed1957c6e7c87ef140056c0d95c9e12e674fb76769a15"
            },
            "downloads": -1,
            "filename": "pyjuliusalign-4.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4e0ee5cce1c36898c6cdf64201b5cb44",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.6.0",
            "size": 26325,
            "upload_time": "2023-07-15T14:29:01",
            "upload_time_iso_8601": "2023-07-15T14:29:01.164229Z",
            "url": "https://files.pythonhosted.org/packages/4a/c8/8c5d9d0c422440c8c2d311fb048dceed6d1d53b8dc86f44ebd2e53211ee2/pyjuliusalign-4.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b8268b1696096bfb86d0b1e47426af3cbb62b809491d9e1aa164eab0ae11f260",
                "md5": "66c82f2a06b052c139f21d7e637eea74",
                "sha256": "d0cc3b391f31f2fc6419bf3234ab41a78132a6673af375d7993be0f2efc9410a"
            },
            "downloads": -1,
            "filename": "pyjuliusalign-4.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "66c82f2a06b052c139f21d7e637eea74",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.6.0",
            "size": 26813,
            "upload_time": "2023-07-15T14:29:03",
            "upload_time_iso_8601": "2023-07-15T14:29:03.015494Z",
            "url": "https://files.pythonhosted.org/packages/b8/26/8b1696096bfb86d0b1e47426af3cbb62b809491d9e1aa164eab0ae11f260/pyjuliusalign-4.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-15 14:29:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "timmahrt",
    "github_project": "pyJuliusAlign",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyjuliusalign"
}
        
Elapsed time: 0.29737s