# Narizaka
Tool to make high quality text to speech (tts) corpus from audio + text books.
## How it works
First it transcribes audio with whisper ASR, saving all word level timestamps, then it alligns this transcription with original text, if distance is very small we consider it as match and add it to the dataset.
## Installation
First, you should install several system dependancies:
On deb linux:
```
sudo apt install ffmpeg pandoc
```
on MacOSX:
```
brew install ffmpeg pandoc libmagic
```
Then you can install `narizaka`:
```
pip install narizaka
```
or if you want to use the latest development version:
```
pip install git+https://github.com/patriotyk/narizaka.git
```
Also if you plan to modify sources:
```
git clone https://github.com/patriotyk/narizaka.git
pip install -e narizaka/
```
Flag `-e` means that you can edit source files in the directory where you have cloned this project and they will be reflected when you run command `narizaka`
Every tagged commit on the `main` branch, automatically generates and pushes image to the docker hub. So you can also pull this images:
```
docker pull patriotyk/narizaka:latest
```
## How to use
Application as input accepts directory that contains audio data, it can be folder or subfolder of audio files, or just one audio file and there also should be one text file tat represents this audio.
This text file, can be any document that accepts `pandoc` application.
Example:
```
narizaka test_data/farshrutka
```
Or
```
narizaka test_data
```
to process all books.
This repository contains `test_data` that includes two audio and text books that you can use for testing.
Raw data
{
"_id": null,
"home_page": null,
"name": "narizaka",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "tts, text-to-speech, audio corpus",
"author": null,
"author_email": "Serhiy Stetskovych <patriotyk+narizaka@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/13/8e/f68b20fa82798275ced2162e8507471198d045817e41290e5a11e46c2cd6/narizaka-1.2.4.tar.gz",
"platform": null,
"description": "# Narizaka\nTool to make high quality text to speech (tts) corpus from audio + text books.\n\n## How it works\nFirst it transcribes audio with whisper ASR, saving all word level timestamps, then it alligns this transcription with original text, if distance is very small we consider it as match and add it to the dataset.\n\n\n## Installation\n\nFirst, you should install several system dependancies:\n\nOn deb linux:\n```\nsudo apt install ffmpeg pandoc\n```\non MacOSX:\n\n```\nbrew install ffmpeg pandoc libmagic\n```\n\nThen you can install `narizaka`:\n\n```\npip install narizaka\n```\nor if you want to use the latest development version:\n\n```\npip install git+https://github.com/patriotyk/narizaka.git\n```\nAlso if you plan to modify sources:\n\n```\ngit clone https://github.com/patriotyk/narizaka.git\npip install -e narizaka/\n```\nFlag `-e` means that you can edit source files in the directory where you have cloned this project and they will be reflected when you run command `narizaka`\n\nEvery tagged commit on the `main` branch, automatically generates and pushes image to the docker hub. So you can also pull this images:\n\n```\ndocker pull patriotyk/narizaka:latest\n```\n\n## How to use\n\nApplication as input accepts directory that contains audio data, it can be folder or subfolder of audio files, or just one audio file and there also should be one text file tat represents this audio.\nThis text file, can be any document that accepts `pandoc` application.\nExample:\n```\nnarizaka test_data/farshrutka \n```\nOr\n```\nnarizaka test_data\n```\nto process all books.\n\nThis repository contains `test_data` that includes two audio and text books that you can use for testing.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tool to make high quality text to speech (tts) corpus from audio + text books.",
"version": "1.2.4",
"project_urls": {
"Bug Tracker": "https://github.com/patriotyk/narizaka/issues",
"Homepage": "https://github.com/patriotyk/narizaka"
},
"split_keywords": [
"tts",
" text-to-speech",
" audio corpus"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "da3ae59b201a9ce21775a47c93e41e1be8f7b84d948e6635a5dd675723ac71b0",
"md5": "71c632841ea40a65999466a14f81f33a",
"sha256": "aa42a70815f9444c6265920b4b37a836a6acf2874e3932de8219643ffa50fe65"
},
"downloads": -1,
"filename": "narizaka-1.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "71c632841ea40a65999466a14f81f33a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 35528,
"upload_time": "2024-07-27T19:23:49",
"upload_time_iso_8601": "2024-07-27T19:23:49.570668Z",
"url": "https://files.pythonhosted.org/packages/da/3a/e59b201a9ce21775a47c93e41e1be8f7b84d948e6635a5dd675723ac71b0/narizaka-1.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "138ef68b20fa82798275ced2162e8507471198d045817e41290e5a11e46c2cd6",
"md5": "041f482d86770c0440bbe512ca1fc290",
"sha256": "ac15c2554bb1cdcce1b48e8e1836c5ce00a7fa98d853e2a212f641aba6903a64"
},
"downloads": -1,
"filename": "narizaka-1.2.4.tar.gz",
"has_sig": false,
"md5_digest": "041f482d86770c0440bbe512ca1fc290",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 34667,
"upload_time": "2024-07-27T19:23:50",
"upload_time_iso_8601": "2024-07-27T19:23:50.844134Z",
"url": "https://files.pythonhosted.org/packages/13/8e/f68b20fa82798275ced2162e8507471198d045817e41290e5a11e46c2cd6/narizaka-1.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-27 19:23:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "patriotyk",
"github_project": "narizaka",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "faster-whisper",
"specs": []
},
{
"name": "transformers",
"specs": []
},
{
"name": "ffmpeg",
"specs": []
},
{
"name": "ffmpeg-python",
"specs": []
},
{
"name": "pycairo",
"specs": []
},
{
"name": "PyGObject",
"specs": []
},
{
"name": "regex",
"specs": []
},
{
"name": "fuzzysearch",
"specs": []
},
{
"name": "python-magic",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "torchaudio",
"specs": []
},
{
"name": "auditok",
"specs": []
},
{
"name": "stable-ts",
"specs": [
[
"==",
"2.17.3"
]
]
},
{
"name": "retry2",
"specs": []
},
{
"name": "tqdm",
"specs": []
}
],
"lcname": "narizaka"
}