# ailia AI Speech Python API
!! CAUTION !!
“ailia” IS NOT OPEN SOURCE SOFTWARE (OSS).
As long as user complies with the conditions stated in [License Document](https://ailia.ai/license/), user may use the Software for free of charge, but the Software is basically paid software.
## About ailia AI Speech
ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.
## Install from pip
You can install the ailia AI Speech free evaluation package with the following command.
```
pip3 install ailia_speech
```
## Install from package
You can install the ailia AI Speech from Package with the following command.
```
python3 bootstrap.py
pip3 install ./
```
## Usage
### Batch mode
In batch mode, the entire audio is transcribed at once.
```python
import ailia_speech
import librosa
import os
import urllib.request
# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
urllib.request.urlretrieve(
"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
"demo.wav"
)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)
# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
print(text)
```
### Step mode
In step mode, the audio is input in chunks and transcribed sequentially.
```python
import ailia_speech
import librosa
import os
import urllib.request
# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
urllib.request.urlretrieve(
"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
"demo.wav"
)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)
# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
complete = False
if i + sampling_rate >= audio_waveform.shape[0]:
complete = True
recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
for text in recognized_text:
print(text)
```
### Available model types
It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.
```
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO
```
## API specification
https://github.com/axinc-ai/ailia-sdk
Raw data
{
"_id": null,
"home_page": "https://ailia.jp/",
"name": "ailia-speech",
"maintainer": null,
"docs_url": null,
"requires_python": ">3.6",
"maintainer_email": null,
"keywords": null,
"author": "ax Inc.",
"author_email": "contact@axinc.jp",
"download_url": "https://files.pythonhosted.org/packages/1d/db/05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d/ailia_speech-1.3.2.3.tar.gz",
"platform": null,
"description": "# ailia AI Speech Python API\n\n!! CAUTION !!\n\u201cailia\u201d IS NOT OPEN SOURCE SOFTWARE (OSS).\nAs long as user complies with the conditions stated in [License Document](https://ailia.ai/license/), user may use the Software for free of charge, but the Software is basically paid software.\n\n## About ailia AI Speech\n\nailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.\n\n## Install from pip\n\nYou can install the ailia AI Speech free evaluation package with the following command.\n\n```\npip3 install ailia_speech\n```\n\n## Install from package\n\nYou can install the ailia AI Speech from Package with the following command.\n\n```\npython3 bootstrap.py\npip3 install ./\n```\n\n## Usage\n\n### Batch mode\n\nIn batch mode, the entire audio is transcribed at once.\n\n```python\nimport ailia_speech\n\nimport librosa\n\nimport os\nimport urllib.request\n\n# Load target audio\ninput_file_path = \"demo.wav\"\nif not os.path.exists(input_file_path):\n\turllib.request.urlretrieve(\n\t\t\"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav\",\n\t\t\"demo.wav\"\n\t)\naudio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)\n\n# Infer\nspeech = ailia_speech.Whisper()\nspeech.initialize_model(model_path = \"./models/\", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)\nrecognized_text = speech.transcribe(audio_waveform, sampling_rate)\nfor text in recognized_text:\n\tprint(text)\n```\n\n### Step mode\n\nIn step mode, the audio is input in chunks and transcribed sequentially.\n\n```python\nimport ailia_speech\n\nimport librosa\n\nimport os\nimport urllib.request\n\n# Load target audio\ninput_file_path = \"demo.wav\"\nif not os.path.exists(input_file_path):\n\turllib.request.urlretrieve(\n\t\t\"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav\",\n\t\t\"demo.wav\"\n\t)\naudio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)\n\n# Infer\nspeech = ailia_speech.Whisper()\nspeech.initialize_model(model_path = \"./models/\", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)\nspeech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)\nfor i in range(0, audio_waveform.shape[0], sampling_rate):\n\tcomplete = False\n\tif i + sampling_rate >= audio_waveform.shape[0]:\n\t\tcomplete = True\n\trecognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)\n\tfor text in recognized_text:\n\t\tprint(text)\n```\n\n### Available model types\n\nIt is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.\n\n```\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY\nilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO\n```\n\n## API specification\n\nhttps://github.com/axinc-ai/ailia-sdk\n\n",
"bugtrack_url": null,
"license": "https://ailia.ai/en/license/",
"summary": "ailia AI Speech",
"version": "1.3.2.3",
"project_urls": {
"Homepage": "https://ailia.jp/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "91af345abda84b75a9ac419fcb4471166c6b98e36649e4c62cb6fc42453d33e3",
"md5": "e032c668dbf59b60a3ede69599e64796",
"sha256": "3512189e829cbaa267a554d3b0f7cf697de75a4443b8e3f81acc01d5f412c58d"
},
"downloads": -1,
"filename": "ailia_speech-1.3.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e032c668dbf59b60a3ede69599e64796",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">3.6",
"size": 559490,
"upload_time": "2025-01-05T08:51:27",
"upload_time_iso_8601": "2025-01-05T08:51:27.809503Z",
"url": "https://files.pythonhosted.org/packages/91/af/345abda84b75a9ac419fcb4471166c6b98e36649e4c62cb6fc42453d33e3/ailia_speech-1.3.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1ddb05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d",
"md5": "d8a78fe0cbe22abc5e40e132c7e2bc0d",
"sha256": "bc5ef3d37fff028999fe58f979cd78c866e747426ff377e230df15a97c81f513"
},
"downloads": -1,
"filename": "ailia_speech-1.3.2.3.tar.gz",
"has_sig": false,
"md5_digest": "d8a78fe0cbe22abc5e40e132c7e2bc0d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">3.6",
"size": 555000,
"upload_time": "2025-01-05T08:51:30",
"upload_time_iso_8601": "2025-01-05T08:51:30.874501Z",
"url": "https://files.pythonhosted.org/packages/1d/db/05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d/ailia_speech-1.3.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-05 08:51:30",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ailia-speech"
}