ailia-speech

Name	ailia-speech JSON
Version	1.3.2.3 JSON
	download
home_page	https://ailia.jp/
Summary	ailia AI Speech
upload_time	2025-01-05 08:51:30
maintainer	None
docs_url	None
author	ax Inc.
requires_python	>3.6
license	https://ailia.ai/en/license/
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ailia AI Speech Python API

!! CAUTION !!
“ailia” IS NOT OPEN SOURCE SOFTWARE (OSS).
As long as user complies with the conditions stated in [License Document](https://ailia.ai/license/), user may use the Software for free of charge, but the Software is basically paid software.

## About ailia AI Speech

ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.

## Install from pip

You can install the ailia AI Speech free evaluation package with the following command.

```
pip3 install ailia_speech
```

## Install from package

You can install the ailia AI Speech from Package with the following command.

```
python3 bootstrap.py
pip3 install ./
```

## Usage

### Batch mode

In batch mode, the entire audio is transcribed at once.

```python
import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
	print(text)
```

### Step mode

In step mode, the audio is input in chunks and transcribed sequentially.

```python
import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
	complete = False
	if i + sampling_rate >= audio_waveform.shape[0]:
		complete = True
	recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
	for text in recognized_text:
		print(text)
```

### Available model types

It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.

```
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO
```

## API specification

https://github.com/axinc-ai/ailia-sdk

Raw data

            {
    "_id": null,
    "home_page": "https://ailia.jp/",
    "name": "ailia-speech",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "ax Inc.",
    "author_email": "contact@axinc.jp",
    "download_url": "https://files.pythonhosted.org/packages/1d/db/05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d/ailia_speech-1.3.2.3.tar.gz",
    "platform": null,
    "description": "# ailia AI Speech Python API\n\n!! CAUTION !!\n\u201cailia\u201d IS NOT OPEN SOURCE SOFTWARE (OSS).\nAs long as user complies with the conditions stated in [License Document](https://ailia.ai/license/), user may use the Software for free of charge, but the Software is basically paid software.\n\n## About ailia AI Speech\n\nailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.\n\n## Install from pip\n\nYou can install the ailia AI Speech free evaluation package with the following command.\n\n```\npip3 install ailia_speech\n```\n\n## Install from package\n\nYou can install the ailia AI Speech from Package with the following command.\n\n```\npython3 bootstrap.py\npip3 install ./\n```\n\n## Usage\n\n### Batch mode\n\nIn batch mode, the entire audio is transcribed at once.\n\n```python\nimport ailia_speech\n\nimport librosa\n\nimport os\nimport urllib.request\n\n# Load target audio\ninput_file_path = \"demo.wav\"\nif not os.path.exists(input_file_path):\n\turllib.request.urlretrieve(\n\t\t\"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav\",\n\t\t\"demo.wav\"\n\t)\naudio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)\n\n# Infer\nspeech = ailia_speech.Whisper()\nspeech.initialize_model(model_path = \"./models/\", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)\nrecognized_text = speech.transcribe(audio_waveform, sampling_rate)\nfor text in recognized_text:\n\tprint(text)\n```\n\n### Step mode\n\nIn step mode, the audio is input in chunks and transcribed sequentially.\n\n```python\nimport ailia_speech\n\nimport librosa\n\nimport os\nimport urllib.request\n\n# Load target audio\ninput_file_path = \"demo.wav\"\nif not os.path.exists(input_file_path):\n\turllib.request.urlretrieve(\n\t\t\"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav\",\n\t\t\"demo.wav\"\n\t)\naudio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)\n\n# Infer\nspeech = ailia_speech.Whisper()\nspeech.initialize_model(model_path = \"./models/\", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)\nspeech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)\nfor i in range(0, audio_waveform.shape[0], sampling_rate):\n\tcomplete = False\n\tif i + sampling_rate >= audio_waveform.shape[0]:\n\t\tcomplete = True\n\trecognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)\n\tfor text in recognized_text:\n\t\tprint(text)\n```\n\n### Available model types\n\nIt is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.\n\n```\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY\nilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3\nailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO\n```\n\n## API specification\n\nhttps://github.com/axinc-ai/ailia-sdk\n\n",
    "bugtrack_url": null,
    "license": "https://ailia.ai/en/license/",
    "summary": "ailia AI Speech",
    "version": "1.3.2.3",
    "project_urls": {
        "Homepage": "https://ailia.jp/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "91af345abda84b75a9ac419fcb4471166c6b98e36649e4c62cb6fc42453d33e3",
                "md5": "e032c668dbf59b60a3ede69599e64796",
                "sha256": "3512189e829cbaa267a554d3b0f7cf697de75a4443b8e3f81acc01d5f412c58d"
            },
            "downloads": -1,
            "filename": "ailia_speech-1.3.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e032c668dbf59b60a3ede69599e64796",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.6",
            "size": 559490,
            "upload_time": "2025-01-05T08:51:27",
            "upload_time_iso_8601": "2025-01-05T08:51:27.809503Z",
            "url": "https://files.pythonhosted.org/packages/91/af/345abda84b75a9ac419fcb4471166c6b98e36649e4c62cb6fc42453d33e3/ailia_speech-1.3.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ddb05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d",
                "md5": "d8a78fe0cbe22abc5e40e132c7e2bc0d",
                "sha256": "bc5ef3d37fff028999fe58f979cd78c866e747426ff377e230df15a97c81f513"
            },
            "downloads": -1,
            "filename": "ailia_speech-1.3.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d8a78fe0cbe22abc5e40e132c7e2bc0d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.6",
            "size": 555000,
            "upload_time": "2025-01-05T08:51:30",
            "upload_time_iso_8601": "2025-01-05T08:51:30.874501Z",
            "url": "https://files.pythonhosted.org/packages/1d/db/05bb544b034f4f8a048f1a063a13ecfab87bf24920659eeae4c09bdb881d/ailia_speech-1.3.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-05 08:51:30",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "ailia-speech"
}

ax Inc.