# malayalam_asr_benchmarking
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
## Objective of the project
<div>
> **Note**
>
> A study to benchmark ASRs in Malayalam. Till now the project has
> benchmark based on Malayalam ASR models based in Whisper.
</div>
## Benchmarked Datasets
Till now we have mainly benchmarked on two datasets:
1. Common Voice 11 Dataset
I have now done benchmarking on Mozilla’s [Common Voice 11 Malayalam
subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).
The benchmarking results can be found in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).
2. Malayalam Speech Corpus
I have now benchmarked on SMC’s [Malayalam Speech corpus
dataset](https://msc.smc.org.in/). The benchmarking results can be found
in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).
## Install
``` sh
pip install malayalam_asr_benchmarking
```
or from github repository
``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
```
Or locally
``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
cd malayalam_asr_benchmarking
pip install -e .
```
## Setting up your development environment
I am developing this project with nbdev. Please take some time reading
up on nbdev … how it works,
[directives](https://nbdev.fast.ai/explanations/directives.html), etc…
by checking out [the
walk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and
[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev
website](https://nbdev.fast.ai/)
### Step 1: Install Quarto:
`nbdev_install_quarto`
[Other options are mentioned in getting started to
quarto](https://quarto.org/docs/get-started/)
## Step 2: Install hooks
`nbdev_install_hooks`
## Step 3: Install our library
`pip install -e '.[dev]'`
## How to use
``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice
werlist = []
cerlist = []
modelsizelist = []
timelist = []
evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
```
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03k/1.03k [00:00<00:00, 6.09MB/s]
Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151M/151M [00:24<00:00, 6.07MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 827/827 [00:00<00:00, 2.64MB/s]
Downloading (…)olve/main/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.14MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 494k/494k [00:00<00:00, 2.65MB/s]
Downloading (…)main/normalizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.7k/52.7k [00:00<00:00, 252kB/s]
Downloading (…)in/added_tokens.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.11k/2.11k [00:00<00:00, 8.53MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.06k/2.06k [00:00<00:00, 5.10MB/s]
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185k/185k [00:02<00:00, 76.2kB/s]
AssertionError: Torch not compiled with CUDA enabled
``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice
werlist = []
cerlist = []
modelsizelist = []
timelist = []
evaluate_faster_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/kurianbenoy/malayalam_asr_benchmarking",
"name": "malayalam-asr-benchmarking",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "nbdev jupyter notebook python",
"author": "kurianbenoy",
"author_email": "kurian.bkk@gmail.com",
"download_url": "",
"platform": null,
"description": "# malayalam_asr_benchmarking\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Objective of the project\n\n<div>\n\n> **Note**\n>\n> A study to benchmark ASRs in Malayalam. Till now the project has\n> benchmark based on Malayalam ASR models based in Whisper.\n\n</div>\n\n## Benchmarked Datasets\n\nTill now we have mainly benchmarked on two datasets:\n\n1. Common Voice 11 Dataset\n\nI have now done benchmarking on Mozilla\u2019s [Common Voice 11 Malayalam\nsubset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).\nThe benchmarking results can be found in [the below\ndataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).\n\n2. Malayalam Speech Corpus\n\nI have now benchmarked on SMC\u2019s [Malayalam Speech corpus\ndataset](https://msc.smc.org.in/). The benchmarking results can be found\nin [the below\ndataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).\n\n## Install\n\n``` sh\npip install malayalam_asr_benchmarking\n```\n\nor from github repository\n\n``` sh\n# Ensure git is installed, else install it. Eg: In ubuntu via apt install git\npip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git\n```\n\nOr locally\n\n``` sh\n# Ensure git is installed, else install it. Eg: In ubuntu via apt install git\ngit clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git\ncd malayalam_asr_benchmarking\npip install -e .\n```\n\n## Setting up your development environment\n\nI am developing this project with nbdev. Please take some time reading\nup on nbdev \u2026 how it works,\n[directives](https://nbdev.fast.ai/explanations/directives.html), etc\u2026\nby checking out [the\nwalk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and\n[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev\nwebsite](https://nbdev.fast.ai/)\n\n### Step 1: Install Quarto:\n\n`nbdev_install_quarto`\n\n[Other options are mentioned in getting started to\nquarto](https://quarto.org/docs/get-started/)\n\n## Step 2: Install hooks\n\n`nbdev_install_hooks`\n\n## Step 3: Install our library\n\n`pip install -e '.[dev]'`\n\n## How to use\n\n``` python\nfrom malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice\n\nwerlist = []\ncerlist = []\nmodelsizelist = []\ntimelist = []\n\nevaluate_whisper_model_common_voice(\"parambharat/whisper-tiny-ml\", werlist, cerlist, modelsizelist, timelist)\n```\n\n Downloading (\u2026)lve/main/config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1.03k/1.03k [00:00<00:00, 6.09MB/s]\n Downloading pytorch_model.bin: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 151M/151M [00:24<00:00, 6.07MB/s]\n Downloading (\u2026)okenizer_config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 827/827 [00:00<00:00, 2.64MB/s]\n Downloading (\u2026)olve/main/vocab.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1.04M/1.04M [00:00<00:00, 1.14MB/s]\n Downloading (\u2026)olve/main/merges.txt: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 494k/494k [00:00<00:00, 2.65MB/s]\n Downloading (\u2026)main/normalizer.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 52.7k/52.7k [00:00<00:00, 252kB/s]\n Downloading (\u2026)in/added_tokens.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2.11k/2.11k [00:00<00:00, 8.53MB/s]\n Downloading (\u2026)cial_tokens_map.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2.06k/2.06k [00:00<00:00, 5.10MB/s]\n Downloading (\u2026)rocessor_config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 185k/185k [00:02<00:00, 76.2kB/s]\n\n AssertionError: Torch not compiled with CUDA enabled\n\n``` python\nfrom malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice\n\nwerlist = []\ncerlist = []\nmodelsizelist = []\ntimelist = []\n\nevaluate_faster_whisper_model_common_voice(\"parambharat/whisper-tiny-ml\", werlist, cerlist, modelsizelist, timelist)\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A study to benchmark whisper based ASRs in Malayalam",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/kurianbenoy/malayalam_asr_benchmarking"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d24f940c98fe4256be138126c05435791d8d1bfd165f6acbbf4d9d2b354d36d8",
"md5": "9e32fd7871c3761ab6c78dec534aebbb",
"sha256": "42127df88e40a2caefdc836cddf3ef216cbd142d42afb762f63db411bd0979f1"
},
"downloads": -1,
"filename": "malayalam_asr_benchmarking-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9e32fd7871c3761ab6c78dec534aebbb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 9094,
"upload_time": "2024-02-10T03:11:08",
"upload_time_iso_8601": "2024-02-10T03:11:08.302622Z",
"url": "https://files.pythonhosted.org/packages/d2/4f/940c98fe4256be138126c05435791d8d1bfd165f6acbbf4d9d2b354d36d8/malayalam_asr_benchmarking-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-10 03:11:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kurianbenoy",
"github_project": "malayalam_asr_benchmarking",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "malayalam-asr-benchmarking"
}