malayalam-asr-benchmarking


Namemalayalam-asr-benchmarking JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/kurianbenoy/malayalam_asr_benchmarking
SummaryA study to benchmark whisper based ASRs in Malayalam
upload_time2024-02-10 03:11:08
maintainer
docs_urlNone
authorkurianbenoy
requires_python>=3.7
licenseMIT License
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # malayalam_asr_benchmarking

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Objective of the project

<div>

> **Note**
>
> A study to benchmark ASRs in Malayalam. Till now the project has
> benchmark based on Malayalam ASR models based in Whisper.

</div>

## Benchmarked Datasets

Till now we have mainly benchmarked on two datasets:

1.  Common Voice 11 Dataset

I have now done benchmarking on Mozilla’s [Common Voice 11 Malayalam
subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).
The benchmarking results can be found in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).

2.  Malayalam Speech Corpus

I have now benchmarked on SMC’s [Malayalam Speech corpus
dataset](https://msc.smc.org.in/). The benchmarking results can be found
in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).

## Install

``` sh
pip install malayalam_asr_benchmarking
```

or from github repository

``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
```

Or locally

``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
cd malayalam_asr_benchmarking
pip install -e .
```

## Setting up your development environment

I am developing this project with nbdev. Please take some time reading
up on nbdev … how it works,
[directives](https://nbdev.fast.ai/explanations/directives.html), etc…
by checking out [the
walk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and
[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev
website](https://nbdev.fast.ai/)

### Step 1: Install Quarto:

`nbdev_install_quarto`

[Other options are mentioned in getting started to
quarto](https://quarto.org/docs/get-started/)

## Step 2: Install hooks

`nbdev_install_hooks`

## Step 3: Install our library

`pip install -e '.[dev]'`

## How to use

``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
```

    Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03k/1.03k [00:00<00:00, 6.09MB/s]
    Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151M/151M [00:24<00:00, 6.07MB/s]
    Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 827/827 [00:00<00:00, 2.64MB/s]
    Downloading (…)olve/main/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.14MB/s]
    Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 494k/494k [00:00<00:00, 2.65MB/s]
    Downloading (…)main/normalizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.7k/52.7k [00:00<00:00, 252kB/s]
    Downloading (…)in/added_tokens.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.11k/2.11k [00:00<00:00, 8.53MB/s]
    Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.06k/2.06k [00:00<00:00, 5.10MB/s]
    Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185k/185k [00:02<00:00, 76.2kB/s]

    AssertionError: Torch not compiled with CUDA enabled

``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_faster_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kurianbenoy/malayalam_asr_benchmarking",
    "name": "malayalam-asr-benchmarking",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "kurianbenoy",
    "author_email": "kurian.bkk@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "# malayalam_asr_benchmarking\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Objective of the project\n\n<div>\n\n> **Note**\n>\n> A study to benchmark ASRs in Malayalam. Till now the project has\n> benchmark based on Malayalam ASR models based in Whisper.\n\n</div>\n\n## Benchmarked Datasets\n\nTill now we have mainly benchmarked on two datasets:\n\n1.  Common Voice 11 Dataset\n\nI have now done benchmarking on Mozilla\u2019s [Common Voice 11 Malayalam\nsubset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).\nThe benchmarking results can be found in [the below\ndataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).\n\n2.  Malayalam Speech Corpus\n\nI have now benchmarked on SMC\u2019s [Malayalam Speech corpus\ndataset](https://msc.smc.org.in/). The benchmarking results can be found\nin [the below\ndataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).\n\n## Install\n\n``` sh\npip install malayalam_asr_benchmarking\n```\n\nor from github repository\n\n``` sh\n# Ensure git is installed, else install it. Eg: In ubuntu via apt install git\npip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git\n```\n\nOr locally\n\n``` sh\n# Ensure git is installed, else install it. Eg: In ubuntu via apt install git\ngit clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git\ncd malayalam_asr_benchmarking\npip install -e .\n```\n\n## Setting up your development environment\n\nI am developing this project with nbdev. Please take some time reading\nup on nbdev \u2026 how it works,\n[directives](https://nbdev.fast.ai/explanations/directives.html), etc\u2026\nby checking out [the\nwalk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and\n[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev\nwebsite](https://nbdev.fast.ai/)\n\n### Step 1: Install Quarto:\n\n`nbdev_install_quarto`\n\n[Other options are mentioned in getting started to\nquarto](https://quarto.org/docs/get-started/)\n\n## Step 2: Install hooks\n\n`nbdev_install_hooks`\n\n## Step 3: Install our library\n\n`pip install -e '.[dev]'`\n\n## How to use\n\n``` python\nfrom malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice\n\nwerlist = []\ncerlist = []\nmodelsizelist = []\ntimelist = []\n\nevaluate_whisper_model_common_voice(\"parambharat/whisper-tiny-ml\", werlist, cerlist, modelsizelist, timelist)\n```\n\n    Downloading (\u2026)lve/main/config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1.03k/1.03k [00:00<00:00, 6.09MB/s]\n    Downloading pytorch_model.bin: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 151M/151M [00:24<00:00, 6.07MB/s]\n    Downloading (\u2026)okenizer_config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 827/827 [00:00<00:00, 2.64MB/s]\n    Downloading (\u2026)olve/main/vocab.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1.04M/1.04M [00:00<00:00, 1.14MB/s]\n    Downloading (\u2026)olve/main/merges.txt: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 494k/494k [00:00<00:00, 2.65MB/s]\n    Downloading (\u2026)main/normalizer.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 52.7k/52.7k [00:00<00:00, 252kB/s]\n    Downloading (\u2026)in/added_tokens.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2.11k/2.11k [00:00<00:00, 8.53MB/s]\n    Downloading (\u2026)cial_tokens_map.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2.06k/2.06k [00:00<00:00, 5.10MB/s]\n    Downloading (\u2026)rocessor_config.json: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 185k/185k [00:02<00:00, 76.2kB/s]\n\n    AssertionError: Torch not compiled with CUDA enabled\n\n``` python\nfrom malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice\n\nwerlist = []\ncerlist = []\nmodelsizelist = []\ntimelist = []\n\nevaluate_faster_whisper_model_common_voice(\"parambharat/whisper-tiny-ml\", werlist, cerlist, modelsizelist, timelist)\n```\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A study to benchmark whisper based ASRs in Malayalam",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/kurianbenoy/malayalam_asr_benchmarking"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d24f940c98fe4256be138126c05435791d8d1bfd165f6acbbf4d9d2b354d36d8",
                "md5": "9e32fd7871c3761ab6c78dec534aebbb",
                "sha256": "42127df88e40a2caefdc836cddf3ef216cbd142d42afb762f63db411bd0979f1"
            },
            "downloads": -1,
            "filename": "malayalam_asr_benchmarking-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e32fd7871c3761ab6c78dec534aebbb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9094,
            "upload_time": "2024-02-10T03:11:08",
            "upload_time_iso_8601": "2024-02-10T03:11:08.302622Z",
            "url": "https://files.pythonhosted.org/packages/d2/4f/940c98fe4256be138126c05435791d8d1bfd165f6acbbf4d9d2b354d36d8/malayalam_asr_benchmarking-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-10 03:11:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kurianbenoy",
    "github_project": "malayalam_asr_benchmarking",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "malayalam-asr-benchmarking"
}
        
Elapsed time: 0.18994s