speechmos


Namespeechmos JSON
Version 0.0.1.1 PyPI version JSON
download
home_page
SummaryMOS (Mean Opinion Score) models for evaluating audio quality.
upload_time2023-07-13 09:20:40
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords aecmos dnsmos plcmos acoustic echo cancellation noise suppression packet loss concealment audio evaluation speech evaluation mean opinion score mos audio quality speech quality
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AECMOS, DNSMOS, PLCMOS

* We release the [AECMOS](https://ieeexplore.ieee.org/document/9747836 "AECMOS: A Speech Quality Assessment Metric for Echo Impairment."), [DNSMOS](https://ieeexplore.ieee.org/document/9746108 "DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors."), and [PLCMOS](https://arxiv.org/abs/2305.15127 "PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms.") models that we have developed for evaluating audio degradations due to echo, noise, packet loss and other sources.

## Prerequisites
- Python 3.7 and above
- librosa 0.9.1
- numpy 1.21.5
- onnxruntime 1.10.0
- pandas
- tqdm


## Usage:
```python
from speechmos import aecmos, dnsmos, plcmos

aecmos.run(sample, sr, talk_type, **kwargs)

dnsmos.run(sample, sr, **kwargs)

plcmos.run(sample, sr, **kwargs)
``` 

- `sample` is one of the following:
    - For AECMOS:  dictionary of the form `{'lpb': lpb, 'mic': mic, 'enh': enh}` corresponding to the loopback, microphone, and enhanced audio as type `np.ndarray` or paths to audio files of type supported by `librosa`. 
    - For DNSMOS and PLCMOS: `np.ndarray` or a path to an audio file of type supported by `librosa`.
    
    All audio should be single channel (mono) audio.  
    Alternatively, `sample` can be a list of items of one of the above types.  

-  `sr` denotes the sampling rate. Sampling rate should be either 16000 or 48000. AECMOS is available at 48kHz, all other models are available at 16kHz. All audio should be provided at the correct sampling rate.

For AECMOS:
- `talk_type` specifies the scenario: `'st'` (far-end single talk), `'nst'` (near-end single talk), or `'dt'` (double talk) if known. `talk_type` can be `None` in which case the 16kHz scenarioless model can be used. The performance is about 2% lower in correlation with the ground truth than the scenario based model.

For DNSMOS: 
- `model_type` controls which DNSMOS model to use: `'dnsmos'` or `'dnsmos_personalized'`. The default is `'dnsmos'`.

Additional arguments:
- `return_df` controls whether a pandas dataframe is returned containing sample information and MOS scores when evaluating a list of samples. The default is `return_df = True`. If set to `False`, a list of dictionaries is returned instead.
- `verbose` controls whether more details are printed on the screen. The default is `verbose = False`.

## Usage examples:

#### AECMOS usage example with `sample` as a dictionary of numpy arrays and unknown `talk_type`.

```python
import librosa
from speechmos import aecmos

lpb, _ = librosa.load("d:/data/example/lpb.wav", sr=16000)
mic, _ = librosa.load("d:/data/example/mic.wav", sr=16000)
enh, _ = librosa.load("d:/data/example/enh.wav", sr=16000)

sample = {'lpb': lpb, 'mic': mic, 'enh': enh}

aecmos.run(sample, sr= 16000, verbose= True)
```

Output:
```
Model version aecmos_scenarioless_16kHz.
The model sampling rate is 16000.
{'echo_mos': 4.9999470710754395, 'deg_mos': 3.4854962825775146, 'talk_type': None, 'model_name': 'aecmos_scenarioless_16kHz'}
```


#### AECMOS usage example with `sample` as a list of dictionaries of paths to audio files.

```python
from speechmos import aecmos
aecmos.run(sample_list, sr=48000, 'dt', verbose = True)
```

Output:
```
Using model aecmos_48kHz to evaluate 3 samples.
Model sampling rate is 48000.
0it [00:00, ?it/s]
1it [00:00,  8.59it/s]
3it [00:00, 25.77it/s]
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}
       echo_mos   deg_mos
count  3.000000  3.000000
mean   3.240038  3.408777
std    0.000000  0.000000
min    3.240038  3.408777
25%    3.240038  3.408777
50%    3.240038  3.408777
75%    3.240038  3.408777
max    3.240038  3.408777
```

#### DNSMOS usage example with `sample` as a numpy array:

```python
import librosa
from speechmos import dnsmos

audio, _ = librosa.load("D:/data/example/enh.wav", sr=16000)
dnsmos.run(audio, sr=16000)
```

Output:
```
{'filename': 'D:/data/example/enh.wav',
 'ovrl_mos': 2.2067626609880104,
 'sig_mos': 3.290418848414798,
 'bak_mos': 2.141338429075571,
 'p808_mos': 3.0722866}
```

#### PLCMOS usage example with `sample` as a path to an audio file:

```python
import librosa
from speechmos import plcmos

plcmos.run("D:/data/example/enh.wav", sr=16000)
```

Output:
```
{'filename': 'D:/data/example/enh.wav',
 'plcmos': 2.5210512320200604,
 'model': 'plcmos_v2'}
```

## Citation:
C. K. A. Reddy, V. Gopal and R. Cutler, "Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886-890, doi: 10.1109/ICASSP43922.2022.9746108.

L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, "PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms." arXiv preprint arXiv:2305.15127 (2023).

M. Purin, S. Sootla, M. Sponza, A. Saabas and R. Cutler, "AECMOS: A Speech Quality Assessment Metric for Echo Impairment," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 901-905, doi: 10.1109/ICASSP43922.2022.9747836.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "speechmos",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "AEC Challenge Organizers <aec_challenge@microsoft.com>, Marju Purin <marjupurin@microsoft.com>",
    "keywords": "aecmos,dnsmos,plcmos,acoustic echo cancellation,noise suppression,packet loss concealment,audio evaluation,speech evaluation,mean opinion score,MOS,audio quality,speech quality",
    "author": "",
    "author_email": "Marju Purin <marjupurin@microsoft.com>, AEC Challenge Organizers <aec_challenge@microsoft.com>",
    "download_url": "https://files.pythonhosted.org/packages/1e/03/b9f7fb53094b7919feb7a37d0dbc445f20276cff0743f518cd8d2726074a/speechmos-0.0.1.1.tar.gz",
    "platform": null,
    "description": "# AECMOS, DNSMOS, PLCMOS\r\n\r\n* We release the [AECMOS](https://ieeexplore.ieee.org/document/9747836 \"AECMOS: A Speech Quality Assessment Metric for Echo Impairment.\"), [DNSMOS](https://ieeexplore.ieee.org/document/9746108 \"DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors.\"), and [PLCMOS](https://arxiv.org/abs/2305.15127 \"PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms.\") models that we have developed for evaluating audio degradations due to echo, noise, packet loss and other sources.\r\n\r\n## Prerequisites\r\n- Python 3.7 and above\r\n- librosa 0.9.1\r\n- numpy 1.21.5\r\n- onnxruntime 1.10.0\r\n- pandas\r\n- tqdm\r\n\r\n\r\n## Usage:\r\n```python\r\nfrom speechmos import aecmos, dnsmos, plcmos\r\n\r\naecmos.run(sample, sr, talk_type, **kwargs)\r\n\r\ndnsmos.run(sample, sr, **kwargs)\r\n\r\nplcmos.run(sample, sr, **kwargs)\r\n``` \r\n\r\n- `sample` is one of the following:\r\n    - For AECMOS:  dictionary of the form `{'lpb': lpb, 'mic': mic, 'enh': enh}` corresponding to the loopback, microphone, and enhanced audio as type `np.ndarray` or paths to audio files of type supported by `librosa`. \r\n    - For DNSMOS and PLCMOS: `np.ndarray` or a path to an audio file of type supported by `librosa`.\r\n    \r\n    All audio should be single channel (mono) audio.  \r\n    Alternatively, `sample` can be a list of items of one of the above types.  \r\n\r\n-  `sr` denotes the sampling rate. Sampling rate should be either 16000 or 48000. AECMOS is available at 48kHz, all other models are available at 16kHz. All audio should be provided at the correct sampling rate.\r\n\r\nFor AECMOS:\r\n- `talk_type` specifies the scenario: `'st'` (far-end single talk), `'nst'` (near-end single talk), or `'dt'` (double talk) if known. `talk_type` can be `None` in which case the 16kHz scenarioless model can be used. The performance is about 2% lower in correlation with the ground truth than the scenario based model.\r\n\r\nFor DNSMOS: \r\n- `model_type` controls which DNSMOS model to use: `'dnsmos'` or `'dnsmos_personalized'`. The default is `'dnsmos'`.\r\n\r\nAdditional arguments:\r\n- `return_df` controls whether a pandas dataframe is returned containing sample information and MOS scores when evaluating a list of samples. The default is `return_df = True`. If set to `False`, a list of dictionaries is returned instead.\r\n- `verbose` controls whether more details are printed on the screen. The default is `verbose = False`.\r\n\r\n## Usage examples:\r\n\r\n#### AECMOS usage example with `sample` as a dictionary of numpy arrays and unknown `talk_type`.\r\n\r\n```python\r\nimport librosa\r\nfrom speechmos import aecmos\r\n\r\nlpb, _ = librosa.load(\"d:/data/example/lpb.wav\", sr=16000)\r\nmic, _ = librosa.load(\"d:/data/example/mic.wav\", sr=16000)\r\nenh, _ = librosa.load(\"d:/data/example/enh.wav\", sr=16000)\r\n\r\nsample = {'lpb': lpb, 'mic': mic, 'enh': enh}\r\n\r\naecmos.run(sample, sr= 16000, verbose= True)\r\n```\r\n\r\nOutput:\r\n```\r\nModel version aecmos_scenarioless_16kHz.\r\nThe model sampling rate is 16000.\r\n{'echo_mos': 4.9999470710754395, 'deg_mos': 3.4854962825775146, 'talk_type': None, 'model_name': 'aecmos_scenarioless_16kHz'}\r\n```\r\n\r\n\r\n#### AECMOS usage example with `sample` as a list of dictionaries of paths to audio files.\r\n\r\n```python\r\nfrom speechmos import aecmos\r\naecmos.run(sample_list, sr=48000, 'dt', verbose = True)\r\n```\r\n\r\nOutput:\r\n```\r\nUsing model aecmos_48kHz to evaluate 3 samples.\r\nModel sampling rate is 48000.\r\n0it [00:00, ?it/s]\r\n1it [00:00,  8.59it/s]\r\n3it [00:00, 25.77it/s]\r\n{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}\r\n{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}\r\n{'lpb_path': 'D:/data/example/lpb.wav', 'mic_path': 'D:/data/example/mic.wav', 'enh_path': 'D:/data/example/enh.wav', 'echo_mos': 3.2400383949279785, 'deg_mos': 3.4087774753570557, 'talk_type': 'dt', 'model_name': 'aecmos_48kHz'}\r\n       echo_mos   deg_mos\r\ncount  3.000000  3.000000\r\nmean   3.240038  3.408777\r\nstd    0.000000  0.000000\r\nmin    3.240038  3.408777\r\n25%    3.240038  3.408777\r\n50%    3.240038  3.408777\r\n75%    3.240038  3.408777\r\nmax    3.240038  3.408777\r\n```\r\n\r\n#### DNSMOS usage example with `sample` as a numpy array:\r\n\r\n```python\r\nimport librosa\r\nfrom speechmos import dnsmos\r\n\r\naudio, _ = librosa.load(\"D:/data/example/enh.wav\", sr=16000)\r\ndnsmos.run(audio, sr=16000)\r\n```\r\n\r\nOutput:\r\n```\r\n{'filename': 'D:/data/example/enh.wav',\r\n 'ovrl_mos': 2.2067626609880104,\r\n 'sig_mos': 3.290418848414798,\r\n 'bak_mos': 2.141338429075571,\r\n 'p808_mos': 3.0722866}\r\n```\r\n\r\n#### PLCMOS usage example with `sample` as a path to an audio file:\r\n\r\n```python\r\nimport librosa\r\nfrom speechmos import plcmos\r\n\r\nplcmos.run(\"D:/data/example/enh.wav\", sr=16000)\r\n```\r\n\r\nOutput:\r\n```\r\n{'filename': 'D:/data/example/enh.wav',\r\n 'plcmos': 2.5210512320200604,\r\n 'model': 'plcmos_v2'}\r\n```\r\n\r\n## Citation:\r\nC. K. A. Reddy, V. Gopal and R. Cutler, \"Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,\" ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886-890, doi: 10.1109/ICASSP43922.2022.9746108.\r\n\r\nL. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, \"PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms.\" arXiv preprint arXiv:2305.15127 (2023).\r\n\r\nM. Purin, S. Sootla, M. Sponza, A. Saabas and R. Cutler, \"AECMOS: A Speech Quality Assessment Metric for Echo Impairment,\" ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 901-905, doi: 10.1109/ICASSP43922.2022.9747836.\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "MOS (Mean Opinion Score) models for evaluating audio quality.",
    "version": "0.0.1.1",
    "project_urls": {
        "AEC (acoustic echo cancellation)": "https://github.com/microsoft/AEC-Challenge",
        "DNS (deep noise suppression)": "https://github.com/microsoft/DNS-Challenge",
        "PLC (packet loss concealment)": "https://github.com/microsoft/PLC-Challenge"
    },
    "split_keywords": [
        "aecmos",
        "dnsmos",
        "plcmos",
        "acoustic echo cancellation",
        "noise suppression",
        "packet loss concealment",
        "audio evaluation",
        "speech evaluation",
        "mean opinion score",
        "mos",
        "audio quality",
        "speech quality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "300e08369f3574447acfe78a7678f9a5e2b9c6629888be25a5e2407d616a6c02",
                "md5": "8e2e17753b417b1e60a6e51abc50c9d4",
                "sha256": "31c4c9d3234f6ee10102edff74333014c50006a3f389daf7ecacae34e68ebbf7"
            },
            "downloads": -1,
            "filename": "speechmos-0.0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8e2e17753b417b1e60a6e51abc50c9d4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9439732,
            "upload_time": "2023-07-13T09:20:37",
            "upload_time_iso_8601": "2023-07-13T09:20:37.718906Z",
            "url": "https://files.pythonhosted.org/packages/30/0e/08369f3574447acfe78a7678f9a5e2b9c6629888be25a5e2407d616a6c02/speechmos-0.0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e03b9f7fb53094b7919feb7a37d0dbc445f20276cff0743f518cd8d2726074a",
                "md5": "a9e5ccad5faf0df8aff2e753368c75e9",
                "sha256": "f65040b408a5114b808fba8abcbd22aa09dc77e60bea0c2d2b060efebca53dec"
            },
            "downloads": -1,
            "filename": "speechmos-0.0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a9e5ccad5faf0df8aff2e753368c75e9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9442691,
            "upload_time": "2023-07-13T09:20:40",
            "upload_time_iso_8601": "2023-07-13T09:20:40.616044Z",
            "url": "https://files.pythonhosted.org/packages/1e/03/b9f7fb53094b7919feb7a37d0dbc445f20276cff0743f518cd8d2726074a/speechmos-0.0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-13 09:20:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "microsoft",
    "github_project": "AEC-Challenge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "speechmos"
}
        
Elapsed time: 0.12182s