hear21passt


Namehear21passt JSON
Version 0.0.26 PyPI version JSON
download
home_pagehttps://github.com/kkoutini/passt_hear21
SummaryPasst pretrained model for HEAR 2021 NeurIPS Competition
upload_time2024-01-02 11:29:19
maintainer
docs_urlNone
authorKhaled Koutini
requires_python>=3.7
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements pre-commit pytest pytest-cov pytest-env torch torchaudio timm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PaSST package for HEAR 2021 NeurIPS Challenge Holistic Evaluation of Audio Representations

This is an implementation for [Efficient Training of Audio Transformers with Patchout](https://arxiv.org/abs/2110.05069) for HEAR 2021 NeurIPS Challenge
Holistic Evaluation of Audio Representations

# CUDA version

This is an implementation is tested with CUDA version 11.1, and torch installed:

```shell
pip3 install torch==1.8.1+cu111  torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
```

but should work on newer versions of CUDA and torch.

# Installation

Install the latest version of this repo:

```shell
pip install hear21passt
```

The models follow the [common API](https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html#common-api) of HEAR 21
:

```shell
hear-validator --model hear21passt.base.pt hear21passt.base
hear-validator --model noweights.txt hear21passt.base2levelF
hear-validator --model noweights.txt hear21passt.base2levelmel
 ```

There are three modules available `hear21passt.base`,`hear21passt.base2level`, `hear21passt.base2levelmel` :

```python
import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

model = load_model().cuda()
seconds = 15
audio = torch.ones((3, 32000 * seconds))*0.5
embed, time_stamps = get_timestamp_embeddings(audio, model)
print(embed.shape)
embed = get_scene_embeddings(audio, model)
print(embed.shape)
```

# Getting the Logits/Class Labels

You can get the logits (before the sigmoid activation) for the 527 classes of audioset:

```python
from hear21passt.base import load_model

model = load_model(mode="logits").cuda()
logits = model(wave_signal)
```

The class labels indices can be found [here](https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/master/metadata/class_labels_indices.csv)

You can also use different pre-trained models, for example, the model trained with KD `passt_s_kd_p16_128_ap486`:

```python
from hear21passt.base import get_basic_model

model = get_basic_model(mode="logits", arch="passt_s_kd_p16_128_ap486")
logits = model(wave_signal)

```

# Supporting longer clips

In case of an input longer than 10 seconds, the `get_scene_embeddings` method compute the average of the embedding of a 10-second overlapping windows.
Depending on the application, it may be useful to use a pre-trained that can extract embeddings from 20 or 30 seconds without averaging. These variant has pre-trained time positional encoding or 20/30 seconds:

```python
# from version 0.0.18, it's possible to use:
from hear21passt.base20sec import load_model # up to 20 seconds of audio.
# or 
from hear21passt.base30sec import load_model # up to 30 seconds of audio.

model = load_model(mode="logits").cuda()
logits = model(wave_signal)
```

# Loading other pre-trained models for logits or fine-tuning

Each pre-trained model has a specific frequency/time positional encoding, it's necessary to select the correct input shape to be able to load the models. The important variables for loading are `input_tdim`, `fstride` and `tstride` to specify the spectrograms time frames, the patches stride over frequency, and patches stride over time, respectively.

```python
import torch

from hear21passt.base import get_basic_model, get_model_passt

model = get_basic_model(mode="logits")

logits = model(some_wave_signal)

# Examples of other pre-trained models using the same spectrograms

# pre-traind on openMIC-18
model.net = get_model_passt(arch="openmic",  n_classes=20)
# pre-traind on FSD-50k
model.net = get_model_passt(arch="fsd50k",  n_classes=200)
# pre-traind on FSD-50k without patch-overlap (faster)
model.net = get_model_passt(arch="fsd50k-n",  n_classes=200, fstride=16, tstride=16)

# models are trained on 10 seconds audios from Audioset, but accept longer audios (20s, or 30s)
# These models are trained by sampling a 10-second time-pos-encodings sequence 
model.net = get_model_passt("passt_20sec", input_tdim=2000)
model.net = get_model_passt("passt_30sec", input_tdim=3000)
```

If you provide the wrong spectrograms, the model may fail silently, by generating low-quality embeddings and logits. Make sure you have the correct spectrograms' config for the selected pre-trained models.
Models with higher spectrogram resolutions, need to specify the correct spectrogram config:

```python
from hear21passt.models.preprocess import AugmentMelSTFT

# high-res pre-trained on Audioset
model.net = get_model_passt("stfthop160", input_tdim=2000)

# hopsize=160 for this pretrained model
model.mel = AugmentMelSTFT(n_mels=128, sr=32000, win_length=800, hopsize=160, n_fft=1024, freqm=48,
                         timem=192,
                         htk=False, fmin=0.0, fmax=None, norm=1, fmin_aug_range=10,
                         fmax_aug_range=2000)



# higher-res pre-trained on Audioset
model.net = get_model_passt("stfthop100", input_tdim=3200)

# hopsize=100 for this pretrained model
model.mel = AugmentMelSTFT(n_mels=128, sr=32000, win_length=800, hopsize=100, n_fft=1024, freqm=48,
                         timem=192,
                         htk=False, fmin=0.0, fmax=None, norm=1, fmin_aug_range=10,
                         fmax_aug_range=2000)



```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kkoutini/passt_hear21",
    "name": "hear21passt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Khaled Koutini",
    "author_email": "first.last@jku.at",
    "download_url": "https://files.pythonhosted.org/packages/9e/4d/53efcc38f947303fc52547c0bbcbfe8a0d0dec5abb8217bd377dfb65430d/hear21passt-0.0.26.tar.gz",
    "platform": null,
    "description": "# PaSST package for HEAR 2021 NeurIPS Challenge Holistic Evaluation of Audio Representations\n\nThis is an implementation for [Efficient Training of Audio Transformers with Patchout](https://arxiv.org/abs/2110.05069) for HEAR 2021 NeurIPS Challenge\nHolistic Evaluation of Audio Representations\n\n# CUDA version\n\nThis is an implementation is tested with CUDA version 11.1, and torch installed:\n\n```shell\npip3 install torch==1.8.1+cu111  torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html\n```\n\nbut should work on newer versions of CUDA and torch.\n\n# Installation\n\nInstall the latest version of this repo:\n\n```shell\npip install hear21passt\n```\n\nThe models follow the [common API](https://neuralaudio.ai/hear2021-holistic-evaluation-of-audio-representations.html#common-api) of HEAR 21\n:\n\n```shell\nhear-validator --model hear21passt.base.pt hear21passt.base\nhear-validator --model noweights.txt hear21passt.base2levelF\nhear-validator --model noweights.txt hear21passt.base2levelmel\n ```\n\nThere are three modules available `hear21passt.base`,`hear21passt.base2level`, `hear21passt.base2levelmel` :\n\n```python\nimport torch\n\nfrom hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings\n\nmodel = load_model().cuda()\nseconds = 15\naudio = torch.ones((3, 32000 * seconds))*0.5\nembed, time_stamps = get_timestamp_embeddings(audio, model)\nprint(embed.shape)\nembed = get_scene_embeddings(audio, model)\nprint(embed.shape)\n```\n\n# Getting the Logits/Class Labels\n\nYou can get the logits (before the sigmoid activation) for the 527 classes of audioset:\n\n```python\nfrom hear21passt.base import load_model\n\nmodel = load_model(mode=\"logits\").cuda()\nlogits = model(wave_signal)\n```\n\nThe class labels indices can be found [here](https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/master/metadata/class_labels_indices.csv)\n\nYou can also use different pre-trained models, for example, the model trained with KD `passt_s_kd_p16_128_ap486`:\n\n```python\nfrom hear21passt.base import get_basic_model\n\nmodel = get_basic_model(mode=\"logits\", arch=\"passt_s_kd_p16_128_ap486\")\nlogits = model(wave_signal)\n\n```\n\n# Supporting longer clips\n\nIn case of an input longer than 10 seconds, the `get_scene_embeddings` method compute the average of the embedding of a 10-second overlapping windows.\nDepending on the application, it may be useful to use a pre-trained that can extract embeddings from 20 or 30 seconds without averaging. These variant has pre-trained time positional encoding or 20/30 seconds:\n\n```python\n# from version 0.0.18, it's possible to use:\nfrom hear21passt.base20sec import load_model # up to 20 seconds of audio.\n# or \nfrom hear21passt.base30sec import load_model # up to 30 seconds of audio.\n\nmodel = load_model(mode=\"logits\").cuda()\nlogits = model(wave_signal)\n```\n\n# Loading other pre-trained models for logits or fine-tuning\n\nEach pre-trained model has a specific frequency/time positional encoding, it's necessary to select the correct input shape to be able to load the models. The important variables for loading are `input_tdim`, `fstride` and `tstride` to specify the spectrograms time frames, the patches stride over frequency, and patches stride over time, respectively.\n\n```python\nimport torch\n\nfrom hear21passt.base import get_basic_model, get_model_passt\n\nmodel = get_basic_model(mode=\"logits\")\n\nlogits = model(some_wave_signal)\n\n# Examples of other pre-trained models using the same spectrograms\n\n# pre-traind on openMIC-18\nmodel.net = get_model_passt(arch=\"openmic\",  n_classes=20)\n# pre-traind on FSD-50k\nmodel.net = get_model_passt(arch=\"fsd50k\",  n_classes=200)\n# pre-traind on FSD-50k without patch-overlap (faster)\nmodel.net = get_model_passt(arch=\"fsd50k-n\",  n_classes=200, fstride=16, tstride=16)\n\n# models are trained on 10 seconds audios from Audioset, but accept longer audios (20s, or 30s)\n# These models are trained by sampling a 10-second time-pos-encodings sequence \nmodel.net = get_model_passt(\"passt_20sec\", input_tdim=2000)\nmodel.net = get_model_passt(\"passt_30sec\", input_tdim=3000)\n```\n\nIf you provide the wrong spectrograms, the model may fail silently, by generating low-quality embeddings and logits. Make sure you have the correct spectrograms' config for the selected pre-trained models.\nModels with higher spectrogram resolutions, need to specify the correct spectrogram config:\n\n```python\nfrom hear21passt.models.preprocess import AugmentMelSTFT\n\n# high-res pre-trained on Audioset\nmodel.net = get_model_passt(\"stfthop160\", input_tdim=2000)\n\n# hopsize=160 for this pretrained model\nmodel.mel = AugmentMelSTFT(n_mels=128, sr=32000, win_length=800, hopsize=160, n_fft=1024, freqm=48,\n                         timem=192,\n                         htk=False, fmin=0.0, fmax=None, norm=1, fmin_aug_range=10,\n                         fmax_aug_range=2000)\n\n\n\n# higher-res pre-trained on Audioset\nmodel.net = get_model_passt(\"stfthop100\", input_tdim=3200)\n\n# hopsize=100 for this pretrained model\nmodel.mel = AugmentMelSTFT(n_mels=128, sr=32000, win_length=800, hopsize=100, n_fft=1024, freqm=48,\n                         timem=192,\n                         htk=False, fmin=0.0, fmax=None, norm=1, fmin_aug_range=10,\n                         fmax_aug_range=2000)\n\n\n\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Passt pretrained model for HEAR 2021 NeurIPS Competition",
    "version": "0.0.26",
    "project_urls": {
        "Bug Tracker": "https://github.com/kkoutini/passt_hear21/issues",
        "Homepage": "https://github.com/kkoutini/passt_hear21",
        "Source Code": "https://github.com/kkoutini/passt_hear21"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e89dd14d332d831125ff9aab91df8dec4f2707219e9136b6d1dc77cf8a7dae89",
                "md5": "d62ddd04d1ecce5327464f52cd4a2f83",
                "sha256": "a3a7377604c6d829369111ab26a86fc5dd40154ec611b8fa5819ecaa6b252550"
            },
            "downloads": -1,
            "filename": "hear21passt-0.0.26-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d62ddd04d1ecce5327464f52cd4a2f83",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 33541,
            "upload_time": "2024-01-02T11:29:18",
            "upload_time_iso_8601": "2024-01-02T11:29:18.004777Z",
            "url": "https://files.pythonhosted.org/packages/e8/9d/d14d332d831125ff9aab91df8dec4f2707219e9136b6d1dc77cf8a7dae89/hear21passt-0.0.26-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e4d53efcc38f947303fc52547c0bbcbfe8a0d0dec5abb8217bd377dfb65430d",
                "md5": "3801c1b7a591df3ec834645ea4b4502b",
                "sha256": "9aa91cef4ca6468d9075092a9c377b09270b5a59dfa83ecdd4ce9ea92c8c8431"
            },
            "downloads": -1,
            "filename": "hear21passt-0.0.26.tar.gz",
            "has_sig": false,
            "md5_digest": "3801c1b7a591df3ec834645ea4b4502b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 23681,
            "upload_time": "2024-01-02T11:29:19",
            "upload_time_iso_8601": "2024-01-02T11:29:19.529993Z",
            "url": "https://files.pythonhosted.org/packages/9e/4d/53efcc38f947303fc52547c0bbcbfe8a0d0dec5abb8217bd377dfb65430d/hear21passt-0.0.26.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-02 11:29:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kkoutini",
    "github_project": "passt_hear21",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pre-commit",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        },
        {
            "name": "pytest-env",
            "specs": []
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.8.1"
                ]
            ]
        },
        {
            "name": "torchaudio",
            "specs": [
                [
                    ">=",
                    "0.8.1"
                ]
            ]
        },
        {
            "name": "timm",
            "specs": [
                [
                    ">=",
                    "0.4.12"
                ]
            ]
        }
    ],
    "lcname": "hear21passt"
}
        
Elapsed time: 0.16221s