deeprhythm


Namedeeprhythm JSON
Version 0.0.13 PyPI version JSON
download
home_pageNone
SummaryA fast, accurate Tempo Predictor
upload_time2024-12-07 00:22:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements librosa torch pandas numpy nnAudio h5py torchaudio
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DeepRhythm: High-Speed Tempo Prediction

DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).

Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.

[more details here](https://bleu.green/deeprhythm)

## Classification Process

1. Split input audio into 8 second clips `[len_batch, len_audio]`
2. Compute the HCQM of each clip
   1. Compute STFT `[len_batch, stft_bands, len_audio/hop]`
   2. Sum STFT bins into 8 log-spaced bands using filter matrix `[len_batch, 8, len_audio/hop]`
   3. Flatten bands for parallel CQT processing `[len_batch*8, len_audio/hop]`
   4. For each of the six harmonics, compute the CQT `[6, len_batch*8, num_cqt_bins]`
   5. Reshape `[len_batch, num_cqt_bins, 8, 6]`
3. Feed HCQM through CNN `[len_batch, num_classes (256)]`
4. Softmax the outputs to get probabilities
5. Choose the class with the highest probability and convert to bpm (bpms = `[len_batch]`)

## Benchmarks

| Method                  | Acc1 (%)  | Acc2 (%)  | Avg. Time (s) | Total Time (s) |
| ----------------------- | --------- | --------- | ------------- | -------------- |
| DeepRhythm (cuda)       | **95.91** | 96.54     | **0.021**     | 20.11          |
| DeepRhythm (cpu)        | **95.91** | 96.54     | 0.12          | 115.02         |
| TempoCNN (cnn)          | 84.78     | **97.69** | 1.21          | 1150.43        |
| TempoCNN (fcn)          | 83.53     | 96.54     | 1.19          | 1131.51        |
| Essentia (multifeature) | 87.93     | 97.48     | 2.72          | 2595.64        |
| Essentia (percival)     | 85.83     | 95.07     | 1.35          | 1289.62        |
| Essentia (degara)       | 86.46     | 97.17     | 1.38          | 1310.69        |
| Librosa                 | 66.84     | 75.13     | 0.48          | 460.52         |

- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
- Acc1 = Prediction within +/- 2% of actual bpm
- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
- I could only get TempoCNN to run on cpu (it requires Cuda 10)

## Installation

To install DeepRhythm, ensure you have Python and pip installed. Then run:

```bash
pip install deeprhythm
```

## Usage

### CLI Inference

#### Single

```bash
python -m deeprhythm.infer /path/to/song.wav -cq
> ([bpm], [confidence])
```

Flags:

- `-c`, `--conf` - include confidence scores
- `-d`, `--device [cuda/cpu/mps]` - specify model device
- `-q`, `--quiet` - prints only bpm/conf

#### Batch

To predict the tempo of all songs in a directory, run

```bash
python -m deeprhythm.batch_infer /path/to/dir
```

This will create in a jsonl file mapping filepath to predicted BPM.

Flags:

- `-o output_path.jsonl` - provide a custom output path (default 'batch_results.jsonl`)
- `-c`, `--conf` - include confidence scores
- `-d`, `--device [cuda/cpu/mps]` - specify model device
- `-q`, `--quiet` - doesn't print status / logs

### Python Inference

To predict the tempo of a song:

```python
from deeprhythm import DeepRhythmPredictor

model = DeepRhythmPredictor()

tempo = model.predict('path/to/song.mp3')

# to include confidence
tempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)

print(f"Predicted Tempo: {tempo} BPM")
```

Audio is loaded with librosa, which supports most audio formats. 

If you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:

```python
import librosa
from deeprhythm import DeepRhythmPredictor

model = DeepRhythmPredictor()

audio, sr = librosa.load('path/to/song.mp3')

# ... other steps for processing the audio ...

tempo = model.predict_from_audio(audio, sr)

# to include confidence
tempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)

print(f"Predicted Tempo: {tempo} BPM")
```

## References

[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.

[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "deeprhythm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "bleugreen <bleugreendesign@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5b/43/c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c/deeprhythm-0.0.13.tar.gz",
    "platform": null,
    "description": "# DeepRhythm: High-Speed Tempo Prediction\n\nDeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).\n\nAudio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.\n\n[more details here](https://bleu.green/deeprhythm)\n\n## Classification Process\n\n1. Split input audio into 8 second clips `[len_batch, len_audio]`\n2. Compute the HCQM of each clip\n   1. Compute STFT `[len_batch, stft_bands, len_audio/hop]`\n   2. Sum STFT bins into 8 log-spaced bands using filter matrix `[len_batch, 8, len_audio/hop]`\n   3. Flatten bands for parallel CQT processing `[len_batch*8, len_audio/hop]`\n   4. For each of the six harmonics, compute the CQT `[6, len_batch*8, num_cqt_bins]`\n   5. Reshape `[len_batch, num_cqt_bins, 8, 6]`\n3. Feed HCQM through CNN `[len_batch, num_classes (256)]`\n4. Softmax the outputs to get probabilities\n5. Choose the class with the highest probability and convert to bpm (bpms = `[len_batch]`)\n\n## Benchmarks\n\n| Method                  | Acc1 (%)  | Acc2 (%)  | Avg. Time (s) | Total Time (s) |\n| ----------------------- | --------- | --------- | ------------- | -------------- |\n| DeepRhythm (cuda)       | **95.91** | 96.54     | **0.021**     | 20.11          |\n| DeepRhythm (cpu)        | **95.91** | 96.54     | 0.12          | 115.02         |\n| TempoCNN (cnn)          | 84.78     | **97.69** | 1.21          | 1150.43        |\n| TempoCNN (fcn)          | 83.53     | 96.54     | 1.19          | 1131.51        |\n| Essentia (multifeature) | 87.93     | 97.48     | 2.72          | 2595.64        |\n| Essentia (percival)     | 85.83     | 95.07     | 1.35          | 1289.62        |\n| Essentia (degara)       | 86.46     | 97.17     | 1.38          | 1310.69        |\n| Librosa                 | 66.84     | 75.13     | 0.48          | 460.52         |\n\n- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock\n- Acc1 = Prediction within +/- 2% of actual bpm\n- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)\n- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)\n- I could only get TempoCNN to run on cpu (it requires Cuda 10)\n\n## Installation\n\nTo install DeepRhythm, ensure you have Python and pip installed. Then run:\n\n```bash\npip install deeprhythm\n```\n\n## Usage\n\n### CLI Inference\n\n#### Single\n\n```bash\npython -m deeprhythm.infer /path/to/song.wav -cq\n> ([bpm], [confidence])\n```\n\nFlags:\n\n- `-c`, `--conf` - include confidence scores\n- `-d`, `--device [cuda/cpu/mps]` - specify model device\n- `-q`, `--quiet` - prints only bpm/conf\n\n#### Batch\n\nTo predict the tempo of all songs in a directory, run\n\n```bash\npython -m deeprhythm.batch_infer /path/to/dir\n```\n\nThis will create in a jsonl file mapping filepath to predicted BPM.\n\nFlags:\n\n- `-o output_path.jsonl` - provide a custom output path (default 'batch_results.jsonl`)\n- `-c`, `--conf` - include confidence scores\n- `-d`, `--device [cuda/cpu/mps]` - specify model device\n- `-q`, `--quiet` - doesn't print status / logs\n\n### Python Inference\n\nTo predict the tempo of a song:\n\n```python\nfrom deeprhythm import DeepRhythmPredictor\n\nmodel = DeepRhythmPredictor()\n\ntempo = model.predict('path/to/song.mp3')\n\n# to include confidence\ntempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)\n\nprint(f\"Predicted Tempo: {tempo} BPM\")\n```\n\nAudio is loaded with librosa, which supports most audio formats. \n\nIf you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:\n\n```python\nimport librosa\nfrom deeprhythm import DeepRhythmPredictor\n\nmodel = DeepRhythmPredictor()\n\naudio, sr = librosa.load('path/to/song.mp3')\n\n# ... other steps for processing the audio ...\n\ntempo = model.predict_from_audio(audio, sr)\n\n# to include confidence\ntempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)\n\nprint(f\"Predicted Tempo: {tempo} BPM\")\n```\n\n## References\n\n[1] Hadrien Foroughmand and Geoffroy Peeters, \u201cDeep-Rhythm for Global Tempo Estimation in Music\u201d, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636\u2013643. doi: 10.5281/zenodo.3527890.\n\n[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, \"nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks,\" in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A fast, accurate Tempo Predictor",
    "version": "0.0.13",
    "project_urls": {
        "Homepage": "https://github.com/bleugreen/deeprhythm",
        "Issues": "https://github.com/bleugreen/deeprhythm/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e2eb9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347",
                "md5": "a610a336aec72b0d1079e9d793ae0296",
                "sha256": "e032b4676a2e46c3aff7eeb30cfd7d46b3292dfa15732c7113decbddef068c95"
            },
            "downloads": -1,
            "filename": "deeprhythm-0.0.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a610a336aec72b0d1079e9d793ae0296",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32212,
            "upload_time": "2024-12-07T00:22:49",
            "upload_time_iso_8601": "2024-12-07T00:22:49.258327Z",
            "url": "https://files.pythonhosted.org/packages/3e/2e/b9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347/deeprhythm-0.0.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b43c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c",
                "md5": "5346ef3883dc4975bd99fb3282710106",
                "sha256": "8333b06c6dd3f440ddac5d43f3437812da2c384c545334f2edc3f0d0a883f2f0"
            },
            "downloads": -1,
            "filename": "deeprhythm-0.0.13.tar.gz",
            "has_sig": false,
            "md5_digest": "5346ef3883dc4975bd99fb3282710106",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10127050,
            "upload_time": "2024-12-07T00:22:55",
            "upload_time_iso_8601": "2024-12-07T00:22:55.779711Z",
            "url": "https://files.pythonhosted.org/packages/5b/43/c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c/deeprhythm-0.0.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-07 00:22:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bleugreen",
    "github_project": "deeprhythm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "librosa",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "nnAudio",
            "specs": []
        },
        {
            "name": "h5py",
            "specs": []
        },
        {
            "name": "torchaudio",
            "specs": []
        }
    ],
    "lcname": "deeprhythm"
}
        
Elapsed time: 0.43726s