# DeepRhythm: High-Speed Tempo Prediction
DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).
Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.
[more details here](https://bleu.green/deeprhythm)
## Classification Process
1. Split input audio into 8 second clips `[len_batch, len_audio]`
2. Compute the HCQM of each clip
1. Compute STFT `[len_batch, stft_bands, len_audio/hop]`
2. Sum STFT bins into 8 log-spaced bands using filter matrix `[len_batch, 8, len_audio/hop]`
3. Flatten bands for parallel CQT processing `[len_batch*8, len_audio/hop]`
4. For each of the six harmonics, compute the CQT `[6, len_batch*8, num_cqt_bins]`
5. Reshape `[len_batch, num_cqt_bins, 8, 6]`
3. Feed HCQM through CNN `[len_batch, num_classes (256)]`
4. Softmax the outputs to get probabilities
5. Choose the class with the highest probability and convert to bpm (bpms = `[len_batch]`)
## Benchmarks
| Method | Acc1 (%) | Acc2 (%) | Avg. Time (s) | Total Time (s) |
| ----------------------- | --------- | --------- | ------------- | -------------- |
| DeepRhythm (cuda) | **95.91** | 96.54 | **0.021** | 20.11 |
| DeepRhythm (cpu) | **95.91** | 96.54 | 0.12 | 115.02 |
| TempoCNN (cnn) | 84.78 | **97.69** | 1.21 | 1150.43 |
| TempoCNN (fcn) | 83.53 | 96.54 | 1.19 | 1131.51 |
| Essentia (multifeature) | 87.93 | 97.48 | 2.72 | 2595.64 |
| Essentia (percival) | 85.83 | 95.07 | 1.35 | 1289.62 |
| Essentia (degara) | 86.46 | 97.17 | 1.38 | 1310.69 |
| Librosa | 66.84 | 75.13 | 0.48 | 460.52 |
- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
- Acc1 = Prediction within +/- 2% of actual bpm
- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
- I could only get TempoCNN to run on cpu (it requires Cuda 10)
## Installation
To install DeepRhythm, ensure you have Python and pip installed. Then run:
```bash
pip install deeprhythm
```
## Usage
### CLI Inference
#### Single
```bash
python -m deeprhythm.infer /path/to/song.wav -cq
> ([bpm], [confidence])
```
Flags:
- `-c`, `--conf` - include confidence scores
- `-d`, `--device [cuda/cpu/mps]` - specify model device
- `-q`, `--quiet` - prints only bpm/conf
#### Batch
To predict the tempo of all songs in a directory, run
```bash
python -m deeprhythm.batch_infer /path/to/dir
```
This will create in a jsonl file mapping filepath to predicted BPM.
Flags:
- `-o output_path.jsonl` - provide a custom output path (default 'batch_results.jsonl`)
- `-c`, `--conf` - include confidence scores
- `-d`, `--device [cuda/cpu/mps]` - specify model device
- `-q`, `--quiet` - doesn't print status / logs
### Python Inference
To predict the tempo of a song:
```python
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
tempo = model.predict('path/to/song.mp3')
# to include confidence
tempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
```
Audio is loaded with librosa, which supports most audio formats.
If you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:
```python
import librosa
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
audio, sr = librosa.load('path/to/song.mp3')
# ... other steps for processing the audio ...
tempo = model.predict_from_audio(audio, sr)
# to include confidence
tempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
```
## References
[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.
[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.
Raw data
{
"_id": null,
"home_page": null,
"name": "deeprhythm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "bleugreen <bleugreendesign@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5b/43/c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c/deeprhythm-0.0.13.tar.gz",
"platform": null,
"description": "# DeepRhythm: High-Speed Tempo Prediction\n\nDeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).\n\nAudio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.\n\n[more details here](https://bleu.green/deeprhythm)\n\n## Classification Process\n\n1. Split input audio into 8 second clips `[len_batch, len_audio]`\n2. Compute the HCQM of each clip\n 1. Compute STFT `[len_batch, stft_bands, len_audio/hop]`\n 2. Sum STFT bins into 8 log-spaced bands using filter matrix `[len_batch, 8, len_audio/hop]`\n 3. Flatten bands for parallel CQT processing `[len_batch*8, len_audio/hop]`\n 4. For each of the six harmonics, compute the CQT `[6, len_batch*8, num_cqt_bins]`\n 5. Reshape `[len_batch, num_cqt_bins, 8, 6]`\n3. Feed HCQM through CNN `[len_batch, num_classes (256)]`\n4. Softmax the outputs to get probabilities\n5. Choose the class with the highest probability and convert to bpm (bpms = `[len_batch]`)\n\n## Benchmarks\n\n| Method | Acc1 (%) | Acc2 (%) | Avg. Time (s) | Total Time (s) |\n| ----------------------- | --------- | --------- | ------------- | -------------- |\n| DeepRhythm (cuda) | **95.91** | 96.54 | **0.021** | 20.11 |\n| DeepRhythm (cpu) | **95.91** | 96.54 | 0.12 | 115.02 |\n| TempoCNN (cnn) | 84.78 | **97.69** | 1.21 | 1150.43 |\n| TempoCNN (fcn) | 83.53 | 96.54 | 1.19 | 1131.51 |\n| Essentia (multifeature) | 87.93 | 97.48 | 2.72 | 2595.64 |\n| Essentia (percival) | 85.83 | 95.07 | 1.35 | 1289.62 |\n| Essentia (degara) | 86.46 | 97.17 | 1.38 | 1310.69 |\n| Librosa | 66.84 | 75.13 | 0.48 | 460.52 |\n\n- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock\n- Acc1 = Prediction within +/- 2% of actual bpm\n- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)\n- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)\n- I could only get TempoCNN to run on cpu (it requires Cuda 10)\n\n## Installation\n\nTo install DeepRhythm, ensure you have Python and pip installed. Then run:\n\n```bash\npip install deeprhythm\n```\n\n## Usage\n\n### CLI Inference\n\n#### Single\n\n```bash\npython -m deeprhythm.infer /path/to/song.wav -cq\n> ([bpm], [confidence])\n```\n\nFlags:\n\n- `-c`, `--conf` - include confidence scores\n- `-d`, `--device [cuda/cpu/mps]` - specify model device\n- `-q`, `--quiet` - prints only bpm/conf\n\n#### Batch\n\nTo predict the tempo of all songs in a directory, run\n\n```bash\npython -m deeprhythm.batch_infer /path/to/dir\n```\n\nThis will create in a jsonl file mapping filepath to predicted BPM.\n\nFlags:\n\n- `-o output_path.jsonl` - provide a custom output path (default 'batch_results.jsonl`)\n- `-c`, `--conf` - include confidence scores\n- `-d`, `--device [cuda/cpu/mps]` - specify model device\n- `-q`, `--quiet` - doesn't print status / logs\n\n### Python Inference\n\nTo predict the tempo of a song:\n\n```python\nfrom deeprhythm import DeepRhythmPredictor\n\nmodel = DeepRhythmPredictor()\n\ntempo = model.predict('path/to/song.mp3')\n\n# to include confidence\ntempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)\n\nprint(f\"Predicted Tempo: {tempo} BPM\")\n```\n\nAudio is loaded with librosa, which supports most audio formats. \n\nIf you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:\n\n```python\nimport librosa\nfrom deeprhythm import DeepRhythmPredictor\n\nmodel = DeepRhythmPredictor()\n\naudio, sr = librosa.load('path/to/song.mp3')\n\n# ... other steps for processing the audio ...\n\ntempo = model.predict_from_audio(audio, sr)\n\n# to include confidence\ntempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)\n\nprint(f\"Predicted Tempo: {tempo} BPM\")\n```\n\n## References\n\n[1] Hadrien Foroughmand and Geoffroy Peeters, \u201cDeep-Rhythm for Global Tempo Estimation in Music\u201d, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636\u2013643. doi: 10.5281/zenodo.3527890.\n\n[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, \"nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks,\" in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.\n",
"bugtrack_url": null,
"license": null,
"summary": "A fast, accurate Tempo Predictor",
"version": "0.0.13",
"project_urls": {
"Homepage": "https://github.com/bleugreen/deeprhythm",
"Issues": "https://github.com/bleugreen/deeprhythm/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3e2eb9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347",
"md5": "a610a336aec72b0d1079e9d793ae0296",
"sha256": "e032b4676a2e46c3aff7eeb30cfd7d46b3292dfa15732c7113decbddef068c95"
},
"downloads": -1,
"filename": "deeprhythm-0.0.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a610a336aec72b0d1079e9d793ae0296",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 32212,
"upload_time": "2024-12-07T00:22:49",
"upload_time_iso_8601": "2024-12-07T00:22:49.258327Z",
"url": "https://files.pythonhosted.org/packages/3e/2e/b9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347/deeprhythm-0.0.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5b43c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c",
"md5": "5346ef3883dc4975bd99fb3282710106",
"sha256": "8333b06c6dd3f440ddac5d43f3437812da2c384c545334f2edc3f0d0a883f2f0"
},
"downloads": -1,
"filename": "deeprhythm-0.0.13.tar.gz",
"has_sig": false,
"md5_digest": "5346ef3883dc4975bd99fb3282710106",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 10127050,
"upload_time": "2024-12-07T00:22:55",
"upload_time_iso_8601": "2024-12-07T00:22:55.779711Z",
"url": "https://files.pythonhosted.org/packages/5b/43/c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c/deeprhythm-0.0.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-07 00:22:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bleugreen",
"github_project": "deeprhythm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "librosa",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "nnAudio",
"specs": []
},
{
"name": "h5py",
"specs": []
},
{
"name": "torchaudio",
"specs": []
}
],
"lcname": "deeprhythm"
}