# All-In-One Music Structure Analyzer
[![Visual Demo](https://img.shields.io/badge/Visual-Demo-8A2BE2)](https://taejun.kim/music-dissector/)
[![arXiv](https://img.shields.io/badge/arXiv-2307.16425-B31B1B)](http://arxiv.org/abs/2307.16425/)
[![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-f9f107)](https://huggingface.co/spaces/taejunkim/all-in-one/)
[![PyPI - Version](https://img.shields.io/pypi/v/allin1.svg)](https://pypi.org/project/allin1)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/allin1.svg)](https://pypi.org/project/allin1)
This package provides models for music structure analysis, predicting:
1. Tempo (BPM)
2. Beats
3. Downbeats
4. Functional segment boundaries
5. Functional segment labels (e.g., intro, verse, chorus, bridge, outro)
-----
**Table of Contents**
- [Installation](#installation)
- [Usage for CLI](#usage-for-cli)
- [Usage for Python](#usage-for-python)
- [Visualization & Sonification](#visualization--sonification)
- [Available Models](#available-models)
- [Speed](#speed)
- [Advanced Usage for Research](#advanced-usage-for-research)
- [Concerning MP3 Files](#concerning-mp3-files)
- [Training](TRAINING.md)
- [Citation](#citation)
## Installation
### 1. Install PyTorch
Visit [PyTorch](https://pytorch.org/) and install the appropriate version for your system.
### 2. Install NATTEN (Required for Linux and Windows; macOS will auto-install)
* **Linux**: Download from [NATTEN website](https://www.shi-labs.com/natten/)
* **macOS**: Auto-installs with `allin1`.
* **Windows**: Build from source:
```shell
pip install ninja # Recommended, not required
git clone https://github.com/SHI-Labs/NATTEN
cd NATTEN
make
```
### 3. Install the package
```shell
pip install git+https://github.com/CPJKU/madmom # install the latest madmom directly from GitHub
pip install allin1 # install this package
```
### 4. (Optional) Install FFmpeg for MP3 support
For ubuntu:
```shell
sudo apt install ffmpeg
```
For macOS:
```shell
brew install ffmpeg
```
## Usage for CLI
To analyze audio files:
```shell
allin1 your_audio_file1.wav your_audio_file2.mp3
```
Results will be saved in the `./struct` directory by default:
```shell
./struct
└── your_audio_file1.json
└── your_audio_file2.json
```
The analysis results will be saved in JSON format:
```json
{
"path": "/path/to/your_audio_file.wav",
"bpm": 100,
"beats": [ 0.33, 0.75, 1.14, ... ],
"downbeats": [ 0.33, 1.94, 3.53, ... ],
"beat_positions": [ 1, 2, 3, 4, 1, 2, 3, 4, 1, ... ],
"segments": [
{
"start": 0.0,
"end": 0.33,
"label": "start"
},
{
"start": 0.33,
"end": 13.13,
"label": "intro"
},
{
"start": 13.13,
"end": 37.53,
"label": "chorus"
},
{
"start": 37.53,
"end": 51.53,
"label": "verse"
},
...
]
}
```
All available options are as follows:
```shell
$ allin1 -h
usage: allin1 [-h] [-o OUT_DIR] [-v] [--viz-dir VIZ_DIR] [-s] [--sonif-dir SONIF_DIR] [-a] [-e] [-m MODEL] [-d DEVICE] [-k]
[--demix-dir DEMIX_DIR] [--spec-dir SPEC_DIR]
paths [paths ...]
positional arguments:
paths Path to tracks
options:
-h, --help show this help message and exit
-o OUT_DIR, --out-dir OUT_DIR
Path to a directory to store analysis results (default: ./struct)
-v, --visualize Save visualizations (default: False)
--viz-dir VIZ_DIR Directory to save visualizations if -v is provided (default: ./viz)
-s, --sonify Save sonifications (default: False)
--sonif-dir SONIF_DIR
Directory to save sonifications if -s is provided (default: ./sonif)
-a, --activ Save frame-level raw activations from sigmoid and softmax (default: False)
-e, --embed Save frame-level embeddings (default: False)
-m MODEL, --model MODEL
Name of the pretrained model to use (default: harmonix-all)
-d DEVICE, --device DEVICE
Device to use (default: cuda if available else cpu)
-k, --keep-byproducts
Keep demixed audio files and spectrograms (default: False)
--demix-dir DEMIX_DIR
Path to a directory to store demixed tracks (default: ./demix)
--spec-dir SPEC_DIR Path to a directory to store spectrograms (default: ./spec)
```
## Usage for Python
Available functions:
- [`analyze()`](#analyze)
- [`load_result()`](#load_result)
- [`visualize()`](#visualize)
- [`sonify()`](#sonify)
### `analyze()`
Analyzes the provided audio files and returns the analysis results.
```python
import allin1
# You can analyze a single file:
result = allin1.analyze('your_audio_file.wav')
# Or multiple files:
results = allin1.analyze(['your_audio_file1.wav', 'your_audio_file2.mp3'])
```
A result is a dataclass instance containing:
```python
AnalysisResult(
path='/path/to/your_audio_file.wav',
bpm=100,
beats=[0.33, 0.75, 1.14, ...],
beat_positions=[1, 2, 3, 4, 1, 2, 3, 4, 1, ...],
downbeats=[0.33, 1.94, 3.53, ...],
segments=[
Segment(start=0.0, end=0.33, label='start'),
Segment(start=0.33, end=13.13, label='intro'),
Segment(start=13.13, end=37.53, label='chorus'),
Segment(start=37.53, end=51.53, label='verse'),
Segment(start=51.53, end=64.34, label='verse'),
Segment(start=64.34, end=89.93, label='chorus'),
Segment(start=89.93, end=105.93, label='bridge'),
Segment(start=105.93, end=134.74, label='chorus'),
Segment(start=134.74, end=153.95, label='chorus'),
Segment(start=153.95, end=154.67, label='end'),
]),
```
Unlike CLI, it does not save the results to disk by default. You can save them as follows:
```python
result = allin1.analyze(
'your_audio_file.wav',
out_dir='./struct',
)
```
#### Parameters:
- `paths` : `Union[PathLike, List[PathLike]]`
List of paths or a single path to the audio files to be analyzed.
- `out_dir` : `PathLike` (optional)
Path to the directory where the analysis results will be saved. By default, the results will not be saved.
- `visualize` : `Union[bool, PathLike]` (optional)
Whether to visualize the analysis results or not. If a path is provided, the visualizations will be saved in that directory. Default is False. If True, the visualizations will be saved in './viz'.
- `sonify` : `Union[bool, PathLike]` (optional)
Whether to sonify the analysis results or not. If a path is provided, the sonifications will be saved in that directory. Default is False. If True, the sonifications will be saved in './sonif'.
- `model` : `str` (optional)
Name of the pre-trained model to be used for the analysis. Default is 'harmonix-all'. Please refer to the documentation for the available models.
- `device` : `str` (optional)
Device to be used for computation. Default is 'cuda' if available, otherwise 'cpu'.
- `include_activations` : `bool` (optional)
Whether to include activations in the analysis results or not.
- `include_embeddings` : `bool` (optional)
Whether to include embeddings in the analysis results or not.
- `demix_dir` : `PathLike` (optional)
Path to the directory where the source-separated audio will be saved. Default is './demix'.
- `spec_dir` : `PathLike` (optional)
Path to the directory where the spectrograms will be saved. Default is './spec'.
- `keep_byproducts` : `bool` (optional)
Whether to keep the source-separated audio and spectrograms or not. Default is False.
- `multiprocess` : `bool` (optional)
Whether to use multiprocessing for extracting spectrograms. Default is True.
#### Returns:
- `Union[AnalysisResult, List[AnalysisResult]]`
Analysis results for the provided audio files.
### `load_result()`
Loads the analysis results from the disk.
```python
result = allin1.load_result('./struct/24k_Magic.json')
```
### `visualize()`
Visualizes the analysis results.
```python
fig = allin1.visualize(result)
fig.show()
```
#### Parameters:
- `result` : `Union[AnalysisResult, List[AnalysisResult]]`
List of analysis results or a single analysis result to be visualized.
- `out_dir` : `PathLike` (optional)
Path to the directory where the visualizations will be saved. By default, the visualizations will not be saved.
#### Returns:
- `Union[Figure, List[Figure]]`
List of figures or a single figure containing the visualizations. `Figure` is a class from `matplotlib.pyplot`.
### `sonify()`
Sonifies the analysis results.
It will mix metronome clicks for beats and downbeats, and event sounds for segment boundaries
to the original audio file.
```python
y, sr = allin1.sonify(result)
# y: sonified audio with shape (channels=2, samples)
# sr: sampling rate (=44100)
```
#### Parameters:
- `result` : `Union[AnalysisResult, List[AnalysisResult]]`
List of analysis results or a single analysis result to be sonified.
- `out_dir` : `PathLike` (optional)
Path to the directory where the sonifications will be saved. By default, the sonifications will not be saved.
#### Returns:
- `Union[Tuple[NDArray, float], List[Tuple[NDArray, float]]]`
List of tuples or a single tuple containing the sonified audio and the sampling rate.
## Visualization & Sonification
This package provides a simple visualization (`-v` or `--visualize`) and sonification (`-s` or `--sonify`) function for the analysis results.
```shell
allin1 -v -s your_audio_file.wav
```
The visualizations will be saved in the `./viz` directory by default:
```shell
./viz
└── your_audio_file.pdf
```
The sonifications will be saved in the `./sonif` directory by default:
```shell
./sonif
└── your_audio_file.sonif.wav
```
For example, a visualization looks like this:
![Visualization](./assets/viz.png)
You can try it at [Hugging Face Space](https://huggingface.co/spaces/taejunkim/all-in-one).
## Available Models
The models are trained on the [Harmonix Set](https://github.com/urinieto/harmonixset) with 8-fold cross-validation.
For more details, please refer to the [paper](http://arxiv.org/abs/2307.16425).
* `harmonix-all`: (Default) An ensemble model averaging the predictions of 8 models trained on each fold.
* `harmonix-foldN`: A model trained on fold N (0~7). For example, `harmonix-fold0` is trained on fold 0.
By default, the `harmonix-all` model is used. To use a different model, use the `--model` option:
```shell
allin1 --model harmonix-fold0 your_audio_file.wav
```
## Speed
With an RTX 4090 GPU and Intel i9-10940X CPU (14 cores, 28 threads, 3.30 GHz),
the `harmonix-all` model processed 10 songs (33 minutes) in 73 seconds.
## Advanced Usage for Research
This package provides researchers with advanced options to extract **frame-level raw activations and embeddings**
without post-processing. These have a resolution of 100 FPS, equivalent to 0.01 seconds per frame.
### CLI
#### Activations
The `--activ` option also saves frame-level raw activations from sigmoid and softmax:
```shell
$ allin1 --activ your_audio_file.wav
```
You can find the activations in the `.npz` file:
```shell
./struct
└── your_audio_file1.json
└── your_audio_file1.activ.npz
```
To load the activations in Python:
```python
>>> import numpy as np
>>> activ = np.load('./struct/your_audio_file1.activ.npz')
>>> activ.files
['beat', 'downbeat', 'segment', 'label']
>>> beat_activations = activ['beat']
>>> downbeat_activations = activ['downbeat']
>>> segment_boundary_activations = activ['segment']
>>> segment_label_activations = activ['label']
```
Details of the activations are as follows:
* `beat`: Raw activations from the **sigmoid** layer for **beat tracking** (shape: `[time_steps]`)
* `downbeat`: Raw activations from the **sigmoid** layer for **downbeat tracking** (shape: `[time_steps]`)
* `segment`: Raw activations from the **sigmoid** layer for **segment boundary detection** (shape: `[time_steps]`)
* `label`: Raw activations from the **softmax** layer for **segment labeling** (shape: `[label_class=10, time_steps]`)
You can access the label names as follows:
```python
>>> allin1.HARMONIX_LABELS
['start',
'end',
'intro',
'outro',
'break',
'bridge',
'inst',
'solo',
'verse',
'chorus']
```
#### Embeddings
This package also provides an option to extract raw embeddings from the model.
```shell
$ allin1 --embed your_audio_file.wav
```
You can find the embeddings in the `.npy` file:
```shell
./struct
└── your_audio_file1.json
└── your_audio_file1.embed.npy
```
To load the embeddings in Python:
```python
>>> import numpy as np
>>> embed = np.load('your_audio_file1.embed.npy')
```
Each model embeds for every source-separated stem per time step,
resulting in embeddings shaped as `[stems=4, time_steps, embedding_size=24]`:
1. The number of source-separated stems (the order is bass, drums, other, vocals).
2. The number of time steps (frames). The time step is 0.01 seconds (100 FPS).
3. The embedding size of 24.
Using the `--embed` option with the `harmonix-all` ensemble model will stack the embeddings,
saving them with the shape `[stems=4, time_steps, embedding_size=24, models=8]`.
### Python
The Python API `allin1.analyze()` offers the same options as the CLI:
```python
>>> allin1.analyze(
paths='your_audio_file.wav',
include_activations=True,
include_embeddings=True,
)
AnalysisResult(
path='/path/to/your_audio_file.wav',
bpm=100,
beats=[...],
downbeats=[...],
segments=[...],
activations={
'beat': array(...),
'downbeat': array(...),
'segment': array(...),
'label': array(...)
},
embeddings=array(...),
)
```
## Concerning MP3 Files
Due to variations in decoders, MP3 files can have slight offset differences.
I recommend you to first convert your audio files to WAV format using FFmpeg (as shown below),
and use the WAV files for all your data processing pipelines.
```shell
ffmpeg -i your_audio_file.mp3 your_audio_file.wav
```
In this package, audio files are read using [Demucs](https://github.com/facebookresearch/demucs).
To my understanding, Demucs converts MP3 files to WAV using FFmpeg before reading them.
However, using a different MP3 decoder can yield different offsets.
I've observed variations of about 20~40ms, which is problematic for tasks requiring precise timing like beat tracking,
where the conventional tolerance is just 70ms.
Hence, I advise standardizing inputs to the WAV format for all data processing,
ensuring straightforward decoding.
## Training
Please refer to [TRAINING.md](TRAINING.md).
## Citation
If you use this package for your research, please cite the following paper:
```bibtex
@inproceedings{taejun2023allinone,
title={All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio},
author={Kim, Taejun and Nam, Juhan},
booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
year={2023}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "allin1",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "allin1,analysis,beat,downbeat,music,structure,tracking",
"author": null,
"author_email": "Taejun Kim <taejun@kaist.ac.kr>",
"download_url": "https://files.pythonhosted.org/packages/d8/73/84dcddba94b637d6f089b763f1221b13cdc716c9244e1e7825bebcb1b1d8/allin1-1.1.0.tar.gz",
"platform": null,
"description": "# All-In-One Music Structure Analyzer\n\n[![Visual Demo](https://img.shields.io/badge/Visual-Demo-8A2BE2)](https://taejun.kim/music-dissector/)\n[![arXiv](https://img.shields.io/badge/arXiv-2307.16425-B31B1B)](http://arxiv.org/abs/2307.16425/)\n[![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-f9f107)](https://huggingface.co/spaces/taejunkim/all-in-one/)\n[![PyPI - Version](https://img.shields.io/pypi/v/allin1.svg)](https://pypi.org/project/allin1)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/allin1.svg)](https://pypi.org/project/allin1)\n\nThis package provides models for music structure analysis, predicting:\n1. Tempo (BPM)\n2. Beats\n3. Downbeats\n4. Functional segment boundaries\n5. Functional segment labels (e.g., intro, verse, chorus, bridge, outro)\n\n\n-----\n\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [Usage for CLI](#usage-for-cli)\n- [Usage for Python](#usage-for-python)\n- [Visualization & Sonification](#visualization--sonification)\n- [Available Models](#available-models)\n- [Speed](#speed)\n- [Advanced Usage for Research](#advanced-usage-for-research)\n- [Concerning MP3 Files](#concerning-mp3-files)\n- [Training](TRAINING.md)\n- [Citation](#citation)\n\n## Installation\n\n### 1. Install PyTorch\n\nVisit [PyTorch](https://pytorch.org/) and install the appropriate version for your system.\n\n### 2. Install NATTEN (Required for Linux and Windows; macOS will auto-install)\n* **Linux**: Download from [NATTEN website](https://www.shi-labs.com/natten/)\n* **macOS**: Auto-installs with `allin1`.\n* **Windows**: Build from source:\n```shell\npip install ninja # Recommended, not required\ngit clone https://github.com/SHI-Labs/NATTEN\ncd NATTEN\nmake\n```\n\n### 3. Install the package\n\n```shell\npip install git+https://github.com/CPJKU/madmom # install the latest madmom directly from GitHub\npip install allin1 # install this package\n```\n\n### 4. (Optional) Install FFmpeg for MP3 support\n\nFor ubuntu:\n\n```shell\nsudo apt install ffmpeg\n```\n\nFor macOS:\n\n```shell\nbrew install ffmpeg\n```\n\n\n## Usage for CLI\n\nTo analyze audio files:\n```shell\nallin1 your_audio_file1.wav your_audio_file2.mp3\n```\nResults will be saved in the `./struct` directory by default:\n```shell\n./struct\n\u2514\u2500\u2500 your_audio_file1.json\n\u2514\u2500\u2500 your_audio_file2.json\n```\nThe analysis results will be saved in JSON format:\n```json\n{\n \"path\": \"/path/to/your_audio_file.wav\",\n \"bpm\": 100,\n \"beats\": [ 0.33, 0.75, 1.14, ... ],\n \"downbeats\": [ 0.33, 1.94, 3.53, ... ],\n \"beat_positions\": [ 1, 2, 3, 4, 1, 2, 3, 4, 1, ... ],\n \"segments\": [\n {\n \"start\": 0.0,\n \"end\": 0.33,\n \"label\": \"start\"\n },\n {\n \"start\": 0.33,\n \"end\": 13.13,\n \"label\": \"intro\"\n },\n {\n \"start\": 13.13,\n \"end\": 37.53,\n \"label\": \"chorus\"\n },\n {\n \"start\": 37.53,\n \"end\": 51.53,\n \"label\": \"verse\"\n },\n ...\n ]\n}\n```\nAll available options are as follows:\n```shell\n$ allin1 -h\n\nusage: allin1 [-h] [-o OUT_DIR] [-v] [--viz-dir VIZ_DIR] [-s] [--sonif-dir SONIF_DIR] [-a] [-e] [-m MODEL] [-d DEVICE] [-k]\n [--demix-dir DEMIX_DIR] [--spec-dir SPEC_DIR]\n paths [paths ...]\n\npositional arguments:\n paths Path to tracks\n\noptions:\n -h, --help show this help message and exit\n -o OUT_DIR, --out-dir OUT_DIR\n Path to a directory to store analysis results (default: ./struct)\n -v, --visualize Save visualizations (default: False)\n --viz-dir VIZ_DIR Directory to save visualizations if -v is provided (default: ./viz)\n -s, --sonify Save sonifications (default: False)\n --sonif-dir SONIF_DIR\n Directory to save sonifications if -s is provided (default: ./sonif)\n -a, --activ Save frame-level raw activations from sigmoid and softmax (default: False)\n -e, --embed Save frame-level embeddings (default: False)\n -m MODEL, --model MODEL\n Name of the pretrained model to use (default: harmonix-all)\n -d DEVICE, --device DEVICE\n Device to use (default: cuda if available else cpu)\n -k, --keep-byproducts\n Keep demixed audio files and spectrograms (default: False)\n --demix-dir DEMIX_DIR\n Path to a directory to store demixed tracks (default: ./demix)\n --spec-dir SPEC_DIR Path to a directory to store spectrograms (default: ./spec)\n```\n\n## Usage for Python\n\nAvailable functions:\n- [`analyze()`](#analyze)\n- [`load_result()`](#load_result)\n- [`visualize()`](#visualize)\n- [`sonify()`](#sonify)\n\n### `analyze()`\nAnalyzes the provided audio files and returns the analysis results.\n\n```python\nimport allin1\n\n# You can analyze a single file:\nresult = allin1.analyze('your_audio_file.wav')\n\n# Or multiple files:\nresults = allin1.analyze(['your_audio_file1.wav', 'your_audio_file2.mp3'])\n```\nA result is a dataclass instance containing:\n```python\nAnalysisResult(\n path='/path/to/your_audio_file.wav', \n bpm=100,\n beats=[0.33, 0.75, 1.14, ...],\n beat_positions=[1, 2, 3, 4, 1, 2, 3, 4, 1, ...],\n downbeats=[0.33, 1.94, 3.53, ...], \n segments=[\n Segment(start=0.0, end=0.33, label='start'), \n Segment(start=0.33, end=13.13, label='intro'), \n Segment(start=13.13, end=37.53, label='chorus'), \n Segment(start=37.53, end=51.53, label='verse'), \n Segment(start=51.53, end=64.34, label='verse'), \n Segment(start=64.34, end=89.93, label='chorus'), \n Segment(start=89.93, end=105.93, label='bridge'), \n Segment(start=105.93, end=134.74, label='chorus'), \n Segment(start=134.74, end=153.95, label='chorus'), \n Segment(start=153.95, end=154.67, label='end'),\n ]),\n```\nUnlike CLI, it does not save the results to disk by default. You can save them as follows:\n```python\nresult = allin1.analyze(\n 'your_audio_file.wav',\n out_dir='./struct',\n)\n```\n\n#### Parameters:\n\n- `paths` : `Union[PathLike, List[PathLike]]` \nList of paths or a single path to the audio files to be analyzed.\n \n- `out_dir` : `PathLike` (optional) \nPath to the directory where the analysis results will be saved. By default, the results will not be saved.\n \n- `visualize` : `Union[bool, PathLike]` (optional) \nWhether to visualize the analysis results or not. If a path is provided, the visualizations will be saved in that directory. Default is False. If True, the visualizations will be saved in './viz'.\n \n- `sonify` : `Union[bool, PathLike]` (optional) \nWhether to sonify the analysis results or not. If a path is provided, the sonifications will be saved in that directory. Default is False. If True, the sonifications will be saved in './sonif'.\n \n- `model` : `str` (optional) \nName of the pre-trained model to be used for the analysis. Default is 'harmonix-all'. Please refer to the documentation for the available models.\n \n- `device` : `str` (optional) \nDevice to be used for computation. Default is 'cuda' if available, otherwise 'cpu'.\n \n- `include_activations` : `bool` (optional) \nWhether to include activations in the analysis results or not.\n \n- `include_embeddings` : `bool` (optional) \nWhether to include embeddings in the analysis results or not.\n \n- `demix_dir` : `PathLike` (optional) \nPath to the directory where the source-separated audio will be saved. Default is './demix'.\n \n- `spec_dir` : `PathLike` (optional) \nPath to the directory where the spectrograms will be saved. Default is './spec'.\n \n- `keep_byproducts` : `bool` (optional) \nWhether to keep the source-separated audio and spectrograms or not. Default is False.\n \n- `multiprocess` : `bool` (optional) \nWhether to use multiprocessing for extracting spectrograms. Default is True.\n\n#### Returns:\n\n- `Union[AnalysisResult, List[AnalysisResult]]` \nAnalysis results for the provided audio files.\n\n\n### `load_result()`\n\nLoads the analysis results from the disk.\n\n```python\nresult = allin1.load_result('./struct/24k_Magic.json')\n```\n\n\n### `visualize()`\n\nVisualizes the analysis results.\n\n```python\nfig = allin1.visualize(result)\nfig.show()\n```\n\n#### Parameters:\n\n- `result` : `Union[AnalysisResult, List[AnalysisResult]]` \nList of analysis results or a single analysis result to be visualized.\n\n- `out_dir` : `PathLike` (optional) \nPath to the directory where the visualizations will be saved. By default, the visualizations will not be saved.\n\n#### Returns:\n\n- `Union[Figure, List[Figure]]`\nList of figures or a single figure containing the visualizations. `Figure` is a class from `matplotlib.pyplot`.\n\n\n### `sonify()`\n\nSonifies the analysis results.\nIt will mix metronome clicks for beats and downbeats, and event sounds for segment boundaries\nto the original audio file.\n\n```python\ny, sr = allin1.sonify(result)\n# y: sonified audio with shape (channels=2, samples)\n# sr: sampling rate (=44100)\n```\n\n#### Parameters:\n\n- `result` : `Union[AnalysisResult, List[AnalysisResult]]` \nList of analysis results or a single analysis result to be sonified.\n- `out_dir` : `PathLike` (optional) \nPath to the directory where the sonifications will be saved. By default, the sonifications will not be saved.\n\n#### Returns:\n\n- `Union[Tuple[NDArray, float], List[Tuple[NDArray, float]]]` \nList of tuples or a single tuple containing the sonified audio and the sampling rate.\n\n\n## Visualization & Sonification\nThis package provides a simple visualization (`-v` or `--visualize`) and sonification (`-s` or `--sonify`) function for the analysis results.\n```shell\nallin1 -v -s your_audio_file.wav\n```\nThe visualizations will be saved in the `./viz` directory by default:\n```shell\n./viz\n\u2514\u2500\u2500 your_audio_file.pdf\n```\nThe sonifications will be saved in the `./sonif` directory by default:\n```shell\n./sonif\n\u2514\u2500\u2500 your_audio_file.sonif.wav\n```\nFor example, a visualization looks like this:\n![Visualization](./assets/viz.png)\n\nYou can try it at [Hugging Face Space](https://huggingface.co/spaces/taejunkim/all-in-one).\n\n\n## Available Models\nThe models are trained on the [Harmonix Set](https://github.com/urinieto/harmonixset) with 8-fold cross-validation.\nFor more details, please refer to the [paper](http://arxiv.org/abs/2307.16425).\n* `harmonix-all`: (Default) An ensemble model averaging the predictions of 8 models trained on each fold.\n* `harmonix-foldN`: A model trained on fold N (0~7). For example, `harmonix-fold0` is trained on fold 0.\n\nBy default, the `harmonix-all` model is used. To use a different model, use the `--model` option:\n```shell\nallin1 --model harmonix-fold0 your_audio_file.wav\n```\n\n\n## Speed\nWith an RTX 4090 GPU and Intel i9-10940X CPU (14 cores, 28 threads, 3.30 GHz),\nthe `harmonix-all` model processed 10 songs (33 minutes) in 73 seconds.\n\n\n## Advanced Usage for Research\nThis package provides researchers with advanced options to extract **frame-level raw activations and embeddings** \nwithout post-processing. These have a resolution of 100 FPS, equivalent to 0.01 seconds per frame.\n\n### CLI\n\n#### Activations\nThe `--activ` option also saves frame-level raw activations from sigmoid and softmax:\n```shell\n$ allin1 --activ your_audio_file.wav\n```\nYou can find the activations in the `.npz` file:\n```shell\n./struct\n\u2514\u2500\u2500 your_audio_file1.json\n\u2514\u2500\u2500 your_audio_file1.activ.npz\n```\nTo load the activations in Python:\n```python\n>>> import numpy as np\n>>> activ = np.load('./struct/your_audio_file1.activ.npz')\n>>> activ.files\n['beat', 'downbeat', 'segment', 'label']\n>>> beat_activations = activ['beat']\n>>> downbeat_activations = activ['downbeat']\n>>> segment_boundary_activations = activ['segment']\n>>> segment_label_activations = activ['label']\n```\nDetails of the activations are as follows:\n* `beat`: Raw activations from the **sigmoid** layer for **beat tracking** (shape: `[time_steps]`)\n* `downbeat`: Raw activations from the **sigmoid** layer for **downbeat tracking** (shape: `[time_steps]`)\n* `segment`: Raw activations from the **sigmoid** layer for **segment boundary detection** (shape: `[time_steps]`)\n* `label`: Raw activations from the **softmax** layer for **segment labeling** (shape: `[label_class=10, time_steps]`)\n\nYou can access the label names as follows:\n```python\n>>> allin1.HARMONIX_LABELS\n['start',\n 'end',\n 'intro',\n 'outro',\n 'break',\n 'bridge',\n 'inst',\n 'solo',\n 'verse',\n 'chorus']\n```\n\n\n#### Embeddings\nThis package also provides an option to extract raw embeddings from the model.\n```shell\n$ allin1 --embed your_audio_file.wav\n```\nYou can find the embeddings in the `.npy` file:\n```shell\n./struct\n\u2514\u2500\u2500 your_audio_file1.json\n\u2514\u2500\u2500 your_audio_file1.embed.npy\n```\nTo load the embeddings in Python:\n```python\n>>> import numpy as np\n>>> embed = np.load('your_audio_file1.embed.npy')\n```\nEach model embeds for every source-separated stem per time step, \nresulting in embeddings shaped as `[stems=4, time_steps, embedding_size=24]`:\n1. The number of source-separated stems (the order is bass, drums, other, vocals).\n2. The number of time steps (frames). The time step is 0.01 seconds (100 FPS).\n3. The embedding size of 24.\n\nUsing the `--embed` option with the `harmonix-all` ensemble model will stack the embeddings, \nsaving them with the shape `[stems=4, time_steps, embedding_size=24, models=8]`.\n\n### Python\nThe Python API `allin1.analyze()` offers the same options as the CLI:\n```python\n>>> allin1.analyze(\n paths='your_audio_file.wav',\n include_activations=True,\n include_embeddings=True,\n )\n\nAnalysisResult(\n path='/path/to/your_audio_file.wav', \n bpm=100, \n beats=[...],\n downbeats=[...],\n segments=[...],\n activations={\n 'beat': array(...), \n 'downbeat': array(...), \n 'segment': array(...), \n 'label': array(...)\n }, \n embeddings=array(...),\n)\n```\n\n## Concerning MP3 Files\nDue to variations in decoders, MP3 files can have slight offset differences.\nI recommend you to first convert your audio files to WAV format using FFmpeg (as shown below), \nand use the WAV files for all your data processing pipelines.\n```shell\nffmpeg -i your_audio_file.mp3 your_audio_file.wav\n```\nIn this package, audio files are read using [Demucs](https://github.com/facebookresearch/demucs).\nTo my understanding, Demucs converts MP3 files to WAV using FFmpeg before reading them.\nHowever, using a different MP3 decoder can yield different offsets. \nI've observed variations of about 20~40ms, which is problematic for tasks requiring precise timing like beat tracking, \nwhere the conventional tolerance is just 70ms. \nHence, I advise standardizing inputs to the WAV format for all data processing, \nensuring straightforward decoding.\n\n\n## Training\nPlease refer to [TRAINING.md](TRAINING.md).\n\n## Citation\nIf you use this package for your research, please cite the following paper:\n```bibtex\n@inproceedings{taejun2023allinone,\n title={All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio},\n author={Kim, Taejun and Nam, Juhan},\n booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},\n year={2023}\n}\n```",
"bugtrack_url": null,
"license": null,
"summary": "All-In-One Music Structure Analyzer",
"version": "1.1.0",
"project_urls": {
"Documentation": "https://github.com/mir-aidj/all-in-one#readme",
"Issues": "https://github.com/mir-aidj/all-in-one/issues",
"Source": "https://github.com/mir-aidj/all-in-one"
},
"split_keywords": [
"allin1",
"analysis",
"beat",
"downbeat",
"music",
"structure",
"tracking"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b76de301dee03fb8b0d72e55ba6022d53cb0fd997437dd4693de34b1c605b90d",
"md5": "5402ca59c8e257c17372378cd5801356",
"sha256": "90edc464637ce7f71113f518c320359e9fef3c17e71e4940cb3637c4081f800e"
},
"downloads": -1,
"filename": "allin1-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5402ca59c8e257c17372378cd5801356",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 49790,
"upload_time": "2023-10-10T06:22:20",
"upload_time_iso_8601": "2023-10-10T06:22:20.002606Z",
"url": "https://files.pythonhosted.org/packages/b7/6d/e301dee03fb8b0d72e55ba6022d53cb0fd997437dd4693de34b1c605b90d/allin1-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d87384dcddba94b637d6f089b763f1221b13cdc716c9244e1e7825bebcb1b1d8",
"md5": "782625f647b734f63c4791ab4d6b6e1a",
"sha256": "652897e1d04925df20f1c1a3377cb02d1e10925ca87cd8c773feae6879b3a93e"
},
"downloads": -1,
"filename": "allin1-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "782625f647b734f63c4791ab4d6b6e1a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 188486,
"upload_time": "2023-10-10T06:22:17",
"upload_time_iso_8601": "2023-10-10T06:22:17.715234Z",
"url": "https://files.pythonhosted.org/packages/d8/73/84dcddba94b637d6f089b763f1221b13cdc716c9244e1e7825bebcb1b1d8/allin1-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-10 06:22:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mir-aidj",
"github_project": "all-in-one#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "allin1"
}