symurbench

Name	symurbench JSON
Version	1.0.0 JSON
	download
home_page	None
Summary	SyMuRBench: Benchmark for symbolic music representations
upload_time	2025-08-15 16:52:26
maintainer	None
docs_url	None
author	Peter Strepetov, Dmitrii Kovalev
requires_python	>=3.10.0
license	MIT License Copyright (c) 2025 Petr Strepetov and Dmitrii Kovalev Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	artificial intelligence midi mir music
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <img width="300" src="docs/assets/logo.jpg"/>
</p>

<h1 align="center"><i>SyMuRBench</i></h1>
<p align="center"><i>Benchmark for Symbolic Music Representations</i></p>

[![GitHub Release](https://img.shields.io/github/v/release/Mintas/SyMuRBench)](https://pypi.python.org/pypi/symurbench/)
[![GitHub License](https://img.shields.io/github/license/Mintas/SyMuRBench)](https://github.com/Mintas/SyMuRBench/blob/main/LICENSE)

## 1. Overview

SyMuRBench is a versatile benchmark designed to compare vector representations of symbolic music. We provide standardized test splits from well-known datasets and strongly encourage authors to **exclude files from these splits** when training models to ensure fair evaluation. Additionally, we introduce a novel **score-performance retrieval task** to evaluate the alignment between symbolic scores and their performed versions.

## 2. Tasks Description

| Task Name                     | Source Dataset | Task Type               | # of Classes | # of Files       | Default Metrics                                  |
|-------------------------------|--------------|--------------------------|-------------|------------------|--------------------------------------------------|
| ComposerClassificationASAP    | ASAP         | Multiclass Classification | 7           | 197              | Weighted F1 Score, Balanced Accuracy             |
| GenreClassificationMMD        | MetaMIDI     | Multiclass Classification | 7           | 2,795            | Weighted F1 Score, Balanced Accuracy             |
| GenreClassificationWMTX       | WikiMT-X     | Multiclass Classification | 8           | 985              | Weighted F1 Score, Balanced Accuracy             |
| EmotionClassificationEMOPIA   | Emopia       | Multiclass Classification | 4           | 191              | Weighted F1 Score, Balanced Accuracy             |
| EmotionClassificationMIREX    | MIREX        | Multiclass Classification | 5           | 163              | Weighted F1 Score, Balanced Accuracy             |
| InstrumentDetectionMMD        | MetaMIDI     | Multilabel Classification | 128         | 4,675            | Weighted F1 Score                                |
| ScorePerformanceRetrievalASAP | ASAP         | Retrieval                 | -           | 438 (219 pairs)  | R@1, R@5, R@10, Median Rank                      |

> **Note**: "ScorePerformanceRetrievalASAP" evaluates how well a model retrieves the correct performed version given a symbolic score (and vice versa), using paired score-performance MIDI files.

---

## 3. Baseline Features

As baselines, we provide precomputed features from [**music21**](https://github.com/cuthbertLab/music21) and [**jSymbolic2**](https://github.com/DDMAL/jSymbolic2). A `FeatureExtractor` for music21 is available in `src/symurbench/music21_extractor.py`.

---

## 4. Installation

Install the package via pip:

```bash
pip install symurbench
```

Then download the datasets and (optionally) precomputed features:

```python
from symurbench.utils import load_datasets

output_folder = "symurbench_data"     # Absolute or relative path to save data
load_datasets(
    output_folder=output_folder,
    load_features=True                # Downloads precomputed music21 & jSymbolic features
)
```

---

## 4. Usage Examples.

**Example 1: Using Precomputed Features**

Run benchmark on specific tasks using cached music21 and jSymbolic features.

```python
from symurbench.benchmark import Benchmark
from symurbench.feature_extractor import PersistentFeatureExtractor

path_to_music21_features = "symurbench_data/features/music21_full_dataset.parquet"
path_to_jsymbolic_features = "symurbench_data/features/jsymbolic_full_dataset.parquet"

m21_pfe = PersistentFeatureExtractor(
    persistence_path=path_to_music21_features,
    use_cached=True,
    name="music21"
)
jsymb_pfe = PersistentFeatureExtractor(
    persistence_path=path_to_jsymbolic_features,
    use_cached=True,
    name="jSymbolic"
)

benchmark = Benchmark(
    feature_extractors_list=[m21_pfe, jsymb_pfe],
    tasks=[ # By default, if no specific tasks are specified, the benchmark will run all tasks.
        "ComposerClassificationASAP",
        "ScorePerformanceRetrievalASAP"
    ]
)

benchmark.run_all_tasks()
benchmark.display_result(return_ci=True, alpha=0.05)
```

> **Tip**: If tasks is omitted, all available tasks will be run by default.

*Output Example*

![output](docs/assets/example.png?raw=true "")


**Example 2: Using a Configuration Dictionary**

Run benchmark with custom dataset paths and AutoML configuration.

```python
from symurbench.benchmark import Benchmark
from symurbench.music21_extractor import Music21Extractor
from symurbench.constant import DEFAULT_LAML_CONFIG_PATHS # dict with paths to AutoML configs

multiclass_task_automl_cfg_path = DEFAULT_LAML_CONFIG_PATHS["multiclass"]
print(f"AutoML config path: {multiclass_task_automl_cfg_path}")

config = {
    "ComposerClassificationASAP": {
        "metadata_csv_path":"symurbench_data/datasets/composer_and_retrieval_datasets/metadata_composer_dataset.csv",
        "files_dir_path":"symurbench_data/datasets/composer_and_retrieval_datasets/",
        "automl_config_path":multiclass_task_automl_cfg_path
    }
}

m21_fe = Music21Extractor()

benchmark = Benchmark.init_from_config(
    feature_extractors_list=[m21_fe],
    tasks_config=config
)
benchmark.run_all_tasks()
benchmark.display_result()
```

**Example 3: Using a YAML Configuration File**

Load task configurations from a YAML file (e.g., dataset paths, AutoML config paths).

```python
from symurbench.benchmark import Benchmark
from symurbench.music21_extractor import Music21Extractor
from symurbench.constant import DATASETS_CONFIG_PATH # path to config with datasets paths

print(f"Datasets config path: {DATASETS_CONFIG_PATH}")

m21_fe = Music21Extractor()

benchmark = Benchmark.init_from_config_file(
    feature_extractors_list=[m21_fe],
    tasks_config_path=DATASETS_CONFIG_PATH
)
benchmark.run_all_tasks()
benchmark.display_result()
```

**Example 4: Saving Results to CSV**

Run benchmark and export results to a CSV file using pandas.

```python

from symurbench.benchmark import Benchmark
from symurbench.music21_extractor import Music21Extractor

path_to_music21_features = "symurbench_data/features/music21_features.parquet"

m21_pfe = PersistentFeatureExtractor(
    feature_extractor=Music21Extractor(),
    persistence_path=path_to_music21_features,
    use_cached=False,
    name="music21"
)

benchmark = Benchmark(
    feature_extractors_list=[m21_pfe],
    tasks=[
        "ComposerClassificationASAP",
        "ScorePerformanceRetrievalASAP"
    ]
)
benchmark.run_all_tasks()
results_df = benchmark.get_result_df(round_num=3, return_ci=True)
results_df.to_csv("results.csv")
```

> **💡**: `round_num=3`: Round metrics to 3 decimal places.
`return_ci=True`: Include confidence intervals in the output.

## 6. Notes & Best Practices

- 🔒 **Avoid data leakage**: Do not include test-set files in your training data to ensure fair and valid evaluation.
- 🔄 **Reproducibility**: Use fixed random seeds and consistent preprocessing pipelines to make experiments reproducible.
- 📁 **File paths**: Ensure paths in config files are correct and accessible.
- 🧪 **Custom extractors**: You can implement your own `FeatureExtractor` subclass by inheriting from the base `FeatureExtractor` class and implementing the `extract` method.

## 7. Citation

If you use SyMuRBench in your research, please cite:

```bibtex
@inproceedings{symurbench2025,
  author    = {Petr Strepetov and Dmitrii Kovalev},
  title     = {SyMuRBench: Benchmark for Symbolic Music Representations},
  booktitle = {Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice (McGE '25)},
  year      = {2025},
  pages     = {9},
  publisher = {ACM},
  address   = {Dublin, Ireland},
  doi       = {10.1145/3746278.3759392}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "symurbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10.0",
    "maintainer_email": null,
    "keywords": "artificial intelligence, midi, mir, music",
    "author": "Peter Strepetov, Dmitrii Kovalev",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/64/39/021b631b349d5ab3b3395bada2f037387779fdd69367f664341428cf5a58/symurbench-1.0.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img width=\"300\" src=\"docs/assets/logo.jpg\"/>\n</p>\n\n<h1 align=\"center\"><i>SyMuRBench</i></h1>\n<p align=\"center\"><i>Benchmark for Symbolic Music Representations</i></p>\n\n[![GitHub Release](https://img.shields.io/github/v/release/Mintas/SyMuRBench)](https://pypi.python.org/pypi/symurbench/)\n[![GitHub License](https://img.shields.io/github/license/Mintas/SyMuRBench)](https://github.com/Mintas/SyMuRBench/blob/main/LICENSE)\n\n## 1. Overview\n\nSyMuRBench is a versatile benchmark designed to compare vector representations of symbolic music. We provide standardized test splits from well-known datasets and strongly encourage authors to **exclude files from these splits** when training models to ensure fair evaluation. Additionally, we introduce a novel **score-performance retrieval task** to evaluate the alignment between symbolic scores and their performed versions.\n\n## 2. Tasks Description\n\n| Task Name                     | Source Dataset | Task Type               | # of Classes | # of Files       | Default Metrics                                  |\n|-------------------------------|--------------|--------------------------|-------------|------------------|--------------------------------------------------|\n| ComposerClassificationASAP    | ASAP         | Multiclass Classification | 7           | 197              | Weighted F1 Score, Balanced Accuracy             |\n| GenreClassificationMMD        | MetaMIDI     | Multiclass Classification | 7           | 2,795            | Weighted F1 Score, Balanced Accuracy             |\n| GenreClassificationWMTX       | WikiMT-X     | Multiclass Classification | 8           | 985              | Weighted F1 Score, Balanced Accuracy             |\n| EmotionClassificationEMOPIA   | Emopia       | Multiclass Classification | 4           | 191              | Weighted F1 Score, Balanced Accuracy             |\n| EmotionClassificationMIREX    | MIREX        | Multiclass Classification | 5           | 163              | Weighted F1 Score, Balanced Accuracy             |\n| InstrumentDetectionMMD        | MetaMIDI     | Multilabel Classification | 128         | 4,675            | Weighted F1 Score                                |\n| ScorePerformanceRetrievalASAP | ASAP         | Retrieval                 | -           | 438 (219 pairs)  | R@1, R@5, R@10, Median Rank                      |\n\n> **Note**: \"ScorePerformanceRetrievalASAP\" evaluates how well a model retrieves the correct performed version given a symbolic score (and vice versa), using paired score-performance MIDI files.\n\n---\n\n## 3. Baseline Features\n\nAs baselines, we provide precomputed features from [**music21**](https://github.com/cuthbertLab/music21) and [**jSymbolic2**](https://github.com/DDMAL/jSymbolic2). A `FeatureExtractor` for music21 is available in `src/symurbench/music21_extractor.py`.\n\n---\n\n## 4. Installation\n\nInstall the package via pip:\n\n```bash\npip install symurbench\n```\n\nThen download the datasets and (optionally) precomputed features:\n\n```python\nfrom symurbench.utils import load_datasets\n\noutput_folder = \"symurbench_data\"     # Absolute or relative path to save data\nload_datasets(\n    output_folder=output_folder,\n    load_features=True                # Downloads precomputed music21 & jSymbolic features\n)\n```\n\n---\n\n## 4. Usage Examples.\n\n**Example 1: Using Precomputed Features**\n\nRun benchmark on specific tasks using cached music21 and jSymbolic features.\n\n```python\nfrom symurbench.benchmark import Benchmark\nfrom symurbench.feature_extractor import PersistentFeatureExtractor\n\npath_to_music21_features = \"symurbench_data/features/music21_full_dataset.parquet\"\npath_to_jsymbolic_features = \"symurbench_data/features/jsymbolic_full_dataset.parquet\"\n\nm21_pfe = PersistentFeatureExtractor(\n    persistence_path=path_to_music21_features,\n    use_cached=True,\n    name=\"music21\"\n)\njsymb_pfe = PersistentFeatureExtractor(\n    persistence_path=path_to_jsymbolic_features,\n    use_cached=True,\n    name=\"jSymbolic\"\n)\n\nbenchmark = Benchmark(\n    feature_extractors_list=[m21_pfe, jsymb_pfe],\n    tasks=[ # By default, if no specific tasks are specified, the benchmark will run all tasks.\n        \"ComposerClassificationASAP\",\n        \"ScorePerformanceRetrievalASAP\"\n    ]\n)\n\nbenchmark.run_all_tasks()\nbenchmark.display_result(return_ci=True, alpha=0.05)\n```\n\n> **Tip**: If tasks is omitted, all available tasks will be run by default.\n\n*Output Example*\n\n![output](docs/assets/example.png?raw=true \"\")\n\n\n**Example 2: Using a Configuration Dictionary**\n\nRun benchmark with custom dataset paths and AutoML configuration.\n\n```python\nfrom symurbench.benchmark import Benchmark\nfrom symurbench.music21_extractor import Music21Extractor\nfrom symurbench.constant import DEFAULT_LAML_CONFIG_PATHS # dict with paths to AutoML configs\n\nmulticlass_task_automl_cfg_path = DEFAULT_LAML_CONFIG_PATHS[\"multiclass\"]\nprint(f\"AutoML config path: {multiclass_task_automl_cfg_path}\")\n\nconfig = {\n    \"ComposerClassificationASAP\": {\n        \"metadata_csv_path\":\"symurbench_data/datasets/composer_and_retrieval_datasets/metadata_composer_dataset.csv\",\n        \"files_dir_path\":\"symurbench_data/datasets/composer_and_retrieval_datasets/\",\n        \"automl_config_path\":multiclass_task_automl_cfg_path\n    }\n}\n\nm21_fe = Music21Extractor()\n\nbenchmark = Benchmark.init_from_config(\n    feature_extractors_list=[m21_fe],\n    tasks_config=config\n)\nbenchmark.run_all_tasks()\nbenchmark.display_result()\n```\n\n**Example 3: Using a YAML Configuration File**\n\nLoad task configurations from a YAML file (e.g., dataset paths, AutoML config paths).\n\n```python\nfrom symurbench.benchmark import Benchmark\nfrom symurbench.music21_extractor import Music21Extractor\nfrom symurbench.constant import DATASETS_CONFIG_PATH # path to config with datasets paths\n\nprint(f\"Datasets config path: {DATASETS_CONFIG_PATH}\")\n\nm21_fe = Music21Extractor()\n\nbenchmark = Benchmark.init_from_config_file(\n    feature_extractors_list=[m21_fe],\n    tasks_config_path=DATASETS_CONFIG_PATH\n)\nbenchmark.run_all_tasks()\nbenchmark.display_result()\n```\n\n**Example 4: Saving Results to CSV**\n\nRun benchmark and export results to a CSV file using pandas.\n\n```python\n\nfrom symurbench.benchmark import Benchmark\nfrom symurbench.music21_extractor import Music21Extractor\n\npath_to_music21_features = \"symurbench_data/features/music21_features.parquet\"\n\nm21_pfe = PersistentFeatureExtractor(\n    feature_extractor=Music21Extractor(),\n    persistence_path=path_to_music21_features,\n    use_cached=False,\n    name=\"music21\"\n)\n\nbenchmark = Benchmark(\n    feature_extractors_list=[m21_pfe],\n    tasks=[\n        \"ComposerClassificationASAP\",\n        \"ScorePerformanceRetrievalASAP\"\n    ]\n)\nbenchmark.run_all_tasks()\nresults_df = benchmark.get_result_df(round_num=3, return_ci=True)\nresults_df.to_csv(\"results.csv\")\n```\n\n> **\ud83d\udca1**: `round_num=3`: Round metrics to 3 decimal places.\n`return_ci=True`: Include confidence intervals in the output.\n\n## 6. Notes & Best Practices\n\n- \ud83d\udd12 **Avoid data leakage**: Do not include test-set files in your training data to ensure fair and valid evaluation.\n- \ud83d\udd04 **Reproducibility**: Use fixed random seeds and consistent preprocessing pipelines to make experiments reproducible.\n- \ud83d\udcc1 **File paths**: Ensure paths in config files are correct and accessible.\n- \ud83e\uddea **Custom extractors**: You can implement your own `FeatureExtractor` subclass by inheriting from the base `FeatureExtractor` class and implementing the `extract` method.\n\n## 7. Citation\n\nIf you use SyMuRBench in your research, please cite:\n\n```bibtex\n@inproceedings{symurbench2025,\n  author    = {Petr Strepetov and Dmitrii Kovalev},\n  title     = {SyMuRBench: Benchmark for Symbolic Music Representations},\n  booktitle = {Proceedings of the 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice (McGE '25)},\n  year      = {2025},\n  pages     = {9},\n  publisher = {ACM},\n  address   = {Dublin, Ireland},\n  doi       = {10.1145/3746278.3759392}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Petr Strepetov and Dmitrii Kovalev\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "SyMuRBench: Benchmark for symbolic music representations",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "artificial intelligence",
        " midi",
        " mir",
        " music"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "75741daa9bac0bfd5391f6421b113fefe1b1432a29b24341a2ae76f7fd2d156c",
                "md5": "e9249010ad1d84b9fad55fc8e1041dc1",
                "sha256": "72864cc5394b7b75d453733b34c32a9c3721d831d6874cd76af7a652943fcadf"
            },
            "downloads": -1,
            "filename": "symurbench-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e9249010ad1d84b9fad55fc8e1041dc1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10.0",
            "size": 48003,
            "upload_time": "2025-08-15T16:52:24",
            "upload_time_iso_8601": "2025-08-15T16:52:24.330304Z",
            "url": "https://files.pythonhosted.org/packages/75/74/1daa9bac0bfd5391f6421b113fefe1b1432a29b24341a2ae76f7fd2d156c/symurbench-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6439021b631b349d5ab3b3395bada2f037387779fdd69367f664341428cf5a58",
                "md5": "649ec35193061d873ec38dc3af19bcd2",
                "sha256": "c06e8ac568877c18ac1821ec87256a26c066c1d7804ae1365f00117e0d510165"
            },
            "downloads": -1,
            "filename": "symurbench-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "649ec35193061d873ec38dc3af19bcd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10.0",
            "size": 199265,
            "upload_time": "2025-08-15T16:52:26",
            "upload_time_iso_8601": "2025-08-15T16:52:26.044558Z",
            "url": "https://files.pythonhosted.org/packages/64/39/021b631b349d5ab3b3395bada2f037387779fdd69367f664341428cf5a58/symurbench-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-15 16:52:26",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "symurbench"
}

Peter Strepetov, Dmitrii Kovalev