mess-benchmark


Namemess-benchmark JSON
Version 0.2 PyPI version JSON
download
home_pageNone
SummaryMESS – Multi-domain Evaluation of Semantic Segmentation
upload_time2024-08-25 17:13:57
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords mess benchmark zero-shot evaluation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # [MESS – Multi-domain Evaluation of Semantic Segmentation](https://blumenstiel.github.io/mess-benchmark/)

This is the official toolkit for the MESS benchmark from the NeurIPS 2023 paper "What a MESS: Multi-domain Evaluation of Zero-shot Semantic Segmentation".
Please visit our [website](https://blumenstiel.github.io/mess-benchmark/) or [paper](https://arxiv.org/abs/2306.15521) for more details.

The MESS benchmark enables a holistic evaluation of semantic segmentation models on a variety of domains and datasets. 
The MESS benchmark includes 22 datasets for different domains like medicine, engineering, earth monitoring, biology, and agriculture. 
We designed this toolkit to be easy to use for new model architectures. We invite others to propose new ideas and datasets for future versions.

The website includes a [leaderboard](https://blumenstiel.github.io/mess-benchmark/leaderboard/) with all evaluated models and links to their implementations.

## Usage

To test a new model architecture, install the benchmark with `pip install mess-benchmark`, and follow the steps in [DATASETS.md](DATASETS.md) for downloading and preparing the datasets.
You can register all datasets by running `import mess.datasets`. See [GettingStarted.md](GettingStarted.md) for more details.

### Zero-shot semantic segmentation

The current version of the MESS benchmark focuses on zero-shot semantic segmentation, and the toolkit is ready to use for this setting.

### Few-shot and many-shot semantic segmentation

Few-shot and many-shot semantic segmentation is not yet supported by the toolkit, but can easily be added based on the provided preprocessing scripts.
Most datasets provide a train/val split that can be used for few-shot or supervised training. 
CHASE DB1 and CryoNuSeg do not provide train data themselves, but use other similar datasets for training (DRIVE and STARE for CHASE DB1 and MoNuSeg for CryoNuSeg).
BDD100K, Dark Zurich, iSAID, and UAVid are evaluated using their official validation split. 
Hence, supervised training may require the train set to be split into a train and val dev split.  

The DRAM dataset only provides an unlabelled train set and would require a style transfer to Pascal VOC for labelled training data.
The WorldFloods train set requires approximately 300Gb of disk space, which may not be feasible for some users.
Therefore, we propose to exclude DRAM and WorldFloods from the few-shot and many-shot settings to simplify the evaluation, called **MESS-20**.

## License

This code is released under the [MIT License](LICENSE). The evaluated datasets are released under their respective licenses, see [DATASETS.md](DATASETS.md) for details. Most datasets are limited to non-commercial use only and require a citation which are provided in [datasets.bib](datasets.bib).

## Acknowledgement

We would like to acknowledge the work of the dataset providers, especially for the careful collection and annotation of the datasets. Thank you for making the dataset publicly available!
See [DATASETS.md](DATASETS.md) for more details and links to the datasets. We like to further thank the authors of the evaluated models for their work and providing the model weights.

## Citation

Please cite our [paper](https://arxiv.org/abs/2306.15521) if you use the MESS benchmark and send us your results to be included in the [leaderboard](https://blumenstiel.github.io/mess-benchmark/leaderboard/).

```
@article{MESSBenchmark2023,
  title={{What a MESS: Multi-Domain Evaluation of Zero-shot Semantic Segmentation}},
  author={Blumenstiel, Benedikt and Jakubik, Johannes and Kühne, Hilde and Vössing, Michael},
  journal={Advances in Neural Information Processing Systems},
  year={2023}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mess-benchmark",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "mess, benchmark, zero-shot, evaluation",
    "author": null,
    "author_email": "Benedikt Blumenstiel <benedikt.blumenstiel@live.com>",
    "download_url": "https://files.pythonhosted.org/packages/61/0a/2e3fe608e3a6b486d40d820f7ce9a51c2a71615f051265dd4315f75b003f/mess_benchmark-0.2.tar.gz",
    "platform": null,
    "description": "# [MESS \u2013 Multi-domain Evaluation of Semantic Segmentation](https://blumenstiel.github.io/mess-benchmark/)\n\nThis is the official toolkit for the MESS benchmark from the NeurIPS 2023 paper \"What a MESS: Multi-domain Evaluation of Zero-shot Semantic Segmentation\".\nPlease visit our [website](https://blumenstiel.github.io/mess-benchmark/) or [paper](https://arxiv.org/abs/2306.15521) for more details.\n\nThe MESS benchmark enables a holistic evaluation of semantic segmentation models on a variety of domains and datasets. \nThe MESS benchmark includes 22 datasets for different domains like medicine, engineering, earth monitoring, biology, and agriculture. \nWe designed this toolkit to be easy to use for new model architectures. We invite others to propose new ideas and datasets for future versions.\n\nThe website includes a [leaderboard](https://blumenstiel.github.io/mess-benchmark/leaderboard/) with all evaluated models and links to their implementations.\n\n## Usage\n\nTo test a new model architecture, install the benchmark with `pip install mess-benchmark`, and follow the steps in [DATASETS.md](DATASETS.md) for downloading and preparing the datasets.\nYou can register all datasets by running `import mess.datasets`. See [GettingStarted.md](GettingStarted.md) for more details.\n\n### Zero-shot semantic segmentation\n\nThe current version of the MESS benchmark focuses on zero-shot semantic segmentation, and the toolkit is ready to use for this setting.\n\n### Few-shot and many-shot semantic segmentation\n\nFew-shot and many-shot semantic segmentation is not yet supported by the toolkit, but can easily be added based on the provided preprocessing scripts.\nMost datasets provide a train/val split that can be used for few-shot or supervised training. \nCHASE DB1 and CryoNuSeg do not provide train data themselves, but use other similar datasets for training (DRIVE and STARE for CHASE DB1 and MoNuSeg for CryoNuSeg).\nBDD100K, Dark Zurich, iSAID, and UAVid are evaluated using their official validation split. \nHence, supervised training may require the train set to be split into a train and val dev split.  \n\nThe DRAM dataset only provides an unlabelled train set and would require a style transfer to Pascal VOC for labelled training data.\nThe WorldFloods train set requires approximately 300Gb of disk space, which may not be feasible for some users.\nTherefore, we propose to exclude DRAM and WorldFloods from the few-shot and many-shot settings to simplify the evaluation, called **MESS-20**.\n\n## License\n\nThis code is released under the [MIT License](LICENSE). The evaluated datasets are released under their respective licenses, see [DATASETS.md](DATASETS.md) for details. Most datasets are limited to non-commercial use only and require a citation which are provided in [datasets.bib](datasets.bib).\n\n## Acknowledgement\n\nWe would like to acknowledge the work of the dataset providers, especially for the careful collection and annotation of the datasets. Thank you for making the dataset publicly available!\nSee [DATASETS.md](DATASETS.md) for more details and links to the datasets. We like to further thank the authors of the evaluated models for their work and providing the model weights.\n\n## Citation\n\nPlease cite our [paper](https://arxiv.org/abs/2306.15521) if you use the MESS benchmark and send us your results to be included in the [leaderboard](https://blumenstiel.github.io/mess-benchmark/leaderboard/).\n\n```\n@article{MESSBenchmark2023,\n  title={{What a MESS: Multi-Domain Evaluation of Zero-shot Semantic Segmentation}},\n  author={Blumenstiel, Benedikt and Jakubik, Johannes and K\u00fchne, Hilde and V\u00f6ssing, Michael},\n  journal={Advances in Neural Information Processing Systems},\n  year={2023}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MESS \u2013 Multi-domain Evaluation of Semantic Segmentation",
    "version": "0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/blumenstiel/MESS/issues",
        "Homepage": "https://blumenstiel.github.io/mess-benchmark/"
    },
    "split_keywords": [
        "mess",
        " benchmark",
        " zero-shot",
        " evaluation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d07ba4fcb25d612913cfae703d14301113bf517be8fc83a72c0a66a0a6505902",
                "md5": "e25e453c3378d696c70779f8583f63df",
                "sha256": "1fe485da305a033505f726236b8429fac8d6319b3e268b81d885637ef82fb2cb"
            },
            "downloads": -1,
            "filename": "mess_benchmark-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e25e453c3378d696c70779f8583f63df",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 110744,
            "upload_time": "2024-08-25T17:13:55",
            "upload_time_iso_8601": "2024-08-25T17:13:55.378261Z",
            "url": "https://files.pythonhosted.org/packages/d0/7b/a4fcb25d612913cfae703d14301113bf517be8fc83a72c0a66a0a6505902/mess_benchmark-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "610a2e3fe608e3a6b486d40d820f7ce9a51c2a71615f051265dd4315f75b003f",
                "md5": "dcdb223bf402b1dd634bd38dddbcf98a",
                "sha256": "0d36f8a13ba34a6f3aae03802b0a1af96203ccc97adaae50c1b8e2a7843f1a52"
            },
            "downloads": -1,
            "filename": "mess_benchmark-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dcdb223bf402b1dd634bd38dddbcf98a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 85125,
            "upload_time": "2024-08-25T17:13:57",
            "upload_time_iso_8601": "2024-08-25T17:13:57.711486Z",
            "url": "https://files.pythonhosted.org/packages/61/0a/2e3fe608e3a6b486d40d820f7ce9a51c2a71615f051265dd4315f75b003f/mess_benchmark-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-25 17:13:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blumenstiel",
    "github_project": "MESS",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "mess-benchmark"
}
        
Elapsed time: 0.70074s