wildlife-datasets


Namewildlife-datasets JSON
Version 1.0.2 PyPI version JSON
download
home_page
SummaryLibrary for easier access and research of wildlife re-identification datasets
upload_time2024-02-28 09:16:00
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License Copyright (c) 2024 Lukáš Adam and Vojtěch Čermák Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords wildlife re-identification datasets
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/issues"><img src="https://img.shields.io/github/issues/WildlifeDatasets/wildlife-datasets" alt="GitHub issues"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/pulls"><img src="https://img.shields.io/github/issues-pr/WildlifeDatasets/wildlife-datasets" alt="GitHub pull requests"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/graphs/contributors"><img src="https://img.shields.io/github/contributors/WildlifeDatasets/wildlife-datasets" alt="GitHub contributors"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/network/members"><img src="https://img.shields.io/github/forks/WildlifeDatasets/wildlife-datasets" alt="GitHub forks"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/stargazers"><img src="https://img.shields.io/github/stars/WildlifeDatasets/wildlife-datasets" alt="GitHub stars"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/watchers"><img src="https://img.shields.io/github/watchers/WildlifeDatasets/wildlife-datasets" alt="GitHub watchers"></a>
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/blob/main/LICENSE"><img src="https://img.shields.io/github/license/WildlifeDatasets/wildlife-datasets" alt="License"></a>
</p>

<p align="center">
<img src="docs/resources/datasets-logo.png" alt="Wildlife datasets" width="300">
</p>

<div align="center">
  <p align="center">Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.</p>
  <a href="https://wildlifedatasets.github.io/wildlife-datasets/">Documentation</a>
  ·
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/issues/new?assignees=aerodynamic-sauce-pan&labels=bug&projects=&template=bug_report.md&title=%5BBUG%5D">Report Bug</a>
  ·
  <a href="https://github.com/WildlifeDatasets/wildlife-datasets/issues/new?assignees=aerodynamic-sauce-pan&labels=enhancement&projects=&template=enhancement.md&title=%5BEnhancement%5D">Request Feature</a>
</div>

</br>

| <a href="https://github.com/WildlifeDatasets/wildlife-datasets"><img src="docs/resources/datasets-logo.png" alt="Wildlife datasets" width="200"></a>  | <a href="https://huggingface.co/BVRA/MegaDescriptor-L-384"><img src="docs/resources/megadescriptor-logo.png" alt="MegaDescriptor" width="200"></a> | <a href="https://github.com/WildlifeDatasets/wildlife-tools"><img src="docs/resources/tools-logo.png" alt="Wildlife tools" width="200"></a> |
|:--------------:|:-----------:|:------------:|
| Datasets for identification of individual animals | Trained model for individual re&#x2011;identification  | Tools for training re&#x2011;identification models |

</br>

## Wildlife Re-Identification (Re-ID) Datasets

The aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:

- overview of 33 publicly available wildlife re-identification datasets.
- utilities to mass download and convert them into a unified format and fix some wrong labels.
- default splits for several machine learning tasks including the ability create additional splits.

An introductory example is provided in a [Jupyter notebook](notebooks/introduction.ipynb). The package provides a natural synergy with [Wildlife tools](https://github.com/WildlifeDatasets/wildlife-tools), which provides our [MegaDescriptor](https://huggingface.co/BVRA/MegaDescriptor-L-384) model and tools for training neural networks. 

## Summary of datasets

An overview of the provided datasets is available in the [documentation](https://wildlifedatasets.github.io/wildlife-datasets/datasets/), while the more numerical summary is located in a [Jupyter notebook](notebooks/dataset_descriptions.ipynb). Due to its size, it may be necessary to view it via [nbviewer](https://nbviewer.org/github/WildlifeDatasets/wildlife-datasets/blob/main/notebooks/dataset_descriptions.ipynb).

We include basic characteristics such as publication years, number of images, number of individuals, dataset time span (difference between the last and first image taken) and additional information such as source, number of poses, inclusion of timestamps, whether the animals were captured in the wild and whether the dataset contain multiple species.

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="docs/resources/Datasets_Summary_inverted.png">
  <source media="(prefers-color-scheme: light)" srcset="docs/resources/Datasets_Summary.png">
  <img alt="Dataset summary" src="docs/resources/Datasets_Summary.png">
</picture>


## Installation

The installation of the package is simple by
```
pip install wildlife-datasets
```


## Basic functionality

We show an example of downloading, extracting and processing the MacaqueFaces dataset.

```
from wildlife_datasets import analysis, datasets

datasets.MacaqueFaces.get_data('data/MacaqueFaces')
dataset = datasets.MacaqueFaces('data/MacaqueFaces')
```

The class `dataset` contains the summary of the dataset. The content depends on the dataset. Each dataset contains the identity and paths to images. This particular dataset also contains information about the date taken and contrast. Other datasets store information about bounding boxes, segmentation masks, position from which the image was taken, keypoints or various other information such as age or gender.

```
dataset.df
```

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="docs/resources/MacaqueFaces_DataFrame_inverted.png">
  <source media="(prefers-color-scheme: light)" srcset="docs/resources/MacaqueFaces_DataFrame.png">
  <img alt="Overview of the MacaqueFaces dataset" src="docs/resources/MacaqueFaces_DataFrame.png">
</picture>

The dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.

```
dataset.metadata
```

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="docs/resources/MacaqueFaces_Metadata_inverted.png">
  <source media="(prefers-color-scheme: light)" srcset="docs/resources/MacaqueFaces_Metadata.png">
  <img alt="Metadata of the MacaqueFaces dataset" src="docs/resources/MacaqueFaces_Metadata.png">
</picture>

This particular dataset already contains cropped images of faces. Other datasets may contain uncropped images with bounding boxes or even segmentation masks.

```
d.plot_grid()
```

![](docs/resources/MacaqueFaces_Grid.png)

## Additional functionality

For additional functionality including mass loading, datasets splitting or evaluation metrics we refer to the [documentation](https://wildlifedatasets.github.io/wildlife-datasets/) or the [notebooks](https://github.com/WildlifeDatasets/wildlife-datasets/tree/main/notebooks).

## Citation

If you like our package, please cite our [paper](https://openaccess.thecvf.com/content/WACV2024/html/Cermak_WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification_WACV_2024_paper.html). You may be also interested in our [SeaTurtleID](https://www.kaggle.com/datasets/wildlifedatasets/seaturtleidheads) dataset published in another [paper](https://openaccess.thecvf.com/content/WACV2024/html/Adam_SeaTurtleID2022_A_Long-Span_Dataset_for_Reliable_Sea_Turtle_Re-Identification_WACV_2024_paper.html).

```
@InProceedings{Cermak_2024_WACV,
    author    = {\v{C}erm\'ak, Vojt\v{e}ch and Picek, Luk\'a\v{s} and Adam, Luk\'a\v{s} and Papafitsoros, Kostas},
    title     = {{WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification}},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {5953-5963}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "wildlife-datasets",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Luk\u00e1\u0161 Adam <lukas.adam.cr@gmail.com>, Vojt\u011bch \u010cerm\u00e1k <cermak.vojtech@seznam.cz>",
    "keywords": "wildlife,re-identification,datasets",
    "author": "",
    "author_email": "Luk\u00e1\u0161 Adam <lukas.adam.cr@gmail.com>, Vojt\u011bch \u010cerm\u00e1k <cermak.vojtech@seznam.cz>",
    "download_url": "https://files.pythonhosted.org/packages/2f/63/3b6889f25edc774f8219b90713f4dd63eae50a92f3c2d607406f4cca0d43/wildlife-datasets-1.0.2.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/issues\"><img src=\"https://img.shields.io/github/issues/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub issues\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/pulls\"><img src=\"https://img.shields.io/github/issues-pr/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub pull requests\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/graphs/contributors\"><img src=\"https://img.shields.io/github/contributors/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub contributors\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/network/members\"><img src=\"https://img.shields.io/github/forks/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub forks\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/stargazers\"><img src=\"https://img.shields.io/github/stars/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub stars\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/watchers\"><img src=\"https://img.shields.io/github/watchers/WildlifeDatasets/wildlife-datasets\" alt=\"GitHub watchers\"></a>\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/blob/main/LICENSE\"><img src=\"https://img.shields.io/github/license/WildlifeDatasets/wildlife-datasets\" alt=\"License\"></a>\r\n</p>\r\n\r\n<p align=\"center\">\r\n<img src=\"docs/resources/datasets-logo.png\" alt=\"Wildlife datasets\" width=\"300\">\r\n</p>\r\n\r\n<div align=\"center\">\r\n  <p align=\"center\">Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.</p>\r\n  <a href=\"https://wildlifedatasets.github.io/wildlife-datasets/\">Documentation</a>\r\n  \u00b7\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/issues/new?assignees=aerodynamic-sauce-pan&labels=bug&projects=&template=bug_report.md&title=%5BBUG%5D\">Report Bug</a>\r\n  \u00b7\r\n  <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets/issues/new?assignees=aerodynamic-sauce-pan&labels=enhancement&projects=&template=enhancement.md&title=%5BEnhancement%5D\">Request Feature</a>\r\n</div>\r\n\r\n</br>\r\n\r\n| <a href=\"https://github.com/WildlifeDatasets/wildlife-datasets\"><img src=\"docs/resources/datasets-logo.png\" alt=\"Wildlife datasets\" width=\"200\"></a>  | <a href=\"https://huggingface.co/BVRA/MegaDescriptor-L-384\"><img src=\"docs/resources/megadescriptor-logo.png\" alt=\"MegaDescriptor\" width=\"200\"></a> | <a href=\"https://github.com/WildlifeDatasets/wildlife-tools\"><img src=\"docs/resources/tools-logo.png\" alt=\"Wildlife tools\" width=\"200\"></a> |\r\n|:--------------:|:-----------:|:------------:|\r\n| Datasets for identification of individual animals | Trained model for individual re&#x2011;identification  | Tools for training re&#x2011;identification models |\r\n\r\n</br>\r\n\r\n## Wildlife Re-Identification (Re-ID) Datasets\r\n\r\nThe aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:\r\n\r\n- overview of 33 publicly available wildlife re-identification datasets.\r\n- utilities to mass download and convert them into a unified format and fix some wrong labels.\r\n- default splits for several machine learning tasks including the ability create additional splits.\r\n\r\nAn introductory example is provided in a [Jupyter notebook](notebooks/introduction.ipynb). The package provides a natural synergy with [Wildlife tools](https://github.com/WildlifeDatasets/wildlife-tools), which provides our [MegaDescriptor](https://huggingface.co/BVRA/MegaDescriptor-L-384) model and tools for training neural networks. \r\n\r\n## Summary of datasets\r\n\r\nAn overview of the provided datasets is available in the [documentation](https://wildlifedatasets.github.io/wildlife-datasets/datasets/), while the more numerical summary is located in a [Jupyter notebook](notebooks/dataset_descriptions.ipynb). Due to its size, it may be necessary to view it via [nbviewer](https://nbviewer.org/github/WildlifeDatasets/wildlife-datasets/blob/main/notebooks/dataset_descriptions.ipynb).\r\n\r\nWe include basic characteristics such as publication years, number of images, number of individuals, dataset time span (difference between the last and first image taken) and additional information such as source, number of poses, inclusion of timestamps, whether the animals were captured in the wild and whether the dataset contain multiple species.\r\n\r\n<picture>\r\n  <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/resources/Datasets_Summary_inverted.png\">\r\n  <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/resources/Datasets_Summary.png\">\r\n  <img alt=\"Dataset summary\" src=\"docs/resources/Datasets_Summary.png\">\r\n</picture>\r\n\r\n\r\n## Installation\r\n\r\nThe installation of the package is simple by\r\n```\r\npip install wildlife-datasets\r\n```\r\n\r\n\r\n## Basic functionality\r\n\r\nWe show an example of downloading, extracting and processing the MacaqueFaces dataset.\r\n\r\n```\r\nfrom wildlife_datasets import analysis, datasets\r\n\r\ndatasets.MacaqueFaces.get_data('data/MacaqueFaces')\r\ndataset = datasets.MacaqueFaces('data/MacaqueFaces')\r\n```\r\n\r\nThe class `dataset` contains the summary of the dataset. The content depends on the dataset. Each dataset contains the identity and paths to images. This particular dataset also contains information about the date taken and contrast. Other datasets store information about bounding boxes, segmentation masks, position from which the image was taken, keypoints or various other information such as age or gender.\r\n\r\n```\r\ndataset.df\r\n```\r\n\r\n<picture>\r\n  <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/resources/MacaqueFaces_DataFrame_inverted.png\">\r\n  <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/resources/MacaqueFaces_DataFrame.png\">\r\n  <img alt=\"Overview of the MacaqueFaces dataset\" src=\"docs/resources/MacaqueFaces_DataFrame.png\">\r\n</picture>\r\n\r\nThe dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.\r\n\r\n```\r\ndataset.metadata\r\n```\r\n\r\n<picture>\r\n  <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/resources/MacaqueFaces_Metadata_inverted.png\">\r\n  <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/resources/MacaqueFaces_Metadata.png\">\r\n  <img alt=\"Metadata of the MacaqueFaces dataset\" src=\"docs/resources/MacaqueFaces_Metadata.png\">\r\n</picture>\r\n\r\nThis particular dataset already contains cropped images of faces. Other datasets may contain uncropped images with bounding boxes or even segmentation masks.\r\n\r\n```\r\nd.plot_grid()\r\n```\r\n\r\n![](docs/resources/MacaqueFaces_Grid.png)\r\n\r\n## Additional functionality\r\n\r\nFor additional functionality including mass loading, datasets splitting or evaluation metrics we refer to the [documentation](https://wildlifedatasets.github.io/wildlife-datasets/) or the [notebooks](https://github.com/WildlifeDatasets/wildlife-datasets/tree/main/notebooks).\r\n\r\n## Citation\r\n\r\nIf you like our package, please cite our [paper](https://openaccess.thecvf.com/content/WACV2024/html/Cermak_WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification_WACV_2024_paper.html). You may be also interested in our [SeaTurtleID](https://www.kaggle.com/datasets/wildlifedatasets/seaturtleidheads) dataset published in another [paper](https://openaccess.thecvf.com/content/WACV2024/html/Adam_SeaTurtleID2022_A_Long-Span_Dataset_for_Reliable_Sea_Turtle_Re-Identification_WACV_2024_paper.html).\r\n\r\n```\r\n@InProceedings{Cermak_2024_WACV,\r\n    author    = {\\v{C}erm\\'ak, Vojt\\v{e}ch and Picek, Luk\\'a\\v{s} and Adam, Luk\\'a\\v{s} and Papafitsoros, Kostas},\r\n    title     = {{WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification}},\r\n    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},\r\n    month     = {January},\r\n    year      = {2024},\r\n    pages     = {5953-5963}\r\n}\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Luk\u00e1\u0161 Adam and Vojt\u011bch \u010cerm\u00e1k  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Library for easier access and research of wildlife re-identification datasets",
    "version": "1.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/WildlifeDatasets/wildlife-datasets/issues",
        "Documentation": "https://wildlifedatasets.github.io/wildlife-datasets/",
        "Homepage": "https://github.com/WildlifeDatasets/wildlife-datasets"
    },
    "split_keywords": [
        "wildlife",
        "re-identification",
        "datasets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72f14f905989ccb4652ee0d8ec241d3c470c30c48b6ac91f1590ff365608dbd2",
                "md5": "5ddcd0c6bcd1776627c1ea0c13df2571",
                "sha256": "51eb34a468bf5cbd14983cfc44aaca6f742b8e28ee4ed4788920dac89cf439f2"
            },
            "downloads": -1,
            "filename": "wildlife_datasets-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ddcd0c6bcd1776627c1ea0c13df2571",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 42204,
            "upload_time": "2024-02-28T09:15:59",
            "upload_time_iso_8601": "2024-02-28T09:15:59.025566Z",
            "url": "https://files.pythonhosted.org/packages/72/f1/4f905989ccb4652ee0d8ec241d3c470c30c48b6ac91f1590ff365608dbd2/wildlife_datasets-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f633b6889f25edc774f8219b90713f4dd63eae50a92f3c2d607406f4cca0d43",
                "md5": "e64a383abfd05ce948ed52a5d3d0aaba",
                "sha256": "0e372d84a8f0c72b91373a8f0c3a4eaef022a2e5e62ef29ac80f91da2006a4c7"
            },
            "downloads": -1,
            "filename": "wildlife-datasets-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e64a383abfd05ce948ed52a5d3d0aaba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 39330,
            "upload_time": "2024-02-28T09:16:00",
            "upload_time_iso_8601": "2024-02-28T09:16:00.591375Z",
            "url": "https://files.pythonhosted.org/packages/2f/63/3b6889f25edc774f8219b90713f4dd63eae50a92f3c2d607406f4cca0d43/wildlife-datasets-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-28 09:16:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "WildlifeDatasets",
    "github_project": "wildlife-datasets",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "wildlife-datasets"
}
        
Elapsed time: 0.20474s