flwr-datasets


Nameflwr-datasets JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://flower.ai
SummaryFlower Datasets
upload_time2024-10-22 12:32:55
maintainerNone
docs_urlNone
authorThe Flower Authors
requires_python<4.0,>=3.9
licenseApache-2.0
keywords flower fl federated learning federated analytics federated evaluation machine learning dataset
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flower Datasets

[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)
![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)
![Downloads](https://pepy.tech/badge/flwr-datasets)
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.ai/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework.


> [!TIP]
> For complete documentation that includes API docs, how-to guides and tutorials, please visit the [Flower Datasets Documentation](https://flower.ai/docs/datasets/) and for full FL example see the [Flower Examples page](https://github.com/adap/flower/tree/main/examples).

## Installation

For a complete installation guide visit the [Flower Datasets Documentation](https://flower.ai/docs/datasets/)

```bash
pip install flwr-datasets[vision]
```

## Overview

Flower Datasets library supports:
* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,
* **partitioning datasets** - customize the partitioning scheme,
* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).

Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
* Hugging Face,
* PyTorch,
* TensorFlow,
* Numpy,
* Pandas,
* Jax,
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented [partitioning schemes](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html#module-flwr_datasets.partitioner)**:

* Partitioner (the abstract base class) `Partitioner`
* IID partitioning `IidPartitioner(num_partitions)`
* Dirichlet partitioning `DirichletPartitioner(num_partitions, partition_by, alpha)`
* Distribution partitioning `DistributionPartitioner(distribution_array, num_partitions, num_unique_labels_per_partition, partition_by, preassigned_num_samples_per_label, rescale)`
* InnerDirichlet partitioning `InnerDirichletPartitioner(partition_sizes, partition_by, alpha)`
* Pathological partitioning `PathologicalPartitioner(num_partitions, partition_by, num_classes_per_partition, class_assignment_mode)`
* Natural ID partitioning `NaturalIdPartitioner(partition_by)`
* Size based partitioning (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner`
* Linear partitioning `LinearPartitioner(num_partitions)`
* Square partitioning `SquarePartitioner(num_partitions)`
* Exponential partitioning `ExponentialPartitioner(num_partitions)`
* more to come in the future releases (contributions are welcome).
<p align="center">
  <img src="./doc/source/_static/readme/comparison_of_partitioning_schemes.png" alt="Comparison of partitioning schemes."/>
  <br>
  <em>Comparison of Partitioning Schemes on CIFAR10</em>
</p>

PS: This plot was generated using a library function (see [flwr_datasets.visualization](https://flower.ai/docs/datasets/ref-api/flwr_datasets.visualization.html) package for more).


## Usage

Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(partition_id, split)` and `load_split(split)`.

Here's a basic quickstart example of how to partition the MNIST dataset:

```
from flwr_datasets import FederatedDataset
from flwr_datasets.partitioners import IidPartitioner

# The train split of the MNIST dataset will be partitioned into 100 partitions
partitioner = IidPartitioner(num_partitions=100)
fds = FederatedDataset("ylecun/mnist", partitioners={"train": partitioner})

partition = fds.load_partition(0)

centralized_data = fds.load_split("test")
```

For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

## Future release

Here are a few of the things that we will work on in future releases:

* ✅ Support for more datasets (especially the ones that have user id present).
* ✅ Creation of custom `Partitioner`s.
* ✅ More out-of-the-box `Partitioner`s.
* ✅ Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument.
* ✅ Customization of the dataset splitting before the partitioning.
* ✅ Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.

            

Raw data

            {
    "_id": null,
    "home_page": "https://flower.ai",
    "name": "flwr-datasets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "flower, fl, federated learning, federated analytics, federated evaluation, machine learning, dataset",
    "author": "The Flower Authors",
    "author_email": "hello@flower.ai",
    "download_url": "https://files.pythonhosted.org/packages/01/ae/fd1148ac7d79d400cb7a15a0cc06cf967f627790bf868c686ebfb05cb546/flwr_datasets-0.4.0.tar.gz",
    "platform": null,
    "description": "# Flower Datasets\n\n[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)\n![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)\n![Downloads](https://pepy.tech/badge/flwr-datasets)\n[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.ai/join-slack)\n\nFlower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework.\n\n\n> [!TIP]\n> For complete documentation that includes API docs, how-to guides and tutorials, please visit the [Flower Datasets Documentation](https://flower.ai/docs/datasets/) and for full FL example see the [Flower Examples page](https://github.com/adap/flower/tree/main/examples).\n\n## Installation\n\nFor a complete installation guide visit the [Flower Datasets Documentation](https://flower.ai/docs/datasets/)\n\n```bash\npip install flwr-datasets[vision]\n```\n\n## Overview\n\nFlower Datasets library supports:\n* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,\n* **partitioning datasets** - customize the partitioning scheme,\n* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).\n\nThanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:\n* Hugging Face,\n* PyTorch,\n* TensorFlow,\n* Numpy,\n* Pandas,\n* Jax,\n* Arrow.\n\nCreate **custom partitioning schemes** or choose from the **implemented [partitioning schemes](https://flower.ai/docs/datasets/ref-api/flwr_datasets.partitioner.html#module-flwr_datasets.partitioner)**:\n\n* Partitioner (the abstract base class) `Partitioner`\n* IID partitioning `IidPartitioner(num_partitions)`\n* Dirichlet partitioning `DirichletPartitioner(num_partitions, partition_by, alpha)`\n* Distribution partitioning `DistributionPartitioner(distribution_array, num_partitions, num_unique_labels_per_partition, partition_by, preassigned_num_samples_per_label, rescale)`\n* InnerDirichlet partitioning `InnerDirichletPartitioner(partition_sizes, partition_by, alpha)`\n* Pathological partitioning `PathologicalPartitioner(num_partitions, partition_by, num_classes_per_partition, class_assignment_mode)`\n* Natural ID partitioning `NaturalIdPartitioner(partition_by)`\n* Size based partitioning (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner`\n* Linear partitioning `LinearPartitioner(num_partitions)`\n* Square partitioning `SquarePartitioner(num_partitions)`\n* Exponential partitioning `ExponentialPartitioner(num_partitions)`\n* more to come in the future releases (contributions are welcome).\n<p align=\"center\">\n  <img src=\"./doc/source/_static/readme/comparison_of_partitioning_schemes.png\" alt=\"Comparison of partitioning schemes.\"/>\n  <br>\n  <em>Comparison of Partitioning Schemes on CIFAR10</em>\n</p>\n\nPS: This plot was generated using a library function (see [flwr_datasets.visualization](https://flower.ai/docs/datasets/ref-api/flwr_datasets.visualization.html) package for more).\n\n\n## Usage\n\nFlower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(partition_id, split)` and `load_split(split)`.\n\nHere's a basic quickstart example of how to partition the MNIST dataset:\n\n```\nfrom flwr_datasets import FederatedDataset\nfrom flwr_datasets.partitioners import IidPartitioner\n\n# The train split of the MNIST dataset will be partitioned into 100 partitions\npartitioner = IidPartitioner(num_partitions=100)\nfds = FederatedDataset(\"ylecun/mnist\", partitioners={\"train\": partitioner})\n\npartition = fds.load_partition(0)\n\ncentralized_data = fds.load_split(\"test\")\n```\n\nFor more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.\n\n## Future release\n\nHere are a few of the things that we will work on in future releases:\n\n* \u2705 Support for more datasets (especially the ones that have user id present).\n* \u2705 Creation of custom `Partitioner`s.\n* \u2705 More out-of-the-box `Partitioner`s.\n* \u2705 Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument.\n* \u2705 Customization of the dataset splitting before the partitioning.\n* \u2705 Simplification of the dataset transformation to the popular frameworks/types.\n* Creation of the synthetic data,\n* Support for Vertical FL.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Flower Datasets",
    "version": "0.4.0",
    "project_urls": {
        "Documentation": "https://flower.ai/docs/datasets",
        "Homepage": "https://flower.ai",
        "Repository": "https://github.com/adap/flower"
    },
    "split_keywords": [
        "flower",
        " fl",
        " federated learning",
        " federated analytics",
        " federated evaluation",
        " machine learning",
        " dataset"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8731d255d03053fd29d0eb1fff99bc983026eec68ce55e7f08853036b9ea9195",
                "md5": "6fdacbf2a8d4ebc4976ff23b5e10bb40",
                "sha256": "8accc0c88520914d79e826655bee5ce014b124e5e32db1778525202f58d12b8b"
            },
            "downloads": -1,
            "filename": "flwr_datasets-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6fdacbf2a8d4ebc4976ff23b5e10bb40",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 78689,
            "upload_time": "2024-10-22T12:32:53",
            "upload_time_iso_8601": "2024-10-22T12:32:53.693266Z",
            "url": "https://files.pythonhosted.org/packages/87/31/d255d03053fd29d0eb1fff99bc983026eec68ce55e7f08853036b9ea9195/flwr_datasets-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "01aefd1148ac7d79d400cb7a15a0cc06cf967f627790bf868c686ebfb05cb546",
                "md5": "99b5d40a5bd7c00f7b4b32ad20f296a8",
                "sha256": "53f4d955c394d1731abb97bd47eaf5d50048f5bd0310548f18082ec2f7004295"
            },
            "downloads": -1,
            "filename": "flwr_datasets-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "99b5d40a5bd7c00f7b4b32ad20f296a8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 49119,
            "upload_time": "2024-10-22T12:32:55",
            "upload_time_iso_8601": "2024-10-22T12:32:55.721662Z",
            "url": "https://files.pythonhosted.org/packages/01/ae/fd1148ac7d79d400cb7a15a0cc06cf967f627790bf868c686ebfb05cb546/flwr_datasets-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 12:32:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adap",
    "github_project": "flower",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "flwr-datasets"
}
        
Elapsed time: 2.34923s