flwr-datasets


Nameflwr-datasets JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://flower.dev
SummaryFlower Datasets
upload_time2023-11-15 16:35:30
maintainer
docs_urlNone
authorThe Flower Authors
requires_python>=3.8,<4.0
licenseApache-2.0
keywords flower fl federated learning federated analytics federated evaluation machine learning dataset
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flower Datasets

[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)
![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)
![Downloads](https://pepy.tech/badge/flwr-datasets)
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework. 
Flower Datasets library supports:
* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,
* **partitioning datasets** - customize the partitioning scheme,
* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).

Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
* Hugging Face,
* PyTorch, 
* TensorFlow, 
* Numpy, 
* Pandas, 
* Jax,
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**:
* Partitioner (the abstract base class) `Partitioner`
* IID partitioning `IidPartitioner(num_partitions)`
* Natural ID partitioner `NaturalIdPartitioner`
* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner` 
* Linear partitioner `LinearPartitioner`
* Square partitioner `SquarePartitioner`
* Exponential partitioner `ExponentialPartitioner`
* more to come in future releases.

# Installation

## With pip

Flower Datasets can be installed from PyPi

```bash
pip install flwr-datasets
```

Install with an extension:

* for image datasets:

```bash
pip install flwr-datasets[vision]
```

* for audio datasets:

```bash
pip install flwr-datasets[audio]
```

If you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.

# Usage

Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(node_id, split)` and `load_full(split)`.

Here's a basic quickstart example of how to partition the MNIST dataset:

```
from flwr_datasets import FederatedDataset

# The train split of the MNIST dataset will be partitioned into 100 partitions
mnist_fds = FederatedDataset("mnist", partitioners={"train": 100}

mnist_partition_0 = mnist_fds.load_partition(0, "train")

centralized_data = mnist_fds.load_full("test")
```

For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

# Future release

Here are a few of the things that we will work on in future releases:

* ✅ Support for more datasets (especially the ones that have user id present).
* ✅ Creation of custom `Partitioner`s.
* ✅ More out-of-the-box `Partitioner`s.
* ✅ Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument. 
* ✅ Customization of the dataset splitting before the partitioning.
* Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.

            

Raw data

            {
    "_id": null,
    "home_page": "https://flower.dev",
    "name": "flwr-datasets",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "flower,fl,federated learning,federated analytics,federated evaluation,machine learning,dataset",
    "author": "The Flower Authors",
    "author_email": "hello@flower.dev",
    "download_url": "https://files.pythonhosted.org/packages/ab/0a/910528454132c44a25fe9e1021a814e2fbce474b68e5299a359c71b4fe7c/flwr_datasets-0.0.2.tar.gz",
    "platform": null,
    "description": "# Flower Datasets\n\n[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)\n![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)\n![Downloads](https://pepy.tech/badge/flwr-datasets)\n[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack)\n\nFlower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework. \nFlower Datasets library supports:\n* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,\n* **partitioning datasets** - customize the partitioning scheme,\n* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).\n\nThanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:\n* Hugging Face,\n* PyTorch, \n* TensorFlow, \n* Numpy, \n* Pandas, \n* Jax,\n* Arrow.\n\nCreate **custom partitioning schemes** or choose from the **implemented partitioning schemes**:\n* Partitioner (the abstract base class) `Partitioner`\n* IID partitioning `IidPartitioner(num_partitions)`\n* Natural ID partitioner `NaturalIdPartitioner`\n* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner` \n* Linear partitioner `LinearPartitioner`\n* Square partitioner `SquarePartitioner`\n* Exponential partitioner `ExponentialPartitioner`\n* more to come in future releases.\n\n# Installation\n\n## With pip\n\nFlower Datasets can be installed from PyPi\n\n```bash\npip install flwr-datasets\n```\n\nInstall with an extension:\n\n* for image datasets:\n\n```bash\npip install flwr-datasets[vision]\n```\n\n* for audio datasets:\n\n```bash\npip install flwr-datasets[audio]\n```\n\nIf you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.\n\n# Usage\n\nFlower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(node_id, split)` and `load_full(split)`.\n\nHere's a basic quickstart example of how to partition the MNIST dataset:\n\n```\nfrom flwr_datasets import FederatedDataset\n\n# The train split of the MNIST dataset will be partitioned into 100 partitions\nmnist_fds = FederatedDataset(\"mnist\", partitioners={\"train\": 100}\n\nmnist_partition_0 = mnist_fds.load_partition(0, \"train\")\n\ncentralized_data = mnist_fds.load_full(\"test\")\n```\n\nFor more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.\n\n# Future release\n\nHere are a few of the things that we will work on in future releases:\n\n* \u2705 Support for more datasets (especially the ones that have user id present).\n* \u2705 Creation of custom `Partitioner`s.\n* \u2705 More out-of-the-box `Partitioner`s.\n* \u2705 Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument. \n* \u2705 Customization of the dataset splitting before the partitioning.\n* Simplification of the dataset transformation to the popular frameworks/types.\n* Creation of the synthetic data,\n* Support for Vertical FL.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Flower Datasets",
    "version": "0.0.2",
    "project_urls": {
        "Documentation": "https://flower.dev/docs/datasets",
        "Homepage": "https://flower.dev",
        "Repository": "https://github.com/adap/flower"
    },
    "split_keywords": [
        "flower",
        "fl",
        "federated learning",
        "federated analytics",
        "federated evaluation",
        "machine learning",
        "dataset"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab604483e9b4993e5c6fff7c01f04cfafd845051bf3d38ac904867b1de192750",
                "md5": "35d746a9a52022bea6edf201bd610e1a",
                "sha256": "e30272a167498f7988524a9c7328e6492307c6e0bec9866a44a72c6f8720f552"
            },
            "downloads": -1,
            "filename": "flwr_datasets-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "35d746a9a52022bea6edf201bd610e1a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 22451,
            "upload_time": "2023-11-15T16:35:29",
            "upload_time_iso_8601": "2023-11-15T16:35:29.058369Z",
            "url": "https://files.pythonhosted.org/packages/ab/60/4483e9b4993e5c6fff7c01f04cfafd845051bf3d38ac904867b1de192750/flwr_datasets-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab0a910528454132c44a25fe9e1021a814e2fbce474b68e5299a359c71b4fe7c",
                "md5": "526be7a37c5c93748212c77ed7833414",
                "sha256": "69746a66892ab5cc663b8897446c7bb33c87e82b3950ae7fd8d8ead6de4fcf32"
            },
            "downloads": -1,
            "filename": "flwr_datasets-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "526be7a37c5c93748212c77ed7833414",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 13545,
            "upload_time": "2023-11-15T16:35:30",
            "upload_time_iso_8601": "2023-11-15T16:35:30.863600Z",
            "url": "https://files.pythonhosted.org/packages/ab/0a/910528454132c44a25fe9e1021a814e2fbce474b68e5299a359c71b4fe7c/flwr_datasets-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-15 16:35:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adap",
    "github_project": "flower",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "flwr-datasets"
}
        
Elapsed time: 0.15031s