fairness-datasets


Namefairness-datasets JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryPyTorch dataset wrapper for the
upload_time2024-04-18 21:31:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords pytorch dataset fairness adult census income law school
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fairness-datasets
[![PyPI version](https://badge.fury.io/py/fairness-datasets.svg)](https://badge.fury.io/py/fairness-datasets)

PyTorch dataset wrappers for the several popular datasets from 
fair machine learning research.

The following datasets are wrapped:
 - [Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult).
 - [Default](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)
 - [Law School](https://eric.ed.gov/?id=ED469370) (data from [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study))
 - [SouthGerman](https://archive.ics.uci.edu/dataset/573/south+german+credit+update)

## Installation
```shell
pip install fairness-datasets
```

## Basic Usage
```python
from fairnessdatasets import Adult

# load (if necessary, download) the Adult training dataset 
train_set = Adult(root="datasets", download=True)
# load the test set
test_set = Adult(root="datasets", train=False, download=True)

inputs, target = train_set[0]  # retrieve the first sample of the training set

# iterate over the training set
for inputs, target in iter(train_set):
    ...  # Do something with a single sample

# use a PyTorch data loader
from torch.utils.data import DataLoader

loader = DataLoader(test_set, batch_size=32, shuffle=True)
for epoch in range(100):
    for inputs, targets in iter(loader):
        ...  # Do something with a batch of samples
```
You can use `Adult(..., raw=True)` to turn off the one-hot encoding
and z-score normalization applied by the `Adult` class by default.

The remaining dataset classes can be used in the same way as `Adult`.
However, these datasets don't come with a fixed train/test split, 
so that the dataset instances always contain all data.
To create a train/test split, use
```python
from fairnessdatasets import Default
from torch.utils.data import random_split

dataset = Default(root="datasets", download=True)

rng = torch.Generator().manual_seed(42)  # for reproducible results
train_set, test_set = random_split(dataset, [0.7, 0.3], generator=rng)
```

## Advanced Usage

Turn off status messages while downloading the dataset:
```python
Adult(root=..., output_fn=None)
```

Use the `logging` module for logging status messages while downloading the
dataset instead of placing the status messages on `sys.stdout`.
```python
import logging

Adult(root=..., output_fn=logging.info)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fairness-datasets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "PyTorch, Dataset, Fairness, Adult, Census Income, Law School",
    "author": null,
    "author_email": "David Boetius <david.boetius@uni-konstanz.de>",
    "download_url": "https://files.pythonhosted.org/packages/9c/4b/d0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4/fairness_datasets-0.4.0.tar.gz",
    "platform": null,
    "description": "# fairness-datasets\n[![PyPI version](https://badge.fury.io/py/fairness-datasets.svg)](https://badge.fury.io/py/fairness-datasets)\n\nPyTorch dataset wrappers for the several popular datasets from \nfair machine learning research.\n\nThe following datasets are wrapped:\n - [Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult).\n - [Default](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)\n - [Law School](https://eric.ed.gov/?id=ED469370) (data from [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study))\n - [SouthGerman](https://archive.ics.uci.edu/dataset/573/south+german+credit+update)\n\n## Installation\n```shell\npip install fairness-datasets\n```\n\n## Basic Usage\n```python\nfrom fairnessdatasets import Adult\n\n# load (if necessary, download) the Adult training dataset \ntrain_set = Adult(root=\"datasets\", download=True)\n# load the test set\ntest_set = Adult(root=\"datasets\", train=False, download=True)\n\ninputs, target = train_set[0]  # retrieve the first sample of the training set\n\n# iterate over the training set\nfor inputs, target in iter(train_set):\n    ...  # Do something with a single sample\n\n# use a PyTorch data loader\nfrom torch.utils.data import DataLoader\n\nloader = DataLoader(test_set, batch_size=32, shuffle=True)\nfor epoch in range(100):\n    for inputs, targets in iter(loader):\n        ...  # Do something with a batch of samples\n```\nYou can use `Adult(..., raw=True)` to turn off the one-hot encoding\nand z-score normalization applied by the `Adult` class by default.\n\nThe remaining dataset classes can be used in the same way as `Adult`.\nHowever, these datasets don't come with a fixed train/test split, \nso that the dataset instances always contain all data.\nTo create a train/test split, use\n```python\nfrom fairnessdatasets import Default\nfrom torch.utils.data import random_split\n\ndataset = Default(root=\"datasets\", download=True)\n\nrng = torch.Generator().manual_seed(42)  # for reproducible results\ntrain_set, test_set = random_split(dataset, [0.7, 0.3], generator=rng)\n```\n\n## Advanced Usage\n\nTurn off status messages while downloading the dataset:\n```python\nAdult(root=..., output_fn=None)\n```\n\nUse the `logging` module for logging status messages while downloading the\ndataset instead of placing the status messages on `sys.stdout`.\n```python\nimport logging\n\nAdult(root=..., output_fn=logging.info)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PyTorch dataset wrapper for the",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cherrywoods/fairness-datasets/issues",
        "Homepage": "https://github.com/cherrywoods/fairness-datasets",
        "Repository": "https://github.com/cherrywoods/fairness-datasets.git"
    },
    "split_keywords": [
        "pytorch",
        " dataset",
        " fairness",
        " adult",
        " census income",
        " law school"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7d1ab16b824d2ec2bbbe24809b30ce148f77c4510761a104dbe401c00e57e588",
                "md5": "ff61436a2a62d66eece090a00650587f",
                "sha256": "ca2c711f0ca768457bcf5f48cc33ae9a40e10eb77df94e0e16f9c73309b675aa"
            },
            "downloads": -1,
            "filename": "fairness_datasets-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff61436a2a62d66eece090a00650587f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18265,
            "upload_time": "2024-04-18T21:31:48",
            "upload_time_iso_8601": "2024-04-18T21:31:48.934579Z",
            "url": "https://files.pythonhosted.org/packages/7d/1a/b16b824d2ec2bbbe24809b30ce148f77c4510761a104dbe401c00e57e588/fairness_datasets-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9c4bd0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4",
                "md5": "71b19974c05c66f1f4943527b6f5d7fa",
                "sha256": "bca7dc534f064b9941a1a10083c84c684ef4d34fe6f2b4aa2ed7d33cdf7da2d1"
            },
            "downloads": -1,
            "filename": "fairness_datasets-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "71b19974c05c66f1f4943527b6f5d7fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 17528,
            "upload_time": "2024-04-18T21:31:52",
            "upload_time_iso_8601": "2024-04-18T21:31:52.371466Z",
            "url": "https://files.pythonhosted.org/packages/9c/4b/d0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4/fairness_datasets-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-18 21:31:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cherrywoods",
    "github_project": "fairness-datasets",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fairness-datasets"
}
        
Elapsed time: 0.24044s