# fairness-datasets
[![PyPI version](https://badge.fury.io/py/fairness-datasets.svg)](https://badge.fury.io/py/fairness-datasets)
PyTorch dataset wrappers for the several popular datasets from
fair machine learning research.
The following datasets are wrapped:
- [Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult).
- [Default](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)
- [Law School](https://eric.ed.gov/?id=ED469370) (data from [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study))
- [SouthGerman](https://archive.ics.uci.edu/dataset/573/south+german+credit+update)
## Installation
```shell
pip install fairness-datasets
```
## Basic Usage
```python
from fairnessdatasets import Adult
# load (if necessary, download) the Adult training dataset
train_set = Adult(root="datasets", download=True)
# load the test set
test_set = Adult(root="datasets", train=False, download=True)
inputs, target = train_set[0] # retrieve the first sample of the training set
# iterate over the training set
for inputs, target in iter(train_set):
... # Do something with a single sample
# use a PyTorch data loader
from torch.utils.data import DataLoader
loader = DataLoader(test_set, batch_size=32, shuffle=True)
for epoch in range(100):
for inputs, targets in iter(loader):
... # Do something with a batch of samples
```
You can use `Adult(..., raw=True)` to turn off the one-hot encoding
and z-score normalization applied by the `Adult` class by default.
The remaining dataset classes can be used in the same way as `Adult`.
However, these datasets don't come with a fixed train/test split,
so that the dataset instances always contain all data.
To create a train/test split, use
```python
from fairnessdatasets import Default
from torch.utils.data import random_split
dataset = Default(root="datasets", download=True)
rng = torch.Generator().manual_seed(42) # for reproducible results
train_set, test_set = random_split(dataset, [0.7, 0.3], generator=rng)
```
## Advanced Usage
Turn off status messages while downloading the dataset:
```python
Adult(root=..., output_fn=None)
```
Use the `logging` module for logging status messages while downloading the
dataset instead of placing the status messages on `sys.stdout`.
```python
import logging
Adult(root=..., output_fn=logging.info)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "fairness-datasets",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "PyTorch, Dataset, Fairness, Adult, Census Income, Law School",
"author": null,
"author_email": "David Boetius <david.boetius@uni-konstanz.de>",
"download_url": "https://files.pythonhosted.org/packages/9c/4b/d0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4/fairness_datasets-0.4.0.tar.gz",
"platform": null,
"description": "# fairness-datasets\n[![PyPI version](https://badge.fury.io/py/fairness-datasets.svg)](https://badge.fury.io/py/fairness-datasets)\n\nPyTorch dataset wrappers for the several popular datasets from \nfair machine learning research.\n\nThe following datasets are wrapped:\n - [Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult).\n - [Default](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)\n - [Law School](https://eric.ed.gov/?id=ED469370) (data from [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study))\n - [SouthGerman](https://archive.ics.uci.edu/dataset/573/south+german+credit+update)\n\n## Installation\n```shell\npip install fairness-datasets\n```\n\n## Basic Usage\n```python\nfrom fairnessdatasets import Adult\n\n# load (if necessary, download) the Adult training dataset \ntrain_set = Adult(root=\"datasets\", download=True)\n# load the test set\ntest_set = Adult(root=\"datasets\", train=False, download=True)\n\ninputs, target = train_set[0] # retrieve the first sample of the training set\n\n# iterate over the training set\nfor inputs, target in iter(train_set):\n ... # Do something with a single sample\n\n# use a PyTorch data loader\nfrom torch.utils.data import DataLoader\n\nloader = DataLoader(test_set, batch_size=32, shuffle=True)\nfor epoch in range(100):\n for inputs, targets in iter(loader):\n ... # Do something with a batch of samples\n```\nYou can use `Adult(..., raw=True)` to turn off the one-hot encoding\nand z-score normalization applied by the `Adult` class by default.\n\nThe remaining dataset classes can be used in the same way as `Adult`.\nHowever, these datasets don't come with a fixed train/test split, \nso that the dataset instances always contain all data.\nTo create a train/test split, use\n```python\nfrom fairnessdatasets import Default\nfrom torch.utils.data import random_split\n\ndataset = Default(root=\"datasets\", download=True)\n\nrng = torch.Generator().manual_seed(42) # for reproducible results\ntrain_set, test_set = random_split(dataset, [0.7, 0.3], generator=rng)\n```\n\n## Advanced Usage\n\nTurn off status messages while downloading the dataset:\n```python\nAdult(root=..., output_fn=None)\n```\n\nUse the `logging` module for logging status messages while downloading the\ndataset instead of placing the status messages on `sys.stdout`.\n```python\nimport logging\n\nAdult(root=..., output_fn=logging.info)\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "PyTorch dataset wrapper for the",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/cherrywoods/fairness-datasets/issues",
"Homepage": "https://github.com/cherrywoods/fairness-datasets",
"Repository": "https://github.com/cherrywoods/fairness-datasets.git"
},
"split_keywords": [
"pytorch",
" dataset",
" fairness",
" adult",
" census income",
" law school"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7d1ab16b824d2ec2bbbe24809b30ce148f77c4510761a104dbe401c00e57e588",
"md5": "ff61436a2a62d66eece090a00650587f",
"sha256": "ca2c711f0ca768457bcf5f48cc33ae9a40e10eb77df94e0e16f9c73309b675aa"
},
"downloads": -1,
"filename": "fairness_datasets-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ff61436a2a62d66eece090a00650587f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 18265,
"upload_time": "2024-04-18T21:31:48",
"upload_time_iso_8601": "2024-04-18T21:31:48.934579Z",
"url": "https://files.pythonhosted.org/packages/7d/1a/b16b824d2ec2bbbe24809b30ce148f77c4510761a104dbe401c00e57e588/fairness_datasets-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9c4bd0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4",
"md5": "71b19974c05c66f1f4943527b6f5d7fa",
"sha256": "bca7dc534f064b9941a1a10083c84c684ef4d34fe6f2b4aa2ed7d33cdf7da2d1"
},
"downloads": -1,
"filename": "fairness_datasets-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "71b19974c05c66f1f4943527b6f5d7fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 17528,
"upload_time": "2024-04-18T21:31:52",
"upload_time_iso_8601": "2024-04-18T21:31:52.371466Z",
"url": "https://files.pythonhosted.org/packages/9c/4b/d0d6e6d40adab59773b33141bacfd380f706f0cf908b40d148b60701f8f4/fairness_datasets-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-18 21:31:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cherrywoods",
"github_project": "fairness-datasets",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "fairness-datasets"
}