# image-dataset-loader: Load image datasets as NumPy arrays
[![PyPI](https://img.shields.io/pypi/v/image-dataset-loader.svg)](https://pypi.org/project/image-dataset-loader/)
[![MIT license](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)
## Installation
```bash
pip install image-dataset-loader
```
## Overview
Suppose you have an image dataset in a directory which looks like this:
```
data/
train/
cats/
cat0001.jpg
cat0002.jpg
...
dogs/
dog0001.jpg
dog0002.jpg
...
test/
cats/
cat0001.jpg
cat0002.jpg
...
dogs/
dog0001.jpg
dog0002.jpg
...
```
You can use the `image_dataset_loader.load` function to load this dataset as NumPy arrays:
```python
import image_dataset_loader
(x_train, y_train), (x_test, y_test) = image_dataset_loader.load('path/to/data', ['train', 'test'])
```
The shape of the `x_*` arrays will be `(instances, rows, cols, channels)` for color images and `(instances, rows, cols)` for grayscale images.
Also, the shape of the `y_*` arrays will be `(instances,)`.
All images in the dataset must have the same shape.
Also, all data subsets (i.e., `train` and `test` in this example) must contain the same set of classes.
Class names will be sorted alphabetically.
So, in this example, `cats` and `dogs` will be represented by `0` and `1`, respectively.
You can also load a single data subset. For example:
```python
(x_train, y_train), = image_dataset_loader.load('path/to/data', ['train'])
```
Note that the comma after `(x_train, y_train)` is required, because the function always returns a tuple of tuples.
## API
```python
load(dataset_path, set_names,
shuffle=True, seed=None,
x_dtype='uint8', y_dtype='uint32')
```
- **`dataset_path:`** Path to the dataset directory.
- **`set_names:`** List of the data subsets (subdirectories of the dataset directory).
- **`shuffle:`** Whether to shuffle the samples. If false, instances will be sorted by file name.
- **`seed:`** Random seed used for shuffling (see the [docs](https://docs.python.org/3/library/random.html#random.seed)).
- **`x_dtype:`** NumPy data type for the X arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).
- **`y_dtype:`** NumPy data type for the Y arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).
- Returns a tuple of `(x, y)` tuples corresponding to `set_names`.
Raw data
{
"_id": null,
"home_page": "https://github.com/soroushj/image-dataset-loader",
"name": "image-dataset-loader",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": null,
"keywords": "datasets, image-datasets",
"author": "Soroush Javadi",
"author_email": "soroush.javadi@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/73/d4/3b899a63589c977afe0a836d823fd53359c2699c352ba8cfac6d371d962b/image-dataset-loader-0.0.1.tar.gz",
"platform": null,
"description": "# image-dataset-loader: Load image datasets as NumPy arrays\n\n[![PyPI](https://img.shields.io/pypi/v/image-dataset-loader.svg)](https://pypi.org/project/image-dataset-loader/)\n[![MIT license](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)\n\n## Installation\n\n```bash\npip install image-dataset-loader\n```\n\n## Overview\n\nSuppose you have an image dataset in a directory which looks like this:\n\n```\ndata/\n train/\n cats/\n cat0001.jpg\n cat0002.jpg\n ...\n dogs/\n dog0001.jpg\n dog0002.jpg\n ...\n test/\n cats/\n cat0001.jpg\n cat0002.jpg\n ...\n dogs/\n dog0001.jpg\n dog0002.jpg\n ...\n```\n\nYou can use the `image_dataset_loader.load` function to load this dataset as NumPy arrays:\n\n```python\nimport image_dataset_loader\n\n(x_train, y_train), (x_test, y_test) = image_dataset_loader.load('path/to/data', ['train', 'test'])\n```\n\nThe shape of the `x_*` arrays will be `(instances, rows, cols, channels)` for color images and `(instances, rows, cols)` for grayscale images.\nAlso, the shape of the `y_*` arrays will be `(instances,)`.\n\nAll images in the dataset must have the same shape.\nAlso, all data subsets (i.e., `train` and `test` in this example) must contain the same set of classes.\nClass names will be sorted alphabetically.\nSo, in this example, `cats` and `dogs` will be represented by `0` and `1`, respectively.\n\nYou can also load a single data subset. For example:\n\n```python\n(x_train, y_train), = image_dataset_loader.load('path/to/data', ['train'])\n```\n\nNote that the comma after `(x_train, y_train)` is required, because the function always returns a tuple of tuples.\n\n## API\n\n```python\nload(dataset_path, set_names,\n shuffle=True, seed=None,\n x_dtype='uint8', y_dtype='uint32')\n```\n\n- **`dataset_path:`** Path to the dataset directory.\n- **`set_names:`** List of the data subsets (subdirectories of the dataset directory).\n- **`shuffle:`** Whether to shuffle the samples. If false, instances will be sorted by file name.\n- **`seed:`** Random seed used for shuffling (see the [docs](https://docs.python.org/3/library/random.html#random.seed)).\n- **`x_dtype:`** NumPy data type for the X arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).\n- **`y_dtype:`** NumPy data type for the Y arrays (see the [docs](https://numpy.org/devdocs/user/basics.types.html)).\n- Returns a tuple of `(x, y)` tuples corresponding to `set_names`.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Load image datasets as NumPy arrays",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/soroushj/image-dataset-loader"
},
"split_keywords": [
"datasets",
" image-datasets"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9d5ccb9d9769ab2079886e53823c0bda7dad6ad8ebf21ad051703996865bb84d",
"md5": "a21c68467cae2e147ddaec7b01ca237b",
"sha256": "f1972bb1ba46902d3cea44f880a6b00440399d3615e22b7c8941181e07710c89"
},
"downloads": -1,
"filename": "image_dataset_loader-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a21c68467cae2e147ddaec7b01ca237b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.0",
"size": 4235,
"upload_time": "2024-03-23T10:54:10",
"upload_time_iso_8601": "2024-03-23T10:54:10.348135Z",
"url": "https://files.pythonhosted.org/packages/9d/5c/cb9d9769ab2079886e53823c0bda7dad6ad8ebf21ad051703996865bb84d/image_dataset_loader-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "73d43b899a63589c977afe0a836d823fd53359c2699c352ba8cfac6d371d962b",
"md5": "44833b30e665be7b6755390de57e9b0c",
"sha256": "b72abf6dd44357f5a26516abad34396cf5123cd1726fc13de87bf7301afaa31b"
},
"downloads": -1,
"filename": "image-dataset-loader-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "44833b30e665be7b6755390de57e9b0c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 3781,
"upload_time": "2024-03-23T10:54:12",
"upload_time_iso_8601": "2024-03-23T10:54:12.032825Z",
"url": "https://files.pythonhosted.org/packages/73/d4/3b899a63589c977afe0a836d823fd53359c2699c352ba8cfac6d371d962b/image-dataset-loader-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-23 10:54:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "soroushj",
"github_project": "image-dataset-loader",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "image-dataset-loader"
}