| Name | files-dataset JSON |
| Version |
0.1.4
JSON |
| download |
| home_page | None |
| Summary | Simple, composable standard for storing datasets as one sample per file |
| upload_time | 2024-08-09 20:48:03 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Files Dataset
> Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.
```bash
pip install files-dataset
```
### Format
A dataset folder looks something like this:
```
my-dataset/
meta.json
car-images/
001.jpg
002.jpg
003.jpg
train-images/
images.tar
...
```
`meta.json`:
```json
{
"files_dataset": {
"cars": {
"archive": "car-images/*.jpg",
"num_files": 3000 // optionally specify the number of files
},
"trains": {
"archive": "train-images/images.tar",
"format": "tar",
"num_files": 10000
}
},
// you can add other stuff if you want to
}
```
### Usage
```python
import files_dataset as fds
ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None
for x in ds.samples('inputs', 'labels'):
x['cars'] # the first car image
x['trains'] # the first train image (extracted from the TAR archive)
```
A common convenience to use is:
```python
import files_dataset as fds
datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
...
```
And that's it! Simple.
Raw data
{
"_id": null,
"home_page": null,
"name": "files-dataset",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Marcel Claramunt <marcel@moveread.com>",
"download_url": "https://files.pythonhosted.org/packages/c2/37/faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665/files_dataset-0.1.4.tar.gz",
"platform": null,
"description": "# Files Dataset\n\n> Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.\n\n```bash\npip install files-dataset\n```\n\n### Format\n\nA dataset folder looks something like this:\n\n```\nmy-dataset/\n meta.json\n car-images/\n 001.jpg\n 002.jpg\n 003.jpg\n train-images/\n images.tar\n ...\n```\n\n`meta.json`:\n```json\n{\n \"files_dataset\": {\n \"cars\": {\n \"archive\": \"car-images/*.jpg\",\n \"num_files\": 3000 // optionally specify the number of files\n },\n \"trains\": {\n \"archive\": \"train-images/images.tar\",\n \"format\": \"tar\",\n \"num_files\": 10000\n }\n },\n // you can add other stuff if you want to\n}\n```\n\n### Usage\n\n```python\nimport files_dataset as fds\n\nds = fds.Dataset.read('path/to/my-dataset')\nnum_samples = ds.len('cars', 'trains') # int | None\n\nfor x in ds.samples('inputs', 'labels'):\n x['cars'] # the first car image\n x['trains'] # the first train image (extracted from the TAR archive)\n```\n\nA common convenience to use is:\n\n```python\nimport files_dataset as fds\n\ndatasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]\nfor x in fds.chain(datasets, 'trains', 'cars'):\n ...\n```\n\nAnd that's it! Simple.\n",
"bugtrack_url": null,
"license": null,
"summary": "Simple, composable standard for storing datasets as one sample per file",
"version": "0.1.4",
"project_urls": {
"repo": "https://github.com/marciclabas/mltools.git"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f75c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7",
"md5": "7b7e07259e9138f79664e1132a828e36",
"sha256": "4eea6840d57a08565e6d3f781908a9e180c47ad7da52504711910a8b6b5fa1ac"
},
"downloads": -1,
"filename": "files_dataset-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7b7e07259e9138f79664e1132a828e36",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 4689,
"upload_time": "2024-08-09T20:48:01",
"upload_time_iso_8601": "2024-08-09T20:48:01.937285Z",
"url": "https://files.pythonhosted.org/packages/8f/75/c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7/files_dataset-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c237faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665",
"md5": "379bcbe5fd498243c03d940e067d5bf8",
"sha256": "de8c23030a6c3dd5e5b08c2bf612309337b1fd89613d60c0aa6ece65eb658cb9"
},
"downloads": -1,
"filename": "files_dataset-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "379bcbe5fd498243c03d940e067d5bf8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 3863,
"upload_time": "2024-08-09T20:48:03",
"upload_time_iso_8601": "2024-08-09T20:48:03.240758Z",
"url": "https://files.pythonhosted.org/packages/c2/37/faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665/files_dataset-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-09 20:48:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "marciclabas",
"github_project": "mltools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "files-dataset"
}