files-dataset


Namefiles-dataset JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummarySimple, composable standard for storing datasets as one sample per file
upload_time2024-08-09 20:48:03
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Files Dataset

> Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.

```bash
pip install files-dataset
```

### Format

A dataset folder looks something like this:

```
my-dataset/
  meta.json
  car-images/
    001.jpg
    002.jpg
    003.jpg
  train-images/
    images.tar
  ...
```

`meta.json`:
```json
{
  "files_dataset": {
    "cars": {
      "archive": "car-images/*.jpg",
      "num_files": 3000 // optionally specify the number of files
    },
    "trains": {
      "archive": "train-images/images.tar",
      "format": "tar",
      "num_files": 10000
    }
  },
  // you can add other stuff if you want to
}
```

### Usage

```python
import files_dataset as fds

ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None

for x in ds.samples('inputs', 'labels'):
  x['cars'] # the first car image
  x['trains'] # the first train image (extracted from the TAR archive)
```

A common convenience to use is:

```python
import files_dataset as fds

datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
  ...
```

And that's it! Simple.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "files-dataset",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Marcel Claramunt <marcel@moveread.com>",
    "download_url": "https://files.pythonhosted.org/packages/c2/37/faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665/files_dataset-0.1.4.tar.gz",
    "platform": null,
    "description": "# Files Dataset\n\n> Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.\n\n```bash\npip install files-dataset\n```\n\n### Format\n\nA dataset folder looks something like this:\n\n```\nmy-dataset/\n  meta.json\n  car-images/\n    001.jpg\n    002.jpg\n    003.jpg\n  train-images/\n    images.tar\n  ...\n```\n\n`meta.json`:\n```json\n{\n  \"files_dataset\": {\n    \"cars\": {\n      \"archive\": \"car-images/*.jpg\",\n      \"num_files\": 3000 // optionally specify the number of files\n    },\n    \"trains\": {\n      \"archive\": \"train-images/images.tar\",\n      \"format\": \"tar\",\n      \"num_files\": 10000\n    }\n  },\n  // you can add other stuff if you want to\n}\n```\n\n### Usage\n\n```python\nimport files_dataset as fds\n\nds = fds.Dataset.read('path/to/my-dataset')\nnum_samples = ds.len('cars', 'trains') # int | None\n\nfor x in ds.samples('inputs', 'labels'):\n  x['cars'] # the first car image\n  x['trains'] # the first train image (extracted from the TAR archive)\n```\n\nA common convenience to use is:\n\n```python\nimport files_dataset as fds\n\ndatasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]\nfor x in fds.chain(datasets, 'trains', 'cars'):\n  ...\n```\n\nAnd that's it! Simple.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Simple, composable standard for storing datasets as one sample per file",
    "version": "0.1.4",
    "project_urls": {
        "repo": "https://github.com/marciclabas/mltools.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f75c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7",
                "md5": "7b7e07259e9138f79664e1132a828e36",
                "sha256": "4eea6840d57a08565e6d3f781908a9e180c47ad7da52504711910a8b6b5fa1ac"
            },
            "downloads": -1,
            "filename": "files_dataset-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7b7e07259e9138f79664e1132a828e36",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 4689,
            "upload_time": "2024-08-09T20:48:01",
            "upload_time_iso_8601": "2024-08-09T20:48:01.937285Z",
            "url": "https://files.pythonhosted.org/packages/8f/75/c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7/files_dataset-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c237faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665",
                "md5": "379bcbe5fd498243c03d940e067d5bf8",
                "sha256": "de8c23030a6c3dd5e5b08c2bf612309337b1fd89613d60c0aa6ece65eb658cb9"
            },
            "downloads": -1,
            "filename": "files_dataset-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "379bcbe5fd498243c03d940e067d5bf8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 3863,
            "upload_time": "2024-08-09T20:48:03",
            "upload_time_iso_8601": "2024-08-09T20:48:03.240758Z",
            "url": "https://files.pythonhosted.org/packages/c2/37/faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665/files_dataset-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 20:48:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "marciclabas",
    "github_project": "mltools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "files-dataset"
}
        
Elapsed time: 0.29592s