tf-datachain


Nametf-datachain JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryA local dataset loader based on tf.data input pipeline
upload_time2023-08-07 13:00:19
maintainer
docs_urlNone
authorYiming Liu
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # tf-datachain

`tf-datachain` is a local dataset loader based on `tf.data` input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.

## Object Detection

```python
from tf_datachain import ObjectDetection as od
```

Before using `ObjectDetection` functions, you have to define some basic information, like folder path and class name list.

```python
od.imageFolder = "data/images"

# hard-code class names
od.classNames = ["class1", "class2", "class3"]
# or read them from csv file
import pandas as pd
od.classNames = pd.read_csv("class.csv", header=None).iloc[:,0].values.tolist()
```

Then, a ready-to-use `tf.data` input pipeline can be built within 3 steps:

- Preparation: prepare the list to process without reading content.
- Data Loading: load data from prepared list via `tf.data`.
- Augmentation: shuffle, batch, and resize.

The best practices to load dataset with different format are shown below.

### Pascal VOC XML Format

```python
from tf_datachain.utils import split

BATCH_SIZE = 4
# read .xml file within data/annotaions folder
# then split them with the ratio of 6:2:2
trainDataset, validationDataset, testDataset = split(od.prepareAnnotation("data/annotations", ".xml"), 6, 2, 2)

trainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)
trainDataset = trainDataset.map(lambda data: od.loadData(data, "Pascal VOC XML", "xyxy"), num_parallel_calls=tf.data.AUTOTUNE)
# shuffle, ragged batch, and jittered resize
trainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, "Jittered Resize", (960, 960), "xyxy")
trainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)
```

### Visualize Dataset

```python
# visualize single data
for data in dataset.take(1):
  visualizeData(data, "xyxy")

# visualize dataset shown in 2x2 grid
visualizeDataset(dataset, "xyxy", rows=2, cols=2)
```


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tf-datachain",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Yiming Liu",
    "author_email": "YimingDesigner@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ba/3b/d774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c/tf_datachain-0.1.0.tar.gz",
    "platform": null,
    "description": "# tf-datachain\n\n`tf-datachain` is a local dataset loader based on `tf.data` input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.\n\n## Object Detection\n\n```python\nfrom tf_datachain import ObjectDetection as od\n```\n\nBefore using `ObjectDetection` functions, you have to define some basic information, like folder path and class name list.\n\n```python\nod.imageFolder = \"data/images\"\n\n# hard-code class names\nod.classNames = [\"class1\", \"class2\", \"class3\"]\n# or read them from csv file\nimport pandas as pd\nod.classNames = pd.read_csv(\"class.csv\", header=None).iloc[:,0].values.tolist()\n```\n\nThen, a ready-to-use `tf.data` input pipeline can be built within 3 steps:\n\n- Preparation: prepare the list to process without reading content.\n- Data Loading: load data from prepared list via `tf.data`.\n- Augmentation: shuffle, batch, and resize.\n\nThe best practices to load dataset with different format are shown below.\n\n### Pascal VOC XML Format\n\n```python\nfrom tf_datachain.utils import split\n\nBATCH_SIZE = 4\n# read .xml file within data/annotaions folder\n# then split them with the ratio of 6:2:2\ntrainDataset, validationDataset, testDataset = split(od.prepareAnnotation(\"data/annotations\", \".xml\"), 6, 2, 2)\n\ntrainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)\ntrainDataset = trainDataset.map(lambda data: od.loadData(data, \"Pascal VOC XML\", \"xyxy\"), num_parallel_calls=tf.data.AUTOTUNE)\n# shuffle, ragged batch, and jittered resize\ntrainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, \"Jittered Resize\", (960, 960), \"xyxy\")\ntrainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)\n```\n\n### Visualize Dataset\n\n```python\n# visualize single data\nfor data in dataset.take(1):\n  visualizeData(data, \"xyxy\")\n\n# visualize dataset shown in 2x2 grid\nvisualizeDataset(dataset, \"xyxy\", rows=2, cols=2)\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A local dataset loader based on tf.data input pipeline",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e124c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c",
                "md5": "efcf09127d95c55916d4c7cb7419b4af",
                "sha256": "32f0accfd2328391056614359367192af876f37e00e90c4778d3f1283eed38bd"
            },
            "downloads": -1,
            "filename": "tf_datachain-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "efcf09127d95c55916d4c7cb7419b4af",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5199,
            "upload_time": "2023-08-07T13:00:17",
            "upload_time_iso_8601": "2023-08-07T13:00:17.764877Z",
            "url": "https://files.pythonhosted.org/packages/e1/24/c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c/tf_datachain-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ba3bd774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c",
                "md5": "5a704c886a26990d71512ee3a911a702",
                "sha256": "18de6aaedfecc578677eb1ccdeb1984149382a3790841a2b7c7a31a5ff8f1ad6"
            },
            "downloads": -1,
            "filename": "tf_datachain-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5a704c886a26990d71512ee3a911a702",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4507,
            "upload_time": "2023-08-07T13:00:19",
            "upload_time_iso_8601": "2023-08-07T13:00:19.492220Z",
            "url": "https://files.pythonhosted.org/packages/ba/3b/d774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c/tf_datachain-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-07 13:00:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tf-datachain"
}
        
Elapsed time: 1.82685s