Name | tf-datachain JSON |
Version |
0.1.0
JSON |
| download |
home_page | |
Summary | A local dataset loader based on tf.data input pipeline |
upload_time | 2023-08-07 13:00:19 |
maintainer | |
docs_url | None |
author | Yiming Liu |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# tf-datachain
`tf-datachain` is a local dataset loader based on `tf.data` input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.
## Object Detection
```python
from tf_datachain import ObjectDetection as od
```
Before using `ObjectDetection` functions, you have to define some basic information, like folder path and class name list.
```python
od.imageFolder = "data/images"
# hard-code class names
od.classNames = ["class1", "class2", "class3"]
# or read them from csv file
import pandas as pd
od.classNames = pd.read_csv("class.csv", header=None).iloc[:,0].values.tolist()
```
Then, a ready-to-use `tf.data` input pipeline can be built within 3 steps:
- Preparation: prepare the list to process without reading content.
- Data Loading: load data from prepared list via `tf.data`.
- Augmentation: shuffle, batch, and resize.
The best practices to load dataset with different format are shown below.
### Pascal VOC XML Format
```python
from tf_datachain.utils import split
BATCH_SIZE = 4
# read .xml file within data/annotaions folder
# then split them with the ratio of 6:2:2
trainDataset, validationDataset, testDataset = split(od.prepareAnnotation("data/annotations", ".xml"), 6, 2, 2)
trainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)
trainDataset = trainDataset.map(lambda data: od.loadData(data, "Pascal VOC XML", "xyxy"), num_parallel_calls=tf.data.AUTOTUNE)
# shuffle, ragged batch, and jittered resize
trainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, "Jittered Resize", (960, 960), "xyxy")
trainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)
```
### Visualize Dataset
```python
# visualize single data
for data in dataset.take(1):
visualizeData(data, "xyxy")
# visualize dataset shown in 2x2 grid
visualizeDataset(dataset, "xyxy", rows=2, cols=2)
```
Raw data
{
"_id": null,
"home_page": "",
"name": "tf-datachain",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Yiming Liu",
"author_email": "YimingDesigner@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ba/3b/d774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c/tf_datachain-0.1.0.tar.gz",
"platform": null,
"description": "# tf-datachain\n\n`tf-datachain` is a local dataset loader based on `tf.data` input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.\n\n## Object Detection\n\n```python\nfrom tf_datachain import ObjectDetection as od\n```\n\nBefore using `ObjectDetection` functions, you have to define some basic information, like folder path and class name list.\n\n```python\nod.imageFolder = \"data/images\"\n\n# hard-code class names\nod.classNames = [\"class1\", \"class2\", \"class3\"]\n# or read them from csv file\nimport pandas as pd\nod.classNames = pd.read_csv(\"class.csv\", header=None).iloc[:,0].values.tolist()\n```\n\nThen, a ready-to-use `tf.data` input pipeline can be built within 3 steps:\n\n- Preparation: prepare the list to process without reading content.\n- Data Loading: load data from prepared list via `tf.data`.\n- Augmentation: shuffle, batch, and resize.\n\nThe best practices to load dataset with different format are shown below.\n\n### Pascal VOC XML Format\n\n```python\nfrom tf_datachain.utils import split\n\nBATCH_SIZE = 4\n# read .xml file within data/annotaions folder\n# then split them with the ratio of 6:2:2\ntrainDataset, validationDataset, testDataset = split(od.prepareAnnotation(\"data/annotations\", \".xml\"), 6, 2, 2)\n\ntrainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)\ntrainDataset = trainDataset.map(lambda data: od.loadData(data, \"Pascal VOC XML\", \"xyxy\"), num_parallel_calls=tf.data.AUTOTUNE)\n# shuffle, ragged batch, and jittered resize\ntrainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, \"Jittered Resize\", (960, 960), \"xyxy\")\ntrainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)\n```\n\n### Visualize Dataset\n\n```python\n# visualize single data\nfor data in dataset.take(1):\n visualizeData(data, \"xyxy\")\n\n# visualize dataset shown in 2x2 grid\nvisualizeDataset(dataset, \"xyxy\", rows=2, cols=2)\n```\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A local dataset loader based on tf.data input pipeline",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e124c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c",
"md5": "efcf09127d95c55916d4c7cb7419b4af",
"sha256": "32f0accfd2328391056614359367192af876f37e00e90c4778d3f1283eed38bd"
},
"downloads": -1,
"filename": "tf_datachain-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "efcf09127d95c55916d4c7cb7419b4af",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5199,
"upload_time": "2023-08-07T13:00:17",
"upload_time_iso_8601": "2023-08-07T13:00:17.764877Z",
"url": "https://files.pythonhosted.org/packages/e1/24/c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c/tf_datachain-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ba3bd774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c",
"md5": "5a704c886a26990d71512ee3a911a702",
"sha256": "18de6aaedfecc578677eb1ccdeb1984149382a3790841a2b7c7a31a5ff8f1ad6"
},
"downloads": -1,
"filename": "tf_datachain-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "5a704c886a26990d71512ee3a911a702",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4507,
"upload_time": "2023-08-07T13:00:19",
"upload_time_iso_8601": "2023-08-07T13:00:19.492220Z",
"url": "https://files.pythonhosted.org/packages/ba/3b/d774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c/tf_datachain-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-07 13:00:19",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tf-datachain"
}