dsdlsdk


Namedsdlsdk JSON
Version 0.1.0.post203 PyPI version JSON
download
home_pagehttps://github.com/opendatalab/dsdl-sdk/tree/dev
Summarymade for opendatalab dsdl-sdk dev branch(ignore other branches).
upload_time2022-12-19 05:57:56
maintainer
docs_urlNone
authordsdl-team
requires_python>=3.8, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*
licenseApache-2.0
keywords odl-cli opendatalab pjlab
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://raw.githubusercontent.com/opendatalab/dsdl-sdk/2ae5264a7ce1ae6116720478f8fa9e59556bed41/resources/opendatalab.svg" width="600"/>
  <div>&nbsp;</div>
  <div align="center">
    <a href="https://opendatalab.com/"> OpenDataLab Website</a>
    &nbsp;&nbsp;&nbsp;&nbsp;
  </div>
  <div>&nbsp;</div>
</div>

English | [简体中文](README_zh-CN.md)

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/odl-cli) ](https://pypi.org/project/odl-cli/)[![PyPI](https://img.shields.io/pypi/v/odl-cli)](https://pypi.org/project/odl-cli/) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://github.com/opendatalab/dsdl-sdk/tree/dev-cli/docs)[![dev workflow](https://github.com/opendatalab/dsdl-sdk/actions/workflows/dev.yml/badge.svg?branch=dev)](https://github.com/opendatalab/dsdl-sdk/actions/workflows/dev.yml)[![stage & preview workflow](https://github.com/opendatalab/dsdl-sdk/actions/workflows/preview.yml/badge.svg?branch=dev)](https://github.com/opendatalab/dsdl-sdk/actions/workflows/preview.yml)

[📘Documentation](https://github.com/opendatalab/dsdl-sdk/tree/dev-cli/docs) |

## Introduction

**Data** is the cornerstone of artificial intelligence. The efficiency of data acquisition, exchange, and application directly impacts the advances in technologies and applications. Over the long history of AI, a vast quantity of data sets have been developed and distributed. However, these datasets are defined in very different forms, which incurs significant overhead when it comes to exchange, integration, and utilization -- it is often the case that one needs to develop a new customized tool or script in order to incorporate a new dataset into a workflow.

To overcome such difficulties, we develop **Data Set Description Language (DSDL)**.

### Major features

The design of **DSDL** is driven by three goals, namely *generic*, *portable*, *extensible*. We refer to these three goals together as **GPE**.

* **Generic**

  This language aims to provide a unified representation standard for data in multiple fields of artificial intelligence, rather than being designed for a single field or task. It should be able to express data sets with different modalities and structures in a consistent format.

* **Portable**

  Write once, distribute everywhere. Dataset descriptions can be widely distributed and exchanged, and used in different environments without modification of the source files. The achievement of this goal is crucial for creating an open and thriving ecosystem. To this end, we need to carefully examine the details of the design, and remove unnecessary dependencies on specific assumptions about the underlying facilities or organizations.

* **Extensible**

  One should be able to extend the boundary of expression without modifying the core standard. For a programming language such as C++ or Python, its application boundaries can be significantly extended by libraries or packages, while the core language remains stable over a long period. Such libraries and packages form a rich ecosystem, making the language stay alive for a very long time.

## Installation

**Case a** install it with pip

```bash
pip install dsdl
```

**Case b** install it from source

```shell
git clone https://github.com/opendatalab/dsdl.git
cd dsdl
python setup.py install
```

## Get Started

#### Use dsdl parser to deserialize the Yaml file to Python code
```bash
dsdl parse --yaml demo/coco_demo.yaml
```

#### Modify the configuration & set the directory of media in dataset

Create a configuration file `config.py` with the following contents(for now dsdl only reading from aliyun oss or local is supported):

```python
local = dict(
    type="LocalFileReader",
    working_dir="local path of your media",
)

ali_oss = dict(
    type="AliOSSFileReader",
    access_key_secret="your secret key of aliyun oss",
    endpoint="your endpoint of aliyun oss",
    access_key_id="your access key of aliyun oss",
    bucket_name="your bucket name of aliyun oss",
    working_dir="the relative path of your media dir in the bucket")
```

 In `config.py`, the configuration of how to read the media in a dataset is defined. One should specify the arguments depending on from where to read the media:  

1. read from local: `working_dir` field in `local` should be specified (the directory of local media)    
2. read from aliyun oss: all the field in `ali_oss `should be specified (including `access_key_secret`, `endpoint`, `access_key_id`, `bucket_name`, `working_dir`)  

#### Visualize samples

   ```bash
   dsdl view -y <yaml-name>.yaml -c <config.py> -l ali-oss -n 10 -r -v -f Label BBox Attributes
   ```

The description of each argument is shown below:  

| simplified  argument | argument      | description                                                                                                                                                                                                                                        |
| -------------------- | ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -y                   | `--yaml`      | The path of dsdl yaml file.                                                                                                                                                                                                                        |
| -c                   | `--config`    | The path of  location configuration file.                                                                                                                                                                                                          |
| -l                   | `--location`  | `local` or `ali-oss`,which means read media from local or aliyun oss.                                                                                                                                                                             |
| -n                   | `--num`       | The number of samples to be visualized.                                                                                                                                                                                                            |
| -r                   | `--random`    | Whether to load the samples in a random order.                                                                                                                                                                                                     |
| -v                   | `--visualize` | Whether to visualize the samples or just print the information in console.                                                                                                                                                                         |
| -f                   | `--field`     | The field type to visualize, e.g. `-f BBox`means show the bounding box in samples, `-f Attributes`means show the attributes of a sample in the console . One can specify multiple field types simultaneously, such as `-f Label BBox  Attributes`. |
| -t                   | `--task`      | The task you are working on, for example, `-t detection` is equivalent to `-f Label BBox Polygon Attributes`.                                                                                                                                      |

## Citation

If you find this project useful in your research, please consider cite:

```bibtex
@misc{dsdl2022,
    title={{DSDL}: Data Set Description Language},
    author={DSDL Contributors},
    howpublished = {\url{https://github.com/opendatalab/dsdl}},
    year={2022}
}
```

## License

`DSDL` is released under the [Apache 2.0 license](LICENSE).

## Acknowledgement

* Field & Model Design inspired by [Django ORM](https://www.djangoproject.com/) and [jsonmodels](https://github.com/jazzband/jsonmodels)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/opendatalab/dsdl-sdk/tree/dev",
    "name": "dsdlsdk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*",
    "maintainer_email": "",
    "keywords": "odl-cli,opendatalab,pjlab",
    "author": "dsdl-team",
    "author_email": "dsdl-team@pjlab.org.cn",
    "download_url": "https://files.pythonhosted.org/packages/d7/e0/3e0dd3a3c9ad3c97a6b1e2a8ab3c6ab48edf71a394ea472ac66bb0bbbe5a/dsdlsdk-0.1.0.post203.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/opendatalab/dsdl-sdk/2ae5264a7ce1ae6116720478f8fa9e59556bed41/resources/opendatalab.svg\" width=\"600\"/>\n  <div>&nbsp;</div>\n  <div align=\"center\">\n    <a href=\"https://opendatalab.com/\"> OpenDataLab Website</a>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n  </div>\n  <div>&nbsp;</div>\n</div>\n\nEnglish | [\u7b80\u4f53\u4e2d\u6587](README_zh-CN.md)\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/odl-cli) ](https://pypi.org/project/odl-cli/)[![PyPI](https://img.shields.io/pypi/v/odl-cli)](https://pypi.org/project/odl-cli/) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://github.com/opendatalab/dsdl-sdk/tree/dev-cli/docs)[![dev workflow](https://github.com/opendatalab/dsdl-sdk/actions/workflows/dev.yml/badge.svg?branch=dev)](https://github.com/opendatalab/dsdl-sdk/actions/workflows/dev.yml)[![stage & preview workflow](https://github.com/opendatalab/dsdl-sdk/actions/workflows/preview.yml/badge.svg?branch=dev)](https://github.com/opendatalab/dsdl-sdk/actions/workflows/preview.yml)\n\n[\ud83d\udcd8Documentation](https://github.com/opendatalab/dsdl-sdk/tree/dev-cli/docs) |\n\n## Introduction\n\n**Data** is the cornerstone of artificial intelligence. The efficiency of data acquisition, exchange, and application directly impacts the advances in technologies and applications. Over the long history of AI, a vast quantity of data sets have been developed and distributed. However, these datasets are defined in very different forms, which incurs significant overhead when it comes to exchange, integration, and utilization -- it is often the case that one needs to develop a new customized tool or script in order to incorporate a new dataset into a workflow.\n\nTo overcome such difficulties, we develop **Data Set Description Language (DSDL)**.\n\n### Major features\n\nThe design of **DSDL** is driven by three goals, namely *generic*, *portable*, *extensible*. We refer to these three goals together as **GPE**.\n\n* **Generic**\n\n  This language aims to provide a unified representation standard for data in multiple fields of artificial intelligence, rather than being designed for a single field or task. It should be able to express data sets with different modalities and structures in a consistent format.\n\n* **Portable**\n\n  Write once, distribute everywhere. Dataset descriptions can be widely distributed and exchanged, and used in different environments without modification of the source files. The achievement of this goal is crucial for creating an open and thriving ecosystem. To this end, we need to carefully examine the details of the design, and remove unnecessary dependencies on specific assumptions about the underlying facilities or organizations.\n\n* **Extensible**\n\n  One should be able to extend the boundary of expression without modifying the core standard. For a programming language such as C++ or Python, its application boundaries can be significantly extended by libraries or packages, while the core language remains stable over a long period. Such libraries and packages form a rich ecosystem, making the language stay alive for a very long time.\n\n## Installation\n\n**Case a** install it with pip\n\n```bash\npip install dsdl\n```\n\n**Case b** install it from source\n\n```shell\ngit clone https://github.com/opendatalab/dsdl.git\ncd dsdl\npython setup.py install\n```\n\n## Get Started\n\n#### Use dsdl parser to deserialize the Yaml file to Python code\n```bash\ndsdl parse --yaml demo/coco_demo.yaml\n```\n\n#### Modify the configuration & set the directory of media in dataset\n\nCreate a configuration file `config.py` with the following contents\uff08for now dsdl only reading from aliyun oss or local is supported\uff09\uff1a\n\n```python\nlocal = dict(\n    type=\"LocalFileReader\",\n    working_dir=\"local path of your media\",\n)\n\nali_oss = dict(\n    type=\"AliOSSFileReader\",\n    access_key_secret=\"your secret key of aliyun oss\",\n    endpoint=\"your endpoint of aliyun oss\",\n    access_key_id=\"your access key of aliyun oss\",\n    bucket_name=\"your bucket name of aliyun oss\",\n    working_dir=\"the relative path of your media dir in the bucket\")\n```\n\n In `config.py`, the configuration of how to read the media in a dataset is defined. One should specify the arguments depending on from where to read the media\uff1a  \n\n1. read from local\uff1a `working_dir` field in `local` should be specified (the directory of local media)    \n2. read from aliyun oss\uff1a all the field in `ali_oss `should be specified (including `access_key_secret`, `endpoint`, `access_key_id`, `bucket_name`, `working_dir`)  \n\n#### Visualize samples\n\n   ```bash\n   dsdl view -y <yaml-name>.yaml -c <config.py> -l ali-oss -n 10 -r -v -f Label BBox Attributes\n   ```\n\nThe description of each argument is shown below:  \n\n| simplified  argument | argument      | description                                                                                                                                                                                                                                        |\n| -------------------- | ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| -y                   | `--yaml`      | The path of dsdl yaml file.                                                                                                                                                                                                                        |\n| -c                   | `--config`    | The path of  location configuration file.                                                                                                                                                                                                          |\n| -l                   | `--location`  | `local` or `ali-oss`\uff0cwhich means read media from local or aliyun oss.                                                                                                                                                                             |\n| -n                   | `--num`       | The number of samples to be visualized.                                                                                                                                                                                                            |\n| -r                   | `--random`    | Whether to load the samples in a random order.                                                                                                                                                                                                     |\n| -v                   | `--visualize` | Whether to visualize the samples or just print the information in console.                                                                                                                                                                         |\n| -f                   | `--field`     | The field type to visualize, e.g. `-f BBox`means show the bounding box in samples, `-f Attributes`means show the attributes of a sample in the console . One can specify multiple field types simultaneously, such as `-f Label BBox  Attributes`. |\n| -t                   | `--task`      | The task you are working on, for example, `-t detection` is equivalent to `-f Label BBox Polygon Attributes`.                                                                                                                                      |\n\n## Citation\n\nIf you find this project useful in your research, please consider cite:\n\n```bibtex\n@misc{dsdl2022,\n    title={{DSDL}: Data Set Description Language},\n    author={DSDL Contributors},\n    howpublished = {\\url{https://github.com/opendatalab/dsdl}},\n    year={2022}\n}\n```\n\n## License\n\n`DSDL` is released under the [Apache 2.0 license](LICENSE).\n\n## Acknowledgement\n\n* Field & Model Design inspired by [Django ORM](https://www.djangoproject.com/) and [jsonmodels](https://github.com/jazzband/jsonmodels)\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "made for opendatalab dsdl-sdk dev branch(ignore other branches).",
    "version": "0.1.0.post203",
    "split_keywords": [
        "odl-cli",
        "opendatalab",
        "pjlab"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "866fa82de131e3e30163f93526843d94",
                "sha256": "fd5ef13043d34e2a703f12ddc03be2010a80d05fc70caff88dddf545f3c6b510"
            },
            "downloads": -1,
            "filename": "dsdlsdk-0.1.0.post203-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "866fa82de131e3e30163f93526843d94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*",
            "size": 202529,
            "upload_time": "2022-12-19T05:57:54",
            "upload_time_iso_8601": "2022-12-19T05:57:54.981302Z",
            "url": "https://files.pythonhosted.org/packages/7b/f2/5d64fa2eaa32c680a87a04b313896fb2c10b1bc59d7383d8b7007d338d1b/dsdlsdk-0.1.0.post203-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "28896e3dabf3d6d0d5b4a6c862b1a17c",
                "sha256": "cfc4c8ebca080a7452e810aec69c56f2e19956a57d3a61e914fa18f8de2165d1"
            },
            "downloads": -1,
            "filename": "dsdlsdk-0.1.0.post203.tar.gz",
            "has_sig": false,
            "md5_digest": "28896e3dabf3d6d0d5b4a6c862b1a17c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*",
            "size": 149795,
            "upload_time": "2022-12-19T05:57:56",
            "upload_time_iso_8601": "2022-12-19T05:57:56.700827Z",
            "url": "https://files.pythonhosted.org/packages/d7/e0/3e0dd3a3c9ad3c97a6b1e2a8ab3c6ab48edf71a394ea472ac66bb0bbbe5a/dsdlsdk-0.1.0.post203.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-19 05:57:56",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "dsdlsdk"
}
        
Elapsed time: 0.01966s