dirschema


Namedirschema JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://materials-data-science-and-informatics.github.io/dirschema
SummarySpec and validator for directories, files and metadata based on JSON Schema and regexes.
upload_time2023-05-08 15:25:10
maintainer
docs_urlNone
authorAnton Pirogov
requires_python>=3.8,<3.11
licenseMIT
keywords jsonschema validation directory structure fair metadata
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)
[
![Docs](https://img.shields.io/badge/read-docs-success)
](https://materials-data-science-and-informatics.github.io/dirschema)
[
![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main&label=ci)
](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)
[
![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)
](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)
[
![Docs Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/interrogate_badge.svg)
](https://materials-data-science-and-informatics.github.io/dirschema)
[
![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)
](https://pypi.org/project/dirschema/)

<!-- --8<-- [start:abstract] -->
# dirschema

<br />
<div>
<img style="center-align: middle;" alt="DirSchema Logo" src="https://user-images.githubusercontent.com/89833997/152970983-267fa89e-9928-4393-a1fa-2a8fe3c6b9ba.png" width=70% height=70% />
&nbsp;&nbsp;
</div>
<br />

A directory structure and metadata linter based on JSON Schema.

[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON
objects that e.g. contain metadata, but these are only the smallest pieces in the
organization of a whole directory structure, e.g. of some dataset of project.
When working on datasets of a certain kind, they might contain various types of data,
each different file requiring different accompanying metadata, based on its file type
and/or location.

**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural
dependencies and metadata requirements in directories and directory-like archives.
With it you can for example check that:

* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)
* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)
* each metadata file is valid according to some JSON Schema

If validating these kinds of constraints looks appealing to you, this tool is for you!

**Dirschema features:**

* Built-in support for schemas and metadata stored as JSON or YAML
* Built-in support for checking contents of ZIP and HDF5 archives
* Extensible validation interface for advanced needs beyond JSON Schema
* Both a Python library and a CLI tool to perform the validation

<!-- --8<-- [end:abstract] -->
<!-- --8<-- [start:quickstart] -->

## Installation

```
pip install dirschema
```

## Getting Started

The `dirschema` tool needs as input:

* a DirSchema YAML file (containing a specification), and
* a path to a directory or file (e.g. zip file) that should be checked.

You can run it like this:

```
dirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH
```

If the validation was successful, there will be no output.
Otherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).

You can also use `dirschema` from other Python code as a library:

```python
from dirschema.validate import DSValidator
DSValidator("/path/to/dirschema").validate("/dataset/path")
```

Similarly, the method will return an error dict, which will be empty if the validation succeeded.

<!-- --8<-- [end:quickstart] -->

**You can find more information on using and contributing to this repository in the
[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**

<!-- --8<-- [start:citation] -->

## How to Cite

If you want to cite this project in your scientific work,
please use the [citation file](https://citation-file-format.github.io/)
in the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).

<!-- --8<-- [end:citation] -->
<!-- --8<-- [start:acknowledgements] -->

## Acknowledgements

We kindly thank all
[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).

<div>
<img style="vertical-align: middle;" alt="HMC Logo" src="https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png" width=50% height=50% />
&nbsp;&nbsp;
<img style="vertical-align: middle;" alt="FZJ Logo" src="https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png" width=30% height=30% />
</div>
<br />

This project was developed at the Institute for Materials Data Science and Informatics
(IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration
(HMC), an incubator-platform of the Helmholtz Association within the framework of the
Information and Data Science strategic initiative.

<!-- --8<-- [end:acknowledgements] -->


            

Raw data

            {
    "_id": null,
    "home_page": "https://materials-data-science-and-informatics.github.io/dirschema",
    "name": "dirschema",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "",
    "keywords": "jsonschema,validation,directory,structure,fair,metadata",
    "author": "Anton Pirogov",
    "author_email": "a.pirogov@fz-juelich.de",
    "download_url": "https://files.pythonhosted.org/packages/d2/5b/7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17/dirschema-0.1.0.tar.gz",
    "platform": null,
    "description": "![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)\n[\n![Docs](https://img.shields.io/badge/read-docs-success)\n](https://materials-data-science-and-informatics.github.io/dirschema)\n[\n![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main&label=ci)\n](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)\n[\n![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)\n](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)\n[\n![Docs Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/interrogate_badge.svg)\n](https://materials-data-science-and-informatics.github.io/dirschema)\n[\n![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)\n](https://pypi.org/project/dirschema/)\n\n<!-- --8<-- [start:abstract] -->\n# dirschema\n\n<br />\n<div>\n<img style=\"center-align: middle;\" alt=\"DirSchema Logo\" src=\"https://user-images.githubusercontent.com/89833997/152970983-267fa89e-9928-4393-a1fa-2a8fe3c6b9ba.png\" width=70% height=70% />\n&nbsp;&nbsp;\n</div>\n<br />\n\nA directory structure and metadata linter based on JSON Schema.\n\n[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON\nobjects that e.g. contain metadata, but these are only the smallest pieces in the\norganization of a whole directory structure, e.g. of some dataset of project.\nWhen working on datasets of a certain kind, they might contain various types of data,\neach different file requiring different accompanying metadata, based on its file type\nand/or location.\n\n**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural\ndependencies and metadata requirements in directories and directory-like archives.\nWith it you can for example check that:\n\n* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)\n* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)\n* each metadata file is valid according to some JSON Schema\n\nIf validating these kinds of constraints looks appealing to you, this tool is for you!\n\n**Dirschema features:**\n\n* Built-in support for schemas and metadata stored as JSON or YAML\n* Built-in support for checking contents of ZIP and HDF5 archives\n* Extensible validation interface for advanced needs beyond JSON Schema\n* Both a Python library and a CLI tool to perform the validation\n\n<!-- --8<-- [end:abstract] -->\n<!-- --8<-- [start:quickstart] -->\n\n## Installation\n\n```\npip install dirschema\n```\n\n## Getting Started\n\nThe `dirschema` tool needs as input:\n\n* a DirSchema YAML file (containing a specification), and\n* a path to a directory or file (e.g. zip file) that should be checked.\n\nYou can run it like this:\n\n```\ndirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH\n```\n\nIf the validation was successful, there will be no output.\nOtherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).\n\nYou can also use `dirschema` from other Python code as a library:\n\n```python\nfrom dirschema.validate import DSValidator\nDSValidator(\"/path/to/dirschema\").validate(\"/dataset/path\")\n```\n\nSimilarly, the method will return an error dict, which will be empty if the validation succeeded.\n\n<!-- --8<-- [end:quickstart] -->\n\n**You can find more information on using and contributing to this repository in the\n[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**\n\n<!-- --8<-- [start:citation] -->\n\n## How to Cite\n\nIf you want to cite this project in your scientific work,\nplease use the [citation file](https://citation-file-format.github.io/)\nin the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).\n\n<!-- --8<-- [end:citation] -->\n<!-- --8<-- [start:acknowledgements] -->\n\n## Acknowledgements\n\nWe kindly thank all\n[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).\n\n<div>\n<img style=\"vertical-align: middle;\" alt=\"HMC Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png\" width=50% height=50% />\n&nbsp;&nbsp;\n<img style=\"vertical-align: middle;\" alt=\"FZJ Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png\" width=30% height=30% />\n</div>\n<br />\n\nThis project was developed at the Institute for Materials Data Science and Informatics\n(IAS-9) of the J\u00fclich Research Center and funded by the Helmholtz Metadata Collaboration\n(HMC), an incubator-platform of the Helmholtz Association within the framework of the\nInformation and Data Science strategic initiative.\n\n<!-- --8<-- [end:acknowledgements] -->\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Spec and validator for directories, files and metadata based on JSON Schema and regexes.",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://materials-data-science-and-informatics.github.io/dirschema",
        "Homepage": "https://materials-data-science-and-informatics.github.io/dirschema",
        "Repository": "https://github.com/Materials-Data-Science-and-Informatics/dirschema"
    },
    "split_keywords": [
        "jsonschema",
        "validation",
        "directory",
        "structure",
        "fair",
        "metadata"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ddad378110c2bb5a0f4dc43b67b85d8b1d9aa54398891babd674f5be1953a846",
                "md5": "2409bfabc5e4e91c47b76b593d7785d0",
                "sha256": "efca1a7a2431305b83f6373b1cd4b6a8cf789b047a434cf1174d82353d25bfb0"
            },
            "downloads": -1,
            "filename": "dirschema-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2409bfabc5e4e91c47b76b593d7785d0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 40944,
            "upload_time": "2023-05-08T15:25:08",
            "upload_time_iso_8601": "2023-05-08T15:25:08.604021Z",
            "url": "https://files.pythonhosted.org/packages/dd/ad/378110c2bb5a0f4dc43b67b85d8b1d9aa54398891babd674f5be1953a846/dirschema-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d25b7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17",
                "md5": "f72a2866e18f652d5b219b92eaf5c3dd",
                "sha256": "f9334259953afd847799a4fd405a4dbefa027b48ce74096329bd7ac03a62250b"
            },
            "downloads": -1,
            "filename": "dirschema-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f72a2866e18f652d5b219b92eaf5c3dd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 52431,
            "upload_time": "2023-05-08T15:25:10",
            "upload_time_iso_8601": "2023-05-08T15:25:10.754253Z",
            "url": "https://files.pythonhosted.org/packages/d2/5b/7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17/dirschema-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-08 15:25:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Materials-Data-Science-and-Informatics",
    "github_project": "dirschema",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dirschema"
}
        
Elapsed time: 0.09444s