![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)
[
![Docs](https://img.shields.io/badge/read-docs-success)
](https://materials-data-science-and-informatics.github.io/dirschema)
[
![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main&label=ci)
](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)
[
![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)
](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)
[
![Docs Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/interrogate_badge.svg)
](https://materials-data-science-and-informatics.github.io/dirschema)
[
![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)
](https://pypi.org/project/dirschema/)
<!-- --8<-- [start:abstract] -->
# dirschema
<br />
<div>
<img style="center-align: middle;" alt="DirSchema Logo" src="https://user-images.githubusercontent.com/89833997/152970983-267fa89e-9928-4393-a1fa-2a8fe3c6b9ba.png" width=70% height=70% />
</div>
<br />
A directory structure and metadata linter based on JSON Schema.
[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON
objects that e.g. contain metadata, but these are only the smallest pieces in the
organization of a whole directory structure, e.g. of some dataset of project.
When working on datasets of a certain kind, they might contain various types of data,
each different file requiring different accompanying metadata, based on its file type
and/or location.
**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural
dependencies and metadata requirements in directories and directory-like archives.
With it you can for example check that:
* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)
* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)
* each metadata file is valid according to some JSON Schema
If validating these kinds of constraints looks appealing to you, this tool is for you!
**Dirschema features:**
* Built-in support for schemas and metadata stored as JSON or YAML
* Built-in support for checking contents of ZIP and HDF5 archives
* Extensible validation interface for advanced needs beyond JSON Schema
* Both a Python library and a CLI tool to perform the validation
<!-- --8<-- [end:abstract] -->
<!-- --8<-- [start:quickstart] -->
## Installation
```
pip install dirschema
```
## Getting Started
The `dirschema` tool needs as input:
* a DirSchema YAML file (containing a specification), and
* a path to a directory or file (e.g. zip file) that should be checked.
You can run it like this:
```
dirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH
```
If the validation was successful, there will be no output.
Otherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).
You can also use `dirschema` from other Python code as a library:
```python
from dirschema.validate import DSValidator
DSValidator("/path/to/dirschema").validate("/dataset/path")
```
Similarly, the method will return an error dict, which will be empty if the validation succeeded.
<!-- --8<-- [end:quickstart] -->
**You can find more information on using and contributing to this repository in the
[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**
<!-- --8<-- [start:citation] -->
## How to Cite
If you want to cite this project in your scientific work,
please use the [citation file](https://citation-file-format.github.io/)
in the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).
<!-- --8<-- [end:citation] -->
<!-- --8<-- [start:acknowledgements] -->
## Acknowledgements
We kindly thank all
[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).
<div>
<img style="vertical-align: middle;" alt="HMC Logo" src="https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png" width=50% height=50% />
<img style="vertical-align: middle;" alt="FZJ Logo" src="https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png" width=30% height=30% />
</div>
<br />
This project was developed at the Institute for Materials Data Science and Informatics
(IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration
(HMC), an incubator-platform of the Helmholtz Association within the framework of the
Information and Data Science strategic initiative.
<!-- --8<-- [end:acknowledgements] -->
Raw data
{
"_id": null,
"home_page": "https://materials-data-science-and-informatics.github.io/dirschema",
"name": "dirschema",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<3.11",
"maintainer_email": "",
"keywords": "jsonschema,validation,directory,structure,fair,metadata",
"author": "Anton Pirogov",
"author_email": "a.pirogov@fz-juelich.de",
"download_url": "https://files.pythonhosted.org/packages/d2/5b/7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17/dirschema-0.1.0.tar.gz",
"platform": null,
"description": "![Project status](https://img.shields.io/badge/project%20status-alpha-%23ff8000)\n[\n![Docs](https://img.shields.io/badge/read-docs-success)\n](https://materials-data-science-and-informatics.github.io/dirschema)\n[\n![CI](https://img.shields.io/github/actions/workflow/status/Materials-Data-Science-and-Informatics/dirschema/ci.yml?branch=main&label=ci)\n](https://github.com/Materials-Data-Science-and-Informatics/dirschema/actions/workflows/ci.yml)\n[\n![Test Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage_badge.svg)\n](https://materials-data-science-and-informatics.github.io/dirschema/main/coverage)\n[\n![Docs Coverage](https://materials-data-science-and-informatics.github.io/dirschema/main/interrogate_badge.svg)\n](https://materials-data-science-and-informatics.github.io/dirschema)\n[\n![PyPIPkgVersion](https://img.shields.io/pypi/v/dirschema)\n](https://pypi.org/project/dirschema/)\n\n<!-- --8<-- [start:abstract] -->\n# dirschema\n\n<br />\n<div>\n<img style=\"center-align: middle;\" alt=\"DirSchema Logo\" src=\"https://user-images.githubusercontent.com/89833997/152970983-267fa89e-9928-4393-a1fa-2a8fe3c6b9ba.png\" width=70% height=70% />\n \n</div>\n<br />\n\nA directory structure and metadata linter based on JSON Schema.\n\n[JSON Schema](https://json-schema.org/) is great for validating (files containing) JSON\nobjects that e.g. contain metadata, but these are only the smallest pieces in the\norganization of a whole directory structure, e.g. of some dataset of project.\nWhen working on datasets of a certain kind, they might contain various types of data,\neach different file requiring different accompanying metadata, based on its file type\nand/or location.\n\n**DirSchema** combines JSON Schemas and regexes into a solution to enforce structural\ndependencies and metadata requirements in directories and directory-like archives.\nWith it you can for example check that:\n\n* only files of a certain type are in a location (e.g. only `jpg` files in directory `img`)\n* for each data file there exists a metadata file (e.g. `test.jpg` has `test.jpg_meta.json`)\n* each metadata file is valid according to some JSON Schema\n\nIf validating these kinds of constraints looks appealing to you, this tool is for you!\n\n**Dirschema features:**\n\n* Built-in support for schemas and metadata stored as JSON or YAML\n* Built-in support for checking contents of ZIP and HDF5 archives\n* Extensible validation interface for advanced needs beyond JSON Schema\n* Both a Python library and a CLI tool to perform the validation\n\n<!-- --8<-- [end:abstract] -->\n<!-- --8<-- [start:quickstart] -->\n\n## Installation\n\n```\npip install dirschema\n```\n\n## Getting Started\n\nThe `dirschema` tool needs as input:\n\n* a DirSchema YAML file (containing a specification), and\n* a path to a directory or file (e.g. zip file) that should be checked.\n\nYou can run it like this:\n\n```\ndirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH\n```\n\nIf the validation was successful, there will be no output.\nOtherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).\n\nYou can also use `dirschema` from other Python code as a library:\n\n```python\nfrom dirschema.validate import DSValidator\nDSValidator(\"/path/to/dirschema\").validate(\"/dataset/path\")\n```\n\nSimilarly, the method will return an error dict, which will be empty if the validation succeeded.\n\n<!-- --8<-- [end:quickstart] -->\n\n**You can find more information on using and contributing to this repository in the\n[documentation](https://materials-data-science-and-informatics.github.io/dirschema/main).**\n\n<!-- --8<-- [start:citation] -->\n\n## How to Cite\n\nIf you want to cite this project in your scientific work,\nplease use the [citation file](https://citation-file-format.github.io/)\nin the [repository](https://github.com/Materials-Data-Science-and-Informatics/dirschema/blob/main/CITATION.cff).\n\n<!-- --8<-- [end:citation] -->\n<!-- --8<-- [start:acknowledgements] -->\n\n## Acknowledgements\n\nWe kindly thank all\n[authors and contributors](https://materials-data-science-and-informatics.github.io/dirschema/latest/credits).\n\n<div>\n<img style=\"vertical-align: middle;\" alt=\"HMC Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png\" width=50% height=50% />\n \n<img style=\"vertical-align: middle;\" alt=\"FZJ Logo\" src=\"https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png\" width=30% height=30% />\n</div>\n<br />\n\nThis project was developed at the Institute for Materials Data Science and Informatics\n(IAS-9) of the J\u00fclich Research Center and funded by the Helmholtz Metadata Collaboration\n(HMC), an incubator-platform of the Helmholtz Association within the framework of the\nInformation and Data Science strategic initiative.\n\n<!-- --8<-- [end:acknowledgements] -->\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Spec and validator for directories, files and metadata based on JSON Schema and regexes.",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://materials-data-science-and-informatics.github.io/dirschema",
"Homepage": "https://materials-data-science-and-informatics.github.io/dirschema",
"Repository": "https://github.com/Materials-Data-Science-and-Informatics/dirschema"
},
"split_keywords": [
"jsonschema",
"validation",
"directory",
"structure",
"fair",
"metadata"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ddad378110c2bb5a0f4dc43b67b85d8b1d9aa54398891babd674f5be1953a846",
"md5": "2409bfabc5e4e91c47b76b593d7785d0",
"sha256": "efca1a7a2431305b83f6373b1cd4b6a8cf789b047a434cf1174d82353d25bfb0"
},
"downloads": -1,
"filename": "dirschema-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2409bfabc5e4e91c47b76b593d7785d0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<3.11",
"size": 40944,
"upload_time": "2023-05-08T15:25:08",
"upload_time_iso_8601": "2023-05-08T15:25:08.604021Z",
"url": "https://files.pythonhosted.org/packages/dd/ad/378110c2bb5a0f4dc43b67b85d8b1d9aa54398891babd674f5be1953a846/dirschema-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d25b7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17",
"md5": "f72a2866e18f652d5b219b92eaf5c3dd",
"sha256": "f9334259953afd847799a4fd405a4dbefa027b48ce74096329bd7ac03a62250b"
},
"downloads": -1,
"filename": "dirschema-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "f72a2866e18f652d5b219b92eaf5c3dd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<3.11",
"size": 52431,
"upload_time": "2023-05-08T15:25:10",
"upload_time_iso_8601": "2023-05-08T15:25:10.754253Z",
"url": "https://files.pythonhosted.org/packages/d2/5b/7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17/dirschema-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-08 15:25:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Materials-Data-Science-and-Informatics",
"github_project": "dirschema",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "dirschema"
}