aind-data-schema


Nameaind-data-schema JSON
Version 0.33.7 PyPI version JSON
download
home_pageNone
SummaryA library that defines AIND data schema and validates JSON files.
upload_time2024-04-10 22:10:32
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # aind-data-schema

[![License](https://img.shields.io/badge/license-MIT-brightgreen)](LICENSE)
![Code Style](https://img.shields.io/badge/code%20style-black-black)
[![Documentation Status](https://readthedocs.org/projects/aind-data-schema/badge/?version=latest)](https://aind-data-schema.readthedocs.io/en/latest/?badge=latest)

A library that defines [AIND](https://alleninstitute.org/what-we-do/brain-science/research/allen-institute-neural-dynamics/) data schema and validates JSON files. 

User documentation available on [readthedocs](https://aind-data-schema.readthedocs.io/en/latest/).

## Overview

This repository contains the schemas needed to ingest and validate metadata that are essential to ensuring [AIND](https://alleninstitute.org/what-we-do/brain-science/research/allen-institute-neural-dynamics/) data collection is completely reproducible. Our general approach is to semantically version core schema classes and include those version numbers in serialized metadata so that we can flexibly evolve the schemas over time without requiring difficult data migrations. In the future, we will provide a browsable list of these classes rendered to [JSON Schema](https://json-schema.org/), including all historic versions.

Be aware that this package is still under heavy preliminary development. Expect breaking changes regularly, although we will communicate these through semantic versioning.

A simple example:

```python
import datetime

from aind_data_schema.core.subject import BreedingInfo, Housing, Subject
from aind_data_schema.models.organizations import Organization
from aind_data_schema.models.species import Species

t = datetime.datetime(2022, 11, 22, 8, 43, 00)

s = Subject(
   species=Species.MUS_MUSCULUS,
   subject_id="12345",
   sex="Male",
   date_of_birth=t.date(),
   genotype="Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)",
   housing=Housing(home_cage_enrichment=["Running wheel"], cage_id="123"),
   background_strain="C57BL/6J",
   source=Organization.AI,
   breeding_info=BreedingInfo(
         breeding_group="Emx1-IRES-Cre(ND)",
         maternal_id="546543",
         maternal_genotype="Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA",
         paternal_id="232323",
         paternal_genotype="Ai93(TITL-GCaMP6f)/wt",
   ),
)

s.write_standard_file() # writes subject.json
```

```json
{
   "describedBy": "https://raw.githubusercontent.com/AllenNeuralDynamics/aind-data-schema/main/src/aind_data_schema/core/subject.py",
   "schema_version": "0.5.5",
   "subject_id": "12345",
   "sex": "Male",
   "date_of_birth": "2022-11-22",
   "genotype": "Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)",
   "species": {
      "name": "Mus musculus",
      "abbreviation": null,
      "registry": {
         "name": "National Center for Biotechnology Information",
         "abbreviation": "NCBI"
      },
      "registry_identifier": "10090"
   },
   "alleles": [],
   "background_strain": "C57BL/6J",
   "breeding_info": {
      "breeding_group": "Emx1-IRES-Cre(ND)",
      "maternal_id": "546543",
      "maternal_genotype": "Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA",
      "paternal_id": "232323",
      "paternal_genotype": "Ai93(TITL-GCaMP6f)/wt"
   },
   "source": {
      "name": "Allen Institute",
      "abbreviation": "AI",
      "registry": {
         "name": "Research Organization Registry",
         "abbreviation": "ROR"
      },
      "registry_identifier": "03cpe7c52"
   },
   "rrid": null,
   "restrictions": null,
   "wellness_reports": [],
   "housing": {
      "cage_id": "123",
      "room_id": null,
      "light_cycle": null,
      "home_cage_enrichment": [
         "Running wheel"
      ],
      "cohoused_subjects": []
   },
   "notes": null
}
```

## Installing and Upgrading

To install the latest version:
```
pip install aind-data-schema
```

Every merge to the `main` branch is automatically tagged with a new major/minor/patch version and uploaded to PyPI. To upgrade to the latest version:
```
pip install aind-data-schema --upgrade
```

To develop the code, check out this repo and run the following in the cloned directory: 
```
pip install -e .[dev]
```

## Contributing

If you've found a bug in the schemas or would like to make a minor change, open an [Issue](https://github.com/AllenNeuralDynamics/aind-data-schema/issues) on this repository. If you'd like to propose a large change or addition, or generally have a question about how things work, head start a new [Discussion](https://github.com/AllenNeuralDynamics/aind-data-schema/discussions)!


### Linters and testing

There are several libraries used to run linters, check documentation, and run tests.

- To run tests locally, navigate to AIND-DATA-SCHEMA directory in terminal and run (this will not run any on-line only tests):

```
python -m unittest
```

- Please test your changes using the **coverage** library, which will run the tests and log a coverage report:

```
coverage run -m unittest discover && coverage report
```

- To test any of the following modules, conda/pip install the relevant package (interrogate, flake8, black, isort), navigate to relevant directory, and run any of the following commands in place of [command]:

```
[command] -v . 
```

- Use **interrogate** to check that modules, methods, etc. have been documented thoroughly:

```
interrogate .
```

- Use **flake8** to check that code is up to standards (no unused imports, etc.):

```
flake8 .
```

- Use **black** to automatically format the code into PEP standards:

```
black .
```

- Use **isort** to automatically sort import statements:

```
isort .
```

### Pull requests

For internal members, please create a branch. For external members, please fork the repo and open a pull request from the fork. We'll primarily use [Angular](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit) style for commit messages. Roughly, they should follow the pattern:
```
<type>(<scope>): <short summary>
```

where scope (optional) describes the packages affected by the code changes and type (mandatory) is one of:

- **build**: Changes that affect the build system or external dependencies (example scopes: pyproject.toml, setup.py)
- **ci**: Changes to our CI configuration files and scripts (examples: .github/workflows/ci.yml)
- **docs**: Documentation only changes
- **feat**: A new feature
- **fix**: A bug fix
- **perf**: A code change that improves performance
- **refactor**: A code change that neither fixes a bug nor adds a feature
- **test**: Adding missing tests or correcting existing tests

### Documentation

To generate the rst files source files for documentation, run:

```
sphinx-apidoc -o docs/source/ src
```

Then to create the documentation html files, run:
```
sphinx-build -b html docs/source/ docs/build/html
```

More info on sphinx installation can be found here: https://www.sphinx-doc.org/en/master/usage/installation.html

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "aind-data-schema",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e9/9f/fa838645e6d104ef95bd90b26df3fb5eb8a2899fd9bf4577aa676441a803/aind-data-schema-0.33.7.tar.gz",
    "platform": null,
    "description": "# aind-data-schema\n\n[![License](https://img.shields.io/badge/license-MIT-brightgreen)](LICENSE)\n![Code Style](https://img.shields.io/badge/code%20style-black-black)\n[![Documentation Status](https://readthedocs.org/projects/aind-data-schema/badge/?version=latest)](https://aind-data-schema.readthedocs.io/en/latest/?badge=latest)\n\nA library that defines [AIND](https://alleninstitute.org/what-we-do/brain-science/research/allen-institute-neural-dynamics/) data schema and validates JSON files. \n\nUser documentation available on [readthedocs](https://aind-data-schema.readthedocs.io/en/latest/).\n\n## Overview\n\nThis repository contains the schemas needed to ingest and validate metadata that are essential to ensuring [AIND](https://alleninstitute.org/what-we-do/brain-science/research/allen-institute-neural-dynamics/) data collection is completely reproducible. Our general approach is to semantically version core schema classes and include those version numbers in serialized metadata so that we can flexibly evolve the schemas over time without requiring difficult data migrations. In the future, we will provide a browsable list of these classes rendered to [JSON Schema](https://json-schema.org/), including all historic versions.\n\nBe aware that this package is still under heavy preliminary development. Expect breaking changes regularly, although we will communicate these through semantic versioning.\n\nA simple example:\n\n```python\nimport datetime\n\nfrom aind_data_schema.core.subject import BreedingInfo, Housing, Subject\nfrom aind_data_schema.models.organizations import Organization\nfrom aind_data_schema.models.species import Species\n\nt = datetime.datetime(2022, 11, 22, 8, 43, 00)\n\ns = Subject(\n   species=Species.MUS_MUSCULUS,\n   subject_id=\"12345\",\n   sex=\"Male\",\n   date_of_birth=t.date(),\n   genotype=\"Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)\",\n   housing=Housing(home_cage_enrichment=[\"Running wheel\"], cage_id=\"123\"),\n   background_strain=\"C57BL/6J\",\n   source=Organization.AI,\n   breeding_info=BreedingInfo(\n         breeding_group=\"Emx1-IRES-Cre(ND)\",\n         maternal_id=\"546543\",\n         maternal_genotype=\"Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA\",\n         paternal_id=\"232323\",\n         paternal_genotype=\"Ai93(TITL-GCaMP6f)/wt\",\n   ),\n)\n\ns.write_standard_file() # writes subject.json\n```\n\n```json\n{\n   \"describedBy\": \"https://raw.githubusercontent.com/AllenNeuralDynamics/aind-data-schema/main/src/aind_data_schema/core/subject.py\",\n   \"schema_version\": \"0.5.5\",\n   \"subject_id\": \"12345\",\n   \"sex\": \"Male\",\n   \"date_of_birth\": \"2022-11-22\",\n   \"genotype\": \"Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)\",\n   \"species\": {\n      \"name\": \"Mus musculus\",\n      \"abbreviation\": null,\n      \"registry\": {\n         \"name\": \"National Center for Biotechnology Information\",\n         \"abbreviation\": \"NCBI\"\n      },\n      \"registry_identifier\": \"10090\"\n   },\n   \"alleles\": [],\n   \"background_strain\": \"C57BL/6J\",\n   \"breeding_info\": {\n      \"breeding_group\": \"Emx1-IRES-Cre(ND)\",\n      \"maternal_id\": \"546543\",\n      \"maternal_genotype\": \"Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA\",\n      \"paternal_id\": \"232323\",\n      \"paternal_genotype\": \"Ai93(TITL-GCaMP6f)/wt\"\n   },\n   \"source\": {\n      \"name\": \"Allen Institute\",\n      \"abbreviation\": \"AI\",\n      \"registry\": {\n         \"name\": \"Research Organization Registry\",\n         \"abbreviation\": \"ROR\"\n      },\n      \"registry_identifier\": \"03cpe7c52\"\n   },\n   \"rrid\": null,\n   \"restrictions\": null,\n   \"wellness_reports\": [],\n   \"housing\": {\n      \"cage_id\": \"123\",\n      \"room_id\": null,\n      \"light_cycle\": null,\n      \"home_cage_enrichment\": [\n         \"Running wheel\"\n      ],\n      \"cohoused_subjects\": []\n   },\n   \"notes\": null\n}\n```\n\n## Installing and Upgrading\n\nTo install the latest version:\n```\npip install aind-data-schema\n```\n\nEvery merge to the `main` branch is automatically tagged with a new major/minor/patch version and uploaded to PyPI. To upgrade to the latest version:\n```\npip install aind-data-schema --upgrade\n```\n\nTo develop the code, check out this repo and run the following in the cloned directory: \n```\npip install -e .[dev]\n```\n\n## Contributing\n\nIf you've found a bug in the schemas or would like to make a minor change, open an [Issue](https://github.com/AllenNeuralDynamics/aind-data-schema/issues) on this repository. If you'd like to propose a large change or addition, or generally have a question about how things work, head start a new [Discussion](https://github.com/AllenNeuralDynamics/aind-data-schema/discussions)!\n\n\n### Linters and testing\n\nThere are several libraries used to run linters, check documentation, and run tests.\n\n- To run tests locally, navigate to AIND-DATA-SCHEMA directory in terminal and run (this will not run any on-line only tests):\n\n```\npython -m unittest\n```\n\n- Please test your changes using the **coverage** library, which will run the tests and log a coverage report:\n\n```\ncoverage run -m unittest discover && coverage report\n```\n\n- To test any of the following modules, conda/pip install the relevant package (interrogate, flake8, black, isort), navigate to relevant directory, and run any of the following commands in place of [command]:\n\n```\n[command] -v . \n```\n\n- Use **interrogate** to check that modules, methods, etc. have been documented thoroughly:\n\n```\ninterrogate .\n```\n\n- Use **flake8** to check that code is up to standards (no unused imports, etc.):\n\n```\nflake8 .\n```\n\n- Use **black** to automatically format the code into PEP standards:\n\n```\nblack .\n```\n\n- Use **isort** to automatically sort import statements:\n\n```\nisort .\n```\n\n### Pull requests\n\nFor internal members, please create a branch. For external members, please fork the repo and open a pull request from the fork. We'll primarily use [Angular](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit) style for commit messages. Roughly, they should follow the pattern:\n```\n<type>(<scope>): <short summary>\n```\n\nwhere scope (optional) describes the packages affected by the code changes and type (mandatory) is one of:\n\n- **build**: Changes that affect the build system or external dependencies (example scopes: pyproject.toml, setup.py)\n- **ci**: Changes to our CI configuration files and scripts (examples: .github/workflows/ci.yml)\n- **docs**: Documentation only changes\n- **feat**: A new feature\n- **fix**: A bug fix\n- **perf**: A code change that improves performance\n- **refactor**: A code change that neither fixes a bug nor adds a feature\n- **test**: Adding missing tests or correcting existing tests\n\n### Documentation\n\nTo generate the rst files source files for documentation, run:\n\n```\nsphinx-apidoc -o docs/source/ src\n```\n\nThen to create the documentation html files, run:\n```\nsphinx-build -b html docs/source/ docs/build/html\n```\n\nMore info on sphinx installation can be found here: https://www.sphinx-doc.org/en/master/usage/installation.html\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library that defines AIND data schema and validates JSON files.",
    "version": "0.33.7",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b6912e7065ff7598cbab9a5c6286eaa3080b5cd84722da79e56a1d867e5670d",
                "md5": "ebc17d86ec6e1f9c15cf88d3d591e254",
                "sha256": "7167ce3df5d41fa96033239881a6f7e4724bac1bfffd2619f239826d84887ae7"
            },
            "downloads": -1,
            "filename": "aind_data_schema-0.33.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ebc17d86ec6e1f9c15cf88d3d591e254",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 65566,
            "upload_time": "2024-04-10T22:10:28",
            "upload_time_iso_8601": "2024-04-10T22:10:28.650339Z",
            "url": "https://files.pythonhosted.org/packages/8b/69/12e7065ff7598cbab9a5c6286eaa3080b5cd84722da79e56a1d867e5670d/aind_data_schema-0.33.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e99ffa838645e6d104ef95bd90b26df3fb5eb8a2899fd9bf4577aa676441a803",
                "md5": "5ab5de10a1dab1dcde9901dacfe5d28a",
                "sha256": "db4a156c210d8d1e5743191f660eb7b2890977650be903d15d9fb8a471e4abff"
            },
            "downloads": -1,
            "filename": "aind-data-schema-0.33.7.tar.gz",
            "has_sig": false,
            "md5_digest": "5ab5de10a1dab1dcde9901dacfe5d28a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 307544,
            "upload_time": "2024-04-10T22:10:32",
            "upload_time_iso_8601": "2024-04-10T22:10:32.456654Z",
            "url": "https://files.pythonhosted.org/packages/e9/9f/fa838645e6d104ef95bd90b26df3fb5eb8a2899fd9bf4577aa676441a803/aind-data-schema-0.33.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-10 22:10:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "aind-data-schema"
}
        
Elapsed time: 0.24873s