gen3Dictionary


Namegen3Dictionary JSON
Version 2.0.3 PyPI version JSON
download
home_page
Summary
upload_time2023-11-14 19:41:39
maintainer
docs_urlNone
authorCTDS UChicago
requires_python>=3.9,<4.0
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data Dictionary

The data dictionary provides the first level of validation for all data
stored in and generated by the BPA. Written in YAML, JSON schemas define all the individual entities
(nodes) in the data model. Moreover, these schemas define all of the relationships (links)
between the nodes. Finally, the schemas define the valid key-value pairs that can be used to
describe the nodes. 

## Data Dictionary Structure 

The Data Model covers all of the nodes within the as well as the relationships between
the different types of nodes. All of the nodes in the data model are strongly typed and individually
defined for a specific data type. For example, submitted files can come in different forms, such as
aligned or unaligned reads; within the model we have two separately defined nodes for
`Submitted Unaligned Reads` and `Submitted Aligned Reads`. Doing such allows for faster querying of
the data model as well as providing a clear and concise representation of the data in the BPA.

Beyond node type, there are also a number of extensions used to further define the nodes within
the data model. Nodes are grouped up into categories that represent broad roles for the node such
as `analysis` or `biospecimen`. Additionally, nodes are defined within their `Program` or `Project`
and have descriptions of their use. All nodes also have a series of `systemProperties`; these
properties are those that will be automatically filled by the system unless otherwise defined by
the user.  These basic properties define the node itself but still need to be placed into the model.

The model itself is represented as a graph. Within the schema are defined `links`; these links
point from child to parent with Program being the root of the graph. The links also contain a
`backref` that allows for a parent to point back to a child. Other features of the link include a
semantic `label` that describes the relationship between the two nodes, a `multiplicity` property
that describes the numeric relationship from the child to the parent, and a requirement property
to define whether a node must have that link. Taken all together the nodes and links create the
directed graph of the Data Model.

## Node Properties and Examples

Each node contains a series of potential key-value pairs (`properties`) that can be used to
characterize the data they represent. Some properties are categorized as `required` or `preferred`.
If a submission lacks a required property, it cannot be accepted. Preferred properties can denote
two things: the property is being highlighted as it has become more desired by the community or
the property is being promoted to required. All properties not designated either `required` or
`preferred` are still sought by BPA, but submissions without them are allowed. 

The properties have further validation through their entries. Legal values are defined in each
property. For the most part these are represented in the `enum` categories although some keys,
such as `submitter_id`, will allow any string value as a valid entry. Other numeric properties
can have maximum and minimum values to limit valid entries.  For examples of what a valid entry
would look like, each node has a mock submission located in the `examples/valid/` directory. 

## Contributing

Read how to contribute [here](https://github.com/NCI-GDC/portal-ui/blob/develop/CONTRIBUTING.md).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "gen3Dictionary",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "CTDS UChicago",
    "author_email": "cdis@uchicago.edu",
    "download_url": "https://files.pythonhosted.org/packages/0e/f9/f77cc03b4757ef1c1ff64a68095824ec3308b8627bc2b03a6a780400c788/gen3dictionary-2.0.3.tar.gz",
    "platform": null,
    "description": "# Data Dictionary\n\nThe data dictionary provides the first level of validation for all data\nstored in and generated by the BPA. Written in YAML, JSON schemas define all the individual entities\n(nodes) in the data model. Moreover, these schemas define all of the relationships (links)\nbetween the nodes. Finally, the schemas define the valid key-value pairs that can be used to\ndescribe the nodes. \n\n## Data Dictionary Structure \n\nThe Data Model covers all of the nodes within the as well as the relationships between\nthe different types of nodes. All of the nodes in the data model are strongly typed and individually\ndefined for a specific data type. For example, submitted files can come in different forms, such as\naligned or unaligned reads; within the model we have two separately defined nodes for\n`Submitted Unaligned Reads` and `Submitted Aligned Reads`. Doing such allows for faster querying of\nthe data model as well as providing a clear and concise representation of the data in the BPA.\n\nBeyond node type, there are also a number of extensions used to further define the nodes within\nthe data model. Nodes are grouped up into categories that represent broad roles for the node such\nas `analysis` or `biospecimen`. Additionally, nodes are defined within their `Program` or `Project`\nand have descriptions of their use. All nodes also have a series of `systemProperties`; these\nproperties are those that will be automatically filled by the system unless otherwise defined by\nthe user.  These basic properties define the node itself but still need to be placed into the model.\n\nThe model itself is represented as a graph. Within the schema are defined `links`; these links\npoint from child to parent with Program being the root of the graph. The links also contain a\n`backref` that allows for a parent to point back to a child. Other features of the link include a\nsemantic `label` that describes the relationship between the two nodes, a `multiplicity` property\nthat describes the numeric relationship from the child to the parent, and a requirement property\nto define whether a node must have that link. Taken all together the nodes and links create the\ndirected graph of the Data Model.\n\n## Node Properties and Examples\n\nEach node contains a series of potential key-value pairs (`properties`) that can be used to\ncharacterize the data they represent. Some properties are categorized as `required` or `preferred`.\nIf a submission lacks a required property, it cannot be accepted. Preferred properties can denote\ntwo things: the property is being highlighted as it has become more desired by the community or\nthe property is being promoted to required. All properties not designated either `required` or\n`preferred` are still sought by BPA, but submissions without them are allowed. \n\nThe properties have further validation through their entries. Legal values are defined in each\nproperty. For the most part these are represented in the `enum` categories although some keys,\nsuch as `submitter_id`, will allow any string value as a valid entry. Other numeric properties\ncan have maximum and minimum values to limit valid entries.  For examples of what a valid entry\nwould look like, each node has a mock submission located in the `examples/valid/` directory. \n\n## Contributing\n\nRead how to contribute [here](https://github.com/NCI-GDC/portal-ui/blob/develop/CONTRIBUTING.md).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "",
    "version": "2.0.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ef9f77cc03b4757ef1c1ff64a68095824ec3308b8627bc2b03a6a780400c788",
                "md5": "7a3675d442fa18b0d7b64681515862c9",
                "sha256": "46a704e202a79be96ec08969d28885794d4825b94394103dca08e3637bd6cb82"
            },
            "downloads": -1,
            "filename": "gen3dictionary-2.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7a3675d442fa18b0d7b64681515862c9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 47902,
            "upload_time": "2023-11-14T19:41:39",
            "upload_time_iso_8601": "2023-11-14T19:41:39.995461Z",
            "url": "https://files.pythonhosted.org/packages/0e/f9/f77cc03b4757ef1c1ff64a68095824ec3308b8627bc2b03a6a780400c788/gen3dictionary-2.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-14 19:41:39",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "gen3dictionary"
}
        
Elapsed time: 0.34484s