psqlgml


Namepsqlgml JSON
Version 0.2.4 PyPI version JSON
download
home_pagehttps://github.com/kulgan/psqlgml
Summary
upload_time2023-07-13 13:54:45
maintainer
docs_urlNone
authorRowland Ogwara
requires_python>=3.6
licenseApache Software License 2.0
keywords gdcdictionary psqlgraph graphml mocks testing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |ci|

Summary
-------
Sample data generation is a common step used for testing and verifying new and existing features that make use of the data commons dictionary. Without validation tools, this step can be super hard and prone to errors. This project aims to provide tooling that helps with generating and visualizing sample data. It is dictionary agnostic, so should work for any given gdc compatible dictionary.

Sample data graphs are represented using a customized GraphML_ format which can be represented in either json or yaml files. This projects provides tools for creating this schema based on selected dictionary and validating data that is targeting this schema.

Goals
-----
psqlgml aims to provide the following for projects that makes use of psqlgraph_:

1. test data validation and visualization
2. test data schema that can be integrated with IDE's for easier test data generation
3. randomized test data generation based on user requirements
4. provide data structures and functions for use in external projects
5. provide alternate implementation for loading dictionary with better type checking

Requirements
------------
* Python3.6+
* graphviz_ (used for visualization)

Installation
------------
from pypi

.. code-block:: bash

    $ pip install psqlgml

Quick Start
-----------
Command Line
++++++++++++
.. code-block:: bash

    # install
    $ pip install psqlgml

    # validate install
    $ psqlgml --help

    # generate internal schema to aid validation
    $ psqlgml generate -v 2.4.0 -n test_dictionary

    # validation
    $ psqlgml validate --help

    # visualize
    $ psqlgml visualize --help

API
+++
.. code-block:: python

    import psqlgml

    # load the default dictionary
    dictionary: psqlgml.Dictionary = psqlgml.load(version="2.3.0")


GML Schema
----------
This is a customized GraphML_ format based on JSON schema. It allows graphs to be represented as a set of nodes and edges. The schema makes it possible to validate a sample data.

.. code-block:: yaml

    unique_field: node_id
    nodes:
      - label: program
        node_id: p_1
        name: SM-KD
      - label: project
        node_id: pr_1
    edges:
      - src: p_1
        dst: pr_1
        label: programs

This example creats two nodes ``Program`` and ``Project`` that are linked together using the ``node_id`` property. The name of the edge connecting them is ``programs``

Schema Generation
-----------------
psqlgml can be used to generate dictionary specific schemas using exposed command line scripts. By default, gdcdictionary_ is assumed but parameters can be updated to work with a different project.

Generate schema using version 2.4.0 of the gdcdictionary

.. code-block::

    psqlgml generate -v 2.4.0 -n gdcdictionary

The generated schema can be used for validating sample data. It can also be added to IDEs like PyCharm for intellisense while creating sample data.

Sample Data Validation
----------------------
.. code-block::

    $ psqlgml validate -f sample.yaml --data-dir <resource dir> -d <dictionary name> -v <dictionary version>

The following validations are currently supported:

* JSON Schema Validation
* Duplicate Definition Validation
* Undefined Link Validation
* Association Validation

JSON Schema Validation
++++++++++++++++++++++
Checks the sample data is compliant with the dictionary. It validates things like:
* properties that are not allowed on a node
* property values not allowed on a property
* Invalid enum value
* Invalid/unsupported node types

Duplicate Definition Validation
+++++++++++++++++++++++++++++++
Raises an error whenever a unique id is used for more than one node

Undefined Link Validation
+++++++++++++++++++++++++
This is raised as a warning, since it is very possible to link to nodes not defined with the sample data. For example, appending data to an existing database.

Association Validation
++++++++++++++++++++++
Raises an error whenever an edge exists between nodes that the dictionary does not define an edge for.

.. |ci| image:: https://app.travis-ci.com/NCI-GDC/psqlgml.svg?token=5s3bZRahNJnkspYEMwZC&branch=master
    :target: https://app.travis-ci.com/github/NCI-GDC/psqlgml/branches
    :alt: build
.. |action| image:: https://img.shields.io/github/workflow/status/kulgan/psqlgml/psqlgml-ci
    :target: https://github.com/kulgan/psqlgml/actions
    :alt: psqlgml ci
.. _graphviz: https://graphviz.org/
.. _GraphML: http://graphml.graphdrawing.org/primer/graphml-primer.html
.. _gdcdictionary: https://github.com/NCI-GDC/gdcdictionary
.. _psqlgraph: https://github.com/NCI-GDC/psqlgraph

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kulgan/psqlgml",
    "name": "psqlgml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "gdcdictionary,psqlgraph,graphml,mocks,testing",
    "author": "Rowland Ogwara",
    "author_email": "r.ogwara@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f8/58/29bb78d1f0764cf29a02cf8d8091c548cea5625933568df68134141e89fe/psqlgml-0.2.4.tar.gz",
    "platform": null,
    "description": "|ci|\n\nSummary\n-------\nSample data generation is a common step used for testing and verifying new and existing features that make use of the data commons dictionary. Without validation tools, this step can be super hard and prone to errors. This project aims to provide tooling that helps with generating and visualizing sample data. It is dictionary agnostic, so should work for any given gdc compatible dictionary.\n\nSample data graphs are represented using a customized GraphML_ format which can be represented in either json or yaml files. This projects provides tools for creating this schema based on selected dictionary and validating data that is targeting this schema.\n\nGoals\n-----\npsqlgml aims to provide the following for projects that makes use of psqlgraph_:\n\n1. test data validation and visualization\n2. test data schema that can be integrated with IDE's for easier test data generation\n3. randomized test data generation based on user requirements\n4. provide data structures and functions for use in external projects\n5. provide alternate implementation for loading dictionary with better type checking\n\nRequirements\n------------\n* Python3.6+\n* graphviz_ (used for visualization)\n\nInstallation\n------------\nfrom pypi\n\n.. code-block:: bash\n\n    $ pip install psqlgml\n\nQuick Start\n-----------\nCommand Line\n++++++++++++\n.. code-block:: bash\n\n    # install\n    $ pip install psqlgml\n\n    # validate install\n    $ psqlgml --help\n\n    # generate internal schema to aid validation\n    $ psqlgml generate -v 2.4.0 -n test_dictionary\n\n    # validation\n    $ psqlgml validate --help\n\n    # visualize\n    $ psqlgml visualize --help\n\nAPI\n+++\n.. code-block:: python\n\n    import psqlgml\n\n    # load the default dictionary\n    dictionary: psqlgml.Dictionary = psqlgml.load(version=\"2.3.0\")\n\n\nGML Schema\n----------\nThis is a customized GraphML_ format based on JSON schema. It allows graphs to be represented as a set of nodes and edges. The schema makes it possible to validate a sample data.\n\n.. code-block:: yaml\n\n    unique_field: node_id\n    nodes:\n      - label: program\n        node_id: p_1\n        name: SM-KD\n      - label: project\n        node_id: pr_1\n    edges:\n      - src: p_1\n        dst: pr_1\n        label: programs\n\nThis example creats two nodes ``Program`` and ``Project`` that are linked together using the ``node_id`` property. The name of the edge connecting them is ``programs``\n\nSchema Generation\n-----------------\npsqlgml can be used to generate dictionary specific schemas using exposed command line scripts. By default, gdcdictionary_ is assumed but parameters can be updated to work with a different project.\n\nGenerate schema using version 2.4.0 of the gdcdictionary\n\n.. code-block::\n\n    psqlgml generate -v 2.4.0 -n gdcdictionary\n\nThe generated schema can be used for validating sample data. It can also be added to IDEs like PyCharm for intellisense while creating sample data.\n\nSample Data Validation\n----------------------\n.. code-block::\n\n    $ psqlgml validate -f sample.yaml --data-dir <resource dir> -d <dictionary name> -v <dictionary version>\n\nThe following validations are currently supported:\n\n* JSON Schema Validation\n* Duplicate Definition Validation\n* Undefined Link Validation\n* Association Validation\n\nJSON Schema Validation\n++++++++++++++++++++++\nChecks the sample data is compliant with the dictionary. It validates things like:\n* properties that are not allowed on a node\n* property values not allowed on a property\n* Invalid enum value\n* Invalid/unsupported node types\n\nDuplicate Definition Validation\n+++++++++++++++++++++++++++++++\nRaises an error whenever a unique id is used for more than one node\n\nUndefined Link Validation\n+++++++++++++++++++++++++\nThis is raised as a warning, since it is very possible to link to nodes not defined with the sample data. For example, appending data to an existing database.\n\nAssociation Validation\n++++++++++++++++++++++\nRaises an error whenever an edge exists between nodes that the dictionary does not define an edge for.\n\n.. |ci| image:: https://app.travis-ci.com/NCI-GDC/psqlgml.svg?token=5s3bZRahNJnkspYEMwZC&branch=master\n    :target: https://app.travis-ci.com/github/NCI-GDC/psqlgml/branches\n    :alt: build\n.. |action| image:: https://img.shields.io/github/workflow/status/kulgan/psqlgml/psqlgml-ci\n    :target: https://github.com/kulgan/psqlgml/actions\n    :alt: psqlgml ci\n.. _graphviz: https://graphviz.org/\n.. _GraphML: http://graphml.graphdrawing.org/primer/graphml-primer.html\n.. _gdcdictionary: https://github.com/NCI-GDC/gdcdictionary\n.. _psqlgraph: https://github.com/NCI-GDC/psqlgraph\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "",
    "version": "0.2.4",
    "project_urls": {
        "Homepage": "https://github.com/kulgan/psqlgml"
    },
    "split_keywords": [
        "gdcdictionary",
        "psqlgraph",
        "graphml",
        "mocks",
        "testing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a03d90ab5f4450a06acdb7215063d931229c2bc4e19a0abfeb5b924e776025b",
                "md5": "0b735f4be369b1f654794a9f3a951de2",
                "sha256": "d3aa62b3aa1c6b971644228911088853134ddb8ddb36f0227eb93d7d8fe9d6a9"
            },
            "downloads": -1,
            "filename": "psqlgml-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0b735f4be369b1f654794a9f3a951de2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 24182,
            "upload_time": "2023-07-13T13:54:43",
            "upload_time_iso_8601": "2023-07-13T13:54:43.637851Z",
            "url": "https://files.pythonhosted.org/packages/9a/03/d90ab5f4450a06acdb7215063d931229c2bc4e19a0abfeb5b924e776025b/psqlgml-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f85829bb78d1f0764cf29a02cf8d8091c548cea5625933568df68134141e89fe",
                "md5": "e65919427f52eae5d3bfb356e04648d6",
                "sha256": "e4284375b30959e4d300aa3968f043ca1649cb226f8507013fbd13792559afe4"
            },
            "downloads": -1,
            "filename": "psqlgml-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "e65919427f52eae5d3bfb356e04648d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 564216,
            "upload_time": "2023-07-13T13:54:45",
            "upload_time_iso_8601": "2023-07-13T13:54:45.453442Z",
            "url": "https://files.pythonhosted.org/packages/f8/58/29bb78d1f0764cf29a02cf8d8091c548cea5625933568df68134141e89fe/psqlgml-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-13 13:54:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kulgan",
    "github_project": "psqlgml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "psqlgml"
}
        
Elapsed time: 0.12168s