phenopackets


Namephenopackets JSON
Version 2.0.2.post1 PyPI version JSON
download
home_pagehttps://github.com/phenopackets/phenopacket-schema
SummaryA python implementation of phenopackets protobuf
upload_time2023-10-18 17:58:59
maintainer
docs_urlNone
authorMichael Gargano
requires_python
licenseBSD
keywords phenopackets clinical
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            Phenopacket schema
==================

|Build Status| |Maven Central| |Documentation|

.. |Build Status| image:: https://travis-ci.org/phenopackets/phenopacket-schema.svg?branch=master
  :target: https://travis-ci.org/phenopackets/phenopacket-schema

.. |Maven Central| image:: https://maven-badges.herokuapp.com/maven-central/org.phenopackets/phenopacket-schema/badge.svg
  :target: https://maven-badges.herokuapp.com/maven-central/org.phenopackets/phenopacket-schema

.. |Documentation| image:: https://readthedocs.org/projects/phenopacket-schema/badge/?version=v2
    :target: https://phenopacket-schema.readthedocs.io/en/v2/?badge=v2
    :alt: Documentation Status

This has been produced as part of the `GA4GH`_ `Clinical Phenotype Data Capture Workstream`_ and it merges the existing `GA4GH metadata-schemas`_ work with a more focused model from the `phenopacket-reference-implementation`_.

.. _GA4GH: https://ga4gh.org
.. _Clinical Phenotype Data Capture Workstream: https://ga4gh-cp.github.io/
.. _GA4GH metadata-schemas: https://github.com/ga4gh-metadata/metadata-schemas
.. _phenopacket-reference-implementation: https://github.com/phenopackets/phenopacket-reference-implementation


This is a re-defined version of the original phenopacket with a more individual-centric approach. This new approach was taken in order to simplify the code required to represent and manipulate the data and also better represent this sort of data as it is in day to day use.

Documentation
=============

The core documentation can be found at `Documentation`_

The documentation in this README is primarily for the users of the phenopacket-schema java library.

.. _Documentation: https://phenopacket-schema.readthedocs.io/en/latest

Scope and Purpose
=================
The goal of the phenopacket-schema is to define the phenotypic description of a patient/sample in the context of rare disease or cancer genomic diagnosis. It aims to provide sufficient and shareable information of the data outside of the EHR (Electronic Health Record) with the aim of enabling capturing of sufficient structured data at the point of care by a clinician or clinical geneticist for sharing with other labs or computational analysis of the data in clinical or research environments.

The schema aims to define a common, limited set of data types which may be composed into more specialised types for data sharing between resources using an agreed upon common schema (as defined in base.proto).

This common schema has been used to define the 'Phenopacket' which is a catch-all collection of data types, specifically focused on representing rare-disease or cancer samples for both initial data capture and analysis. The phenopacket is designed to be both human and machine-readable, and to inter-operate with the HL7 Fast Healthcare Interoperability Resources Specification (aka FHIR®).  

Versioning
==========

The library uses semantic versioning. See https://semver.org for details.

Email list
==========
There is a low-volume mailing list for announcements about phenopackets at phenopackets@groups.io. More information
about this list is available at https://groups.io/g/phenopackets.


Usage
=====
The Phenopacket schema is defined using `Protobuf`_ which is `"a language-neutral, platform-neutral extensible mechanism for serializing structured data"`.  There are two ways to use this library, firstly using the ``Phenopacket`` as an exchange mechanism, secondly as a schema of basic types on which to build more specialist messages, yet allow for easy interoperability with other resources using the phenopackets schema.
The following sections describe how to achieve these two things.

.. _Protobuf: https://developers.google.com/protocol-buffers/

Include phenopackets into your project
--------------------------------------

**Java** people can incorporate phenopackets into their code by importing the jar using maven:

.. code:: xml

    <dependency>
        <groupId>org.phenopackets</groupId>
        <artifactId>phenopacket-schema</artifactId>
        <version>${phenopacket-schema.version}</version>
    </dependency>


Using phenopackets in **Python** is also straightforward:

.. code:: bash

    pip install phenopackets


Exchanging Phenopackets directly
--------------------------------
Examples on how these can be used can be found in the test directory. There are no explicit relationships defined between fields in the phenopacket (apart from the Pedigree), so it is vital that resources exchanging phenopackets agree on what is valid and what the fields mean in relation to other fields in the phenopacket. For example the ``Phenopacket.genes`` field may be agreed upon as representing the genes for a gene panel in one context, or a set of candidate genes or perhaps a diagnosed causative gene.

JSON/YAML formats
-----------------
A Phenopacket can be transformed between the native binary format and JSON using the ``JsonFormat`` class from the ``protobuf-java-util`` library. This will also need to be added to your pom.xml

.. code:: xml

    <dependency>
        <groupId>com.google.protobuf</groupId>
        <artifactId>protobuf-java-util</artifactId>
        <version>${protobuf.version}</version>
    </dependency>


.. code:: bash

    pip install protobuf


``protobuf-java-util`` for java and ``protobuf`` for python contain simple utility methods to perform these transformations. Usage is shown here:

.. code-block:: java

    // Transform a Phenopacket into JSON
    Phenopacket original = TestExamples.rareDiseasePhenopacket();

    String asJson = JsonFormat.printer().print(original);
    System.out.println(asJson);

    // Convert the JSON back to a Phenopacket
    Phenopacket.Builder phenoPacketBuilder = Phenopacket.newBuilder();
    JsonFormat.parser().merge(jsonString, phenoPacketBuilder);
    Phenopacket fromJson = phenoPacketBuilder.build();

    // Convert the JSON into YAML (using Jackson)
    JsonNode jsonNodeTree = new ObjectMapper().readTree(jsonString);
    String yamlPhenopacket = new YAMLMapper().writeValueAsString(jsonNodeTree);

    // Convert the YAML back into JSON (using Jackson)
    JsonNode jsonNodeTree = new YAMLMapper().readTree(yamlString);
    String jsonPhenopacket = new ObjectMapper().writeValueAsString(jsonNodeTree);

    // And finally back into a Java object
    Phenopacket.Builder phenoPacketBuilder2 = Phenopacket.newBuilder();
    JsonFormat.parser().merge(jsonPhenopacket, phenoPacketBuilder2);
    Phenopacket fromJson2 = phenoPacketBuilder2.build();


.. code-block:: python

    from google.protobuf.json_format import Parse, MessageToJson
    from google.protobuf.timestamp_pb2 import Timestamp
    from phenopackets import Phenopacket, Individual, PhenotypicFeature, OntologyClass

    # Parsing phenopackets from json
    with open('file.json', 'r') as jsfile:
        phenopacket = Parse(Phenopacket(), text=jsfile.read())

    # Writing phenopackets to json
    with open('file.json', 'w') as jsfile:
        subject = Individual(id="Zaphod", sex="MALE", date_of_birth=Timestamp(seconds=-123456798))
        phenotypic_features = [PhenotypicFeature(type=OntologyClass(id="HG2G:00001", label="Hoopy")),
                               PhenotypicFeature(type=OntologyClass(id="HG2G:00002", label="Frood"))]

        phenopacket = Phenopacket(id="PPKT:1", subject=subject, phenotypic_features=phenotypic_features)

        json = MessageToJson(phenopacket)
        jsfile.write(json)


Building new messages from the schema
-------------------------------------
There is an example of how to do this included in the `mme.proto`_ file. Here the Matchmaker Exchange (MME) API has been implemented using the phenopackets schema, defining custom messages as required, but re-using messages from `base.proto`_ where applicable. Using the above example, perhaps the ``Phenopacket.genes`` is a problem as you wish to record not only the gene panels ordered, but also the candidate genes discovered in two separate fields. In this case, a new bespoke message could be created, using the ``Gene`` as a building block.

.. _mme.proto: https://github.com/phenopackets/phenopacket-schema/blob/master/src/test/proto/mme.proto
.. _base.proto: https://github.com/phenopackets/phenopacket-schema/blob/master/src/main/proto/phenopackets/schema/v1/base.proto

Git Submodules
==============
This repo uses `git submodules`_ to import the `VRS protobuf` implementation. You may need to use the following command after cloning/update
for things to build correctly:

.. code:: bash

  $ git submodule update --init --recursive


.. _git submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules
.. _VRS protobuf: https://github.com/ga4gh/vrs-protobuf

Building
========
The project can be built using the awesome `Takari maven wrapper`_ which requires no local maven installation. The only requirement for the build is to have a working java installation and network access.

To do this ``cd`` to the project root and run the wrapper scripts:
                                                    
.. code:: bash

    $ ./mvnw clean install


or

.. code:: bash

    $ ./mvnw.cmd clean install


.. _Takari maven wrapper: https://github.com/takari/maven-wrapper

Sign artefacts for release
==========================
There is a ``release-sign-artifacts`` profile for **Java** which can be triggered with the command

.. code:: bash

    $ ./mvnw clean install -DperformRelease=true


The **Python** artefacts are released by running

Test

.. code:: bash

    $ bash deploy-python.sh release-test


Production

.. code:: bash

    $ bash deploy-python.sh release-prod


Java, Python and C++ artefacts
==============================
Building the project will automatically compile Java, Python and C++ artefacts. The Java jar file can be directly used in any Java project. For Python or C++ the build artefacts can be found at

.. code:: bash

    target/generated-sources/protobuf/python

and

.. code:: bash

    target/generated-sources/protobuf/cpp

Other languages will need to compile the files in ``src/main/proto`` to
their desired language. The protobuf developer site has examples on how
to do this, e.g `GO`_ or `C#`_. Protobuf also supports a `host of other
languages`_.

.. _GO: https://developers.google.com/protocol-buffers/docs/gotutorial#compiling-your-protocol-buffers
.. _C#: https://developers.google.com/protocol-buffers/docs/csharptutorial#compiling-your-protocol-buffers
.. _host of other languages: https://github.com/google/protobuf/tree/v3.7.0#protobuf-runtime-installation

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/phenopackets/phenopacket-schema",
    "name": "phenopackets",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "phenopackets,clinical",
    "author": "Michael Gargano",
    "author_email": "michael.gargano@jax.com",
    "download_url": "https://files.pythonhosted.org/packages/0b/b8/4a59932858caaa3aeab5d5dba2a008da0df3a616c336bd9119cadb622de6/phenopackets-2.0.2.post1.tar.gz",
    "platform": null,
    "description": "Phenopacket schema\n==================\n\n|Build Status| |Maven Central| |Documentation|\n\n.. |Build Status| image:: https://travis-ci.org/phenopackets/phenopacket-schema.svg?branch=master\n  :target: https://travis-ci.org/phenopackets/phenopacket-schema\n\n.. |Maven Central| image:: https://maven-badges.herokuapp.com/maven-central/org.phenopackets/phenopacket-schema/badge.svg\n  :target: https://maven-badges.herokuapp.com/maven-central/org.phenopackets/phenopacket-schema\n\n.. |Documentation| image:: https://readthedocs.org/projects/phenopacket-schema/badge/?version=v2\n    :target: https://phenopacket-schema.readthedocs.io/en/v2/?badge=v2\n    :alt: Documentation Status\n\nThis has been produced as part of the `GA4GH`_ `Clinical Phenotype Data Capture Workstream`_ and it merges the existing `GA4GH metadata-schemas`_ work with a more focused model from the `phenopacket-reference-implementation`_.\n\n.. _GA4GH: https://ga4gh.org\n.. _Clinical Phenotype Data Capture Workstream: https://ga4gh-cp.github.io/\n.. _GA4GH metadata-schemas: https://github.com/ga4gh-metadata/metadata-schemas\n.. _phenopacket-reference-implementation: https://github.com/phenopackets/phenopacket-reference-implementation\n\n\nThis is a re-defined version of the original phenopacket with a more individual-centric approach. This new approach was taken in order to simplify the code required to represent and manipulate the data and also better represent this sort of data as it is in day to day use.\n\nDocumentation\n=============\n\nThe core documentation can be found at `Documentation`_\n\nThe documentation in this README is primarily for the users of the phenopacket-schema java library.\n\n.. _Documentation: https://phenopacket-schema.readthedocs.io/en/latest\n\nScope and Purpose\n=================\nThe goal of the phenopacket-schema is to define the phenotypic description of a patient/sample in the context of rare disease or cancer genomic diagnosis. It aims to provide sufficient and shareable information of the data outside of the EHR (Electronic Health Record) with the aim of enabling capturing of sufficient structured data at the point of care by a clinician or clinical geneticist for sharing with other labs or computational analysis of the data in clinical or research environments.\n\nThe schema aims to define a common, limited set of data types which may be composed into more specialised types for data sharing between resources using an agreed upon common schema (as defined in base.proto).\n\nThis common schema has been used to define the 'Phenopacket' which is a catch-all collection of data types, specifically focused on representing rare-disease or cancer samples for both initial data capture and analysis. The phenopacket is designed to be both human and machine-readable, and to inter-operate with the HL7 Fast Healthcare Interoperability Resources Specification (aka FHIR\u00ae).  \n\nVersioning\n==========\n\nThe library uses semantic versioning. See https://semver.org for details.\n\nEmail list\n==========\nThere is a low-volume mailing list for announcements about phenopackets at phenopackets@groups.io. More information\nabout this list is available at https://groups.io/g/phenopackets.\n\n\nUsage\n=====\nThe Phenopacket schema is defined using `Protobuf`_ which is `\"a language-neutral, platform-neutral extensible mechanism for serializing structured data\"`.  There are two ways to use this library, firstly using the ``Phenopacket`` as an exchange mechanism, secondly as a schema of basic types on which to build more specialist messages, yet allow for easy interoperability with other resources using the phenopackets schema.\nThe following sections describe how to achieve these two things.\n\n.. _Protobuf: https://developers.google.com/protocol-buffers/\n\nInclude phenopackets into your project\n--------------------------------------\n\n**Java** people can incorporate phenopackets into their code by importing the jar using maven:\n\n.. code:: xml\n\n    <dependency>\n        <groupId>org.phenopackets</groupId>\n        <artifactId>phenopacket-schema</artifactId>\n        <version>${phenopacket-schema.version}</version>\n    </dependency>\n\n\nUsing phenopackets in **Python** is also straightforward:\n\n.. code:: bash\n\n    pip install phenopackets\n\n\nExchanging Phenopackets directly\n--------------------------------\nExamples on how these can be used can be found in the test directory. There are no explicit relationships defined between fields in the phenopacket (apart from the Pedigree), so it is vital that resources exchanging phenopackets agree on what is valid and what the fields mean in relation to other fields in the phenopacket. For example the ``Phenopacket.genes`` field may be agreed upon as representing the genes for a gene panel in one context, or a set of candidate genes or perhaps a diagnosed causative gene.\n\nJSON/YAML formats\n-----------------\nA Phenopacket can be transformed between the native binary format and JSON using the ``JsonFormat`` class from the ``protobuf-java-util`` library. This will also need to be added to your pom.xml\n\n.. code:: xml\n\n    <dependency>\n        <groupId>com.google.protobuf</groupId>\n        <artifactId>protobuf-java-util</artifactId>\n        <version>${protobuf.version}</version>\n    </dependency>\n\n\n.. code:: bash\n\n    pip install protobuf\n\n\n``protobuf-java-util`` for java and ``protobuf`` for python contain simple utility methods to perform these transformations. Usage is shown here:\n\n.. code-block:: java\n\n    // Transform a Phenopacket into JSON\n    Phenopacket original = TestExamples.rareDiseasePhenopacket();\n\n    String asJson = JsonFormat.printer().print(original);\n    System.out.println(asJson);\n\n    // Convert the JSON back to a Phenopacket\n    Phenopacket.Builder phenoPacketBuilder = Phenopacket.newBuilder();\n    JsonFormat.parser().merge(jsonString, phenoPacketBuilder);\n    Phenopacket fromJson = phenoPacketBuilder.build();\n\n    // Convert the JSON into YAML (using Jackson)\n    JsonNode jsonNodeTree = new ObjectMapper().readTree(jsonString);\n    String yamlPhenopacket = new YAMLMapper().writeValueAsString(jsonNodeTree);\n\n    // Convert the YAML back into JSON (using Jackson)\n    JsonNode jsonNodeTree = new YAMLMapper().readTree(yamlString);\n    String jsonPhenopacket = new ObjectMapper().writeValueAsString(jsonNodeTree);\n\n    // And finally back into a Java object\n    Phenopacket.Builder phenoPacketBuilder2 = Phenopacket.newBuilder();\n    JsonFormat.parser().merge(jsonPhenopacket, phenoPacketBuilder2);\n    Phenopacket fromJson2 = phenoPacketBuilder2.build();\n\n\n.. code-block:: python\n\n    from google.protobuf.json_format import Parse, MessageToJson\n    from google.protobuf.timestamp_pb2 import Timestamp\n    from phenopackets import Phenopacket, Individual, PhenotypicFeature, OntologyClass\n\n    # Parsing phenopackets from json\n    with open('file.json', 'r') as jsfile:\n        phenopacket = Parse(Phenopacket(), text=jsfile.read())\n\n    # Writing phenopackets to json\n    with open('file.json', 'w') as jsfile:\n        subject = Individual(id=\"Zaphod\", sex=\"MALE\", date_of_birth=Timestamp(seconds=-123456798))\n        phenotypic_features = [PhenotypicFeature(type=OntologyClass(id=\"HG2G:00001\", label=\"Hoopy\")),\n                               PhenotypicFeature(type=OntologyClass(id=\"HG2G:00002\", label=\"Frood\"))]\n\n        phenopacket = Phenopacket(id=\"PPKT:1\", subject=subject, phenotypic_features=phenotypic_features)\n\n        json = MessageToJson(phenopacket)\n        jsfile.write(json)\n\n\nBuilding new messages from the schema\n-------------------------------------\nThere is an example of how to do this included in the `mme.proto`_ file. Here the Matchmaker Exchange (MME) API has been implemented using the phenopackets schema, defining custom messages as required, but re-using messages from `base.proto`_ where applicable. Using the above example, perhaps the ``Phenopacket.genes`` is a problem as you wish to record not only the gene panels ordered, but also the candidate genes discovered in two separate fields. In this case, a new bespoke message could be created, using the ``Gene`` as a building block.\n\n.. _mme.proto: https://github.com/phenopackets/phenopacket-schema/blob/master/src/test/proto/mme.proto\n.. _base.proto: https://github.com/phenopackets/phenopacket-schema/blob/master/src/main/proto/phenopackets/schema/v1/base.proto\n\nGit Submodules\n==============\nThis repo uses `git submodules`_ to import the `VRS protobuf` implementation. You may need to use the following command after cloning/update\nfor things to build correctly:\n\n.. code:: bash\n\n  $ git submodule update --init --recursive\n\n\n.. _git submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules\n.. _VRS protobuf: https://github.com/ga4gh/vrs-protobuf\n\nBuilding\n========\nThe project can be built using the awesome `Takari maven wrapper`_ which requires no local maven installation. The only requirement for the build is to have a working java installation and network access.\n\nTo do this ``cd`` to the project root and run the wrapper scripts:\n                                                    \n.. code:: bash\n\n    $ ./mvnw clean install\n\n\nor\n\n.. code:: bash\n\n    $ ./mvnw.cmd clean install\n\n\n.. _Takari maven wrapper: https://github.com/takari/maven-wrapper\n\nSign artefacts for release\n==========================\nThere is a ``release-sign-artifacts`` profile for **Java** which can be triggered with the command\n\n.. code:: bash\n\n    $ ./mvnw clean install -DperformRelease=true\n\n\nThe **Python** artefacts are released by running\n\nTest\n\n.. code:: bash\n\n    $ bash deploy-python.sh release-test\n\n\nProduction\n\n.. code:: bash\n\n    $ bash deploy-python.sh release-prod\n\n\nJava, Python and C++ artefacts\n==============================\nBuilding the project will automatically compile Java, Python and C++ artefacts. The Java jar file can be directly used in any Java project. For Python or C++ the build artefacts can be found at\n\n.. code:: bash\n\n    target/generated-sources/protobuf/python\n\nand\n\n.. code:: bash\n\n    target/generated-sources/protobuf/cpp\n\nOther languages will need to compile the files in ``src/main/proto`` to\ntheir desired language. The protobuf developer site has examples on how\nto do this, e.g `GO`_ or `C#`_. Protobuf also supports a `host of other\nlanguages`_.\n\n.. _GO: https://developers.google.com/protocol-buffers/docs/gotutorial#compiling-your-protocol-buffers\n.. _C#: https://developers.google.com/protocol-buffers/docs/csharptutorial#compiling-your-protocol-buffers\n.. _host of other languages: https://github.com/google/protobuf/tree/v3.7.0#protobuf-runtime-installation\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A python implementation of phenopackets protobuf",
    "version": "2.0.2.post1",
    "project_urls": {
        "Homepage": "https://github.com/phenopackets/phenopacket-schema"
    },
    "split_keywords": [
        "phenopackets",
        "clinical"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "770be498750fb8fd6cc30235148694294185e926848a30d43a64d4715526bf55",
                "md5": "c16f75fcce371c8a884973039f9a8d3e",
                "sha256": "4b18beaa8f27dfad81e4547415b1249e1d840737858f6092801ab54dd3ba0845"
            },
            "downloads": -1,
            "filename": "phenopackets-2.0.2.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c16f75fcce371c8a884973039f9a8d3e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27007,
            "upload_time": "2023-10-18T17:58:58",
            "upload_time_iso_8601": "2023-10-18T17:58:58.044310Z",
            "url": "https://files.pythonhosted.org/packages/77/0b/e498750fb8fd6cc30235148694294185e926848a30d43a64d4715526bf55/phenopackets-2.0.2.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0bb84a59932858caaa3aeab5d5dba2a008da0df3a616c336bd9119cadb622de6",
                "md5": "144d7eb4713e5dd6595e676d389111cc",
                "sha256": "242fce220529092685a059144b40bea9dd044f375a593c5cfdbcf1e1e658bd8b"
            },
            "downloads": -1,
            "filename": "phenopackets-2.0.2.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "144d7eb4713e5dd6595e676d389111cc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19872,
            "upload_time": "2023-10-18T17:58:59",
            "upload_time_iso_8601": "2023-10-18T17:58:59.792874Z",
            "url": "https://files.pythonhosted.org/packages/0b/b8/4a59932858caaa3aeab5d5dba2a008da0df3a616c336bd9119cadb622de6/phenopackets-2.0.2.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-18 17:58:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "phenopackets",
    "github_project": "phenopacket-schema",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "phenopackets"
}
        
Elapsed time: 0.13810s