airr


Nameairr JSON
Version 1.5.0 PyPI version JSON
download
home_pagehttp://docs.airr-community.org
SummaryAIRR Community Data Representation Standard reference library for antibody and TCR sequencing data.
upload_time2023-08-31 19:11:16
maintainer
docs_urlNone
authorAIRR Community
requires_python
licenseCC BY 4.0
keywords airr bioinformatics sequencing immunoglobulin antibody adaptive immunity t cell b cell bcr tcr
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Installation
------------------------------------------------------------------------------

Install in the usual manner from PyPI::

    > pip3 install airr --user

Or from the `downloaded <https://github.com/airr-community/airr-standards>`__
source code directory::

    > python3 setup.py install --user


Quick Start
------------------------------------------------------------------------------

Deprecation Notice
^^^^^^^^^^^^^^^^^^^^

The ``load_repertoire``, ``write_repertoire``, and ``validate_repertoire`` functions
have been deprecated for the new generic ``load_airr_data``, ``write_airr_data``, and
``validate_airr_data`` functions. These new functions are backwards compatible with
the Repertoire metadata format but also support the new AIRR objects such as GermlineSet,
RepertoireGroup, GenotypeSet, Cell and Clone. This new format is defined by the DataFile
Schema, which describes a standard set of objects included in a file containing
AIRR Data Model presentations. Currently, the AIRR DataFile does not completely support
Rearrangement, so users should continue using AIRR TSV files and its specific functions.
Also, the ``repertoire_template`` function has been deprecated for the ``Schema.template``
method, which can now be called on any AIRR Schema to create a blank object.

Reading AIRR Data Files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``airr`` package contains functions to read and write AIRR Data
Model files. The file format is either YAML or JSON, and the package provides a
light wrapper over the standard parsers. The file needs a ``json``, ``yaml``, or ``yml``
file extension so that the proper parser is utilized. All of the AIRR objects
are loaded into memory at once and no streaming interface is provided::

    import airr

    # Load the AIRR data
    data = airr.read_airr('input.airr.json')
    # loop through the repertoires
    for rep in data['Repertoire']:
        print(rep)

Why are the AIRR objects, such as Repertoire, GermlineSet, and etc., in a list versus in a
dictionary keyed by their identifier (e.g., ``repertoire_id``)? There are two primary reasons for
this. First, the identifier might not have been assigned yet. Some systems might allow MiAIRR
metadata to be entered but the identifier is assigned to that data later by another process. Without
the identifier, the data could not be stored in a dictionary. Secondly, the list allows the data to
have a default ordering. If you know that the data has a unique identifier then you can quickly
create a dictionary object using a comprehension. For example, with repertoires::

    rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }

another example with germline sets::

    germline_dict = { obj['germline_set_id'] : obj for obj in data['GermlineSet'] }

Writing AIRR Data Files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Writing an AIRR Data File is also a light wrapper over standard YAML or JSON
parsers. Multiple AIRR objects, such as Repertoire, GermlineSet, and etc., can be
written together into the same file. In this example, we use the ``airr`` library ``template``
method to create some blank Repertoire objects, and write them to a file.
As with the read function, the complete list of repertoires are written at once,
there is no streaming interface::

    import airr

    # Create some blank repertoire objects in a list
    data = { 'Repertoire': [] }
    for i in range(5):
        data['Repertoire'].append(airr.schema.RepertoireSchema.template())

    # Write the AIRR Data
    airr.write_airr('output.airr.json', data)

Reading AIRR Rearrangement TSV files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``airr`` package contains functions to read and write AIRR Rearrangement
TSV files as either iterables or pandas data frames. The usage is straightforward,
as the file format is a typical tab delimited file, but the package
performs some additional validation and type conversion beyond using a
standard CSV reader::

    import airr

    # Create an iteratable that returns a dictionary for each row
    reader = airr.read_rearrangement('input.tsv')
    for row in reader: print(row)

    # Load the entire file into a pandas data frame
    df = airr.load_rearrangement('input.tsv')

Writing AIRR Rearrangement TSV files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Similar to the read operations, write functions are provided for either creating
a writer class to perform row-wise output or writing the entire contents of
a pandas data frame to a file. Again, usage is straightforward with the ``airr``
output functions simply performing some type conversion and field ordering
operations::

    import airr

    # Create a writer class for iterative row output
    writer = airr.create_rearrangement('output.tsv')
    for row in reader:  writer.write(row)

    # Write an entire pandas data frame to a file
    airr.dump_rearrangement(df, 'file.tsv')

By default, ``create_rearrangement`` will only write the ``required`` fields
in the output file. Additional fields can be included in the output file by
providing the ``fields`` parameter with an array of additional field names::

    # Specify additional fields in the output
    fields = ['new_calc', 'another_field']
    writer = airr.create_rearrangement('output.tsv', fields=fields)

A common operation is to read an AIRR rearrangement file, and then
write an AIRR rearrangement file with additional fields in it while
keeping all of the existing fields from the original file. The
``derive_rearrangement`` function provides this capability::

    import airr

    # Read rearrangement data and write new file with additional fields
    reader = airr.read_rearrangement('input.tsv')
    fields = ['new_calc']
    writer = airr.derive_rearrangement('output.tsv', 'input.tsv', fields=fields)
    for row in reader:
        row['new_calc'] = 'a value'
        writer.write(row)


Validating AIRR data files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``airr`` package can validate AIRR Data Model JSON/YAML files and Rearrangement
TSV files to ensure that they contain all required fields and that the fields types
match the AIRR Schema. This can be done using the ``airr-tools`` command
line program or the validate functions in the library can be called::

    # Validate a rearrangement TSV file
    airr-tools validate rearrangement -a input.tsv

    # Validate an AIRR DataFile
    airr-tools validate airr -a input.airr.json

Combining Repertoire metadata and Rearrangement files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``airr`` package does not currently keep track of which AIRR Data Model files
are associated with which Rearrangement TSV files, though there is ongoing work to define
a standardized manifest, so users will need to handle those
associations themselves. However, in the data, AIRR identifier fields, such as ``repertoire_id``,
form the link between objects in the AIRR Data Model.
The typical usage is that a program is going to perform some
computation on the Rearrangements, and it needs access to the Repertoire metadata
as part of the computation logic. This example code shows the basic framework
for doing that, in this case doing gender specific computation::

    import airr

    # Load AIRR data containing repertoires
    data = airr.read_airr('input.airr.json')

    # Put repertoires in dictionary keyed by repertoire_id
    rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }

    # Create an iteratable for rearrangement data
    reader = airr.read_rearrangement('input.tsv')
    for row in reader:
        # get repertoire metadata with this rearrangement
        rep = rep_dict[row['repertoire_id']]
        
        # check the gender
        if rep['subject']['sex'] == 'male':
            # do male specific computation
        elif rep['subject']['sex'] == 'female':
            # do female specific computation
        else:
            # do other specific computation


            

Raw data

            {
    "_id": null,
    "home_page": "http://docs.airr-community.org",
    "name": "airr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "AIRR,bioinformatics,sequencing,immunoglobulin,antibody,adaptive immunity,T cell,B cell,BCR,TCR",
    "author": "AIRR Community",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/3f/61/72af365d0855bce94245c4cf0676893fcce3acf05e2b23608f1dfaa0c8cb/airr-1.5.0.tar.gz",
    "platform": null,
    "description": "Installation\n------------------------------------------------------------------------------\n\nInstall in the usual manner from PyPI::\n\n    > pip3 install airr --user\n\nOr from the `downloaded <https://github.com/airr-community/airr-standards>`__\nsource code directory::\n\n    > python3 setup.py install --user\n\n\nQuick Start\n------------------------------------------------------------------------------\n\nDeprecation Notice\n^^^^^^^^^^^^^^^^^^^^\n\nThe ``load_repertoire``, ``write_repertoire``, and ``validate_repertoire`` functions\nhave been deprecated for the new generic ``load_airr_data``, ``write_airr_data``, and\n``validate_airr_data`` functions. These new functions are backwards compatible with\nthe Repertoire metadata format but also support the new AIRR objects such as GermlineSet,\nRepertoireGroup, GenotypeSet, Cell and Clone. This new format is defined by the DataFile\nSchema, which describes a standard set of objects included in a file containing\nAIRR Data Model presentations. Currently, the AIRR DataFile does not completely support\nRearrangement, so users should continue using AIRR TSV files and its specific functions.\nAlso, the ``repertoire_template`` function has been deprecated for the ``Schema.template``\nmethod, which can now be called on any AIRR Schema to create a blank object.\n\nReading AIRR Data Files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``airr`` package contains functions to read and write AIRR Data\nModel files. The file format is either YAML or JSON, and the package provides a\nlight wrapper over the standard parsers. The file needs a ``json``, ``yaml``, or ``yml``\nfile extension so that the proper parser is utilized. All of the AIRR objects\nare loaded into memory at once and no streaming interface is provided::\n\n    import airr\n\n    # Load the AIRR data\n    data = airr.read_airr('input.airr.json')\n    # loop through the repertoires\n    for rep in data['Repertoire']:\n        print(rep)\n\nWhy are the AIRR objects, such as Repertoire, GermlineSet, and etc., in a list versus in a\ndictionary keyed by their identifier (e.g., ``repertoire_id``)? There are two primary reasons for\nthis. First, the identifier might not have been assigned yet. Some systems might allow MiAIRR\nmetadata to be entered but the identifier is assigned to that data later by another process. Without\nthe identifier, the data could not be stored in a dictionary. Secondly, the list allows the data to\nhave a default ordering. If you know that the data has a unique identifier then you can quickly\ncreate a dictionary object using a comprehension. For example, with repertoires::\n\n    rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }\n\nanother example with germline sets::\n\n    germline_dict = { obj['germline_set_id'] : obj for obj in data['GermlineSet'] }\n\nWriting AIRR Data Files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWriting an AIRR Data File is also a light wrapper over standard YAML or JSON\nparsers. Multiple AIRR objects, such as Repertoire, GermlineSet, and etc., can be\nwritten together into the same file. In this example, we use the ``airr`` library ``template``\nmethod to create some blank Repertoire objects, and write them to a file.\nAs with the read function, the complete list of repertoires are written at once,\nthere is no streaming interface::\n\n    import airr\n\n    # Create some blank repertoire objects in a list\n    data = { 'Repertoire': [] }\n    for i in range(5):\n        data['Repertoire'].append(airr.schema.RepertoireSchema.template())\n\n    # Write the AIRR Data\n    airr.write_airr('output.airr.json', data)\n\nReading AIRR Rearrangement TSV files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``airr`` package contains functions to read and write AIRR Rearrangement\nTSV files as either iterables or pandas data frames. The usage is straightforward,\nas the file format is a typical tab delimited file, but the package\nperforms some additional validation and type conversion beyond using a\nstandard CSV reader::\n\n    import airr\n\n    # Create an iteratable that returns a dictionary for each row\n    reader = airr.read_rearrangement('input.tsv')\n    for row in reader: print(row)\n\n    # Load the entire file into a pandas data frame\n    df = airr.load_rearrangement('input.tsv')\n\nWriting AIRR Rearrangement TSV files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSimilar to the read operations, write functions are provided for either creating\na writer class to perform row-wise output or writing the entire contents of\na pandas data frame to a file. Again, usage is straightforward with the ``airr``\noutput functions simply performing some type conversion and field ordering\noperations::\n\n    import airr\n\n    # Create a writer class for iterative row output\n    writer = airr.create_rearrangement('output.tsv')\n    for row in reader:  writer.write(row)\n\n    # Write an entire pandas data frame to a file\n    airr.dump_rearrangement(df, 'file.tsv')\n\nBy default, ``create_rearrangement`` will only write the ``required`` fields\nin the output file. Additional fields can be included in the output file by\nproviding the ``fields`` parameter with an array of additional field names::\n\n    # Specify additional fields in the output\n    fields = ['new_calc', 'another_field']\n    writer = airr.create_rearrangement('output.tsv', fields=fields)\n\nA common operation is to read an AIRR rearrangement file, and then\nwrite an AIRR rearrangement file with additional fields in it while\nkeeping all of the existing fields from the original file. The\n``derive_rearrangement`` function provides this capability::\n\n    import airr\n\n    # Read rearrangement data and write new file with additional fields\n    reader = airr.read_rearrangement('input.tsv')\n    fields = ['new_calc']\n    writer = airr.derive_rearrangement('output.tsv', 'input.tsv', fields=fields)\n    for row in reader:\n        row['new_calc'] = 'a value'\n        writer.write(row)\n\n\nValidating AIRR data files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``airr`` package can validate AIRR Data Model JSON/YAML files and Rearrangement\nTSV files to ensure that they contain all required fields and that the fields types\nmatch the AIRR Schema. This can be done using the ``airr-tools`` command\nline program or the validate functions in the library can be called::\n\n    # Validate a rearrangement TSV file\n    airr-tools validate rearrangement -a input.tsv\n\n    # Validate an AIRR DataFile\n    airr-tools validate airr -a input.airr.json\n\nCombining Repertoire metadata and Rearrangement files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``airr`` package does not currently keep track of which AIRR Data Model files\nare associated with which Rearrangement TSV files, though there is ongoing work to define\na standardized manifest, so users will need to handle those\nassociations themselves. However, in the data, AIRR identifier fields, such as ``repertoire_id``,\nform the link between objects in the AIRR Data Model.\nThe typical usage is that a program is going to perform some\ncomputation on the Rearrangements, and it needs access to the Repertoire metadata\nas part of the computation logic. This example code shows the basic framework\nfor doing that, in this case doing gender specific computation::\n\n    import airr\n\n    # Load AIRR data containing repertoires\n    data = airr.read_airr('input.airr.json')\n\n    # Put repertoires in dictionary keyed by repertoire_id\n    rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }\n\n    # Create an iteratable for rearrangement data\n    reader = airr.read_rearrangement('input.tsv')\n    for row in reader:\n        # get repertoire metadata with this rearrangement\n        rep = rep_dict[row['repertoire_id']]\n        \n        # check the gender\n        if rep['subject']['sex'] == 'male':\n            # do male specific computation\n        elif rep['subject']['sex'] == 'female':\n            # do female specific computation\n        else:\n            # do other specific computation\n\n",
    "bugtrack_url": null,
    "license": "CC BY 4.0",
    "summary": "AIRR Community Data Representation Standard reference library for antibody and TCR sequencing data.",
    "version": "1.5.0",
    "project_urls": {
        "Homepage": "http://docs.airr-community.org"
    },
    "split_keywords": [
        "airr",
        "bioinformatics",
        "sequencing",
        "immunoglobulin",
        "antibody",
        "adaptive immunity",
        "t cell",
        "b cell",
        "bcr",
        "tcr"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f6172af365d0855bce94245c4cf0676893fcce3acf05e2b23608f1dfaa0c8cb",
                "md5": "071b12cfbbc5984ddcc0a1b25d4e7b6a",
                "sha256": "febc0a881bf46b1a9c29ac6a7089dd733ff9120d114585e75dede26403f68d42"
            },
            "downloads": -1,
            "filename": "airr-1.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "071b12cfbbc5984ddcc0a1b25d4e7b6a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 64378,
            "upload_time": "2023-08-31T19:11:16",
            "upload_time_iso_8601": "2023-08-31T19:11:16.961369Z",
            "url": "https://files.pythonhosted.org/packages/3f/61/72af365d0855bce94245c4cf0676893fcce3acf05e2b23608f1dfaa0c8cb/airr-1.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-31 19:11:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "airr"
}
        
Elapsed time: 0.11010s