lineage


Namelineage JSON
Version 4.4.1 PyPI version JSON
download
home_pagehttps://github.com/apriha/lineage
Summarytools for genetic genealogy and the analysis of consumer DNA test results
upload_time2024-07-13 05:20:41
maintainerNone
docs_urlNone
authorAndrew Riha
requires_python>=3.8
licenseMIT
keywords dna genes genetics genealogy snps chromosomes genotype bioinformatics ancestry
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/lineage_banner.png

|ci| |codecov| |docs| |pypi| |python| |downloads| |license| |black|

lineage
=======
``lineage`` provides a framework for analyzing genotype (raw data) files from direct-to-consumer
(DTC) DNA testing companies, primarily for the purposes of genetic genealogy.

Capabilities
------------
- Find shared DNA and genes between individuals
- Compute centiMorgans (cMs) of shared DNA using a variety of genetic maps (e.g., HapMap Phase II, 1000 Genomes Project)
- Plot shared DNA between individuals
- Find discordant SNPs between child and parent(s)
- Read, write, merge, and remap SNPs for an individual via the `snps <https://github.com/apriha/snps>`_ package

Supported Genotype Files
------------------------
``lineage`` supports all genotype files supported by `snps <https://github.com/apriha/snps>`_.

Installation
------------
``lineage`` is `available <https://pypi.org/project/lineage/>`_ on the
`Python Package Index <https://pypi.org>`_. Install ``lineage`` (and its required
Python dependencies) via ``pip``::

    $ pip install lineage

Also see the `installation documentation <https://lineage.readthedocs.io/en/stable/installation.html>`_.

Dependencies
------------
``lineage`` requires `Python <https://www.python.org>`_ 3.8+ and the following Python packages:

- `numpy <https://numpy.org>`_
- `pandas <https://pandas.pydata.org>`_
- `matplotlib <https://matplotlib.org>`_
- `atomicwrites <https://github.com/untitaker/python-atomicwrites>`_
- `snps <https://github.com/apriha/snps>`_

Examples
--------
Initialize the lineage Framework
````````````````````````````````
Import ``Lineage`` and instantiate a ``Lineage`` object:

>>> from lineage import Lineage
>>> l = Lineage()

Download Example Data
`````````````````````
First, let's setup logging to get some helpful output:

>>> import logging, sys
>>> logger = logging.getLogger()
>>> logger.setLevel(logging.INFO)
>>> logger.addHandler(logging.StreamHandler(sys.stdout))

Now we're ready to download some example data from `openSNP <https://opensnp.org>`_:

>>> paths = l.download_example_datasets()
Downloading resources/662.23andme.340.txt.gz
Downloading resources/662.ftdna-illumina.341.csv.gz
Downloading resources/663.23andme.305.txt.gz
Downloading resources/4583.ftdna-illumina.3482.csv.gz
Downloading resources/4584.ftdna-illumina.3483.csv.gz

We'll call these datasets ``User662``, ``User663``, ``User4583``, and ``User4584``.

Load Raw Data
`````````````
Create an ``Individual`` in the context of the ``lineage`` framework to interact with the
``User662`` dataset:

>>> user662 = l.create_individual('User662', ['resources/662.23andme.340.txt.gz', 'resources/662.ftdna-illumina.341.csv.gz'])
Loading SNPs('662.23andme.340.txt.gz')
Merging SNPs('662.ftdna-illumina.341.csv.gz')
SNPs('662.ftdna-illumina.341.csv.gz') has Build 36; remapping to Build 37
Downloading resources/NCBI36_GRCh37.tar.gz
27 SNP positions were discrepant; keeping original positions
151 SNP genotypes were discrepant; marking those as null

Here we created ``user662`` with the name ``User662``. In the process, we merged two raw data
files for this individual. Specifically:

- ``662.23andme.340.txt.gz`` was loaded.
- Then, ``662.ftdna-illumina.341.csv.gz`` was merged. In the process, it was found to have Build
  36. So, it was automatically remapped to Build 37 (downloading the remapping data in the
  process) to match the build of the SNPs already loaded. After this merge, 27 SNP positions and
  151 SNP genotypes were found to be discrepant.

``user662`` is represented by an ``Individual`` object, which inherits from ``snps.SNPs``.
Therefore, all of the `properties and methods <https://snps.readthedocs.io/en/stable/snps.html>`_
available to a ``SNPs`` object are available here; for example:

>>> len(user662.discrepant_merge_genotypes)
151
>>> user662.build
37
>>> user662.build_detected
True
>>> user662.assembly
'GRCh37'
>>> user662.count
1006960

As such, SNPs can be saved, remapped, merged, etc. See the
`snps <https://github.com/apriha/snps>`_ package for further examples.

Compare Individuals
```````````````````
Let's create another ``Individual`` for the ``User663`` dataset:

>>> user663 = l.create_individual('User663', 'resources/663.23andme.305.txt.gz')
Loading SNPs('663.23andme.305.txt.gz')

Now we can perform some analysis between the ``User662`` and ``User663`` datasets.

`Find Discordant SNPs <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_discordant_snps>`_
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
First, let's find discordant SNPs (i.e., SNP data that is not consistent with Mendelian
inheritance):

>>> discordant_snps = l.find_discordant_snps(user662, user663, save_output=True)
Saving output/discordant_snps_User662_User663_GRCh37.csv

All `output files <https://lineage.readthedocs.io/en/stable/output_files.html>`_ are saved to
the output directory (a parameter to ``Lineage``).

This method also returns a ``pandas.DataFrame``, and it can be inspected interactively at
the prompt, although the same output is available in the CSV file.

>>> len(discordant_snps.loc[discordant_snps['chrom'] != 'MT'])
37

Not counting mtDNA SNPs, there are 37 discordant SNPs between these two datasets.

`Find Shared DNA <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
``lineage`` uses the probabilistic recombination rates throughout the human genome from the
`International HapMap Project <https://www.genome.gov/10001688/international-hapmap-project/>`_
and the `1000 Genomes Project <https://www.internationalgenome.org>`_ to compute the shared DNA
(in centiMorgans) between two individuals. Additionally, ``lineage`` denotes when the shared DNA
is shared on either one or both chromosomes in a pair. For example, when siblings share a segment
of DNA on both chromosomes, they inherited the same DNA from their mother and father for that
segment.

With that background, let's find the shared DNA between the ``User662`` and ``User663`` datasets,
calculating the centiMorgans of shared DNA and plotting the results:

>>> results = l.find_shared_dna([user662, user663], cM_threshold=0.75, snp_threshold=1100)
Downloading resources/genetic_map_HapMapII_GRCh37.tar.gz
Downloading resources/cytoBand_hg19.txt.gz
Saving output/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png
Saving output/shared_dna_one_chrom_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.csv

Notice that the centiMorgan and SNP thresholds for each DNA segment can be tuned. Additionally,
notice that two files were downloaded to facilitate the analysis and plotting - future analyses
will use the downloaded files instead of downloading the files again. Finally, notice that a list
of individuals is passed to ``find_shared_dna``... This list can contain an arbitrary number of
individuals, and ``lineage`` will find shared DNA across all individuals in the list (i.e.,
where all individuals share segments of DNA on either one or both chromosomes).

Output is returned as a dictionary with the following keys (``pandas.DataFrame`` and
``pandas.Index`` items):

>>> sorted(results.keys())
['one_chrom_discrepant_snps', 'one_chrom_shared_dna', 'one_chrom_shared_genes', 'two_chrom_discrepant_snps', 'two_chrom_shared_dna', 'two_chrom_shared_genes']

In this example, there are 27 segments of shared DNA:

>>> len(results['one_chrom_shared_dna'])
27

Also, `output files <https://lineage.readthedocs.io/en/stable/output_files.html>`_ are
created; these files are detailed in the documentation and their generation can be disabled with a
``save_output=False`` argument. In this example, the output files consist of a CSV file that
details the shared segments of DNA on one chromosome and a plot that illustrates the shared DNA:

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png

`Find Shared Genes <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The `Central Dogma of Molecular Biology <https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology>`_
states that genetic information flows from DNA to mRNA to proteins: DNA is transcribed into
mRNA, and mRNA is translated into a protein. It's more complicated than this (it's biology
after all), but generally, one mRNA produces one protein, and the mRNA / protein is considered a
gene.

Therefore, it would be interesting to understand not just what DNA is shared between individuals,
but what *genes* are shared between individuals *with the same variations*. In other words,
what genes are producing the *same* proteins? [*]_ Since ``lineage`` can determine the shared DNA
between individuals, it can use that information to determine what genes are also shared on
either one or both chromosomes.

.. [*] In theory, shared segments of DNA should be producing the same proteins, but there are many
 complexities, such as copy number variation (CNV), gene expression, etc.

For this example, let's create two more ``Individuals`` for the ``User4583`` and ``User4584``
datasets:

>>> user4583 = l.create_individual('User4583', 'resources/4583.ftdna-illumina.3482.csv.gz')
Loading SNPs('4583.ftdna-illumina.3482.csv.gz')

>>> user4584 = l.create_individual('User4584', 'resources/4584.ftdna-illumina.3483.csv.gz')
Loading SNPs('4584.ftdna-illumina.3483.csv.gz')

Now let's find the shared genes, specifying a
`population-specific <https://www.internationalgenome.org/faq/which-populations-are-part-your-study/>`_
1000 Genomes Project genetic map (e.g., as predicted by `ezancestry <https://github.com/arvkevi/ezancestry>`_!):

>>> results = l.find_shared_dna([user4583, user4584], shared_genes=True, genetic_map="CEU")
Downloading resources/CEU_omni_recombination_20130507.tar
Downloading resources/knownGene_hg19.txt.gz
Downloading resources/kgXref_hg19.txt.gz
Saving output/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png
Saving output/shared_dna_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_dna_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_genes_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_genes_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv

The plot that illustrates the shared DNA is shown below. Note that in addition to outputting the
shared DNA segments on either one or both chromosomes, the shared genes on either one or both
chromosomes are also output.

.. note:: Shared DNA is not computed on the X chromosome with the 1000 Genomes Project genetic
          maps since the X chromosome is not included in these genetic maps.

In this example, there are 15,976 shared genes on both chromosomes transcribed from 36 segments
of shared DNA:

>>> len(results['two_chrom_shared_genes'])
15976
>>> len(results['two_chrom_shared_dna'])
36

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png

Documentation
-------------
Documentation is available `here <https://lineage.readthedocs.io/>`_.

Acknowledgements
----------------
Thanks to Whit Athey, Ryan Dale, Binh Bui, Jeff Gill, Gopal Vashishtha,
`CS50 <https://cs50.harvard.edu>`_, and `openSNP <https://opensnp.org>`_.

``lineage`` incorporates code and concepts generated with the assistance of
`OpenAI's <https://openai.com>`_ `ChatGPT <https://chatgpt.com>`_ . ✨

.. https://github.com/rtfd/readthedocs.org/blob/master/docs/badges.rst
.. |ci| image:: https://github.com/apriha/lineage/actions/workflows/ci.yml/badge.svg?branch=master
   :target: https://github.com/apriha/lineage/actions/workflows/ci.yml
.. |codecov| image:: https://codecov.io/gh/apriha/lineage/branch/master/graph/badge.svg
   :target: https://codecov.io/gh/apriha/lineage
.. |docs| image:: https://readthedocs.org/projects/lineage/badge/?version=stable
   :target: https://lineage.readthedocs.io/
.. |pypi| image:: https://img.shields.io/pypi/v/lineage.svg
   :target: https://pypi.python.org/pypi/lineage
.. |python| image:: https://img.shields.io/pypi/pyversions/lineage.svg
   :target: https://www.python.org
.. |downloads| image:: https://pepy.tech/badge/lineage
   :target: https://pepy.tech/project/lineage
.. |license| image:: https://img.shields.io/pypi/l/lineage.svg
   :target: https://github.com/apriha/lineage/blob/master/LICENSE.txt
.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/apriha/lineage",
    "name": "lineage",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "dna genes genetics genealogy snps chromosomes genotype bioinformatics ancestry",
    "author": "Andrew Riha",
    "author_email": "apriha@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1a/5c/e1d2a1c68ef0bf67061ec5f7fe745e16e2dce74032c432a7f853a0a3a462/lineage-4.4.1.tar.gz",
    "platform": "any",
    "description": ".. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/lineage_banner.png\n\n|ci| |codecov| |docs| |pypi| |python| |downloads| |license| |black|\n\nlineage\n=======\n``lineage`` provides a framework for analyzing genotype (raw data) files from direct-to-consumer\n(DTC) DNA testing companies, primarily for the purposes of genetic genealogy.\n\nCapabilities\n------------\n- Find shared DNA and genes between individuals\n- Compute centiMorgans (cMs) of shared DNA using a variety of genetic maps (e.g., HapMap Phase II, 1000 Genomes Project)\n- Plot shared DNA between individuals\n- Find discordant SNPs between child and parent(s)\n- Read, write, merge, and remap SNPs for an individual via the `snps <https://github.com/apriha/snps>`_ package\n\nSupported Genotype Files\n------------------------\n``lineage`` supports all genotype files supported by `snps <https://github.com/apriha/snps>`_.\n\nInstallation\n------------\n``lineage`` is `available <https://pypi.org/project/lineage/>`_ on the\n`Python Package Index <https://pypi.org>`_. Install ``lineage`` (and its required\nPython dependencies) via ``pip``::\n\n    $ pip install lineage\n\nAlso see the `installation documentation <https://lineage.readthedocs.io/en/stable/installation.html>`_.\n\nDependencies\n------------\n``lineage`` requires `Python <https://www.python.org>`_ 3.8+ and the following Python packages:\n\n- `numpy <https://numpy.org>`_\n- `pandas <https://pandas.pydata.org>`_\n- `matplotlib <https://matplotlib.org>`_\n- `atomicwrites <https://github.com/untitaker/python-atomicwrites>`_\n- `snps <https://github.com/apriha/snps>`_\n\nExamples\n--------\nInitialize the lineage Framework\n````````````````````````````````\nImport ``Lineage`` and instantiate a ``Lineage`` object:\n\n>>> from lineage import Lineage\n>>> l = Lineage()\n\nDownload Example Data\n`````````````````````\nFirst, let's setup logging to get some helpful output:\n\n>>> import logging, sys\n>>> logger = logging.getLogger()\n>>> logger.setLevel(logging.INFO)\n>>> logger.addHandler(logging.StreamHandler(sys.stdout))\n\nNow we're ready to download some example data from `openSNP <https://opensnp.org>`_:\n\n>>> paths = l.download_example_datasets()\nDownloading resources/662.23andme.340.txt.gz\nDownloading resources/662.ftdna-illumina.341.csv.gz\nDownloading resources/663.23andme.305.txt.gz\nDownloading resources/4583.ftdna-illumina.3482.csv.gz\nDownloading resources/4584.ftdna-illumina.3483.csv.gz\n\nWe'll call these datasets ``User662``, ``User663``, ``User4583``, and ``User4584``.\n\nLoad Raw Data\n`````````````\nCreate an ``Individual`` in the context of the ``lineage`` framework to interact with the\n``User662`` dataset:\n\n>>> user662 = l.create_individual('User662', ['resources/662.23andme.340.txt.gz', 'resources/662.ftdna-illumina.341.csv.gz'])\nLoading SNPs('662.23andme.340.txt.gz')\nMerging SNPs('662.ftdna-illumina.341.csv.gz')\nSNPs('662.ftdna-illumina.341.csv.gz') has Build 36; remapping to Build 37\nDownloading resources/NCBI36_GRCh37.tar.gz\n27 SNP positions were discrepant; keeping original positions\n151 SNP genotypes were discrepant; marking those as null\n\nHere we created ``user662`` with the name ``User662``. In the process, we merged two raw data\nfiles for this individual. Specifically:\n\n- ``662.23andme.340.txt.gz`` was loaded.\n- Then, ``662.ftdna-illumina.341.csv.gz`` was merged. In the process, it was found to have Build\n  36. So, it was automatically remapped to Build 37 (downloading the remapping data in the\n  process) to match the build of the SNPs already loaded. After this merge, 27 SNP positions and\n  151 SNP genotypes were found to be discrepant.\n\n``user662`` is represented by an ``Individual`` object, which inherits from ``snps.SNPs``.\nTherefore, all of the `properties and methods <https://snps.readthedocs.io/en/stable/snps.html>`_\navailable to a ``SNPs`` object are available here; for example:\n\n>>> len(user662.discrepant_merge_genotypes)\n151\n>>> user662.build\n37\n>>> user662.build_detected\nTrue\n>>> user662.assembly\n'GRCh37'\n>>> user662.count\n1006960\n\nAs such, SNPs can be saved, remapped, merged, etc. See the\n`snps <https://github.com/apriha/snps>`_ package for further examples.\n\nCompare Individuals\n```````````````````\nLet's create another ``Individual`` for the ``User663`` dataset:\n\n>>> user663 = l.create_individual('User663', 'resources/663.23andme.305.txt.gz')\nLoading SNPs('663.23andme.305.txt.gz')\n\nNow we can perform some analysis between the ``User662`` and ``User663`` datasets.\n\n`Find Discordant SNPs <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_discordant_snps>`_\n''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''\nFirst, let's find discordant SNPs (i.e., SNP data that is not consistent with Mendelian\ninheritance):\n\n>>> discordant_snps = l.find_discordant_snps(user662, user663, save_output=True)\nSaving output/discordant_snps_User662_User663_GRCh37.csv\n\nAll `output files <https://lineage.readthedocs.io/en/stable/output_files.html>`_ are saved to\nthe output directory (a parameter to ``Lineage``).\n\nThis method also returns a ``pandas.DataFrame``, and it can be inspected interactively at\nthe prompt, although the same output is available in the CSV file.\n\n>>> len(discordant_snps.loc[discordant_snps['chrom'] != 'MT'])\n37\n\nNot counting mtDNA SNPs, there are 37 discordant SNPs between these two datasets.\n\n`Find Shared DNA <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_\n''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''\n``lineage`` uses the probabilistic recombination rates throughout the human genome from the\n`International HapMap Project <https://www.genome.gov/10001688/international-hapmap-project/>`_\nand the `1000 Genomes Project <https://www.internationalgenome.org>`_ to compute the shared DNA\n(in centiMorgans) between two individuals. Additionally, ``lineage`` denotes when the shared DNA\nis shared on either one or both chromosomes in a pair. For example, when siblings share a segment\nof DNA on both chromosomes, they inherited the same DNA from their mother and father for that\nsegment.\n\nWith that background, let's find the shared DNA between the ``User662`` and ``User663`` datasets,\ncalculating the centiMorgans of shared DNA and plotting the results:\n\n>>> results = l.find_shared_dna([user662, user663], cM_threshold=0.75, snp_threshold=1100)\nDownloading resources/genetic_map_HapMapII_GRCh37.tar.gz\nDownloading resources/cytoBand_hg19.txt.gz\nSaving output/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png\nSaving output/shared_dna_one_chrom_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.csv\n\nNotice that the centiMorgan and SNP thresholds for each DNA segment can be tuned. Additionally,\nnotice that two files were downloaded to facilitate the analysis and plotting - future analyses\nwill use the downloaded files instead of downloading the files again. Finally, notice that a list\nof individuals is passed to ``find_shared_dna``... This list can contain an arbitrary number of\nindividuals, and ``lineage`` will find shared DNA across all individuals in the list (i.e.,\nwhere all individuals share segments of DNA on either one or both chromosomes).\n\nOutput is returned as a dictionary with the following keys (``pandas.DataFrame`` and\n``pandas.Index`` items):\n\n>>> sorted(results.keys())\n['one_chrom_discrepant_snps', 'one_chrom_shared_dna', 'one_chrom_shared_genes', 'two_chrom_discrepant_snps', 'two_chrom_shared_dna', 'two_chrom_shared_genes']\n\nIn this example, there are 27 segments of shared DNA:\n\n>>> len(results['one_chrom_shared_dna'])\n27\n\nAlso, `output files <https://lineage.readthedocs.io/en/stable/output_files.html>`_ are\ncreated; these files are detailed in the documentation and their generation can be disabled with a\n``save_output=False`` argument. In this example, the output files consist of a CSV file that\ndetails the shared segments of DNA on one chromosome and a plot that illustrates the shared DNA:\n\n.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png\n\n`Find Shared Genes <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_\n''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''\nThe `Central Dogma of Molecular Biology <https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology>`_\nstates that genetic information flows from DNA to mRNA to proteins: DNA is transcribed into\nmRNA, and mRNA is translated into a protein. It's more complicated than this (it's biology\nafter all), but generally, one mRNA produces one protein, and the mRNA / protein is considered a\ngene.\n\nTherefore, it would be interesting to understand not just what DNA is shared between individuals,\nbut what *genes* are shared between individuals *with the same variations*. In other words,\nwhat genes are producing the *same* proteins? [*]_ Since ``lineage`` can determine the shared DNA\nbetween individuals, it can use that information to determine what genes are also shared on\neither one or both chromosomes.\n\n.. [*] In theory, shared segments of DNA should be producing the same proteins, but there are many\n complexities, such as copy number variation (CNV), gene expression, etc.\n\nFor this example, let's create two more ``Individuals`` for the ``User4583`` and ``User4584``\ndatasets:\n\n>>> user4583 = l.create_individual('User4583', 'resources/4583.ftdna-illumina.3482.csv.gz')\nLoading SNPs('4583.ftdna-illumina.3482.csv.gz')\n\n>>> user4584 = l.create_individual('User4584', 'resources/4584.ftdna-illumina.3483.csv.gz')\nLoading SNPs('4584.ftdna-illumina.3483.csv.gz')\n\nNow let's find the shared genes, specifying a\n`population-specific <https://www.internationalgenome.org/faq/which-populations-are-part-your-study/>`_\n1000 Genomes Project genetic map (e.g., as predicted by `ezancestry <https://github.com/arvkevi/ezancestry>`_!):\n\n>>> results = l.find_shared_dna([user4583, user4584], shared_genes=True, genetic_map=\"CEU\")\nDownloading resources/CEU_omni_recombination_20130507.tar\nDownloading resources/knownGene_hg19.txt.gz\nDownloading resources/kgXref_hg19.txt.gz\nSaving output/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png\nSaving output/shared_dna_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv\nSaving output/shared_dna_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv\nSaving output/shared_genes_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv\nSaving output/shared_genes_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv\n\nThe plot that illustrates the shared DNA is shown below. Note that in addition to outputting the\nshared DNA segments on either one or both chromosomes, the shared genes on either one or both\nchromosomes are also output.\n\n.. note:: Shared DNA is not computed on the X chromosome with the 1000 Genomes Project genetic\n          maps since the X chromosome is not included in these genetic maps.\n\nIn this example, there are 15,976 shared genes on both chromosomes transcribed from 36 segments\nof shared DNA:\n\n>>> len(results['two_chrom_shared_genes'])\n15976\n>>> len(results['two_chrom_shared_dna'])\n36\n\n.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png\n\nDocumentation\n-------------\nDocumentation is available `here <https://lineage.readthedocs.io/>`_.\n\nAcknowledgements\n----------------\nThanks to Whit Athey, Ryan Dale, Binh Bui, Jeff Gill, Gopal Vashishtha,\n`CS50 <https://cs50.harvard.edu>`_, and `openSNP <https://opensnp.org>`_.\n\n``lineage`` incorporates code and concepts generated with the assistance of\n`OpenAI's <https://openai.com>`_ `ChatGPT <https://chatgpt.com>`_ . \u2728\n\n.. https://github.com/rtfd/readthedocs.org/blob/master/docs/badges.rst\n.. |ci| image:: https://github.com/apriha/lineage/actions/workflows/ci.yml/badge.svg?branch=master\n   :target: https://github.com/apriha/lineage/actions/workflows/ci.yml\n.. |codecov| image:: https://codecov.io/gh/apriha/lineage/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/apriha/lineage\n.. |docs| image:: https://readthedocs.org/projects/lineage/badge/?version=stable\n   :target: https://lineage.readthedocs.io/\n.. |pypi| image:: https://img.shields.io/pypi/v/lineage.svg\n   :target: https://pypi.python.org/pypi/lineage\n.. |python| image:: https://img.shields.io/pypi/pyversions/lineage.svg\n   :target: https://www.python.org\n.. |downloads| image:: https://pepy.tech/badge/lineage\n   :target: https://pepy.tech/project/lineage\n.. |license| image:: https://img.shields.io/pypi/l/lineage.svg\n   :target: https://github.com/apriha/lineage/blob/master/LICENSE.txt\n.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :target: https://github.com/psf/black\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "tools for genetic genealogy and the analysis of consumer DNA test results",
    "version": "4.4.1",
    "project_urls": {
        "Changelog": "https://github.com/apriha/lineage/releases",
        "Documentation": "https://lineage.readthedocs.io",
        "Homepage": "https://github.com/apriha/lineage",
        "Issue Tracker": "https://github.com/apriha/lineage/issues"
    },
    "split_keywords": [
        "dna",
        "genes",
        "genetics",
        "genealogy",
        "snps",
        "chromosomes",
        "genotype",
        "bioinformatics",
        "ancestry"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca008faf02f966049aec2e449c8eb245205212aa148acbefa0279a03311749a5",
                "md5": "30cc914e7d6566840fe054881f06df9b",
                "sha256": "98dd8b9791c2262a524dbabb2af18991ec90c3ae03d69b36b89c7eed8d931b58"
            },
            "downloads": -1,
            "filename": "lineage-4.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "30cc914e7d6566840fe054881f06df9b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 27266,
            "upload_time": "2024-07-13T05:20:39",
            "upload_time_iso_8601": "2024-07-13T05:20:39.230859Z",
            "url": "https://files.pythonhosted.org/packages/ca/00/8faf02f966049aec2e449c8eb245205212aa148acbefa0279a03311749a5/lineage-4.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a5ce1d2a1c68ef0bf67061ec5f7fe745e16e2dce74032c432a7f853a0a3a462",
                "md5": "16203a4ce918e83316c01e7d2e18096b",
                "sha256": "48377946165252c9e9026d85b3ccfe68ce0ba8f4a39940bd9ff0f749ed8de904"
            },
            "downloads": -1,
            "filename": "lineage-4.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "16203a4ce918e83316c01e7d2e18096b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 64356,
            "upload_time": "2024-07-13T05:20:41",
            "upload_time_iso_8601": "2024-07-13T05:20:41.219939Z",
            "url": "https://files.pythonhosted.org/packages/1a/5c/e1d2a1c68ef0bf67061ec5f7fe745e16e2dce74032c432a7f853a0a3a462/lineage-4.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-13 05:20:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "apriha",
    "github_project": "lineage",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lineage"
}
        
Elapsed time: 0.99803s