haplopy


Namehaplopy JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/malmgrek/haplopy
SummaryHaplotype reconstruction from unphased diplotypes
upload_time2023-05-03 09:42:33
maintainerStratos Staboulis <stratos@stratokraft.fi>
docs_urlNone
authorStratos Staboulis <stratos@stratokraft.fi>
requires_python
licenseMIT
keywords statistics modeling population genetics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HaploPy – Haplotype estimation and phasing in Python

This package contains tools for estimating haplotype (or allele list) frequencies in a population using measurements of unphased genotype data, that is, phenotypes.

## Introduction

In layman terms, a phenotype is defined as an observation of two-allele sets
over multiple gene loci:

``` text
    Aa ––––––– Bb ––––––– Cc
    |          |          |
  locus 1    locus 2    locus 3
```

Note that the above datum doesn't reveal what are the exact haplotype (allele
sequence) pair behind the phenotype. Possible parent haplotype pairs that could
result into the above phenotype are given by

``` text
(ABC, abc), (aBC, Abc), (AbC, aBc), (abC, ABc), ...
```

In other words, the mapping that maps a haplotype pair to a phenotype is defined
by the example

``` text
(Abc, aBC) => (Aa, Bb, Cc)
```

and so on. Note that each item in the phenotype is a set of two alleles where the
order doesn't matter. 

**Problem:** Suppose that we have a set of phenotype observations from a large
population of N individuals. For each individual phenotype we would like to
estimate what are the most probable haplotype pair that resulted into the
phenotype. The main ingredient of solution is the estimation of individual
haplotype frequencies in the population 


## Installation

The package is found in PyPi.

``` shell
pip install haplopy
```

Alternatively, install development version manually using Conda

``` bash
git clone https://github.com/malmgrek/haplopy.git
pip install -r requirements
pip install -e .
```

To check if the development version installed correctly, run tests with

``` shell
pytest -v 
```

## Examples

### Estimate haplotype frequencies

Simulate dataset using a prescribed haplotype probabilities and 
a multinomial distribution model.

``` python
import haplopy as hp


proba_haplotypes = {
    ("A", "B", "C"): 0.34,
    ("a", "B", "c"): 0.20,
    ("a", "B", "C"): 0.13,
    ("a", "b", "c"): 0.23,
    ("A", "b", "C"): 0.10
}

phenotypes = hp.multinomial.Model(proba_haplotypes).random(100)

fig = hp.plot.plot_haplotypes(proba_haplotypes)
fig = hp.plot.plot_phenotypes(phenotypes)
```

![Original relative haplotype frequencies](./doc/images/hinton-original.png "Original")

![Simulated phenotype observation set](./doc/images/bar.png "Phenotypes")

Pretend that we don't know the underlying haplotype distribution and let's try to estimate it.

``` python
model = hp.multinomial.Model().fit(phenotypes)
fig = hp.plot.plot_haplotypes(
    model.proba_haplotypes,
    thres=1.0e-6  # Hide probabilities smaller than this
)
```

![Estimated relative haplotype frequencies](./doc/images/hinton-estimated.png "Estimated")

### Phenotype phasing

Use an existing model to calculate the probabilities (conditional to given the
phenotype) of different diplotype representations of a given phenotype.

``` python
import haplopy as hp


model = hp.multinomial.Model({
    ("A", "B"): 0.4,
    ("A", "b"): 0.3,
    ("a", "B"): 0.2,
    ("a", "b"): 0.1
})

# A complete phenotype observation
model.calculate_proba_diplotypes(("Aa", "Bb"))
# {(('A', 'B'), ('a', 'b')): 0.4, (('A', 'b'), ('a', 'B')): 0.6}

# A phenotype with some missing SNPs
model.calculate_proba_diplotypes(("A.", ".."))
# {(('A', 'B'), ('A', 'B')): 0.17582417582417584,
#  (('A', 'B'), ('A', 'b')): 0.2637362637362637,
#  (('A', 'B'), ('a', 'B')): 0.17582417582417584,
#  (('A', 'B'), ('a', 'b')): 0.08791208791208792,
#  (('A', 'b'), ('A', 'b')): 0.09890109890109888,
#  (('A', 'b'), ('a', 'B')): 0.13186813186813184,
#  (('A', 'b'), ('a', 'b')): 0.06593406593406592}

```

In particular, phenotype phasing also enables computing the probabilities of 
different admissible phenotypes as well as imputation of missing data:

``` python
model.calculate_proba_phenotypes(("A.", ".."))
# {('AA', 'BB'): 0.17582417582417584,
#  ('AA', 'Bb'): 0.2637362637362637,
#  ('Aa', 'BB'): 0.17582417582417584,
#  ('Aa', 'Bb'): 0.21978021978021978,
#  ('AA', 'bb'): 0.09890109890109888,
#  ('Aa', 'bb'): 0.06593406593406592}

# Imputes with the most probable one
model.impute(("A.", ".."))
# ("AA", "Bb")
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/malmgrek/haplopy",
    "name": "haplopy",
    "maintainer": "Stratos Staboulis <stratos@stratokraft.fi>",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "statistics,modeling,population genetics",
    "author": "Stratos Staboulis <stratos@stratokraft.fi>",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/80/2b/271dff0a2e53d959e8a92b2ad7be411c2b3fbb51d814c1994b2f5dcea61c/haplopy-0.1.3.tar.gz",
    "platform": null,
    "description": "# HaploPy \u2013 Haplotype estimation and phasing in Python\n\nThis package contains tools for estimating haplotype (or allele list) frequencies in a population using measurements of unphased genotype data, that is, phenotypes.\n\n## Introduction\n\nIn layman terms, a phenotype is defined as an observation of two-allele sets\nover multiple gene loci:\n\n``` text\n    Aa \u2013\u2013\u2013\u2013\u2013\u2013\u2013 Bb \u2013\u2013\u2013\u2013\u2013\u2013\u2013 Cc\n    |          |          |\n  locus 1    locus 2    locus 3\n```\n\nNote that the above datum doesn't reveal what are the exact haplotype (allele\nsequence) pair behind the phenotype. Possible parent haplotype pairs that could\nresult into the above phenotype are given by\n\n``` text\n(ABC, abc), (aBC, Abc), (AbC, aBc), (abC, ABc), ...\n```\n\nIn other words, the mapping that maps a haplotype pair to a phenotype is defined\nby the example\n\n``` text\n(Abc, aBC) => (Aa, Bb, Cc)\n```\n\nand so on. Note that each item in the phenotype is a set of two alleles where the\norder doesn't matter. \n\n**Problem:** Suppose that we have a set of phenotype observations from a large\npopulation of N individuals. For each individual phenotype we would like to\nestimate what are the most probable haplotype pair that resulted into the\nphenotype. The main ingredient of solution is the estimation of individual\nhaplotype frequencies in the population \n\n\n## Installation\n\nThe package is found in PyPi.\n\n``` shell\npip install haplopy\n```\n\nAlternatively, install development version manually using Conda\n\n``` bash\ngit clone https://github.com/malmgrek/haplopy.git\npip install -r requirements\npip install -e .\n```\n\nTo check if the development version installed correctly, run tests with\n\n``` shell\npytest -v \n```\n\n## Examples\n\n### Estimate haplotype frequencies\n\nSimulate dataset using a prescribed haplotype probabilities and \na multinomial distribution model.\n\n``` python\nimport haplopy as hp\n\n\nproba_haplotypes = {\n    (\"A\", \"B\", \"C\"): 0.34,\n    (\"a\", \"B\", \"c\"): 0.20,\n    (\"a\", \"B\", \"C\"): 0.13,\n    (\"a\", \"b\", \"c\"): 0.23,\n    (\"A\", \"b\", \"C\"): 0.10\n}\n\nphenotypes = hp.multinomial.Model(proba_haplotypes).random(100)\n\nfig = hp.plot.plot_haplotypes(proba_haplotypes)\nfig = hp.plot.plot_phenotypes(phenotypes)\n```\n\n![Original relative haplotype frequencies](./doc/images/hinton-original.png \"Original\")\n\n![Simulated phenotype observation set](./doc/images/bar.png \"Phenotypes\")\n\nPretend that we don't know the underlying haplotype distribution and let's try to estimate it.\n\n``` python\nmodel = hp.multinomial.Model().fit(phenotypes)\nfig = hp.plot.plot_haplotypes(\n    model.proba_haplotypes,\n    thres=1.0e-6  # Hide probabilities smaller than this\n)\n```\n\n![Estimated relative haplotype frequencies](./doc/images/hinton-estimated.png \"Estimated\")\n\n### Phenotype phasing\n\nUse an existing model to calculate the probabilities (conditional to given the\nphenotype) of different diplotype representations of a given phenotype.\n\n``` python\nimport haplopy as hp\n\n\nmodel = hp.multinomial.Model({\n    (\"A\", \"B\"): 0.4,\n    (\"A\", \"b\"): 0.3,\n    (\"a\", \"B\"): 0.2,\n    (\"a\", \"b\"): 0.1\n})\n\n# A complete phenotype observation\nmodel.calculate_proba_diplotypes((\"Aa\", \"Bb\"))\n# {(('A', 'B'), ('a', 'b')): 0.4, (('A', 'b'), ('a', 'B')): 0.6}\n\n# A phenotype with some missing SNPs\nmodel.calculate_proba_diplotypes((\"A.\", \"..\"))\n# {(('A', 'B'), ('A', 'B')): 0.17582417582417584,\n#  (('A', 'B'), ('A', 'b')): 0.2637362637362637,\n#  (('A', 'B'), ('a', 'B')): 0.17582417582417584,\n#  (('A', 'B'), ('a', 'b')): 0.08791208791208792,\n#  (('A', 'b'), ('A', 'b')): 0.09890109890109888,\n#  (('A', 'b'), ('a', 'B')): 0.13186813186813184,\n#  (('A', 'b'), ('a', 'b')): 0.06593406593406592}\n\n```\n\nIn particular, phenotype phasing also enables computing the probabilities of \ndifferent admissible phenotypes as well as imputation of missing data:\n\n``` python\nmodel.calculate_proba_phenotypes((\"A.\", \"..\"))\n# {('AA', 'BB'): 0.17582417582417584,\n#  ('AA', 'Bb'): 0.2637362637362637,\n#  ('Aa', 'BB'): 0.17582417582417584,\n#  ('Aa', 'Bb'): 0.21978021978021978,\n#  ('AA', 'bb'): 0.09890109890109888,\n#  ('Aa', 'bb'): 0.06593406593406592}\n\n# Imputes with the most probable one\nmodel.impute((\"A.\", \"..\"))\n# (\"AA\", \"Bb\")\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Haplotype reconstruction from unphased diplotypes",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/malmgrek/haplopy"
    },
    "split_keywords": [
        "statistics",
        "modeling",
        "population genetics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a87ec76ce105e93b04e61be0623f569fcc2b3a8b19b627bb162fd0939c3ef8f",
                "md5": "dd10dd900e73a91248c1ebb2b61477f4",
                "sha256": "5744b52b84d5ffb8b9350964f0bff1e8a941ff165b67f42074005d18ed9c5657"
            },
            "downloads": -1,
            "filename": "haplopy-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd10dd900e73a91248c1ebb2b61477f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10505,
            "upload_time": "2023-05-03T09:42:31",
            "upload_time_iso_8601": "2023-05-03T09:42:31.066300Z",
            "url": "https://files.pythonhosted.org/packages/8a/87/ec76ce105e93b04e61be0623f569fcc2b3a8b19b627bb162fd0939c3ef8f/haplopy-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "802b271dff0a2e53d959e8a92b2ad7be411c2b3fbb51d814c1994b2f5dcea61c",
                "md5": "6daeceb285d19c2dd2bb1bd52b83922f",
                "sha256": "a5260770185ee8c1e01cfc24195ae67aaf494e87a03825f05742bc715898137a"
            },
            "downloads": -1,
            "filename": "haplopy-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "6daeceb285d19c2dd2bb1bd52b83922f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11228,
            "upload_time": "2023-05-03T09:42:33",
            "upload_time_iso_8601": "2023-05-03T09:42:33.134338Z",
            "url": "https://files.pythonhosted.org/packages/80/2b/271dff0a2e53d959e8a92b2ad7be411c2b3fbb51d814c1994b2f5dcea61c/haplopy-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-03 09:42:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "malmgrek",
    "github_project": "haplopy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "haplopy"
}
        
Elapsed time: 0.09365s