HaploDynamics


NameHaploDynamics JSON
Version 0.4b1 PyPI version JSON
download
home_pagehttps://github.com/remytuyeras/HaploDynamics
SummaryA python library to develop genomic data simulators
upload_time2023-08-25 22:13:05
maintainer
docs_urlNone
authorRemy Tuyeras
requires_python
licensegpl-3.0
keywords simulator genomics genomic microarray snp chip vcf linkage disequilibrium hardy-weinberg equilibrium
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Presentation of HaploDynamics 
**HaploDynamics** (**HaploDX**) is a Python 3+ library that provides a collection of functions for simulating population-specific genomic data. It is part of the Genetic Simulator Resources (GSR) catalog. You can access the GSR catalog by clicking on the image below.

<div style="width: 180px; margin: auto;"><a href="https://surveillance.cancer.gov/genetic-simulation-resources/"><img src="https://surveillance.cancer.gov/gsr/static/img/gsr_tile.jpg" alt="Catalogued on GSR" width="180" height="60" /></a></div>

## Highlights and updates

1. The [documentation](#documentation) has been enhanced with tutorials and performance analyses;

2. Release version ```0.4b*```:
    * **Compose your own mutation model:** the class ```Model``` now lets you create your own mutation model and use it with the generative functions of the HaploDX framework.
        ```python
        import HaploDynamics.Framework as fmx
        #Start your simulation
        model = fmx.Model("tutorial")
        #Initialize the genomic landscape
        model.initiate_landscape(reference = 1.245)
        #Design your own genomic landscape with any allele frequency model
        model.extend_landscape(*(fmx.Model.standard_schema(20) for _ in range(6)))
        #Population and LD parameters
        strength = 1
        population = 0.1
        Npop = 1000
        chrom = "1"
        #Generate the simulation in a VCF file
        model.generate_vcf(strength,population,Npop,chrom)
        ```

    * [HaploDynamics.Framework.Model.initiate_landscape](docs/source/framework-doc.md#haplodynamicsframeworkmodelinitiate_landscape) added;
    * [HaploDynamics.Framework.Model.extend_landscape](docs/source/framework-doc.md#haplodynamicsframeworkmodelextend_landscape) added;
    * [HaploDynamics.Framework.Model.standard_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelstandard_schema) added;
    * [HaploDynamics.Framework.Model.genotype_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelgenotype_schema) added;
    * [HaploDynamics.Framework.Model.linkage_disequilibrium](docs/source/framework-doc.md#haplodynamicsframeworkmodellinkage_disequilibrium) added;
    * [HaploDynamics.Framework.Model.cond_genotype_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelcond_genotype_schema) added;
    * [Documentation for the Framework module](docs/source/framework-doc.md) polished;
    * Various typos and clumsy phrasing have been corrected in the [documentation](#documentation);
    * Loading bar appearance changed:
        ```shell
        $ python myscript.py
        Model.generate_vcf: |████████████████████| 100%
        time (sec.): 0.7510931491851807
        max. mem (MB): 0.11163139343261719
        cur. mem (MB): 0.0834970474243164
        ```
  


## Installation

### Installation via ```pip```
Install the HaploDynamics package by using the following command.
```shell
$ pip install HaploDynamics
```
After this, you can import the modules of the library to your script as follows.
```python
import HaploDynamics.HaploDX as hdx
import HaploDynamics.Framework as fmx
```
To upgrade the package to its latest version, use the following command.
```shell
$ pip install --upgrade HaploDynamics==0.4b1
```
### Manual installation
HaploDynamics uses the [SciPy](https://docs.scipy.org/doc/scipy/reference/stats.html) library for certain calculations. To install SciPy, run the following command, or see SciPy's [installation instructions](https://scipy.org/install/) for more options.
```shell
$ python -m pip install scipy
```
You can install the HaploDynamics GitHub package by using the following command in a terminal.
```shell
$ git clone https://github.com/remytuyeras/HaploDynamics.git
```
Then, use the ```pwd``` command to get the absolute path leading to the downloaded package.
```shell
$ ls
HaploDynamics
$ cd HaploDynamics/
$ pwd
absolute/path/to/HaploDynamics
```
To import the modules of the library to your script, you can use the following syntax where the path ```absolute/path/to/HaploDynamics``` should be replaced with the path obtained earlier.
```python
import sys
sys.path.insert(1,"absolute/path/to/HaploDynamics")
import HaploDynamics.HaploDX as hdx
import HaploDynamics.Framework as fmx
```
## Quickstart

The following script generates a VCF file containing simulated diploid genotypes for a population of 1000 individuals with LD-blocks of length 20kb, 5kb, 20kb, 35kb, 30kb and 15kb. 
```python
import HaploDynamics.HaploDX as hdx

simulated_data = hdx.genmatrix([20,5,20,35,30,15],strength=1,population=0.1,Npop=1000)
hdx.create_vcfgz("genomic-data.simulation.v1",*simulated_data)
```
The equation ```strength=1``` forces a high amount of linkage disequilibrium and the equation ```population=0.1``` increases the likelyhood of the simulated population to have rare mutations (e.g. to simulate a population profile close to African and South-Asian populations). 

More generally, the function ```genmatrix()``` takes the following types of parameters:
Parameters | Type | Values
| :--- | :--- | :---
```blocks```  | ```list[int]``` | List of positive integers, ideally between 1 and 40.
```strength```  | ```float``` | From -1 (little linkage) to 1 (high linkage)
```population```  | ```float``` | From 0 (for more rare mutations) to 1 (for less rare mutations)
```Npop```  | ```int```  | Positive integer specifying the number of individuals in the genomic matrix

The generation of each locus in a VCF file tends to be linear in the parameter ```Npop```. On average, a genetic variant can take from 0.3 to 1 seconds to be generated when ```Npop=100000``` (this may vary depending on your machine). The estimated time complexity for an average machine is shown below.

![](img/time_complexity.png) 

## Use cases
The following script shows how to display linkage disequilibirum correlations for the simulated data.
```python
import matplotlib.pyplot as plt
import HaploDynamics.HaploDX as hdx

simulated_data = hdx.genmatrix([20,20,20,20,20,20],strength=1,population=0.1,Npop=1000)
hdx.create_vcfgz("genomic-data.simulation.v1",*simulated_data)

rel, m, _ = hdx.LD_corr_matrix(simulated_data[0])
plt.imshow(hdx.display(rel,m))
plt.show()
```
A typical output for the previous script should look as follows.

![](img/simulation_LD_0.png) 

The following script shows how you can control linkage disequilibrium by using LD-blocks of varying legnths. You can display the graph relating distances between pairs of variants to average correlation scores by using the last output of the function ```LD_corr_matrix()```.

```python
import matplotlib.pyplot as plt
import HaploDynamics.HaploDX as hdx

ld_blocks = [5,5,5,10,20,5,5,5,5,5,5,1,1,1,2,2,10,20,40]
strength=1
population=0.1
Npop = 1000
simulated_data = hdx.genmatrix(ld_blocks,strength,population,Npop)
hdx.create_vcfgz("genomic-data.simulation.v1",*simulated_data)

#Correlations
rel, m, dist = hdx.LD_corr_matrix(simulated_data[0])
plt.imshow(hdx.display(rel,m))
plt.show()

#from genetic distances to average correlaions
plt.plot([i for i in range(len(dist)-1)],dist[1:])
plt.ylim([0, 1])
plt.show()
```
Typical outputs for the previous script should look as follows.

Correlations            |  genetic distances to average correlations
:-------------------------:|:-------------------------:
![](img/simulation_LD_1.png)  |  ![](img/simulation_dist_1.png)

Finally, the following script shows how you can generate large regions of linkage.

```python
import matplotlib.pyplot as plt
import HaploDynamics.HaploDX as hdx

ld_blocks = [1] * 250
strength=1
population=0.1
Npop = 1000
simulated_data = hdx.genmatrix(ld_blocks,strength,population,Npop)
hdx.create_vcfgz("genomic-data.simulation.v1",*simulated_data)

#Correlations
rel, m, dist = hdx.LD_corr_matrix(simulated_data[0])
plt.imshow(hdx.display(rel,m))
plt.show()

#from genetic distances to average correlaions
plt.plot([i for i in range(len(dist)-1)],dist[1:])
plt.ylim([0, 1])
plt.show()
```
Typical outputs for the previous script should look as follows.

Correlations            |  genetic distances to average correlations
:-------------------------:|:-------------------------:
![](img/simulation_LD_2.png)  |  ![](img/simulation_dist_2.png)

## To cite this work

Tuyeras, R. (2023). _HaploDynamics: A python library to develop genomic data simulators_ (Version 0.4-beta.1) [Computer software]. [![DOI](https://zenodo.org/badge/609227235.svg)](https://zenodo.org/badge/latestdoi/609227235)

<br/>

# Documentation

* [Documentation for the HaploDX module](docs/source/haplodx-doc.md) 
* [Documentation for the Framework module](docs/source/framework-doc.md)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/remytuyeras/HaploDynamics",
    "name": "HaploDynamics",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Simulator,Genomics,Genomic,Microarray,SNP chip,VCF,Linkage disequilibrium,Hardy-Weinberg equilibrium",
    "author": "Remy Tuyeras",
    "author_email": "rtuyeras@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/55/cb/0397dd7ae530ac20191673ae625184c6d527db67eb6d09eca118ceda3b73/HaploDynamics-0.4b1.tar.gz",
    "platform": null,
    "description": "# Presentation of HaploDynamics \n**HaploDynamics** (**HaploDX**) is a Python 3+ library that provides a collection of functions for simulating population-specific genomic data. It is part of the Genetic Simulator Resources (GSR) catalog. You can access the GSR catalog by clicking on the image below.\n\n<div style=\"width: 180px; margin: auto;\"><a href=\"https://surveillance.cancer.gov/genetic-simulation-resources/\"><img src=\"https://surveillance.cancer.gov/gsr/static/img/gsr_tile.jpg\" alt=\"Catalogued on GSR\" width=\"180\" height=\"60\" /></a></div>\n\n## Highlights and updates\n\n1. The [documentation](#documentation) has been enhanced with tutorials and performance analyses;\n\n2. Release version ```0.4b*```:\n    * **Compose your own mutation model:** the class ```Model``` now lets you create your own mutation model and use it with the generative functions of the HaploDX framework.\n        ```python\n        import HaploDynamics.Framework as fmx\n        #Start your simulation\n        model = fmx.Model(\"tutorial\")\n        #Initialize the genomic landscape\n        model.initiate_landscape(reference = 1.245)\n        #Design your own genomic landscape with any allele frequency model\n        model.extend_landscape(*(fmx.Model.standard_schema(20) for _ in range(6)))\n        #Population and LD parameters\n        strength = 1\n        population = 0.1\n        Npop = 1000\n        chrom = \"1\"\n        #Generate the simulation in a VCF file\n        model.generate_vcf(strength,population,Npop,chrom)\n        ```\n\n    * [HaploDynamics.Framework.Model.initiate_landscape](docs/source/framework-doc.md#haplodynamicsframeworkmodelinitiate_landscape) added;\n    * [HaploDynamics.Framework.Model.extend_landscape](docs/source/framework-doc.md#haplodynamicsframeworkmodelextend_landscape) added;\n    * [HaploDynamics.Framework.Model.standard_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelstandard_schema) added;\n    * [HaploDynamics.Framework.Model.genotype_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelgenotype_schema) added;\n    * [HaploDynamics.Framework.Model.linkage_disequilibrium](docs/source/framework-doc.md#haplodynamicsframeworkmodellinkage_disequilibrium) added;\n    * [HaploDynamics.Framework.Model.cond_genotype_schema](docs/source/framework-doc.md#haplodynamicsframeworkmodelcond_genotype_schema) added;\n    * [Documentation for the Framework module](docs/source/framework-doc.md) polished;\n    * Various typos and clumsy phrasing have been corrected in the [documentation](#documentation);\n    * Loading bar appearance changed:\n        ```shell\n        $ python myscript.py\n        Model.generate_vcf: |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100%\n        time (sec.): 0.7510931491851807\n        max. mem (MB): 0.11163139343261719\n        cur. mem (MB): 0.0834970474243164\n        ```\n  \n\n\n## Installation\n\n### Installation via ```pip```\nInstall the HaploDynamics package by using the following command.\n```shell\n$ pip install HaploDynamics\n```\nAfter this, you can import the modules of the library to your script as follows.\n```python\nimport HaploDynamics.HaploDX as hdx\nimport HaploDynamics.Framework as fmx\n```\nTo upgrade the package to its latest version, use the following command.\n```shell\n$ pip install --upgrade HaploDynamics==0.4b1\n```\n### Manual installation\nHaploDynamics uses the [SciPy](https://docs.scipy.org/doc/scipy/reference/stats.html) library for certain calculations. To install SciPy, run the following command, or see SciPy's [installation instructions](https://scipy.org/install/) for more options.\n```shell\n$ python -m pip install scipy\n```\nYou can install the HaploDynamics GitHub package by using the following command in a terminal.\n```shell\n$ git clone https://github.com/remytuyeras/HaploDynamics.git\n```\nThen, use the ```pwd``` command to get the absolute path leading to the downloaded package.\n```shell\n$ ls\nHaploDynamics\n$ cd HaploDynamics/\n$ pwd\nabsolute/path/to/HaploDynamics\n```\nTo import the modules of the library to your script, you can use the following syntax where the path ```absolute/path/to/HaploDynamics``` should be replaced with the path obtained earlier.\n```python\nimport sys\nsys.path.insert(1,\"absolute/path/to/HaploDynamics\")\nimport HaploDynamics.HaploDX as hdx\nimport HaploDynamics.Framework as fmx\n```\n## Quickstart\n\nThe following script generates a VCF file containing simulated diploid genotypes for a population of 1000 individuals with LD-blocks of length 20kb, 5kb, 20kb, 35kb, 30kb and 15kb. \n```python\nimport HaploDynamics.HaploDX as hdx\n\nsimulated_data = hdx.genmatrix([20,5,20,35,30,15],strength=1,population=0.1,Npop=1000)\nhdx.create_vcfgz(\"genomic-data.simulation.v1\",*simulated_data)\n```\nThe equation ```strength=1``` forces a high amount of linkage disequilibrium and the equation ```population=0.1``` increases the likelyhood of the simulated population to have rare mutations (e.g. to simulate a population profile close to African and South-Asian populations). \n\nMore generally, the function ```genmatrix()``` takes the following types of parameters:\nParameters | Type | Values\n| :--- | :--- | :---\n```blocks```  | ```list[int]``` | List of positive integers, ideally between 1 and 40.\n```strength```  | ```float``` | From -1 (little linkage) to 1 (high linkage)\n```population```  | ```float``` | From 0 (for more rare mutations) to 1 (for less rare mutations)\n```Npop```  | ```int```  | Positive integer specifying the number of individuals in the genomic matrix\n\nThe generation of each locus in a VCF file tends to be linear in the parameter ```Npop```. On average, a genetic variant can take from 0.3 to 1 seconds to be generated when ```Npop=100000``` (this may vary depending on your machine). The estimated time complexity for an average machine is shown below.\n\n![](img/time_complexity.png) \n\n## Use cases\nThe following script shows how to display linkage disequilibirum correlations for the simulated data.\n```python\nimport matplotlib.pyplot as plt\nimport HaploDynamics.HaploDX as hdx\n\nsimulated_data = hdx.genmatrix([20,20,20,20,20,20],strength=1,population=0.1,Npop=1000)\nhdx.create_vcfgz(\"genomic-data.simulation.v1\",*simulated_data)\n\nrel, m, _ = hdx.LD_corr_matrix(simulated_data[0])\nplt.imshow(hdx.display(rel,m))\nplt.show()\n```\nA typical output for the previous script should look as follows.\n\n![](img/simulation_LD_0.png) \n\nThe following script shows how you can control linkage disequilibrium by using LD-blocks of varying legnths. You can display the graph relating distances between pairs of variants to average correlation scores by using the last output of the function ```LD_corr_matrix()```.\n\n```python\nimport matplotlib.pyplot as plt\nimport HaploDynamics.HaploDX as hdx\n\nld_blocks = [5,5,5,10,20,5,5,5,5,5,5,1,1,1,2,2,10,20,40]\nstrength=1\npopulation=0.1\nNpop = 1000\nsimulated_data = hdx.genmatrix(ld_blocks,strength,population,Npop)\nhdx.create_vcfgz(\"genomic-data.simulation.v1\",*simulated_data)\n\n#Correlations\nrel, m, dist = hdx.LD_corr_matrix(simulated_data[0])\nplt.imshow(hdx.display(rel,m))\nplt.show()\n\n#from genetic distances to average correlaions\nplt.plot([i for i in range(len(dist)-1)],dist[1:])\nplt.ylim([0, 1])\nplt.show()\n```\nTypical outputs for the previous script should look as follows.\n\nCorrelations            |  genetic distances to average correlations\n:-------------------------:|:-------------------------:\n![](img/simulation_LD_1.png)  |  ![](img/simulation_dist_1.png)\n\nFinally, the following script shows how you can generate large regions of linkage.\n\n```python\nimport matplotlib.pyplot as plt\nimport HaploDynamics.HaploDX as hdx\n\nld_blocks = [1] * 250\nstrength=1\npopulation=0.1\nNpop = 1000\nsimulated_data = hdx.genmatrix(ld_blocks,strength,population,Npop)\nhdx.create_vcfgz(\"genomic-data.simulation.v1\",*simulated_data)\n\n#Correlations\nrel, m, dist = hdx.LD_corr_matrix(simulated_data[0])\nplt.imshow(hdx.display(rel,m))\nplt.show()\n\n#from genetic distances to average correlaions\nplt.plot([i for i in range(len(dist)-1)],dist[1:])\nplt.ylim([0, 1])\nplt.show()\n```\nTypical outputs for the previous script should look as follows.\n\nCorrelations            |  genetic distances to average correlations\n:-------------------------:|:-------------------------:\n![](img/simulation_LD_2.png)  |  ![](img/simulation_dist_2.png)\n\n## To cite this work\n\nTuyeras, R. (2023). _HaploDynamics: A python library to develop genomic data simulators_ (Version 0.4-beta.1) [Computer software]. [![DOI](https://zenodo.org/badge/609227235.svg)](https://zenodo.org/badge/latestdoi/609227235)\n\n<br/>\n\n# Documentation\n\n* [Documentation for the HaploDX module](docs/source/haplodx-doc.md) \n* [Documentation for the Framework module](docs/source/framework-doc.md)\n",
    "bugtrack_url": null,
    "license": "gpl-3.0",
    "summary": "A python library to develop genomic data simulators",
    "version": "0.4b1",
    "project_urls": {
        "Download": "https://github.com/remytuyeras/HaploDynamics/archive/refs/tags/v0.4-beta.1.tar.gz",
        "Homepage": "https://github.com/remytuyeras/HaploDynamics"
    },
    "split_keywords": [
        "simulator",
        "genomics",
        "genomic",
        "microarray",
        "snp chip",
        "vcf",
        "linkage disequilibrium",
        "hardy-weinberg equilibrium"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55cb0397dd7ae530ac20191673ae625184c6d527db67eb6d09eca118ceda3b73",
                "md5": "962864858e9ac01c90a2f919a83ef227",
                "sha256": "0ba890c3da87876f717d5fe01455a2ca13c731b5c7575687b82d199fd228673d"
            },
            "downloads": -1,
            "filename": "HaploDynamics-0.4b1.tar.gz",
            "has_sig": false,
            "md5_digest": "962864858e9ac01c90a2f919a83ef227",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26548,
            "upload_time": "2023-08-25T22:13:05",
            "upload_time_iso_8601": "2023-08-25T22:13:05.380603Z",
            "url": "https://files.pythonhosted.org/packages/55/cb/0397dd7ae530ac20191673ae625184c6d527db67eb6d09eca118ceda3b73/HaploDynamics-0.4b1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-25 22:13:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "remytuyeras",
    "github_project": "HaploDynamics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "haplodynamics"
}
        
Elapsed time: 0.67071s