Name | SCSilicon2 JSON |
Version |
1.0.1
JSON |
| download |
home_page | https://github.com/xikanfeng2/SCSilicon2 |
Summary | SCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation. |
upload_time | 2023-12-31 08:05:18 |
maintainer | |
docs_url | None |
author | Xikang Feng |
requires_python | >=3.6 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SCSilicon2
SCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.
## 1. Pre-requirements
* python3.6 or higher
* pandas>=0.23.4
* matplotlib>=3.0.2
* networkx>=3.2.1
* [wgsim](https://github.com/lh3/wgsim)
All python packages will be automatically installed when you install SCSilicon2 if these packages are not included in your python library.
To install wgsim, please refer to the README of [wgsim](https://github.com/lh3/wgsim). Please make sure the command 'wgsim' works in your command line.
## 2. Installation
## Creation of python virtual env
We recommend creating a virtual environment to run the scsilicon2(This step is optional!). You can use the following command to create a virtual python env:
```Bash
# create a python virtual env in scsilicon2 folder
python -m venv scsilicon2
# activate the virtual env
source scsilicon2/bin/activate
# deactivate the virtual env
deactivate
```
### Installation with pip
To install with pip, run the following from a terminal:
```Bash
pip install scsilicon2
```
### Installation from Github
To clone the repository and install manually, run the following from a terminal:
```Bash
git clone https://github.com/xikanfeng2/SCSilicon2.git
cd SCSilicon2
python setup.py install
```
## 3. Quick start
The following code runs SCSilicon.
```Python
import scsilicon2 as scs
# create SCSilicon2 object: ref_genome and snp_file are required, and outdir, clone_no, and cell_no are optional.
simulator = scs.SCSilicon2(ref_genome='your reference fasta file here', snp_file='your snp list file here', outdir='your output directory here', clone_no=4, cell_no=10)
# simulate dataset
simulator.sim_dataset()
```
## 4. Input file required
1. **A reference genome file with fasta format.**
Please refer to the example fasta file `example/input/chr22.fa`.
2. **A list of SNPs.**
The SNPs in this list can be introduced in arbitrary positions of the genome. Please refer to the example snp list file `example/input/dbsnp.tsv`.
## 5. Output files of SCSilicon2
The output directory contains three subfolders: fastq folder, fasta folder and profile folder. The structure of one example output directory is listed as follows (the clone no is 3 and the cell no is 10 in this example):
```
output
|-fastq
| |-normal_r2.fq
| |-clone2
| | |-cell0_r1.fq
| | |-cell0_r2.fq
| |-normal_r1.fq
| |-clone2_r2.fq
| |-clone1_r1.fq
| |-clone0_r2.fq
| |-clone0
| | |-cell2_r1.fq
| | |-cell3_r2.fq
| | |-cell2_r2.fq
| | |-cell1_r2.fq
| | |-cell1_r1.fq
| | |-cell3_r1.fq
| | |-cell0_r1.fq
| | |-cell0_r2.fq
| |-normal
| | |-cell2_r1.fq
| | |-cell2_r2.fq
| | |-cell1_r2.fq
| | |-cell1_r1.fq
| | |-cell0_r1.fq
| | |-cell0_r2.fq
| |-clone2_r1.fq
| |-clone1
| | |-cell1_r2.fq
| | |-cell1_r1.fq
| | |-cell0_r1.fq
| | |-cell0_r2.fq
| |-clone0_r1.fq
| |-clone1_r2.fq
|-fasta
| |-clone2.fasta
| |-normal_paternal.fasta
| |-clone2_paternal.fasta
| |-clone0.fasta
| |-clone1.fasta
| |-clone0_paternal.fasta
| |-normal.fasta
| |-clone1_paternal.fasta
| |-clone2_maternal.fasta
| |-clone1_maternal.fasta
| |-clone0_maternal.fasta
| |-normal_maternal.fasta
|-profile
| |-changes.csv
| |-tree.pdf
| |-maternal_cnv_matrix.csv
| |-paternal_cnv_matrix.csv
| |-phases.csv
| |-cnv_profile.csv
| |-tree.newick
```
* `fasta folder`: stores all the fasta file for each clone.
* `fastq folder`: stores all the paired-reads with fastq format for each clone and each cell.
* `profile folder`: stores all the profile file which is related to the simulation process. The detailed explanation of the format for each file in this folder is as follows.
1. `changes.csv`: stores the evlution path for each clone. One example is listed below:
|Parent|Child |Haplotype|Type|Segment |Change|
|------|------|---------|----|-----------------------|------|
|normal|clone0|paternal |dup |chr22:500001-1000000 |1->3 |
|normal|clone0|maternal |del |chr22:3500001-4000000 |1->0 |
|normal|clone0|maternal |dup |chr22:4000001-4500000 |1->2 |
|normal|clone0|maternal |dup |chr22:5000001-5500000 |1->2 |
|normal|clone0|maternal |dup |chr22:8000001-8500000 |1->4 |
2. `cnv_profile.csv`: stores the cnv ground truth for ech clone with maternal|paternal format. One example is listed below:
|Chromosome|Start |End |clone0|clone1 |clone2|
|----------|------|--------|------|-----------------------|------|
|chr22 |1 |500000 |1|1 |3|1 |3|1 |
|chr22 |500001|1000000 |1|3 |1|3 |3|5 |
|chr22 |1000001|1500000 |1|1 |3|2 |3|2 |
|chr22 |1500001|3000000 |1|1 |1|1 |1|1 |
|chr22 |3000001|3500000 |1|1 |3|2 |3|2 |
3. `maternal_cnv_matrix.csv` and `paternal_cnv_matrix.csv`: store the cnv matrix of each clone seperated by maternal haplotype and paternal haplotype. One example is listed below:
|Index|clone0_maternal_cnvs|clone1_maternal_cnvs|clone2_maternal_cnvs|
|------|--------------------|--------------------|--------------------|
|chr22:1-500000|1 |3 |3 |
|chr22:500001-1000000|1 |1 |3 |
|chr22:1000001-1500000|1 |3 |3 |
|chr22:1500001-3000000|1 |1 |1 |
|chr22:3000001-3500000|1 |3 |3 |
4. `phases.csv`: stores the SNPs in maternal|paternal haplotype. One example is listed below:
|chr22 |16578327|1|0 |
|------|--------|--------|
|chr22 |17307398|1|0 |
|chr22 |18025718|1|0 |
|chr22 |21416314|0|1 |
|chr22 |22418251|1|0 |
5. `tree.newick` and `tree.pdf`: the cnv elution tree with newick format and pdf format.
The example profile folder can be found in `data/profile` folder.
## 6. `SCSilicon2` object
All the general parameters for the SCSilicon2 simulation are stored in a `SCSilicon2` object. Let’s create a new one.
```Python
simulator = scs.SCSilicon2()
```
### 6.1 All parameters in `SCSilicon2` object
* `ref_genome`: str, required<br>
The reference genome file path
* `snp_file`: str, required<br>
The snp list file
* `outdir`: str, optional, default: './'<br>
The output directory
* `clone_no`: int, optional, default: 1<br>
The random clone number contained in evolution tree
* `cell_no`: int, optional, default: 2<br>
The total cell number for this simultion dataset. Please make sure the `cell_no` is large than `clone_no`. At least one cell is geneated for nomal case.
* `max_cnv_tree_depth`: int, optional, default: 4<br>
The maximum depth of random evolution tree
* `bin_len`: int, optional, default: 500000<br>
The fixed bin length
* `HEHOratio`: float, optional, default: 0.5<br>
Ratio of heterozygous SNPs
* `cnv_prob_cutoff`: float, optional, default: 0.8<br>
The cutoff probability of a bin undergoing CNV, if random probability is larger than cutoff, CNV happens
* `clone_coverage`: float, optional, default: 30<br>
The coverage for clone fastq file
* `cell_coverage`: float, optional, default: 0.5<br>
The coverage for each cell in a clone
* `reads_len`: int, optional, default: 150<br>
The reads length in fastq file
* `insertion_size`: int, optional, default: 350<br>
The outer distance between the two ends
* `error_rate`: float, optional, default: 0.02<br>
The base error rate
## Cite us
todo
## Help
If you have any questions or require assistance using SCSilicon, please contact us with fxk@nwpu.edu.cn.
Raw data
{
"_id": null,
"home_page": "https://github.com/xikanfeng2/SCSilicon2",
"name": "SCSilicon2",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Xikang Feng",
"author_email": "Xikang Feng <fxk@nwpu.edu.cn>",
"download_url": "https://files.pythonhosted.org/packages/25/b9/cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56/SCSilicon2-1.0.1.tar.gz",
"platform": null,
"description": "# SCSilicon2\n\nSCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.\n\n## 1. Pre-requirements\n* python3.6 or higher\n* pandas>=0.23.4\n* matplotlib>=3.0.2\n* networkx>=3.2.1\n* [wgsim](https://github.com/lh3/wgsim)\n\n\nAll python packages will be automatically installed when you install SCSilicon2 if these packages are not included in your python library.\n\nTo install wgsim, please refer to the README of [wgsim](https://github.com/lh3/wgsim). Please make sure the command 'wgsim' works in your command line.\n\n## 2. Installation\n\n## Creation of python virtual env\nWe recommend creating a virtual environment to run the scsilicon2(This step is optional!). You can use the following command to create a virtual python env:\n\n```Bash\n# create a python virtual env in scsilicon2 folder\npython -m venv scsilicon2\n\n# activate the virtual env\nsource scsilicon2/bin/activate\n\n# deactivate the virtual env\ndeactivate\n```\n\n### Installation with pip\nTo install with pip, run the following from a terminal:\n```Bash\npip install scsilicon2\n```\n\n### Installation from Github\nTo clone the repository and install manually, run the following from a terminal:\n```Bash\ngit clone https://github.com/xikanfeng2/SCSilicon2.git\ncd SCSilicon2\npython setup.py install\n```\n\n## 3. Quick start\nThe following code runs SCSilicon.\n\n```Python\nimport scsilicon2 as scs\n\n# create SCSilicon2 object: ref_genome and snp_file are required, and outdir, clone_no, and cell_no are optional.\nsimulator = scs.SCSilicon2(ref_genome='your reference fasta file here', snp_file='your snp list file here', outdir='your output directory here', clone_no=4, cell_no=10)\n\n# simulate dataset\nsimulator.sim_dataset()\n```\n\n## 4. Input file required\n\n1. **A reference genome file with fasta format.** \nPlease refer to the example fasta file `example/input/chr22.fa`.\n2. **A list of SNPs.** \nThe SNPs in this list can be introduced in arbitrary positions of the genome. Please refer to the example snp list file `example/input/dbsnp.tsv`.\n\n## 5. Output files of SCSilicon2\nThe output directory contains three subfolders: fastq folder, fasta folder and profile folder. The structure of one example output directory is listed as follows (the clone no is 3 and the cell no is 10 in this example):\n\n```\noutput\n |-fastq\n | |-normal_r2.fq\n | |-clone2\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-normal_r1.fq\n | |-clone2_r2.fq\n | |-clone1_r1.fq\n | |-clone0_r2.fq\n | |-clone0\n | | |-cell2_r1.fq\n | | |-cell3_r2.fq\n | | |-cell2_r2.fq\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell3_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-normal\n | | |-cell2_r1.fq\n | | |-cell2_r2.fq\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-clone2_r1.fq\n | |-clone1\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-clone0_r1.fq\n | |-clone1_r2.fq\n |-fasta\n | |-clone2.fasta\n | |-normal_paternal.fasta\n | |-clone2_paternal.fasta\n | |-clone0.fasta\n | |-clone1.fasta\n | |-clone0_paternal.fasta\n | |-normal.fasta\n | |-clone1_paternal.fasta\n | |-clone2_maternal.fasta\n | |-clone1_maternal.fasta\n | |-clone0_maternal.fasta\n | |-normal_maternal.fasta\n |-profile\n | |-changes.csv\n | |-tree.pdf\n | |-maternal_cnv_matrix.csv\n | |-paternal_cnv_matrix.csv\n | |-phases.csv\n | |-cnv_profile.csv\n | |-tree.newick\n```\n\n* `fasta folder`: stores all the fasta file for each clone.\n\n* `fastq folder`: stores all the paired-reads with fastq format for each clone and each cell.\n\n* `profile folder`: stores all the profile file which is related to the simulation process. The detailed explanation of the format for each file in this folder is as follows.\n\n 1. `changes.csv`: stores the evlution path for each clone. One example is listed below:\n\n |Parent|Child |Haplotype|Type|Segment |Change|\n |------|------|---------|----|-----------------------|------|\n |normal|clone0|paternal |dup |chr22:500001-1000000 |1->3 |\n |normal|clone0|maternal |del |chr22:3500001-4000000 |1->0 |\n |normal|clone0|maternal |dup |chr22:4000001-4500000 |1->2 |\n |normal|clone0|maternal |dup |chr22:5000001-5500000 |1->2 |\n |normal|clone0|maternal |dup |chr22:8000001-8500000 |1->4 |\n \n\n 2. `cnv_profile.csv`: stores the cnv ground truth for ech clone with maternal|paternal format. One example is listed below:\n\n |Chromosome|Start |End |clone0|clone1 |clone2|\n |----------|------|--------|------|-----------------------|------|\n |chr22 |1 |500000 |1|1 |3|1 |3|1 |\n |chr22 |500001|1000000 |1|3 |1|3 |3|5 |\n |chr22 |1000001|1500000 |1|1 |3|2 |3|2 |\n |chr22 |1500001|3000000 |1|1 |1|1 |1|1 |\n |chr22 |3000001|3500000 |1|1 |3|2 |3|2 |\n \n\n 3. `maternal_cnv_matrix.csv` and `paternal_cnv_matrix.csv`: store the cnv matrix of each clone seperated by maternal haplotype and paternal haplotype. One example is listed below:\n\n |Index|clone0_maternal_cnvs|clone1_maternal_cnvs|clone2_maternal_cnvs|\n |------|--------------------|--------------------|--------------------|\n |chr22:1-500000|1 |3 |3 |\n |chr22:500001-1000000|1 |1 |3 |\n |chr22:1000001-1500000|1 |3 |3 |\n |chr22:1500001-3000000|1 |1 |1 |\n |chr22:3000001-3500000|1 |3 |3 |\n \n 4. `phases.csv`: stores the SNPs in maternal|paternal haplotype. One example is listed below:\n\n |chr22 |16578327|1|0 |\n |------|--------|--------|\n |chr22 |17307398|1|0 |\n |chr22 |18025718|1|0 |\n |chr22 |21416314|0|1 |\n |chr22 |22418251|1|0 |\n\n 5. `tree.newick` and `tree.pdf`: the cnv elution tree with newick format and pdf format.\n\n The example profile folder can be found in `data/profile` folder.\n\n## 6. `SCSilicon2` object\nAll the general parameters for the SCSilicon2 simulation are stored in a `SCSilicon2` object. Let\u2019s create a new one.\n\n```Python\nsimulator = scs.SCSilicon2()\n```\n\n### 6.1 All parameters in `SCSilicon2` object\n\n* `ref_genome`: str, required<br>\n The reference genome file path\n \n* `snp_file`: str, required<br>\n The snp list file\n\n* `outdir`: str, optional, default: './'<br>\n The output directory\n\n* `clone_no`: int, optional, default: 1<br>\n The random clone number contained in evolution tree\n\n* `cell_no`: int, optional, default: 2<br>\n The total cell number for this simultion dataset. Please make sure the `cell_no` is large than `clone_no`. At least one cell is geneated for nomal case.\n\n* `max_cnv_tree_depth`: int, optional, default: 4<br>\n The maximum depth of random evolution tree\n\n* `bin_len`: int, optional, default: 500000<br>\n The fixed bin length\n\n* `HEHOratio`: float, optional, default: 0.5<br>\n Ratio of heterozygous SNPs\n\n* `cnv_prob_cutoff`: float, optional, default: 0.8<br>\n The cutoff probability of a bin undergoing CNV, if random probability is larger than cutoff, CNV happens\n\n* `clone_coverage`: float, optional, default: 30<br>\n The coverage for clone fastq file\n\n* `cell_coverage`: float, optional, default: 0.5<br>\n The coverage for each cell in a clone\n\n* `reads_len`: int, optional, default: 150<br>\n The reads length in fastq file\n\n* `insertion_size`: int, optional, default: 350<br>\n The outer distance between the two ends\n\n* `error_rate`: float, optional, default: 0.02<br>\n The base error rate\n\n## Cite us\ntodo\n\n## Help\nIf you have any questions or require assistance using SCSilicon, please contact us with fxk@nwpu.edu.cn.\n",
"bugtrack_url": null,
"license": "",
"summary": "SCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/xikanfeng2/SCSilicon2",
"Issues": "https://github.com/xikanfeng2/SCSilicon2/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "014b1248638fc0f2730f0e26f4af5f4c844748a456780d6befd6348e1a79ba3f",
"md5": "0922c9be9964fe84b9fad04eb2f91379",
"sha256": "98850d3ad5a28cb2cf43a68f94b455df6b50c85e2c5c96da735bef9817a64714"
},
"downloads": -1,
"filename": "SCSilicon2-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0922c9be9964fe84b9fad04eb2f91379",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 14009,
"upload_time": "2023-12-31T08:05:14",
"upload_time_iso_8601": "2023-12-31T08:05:14.497245Z",
"url": "https://files.pythonhosted.org/packages/01/4b/1248638fc0f2730f0e26f4af5f4c844748a456780d6befd6348e1a79ba3f/SCSilicon2-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "25b9cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56",
"md5": "8c054c3a608baab6738cb85c8aa3c08f",
"sha256": "855e7da4083cb008055df91c8ad4bdf31d5ae59133fcc692af8b2165be3ddaee"
},
"downloads": -1,
"filename": "SCSilicon2-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "8c054c3a608baab6738cb85c8aa3c08f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 12818,
"upload_time": "2023-12-31T08:05:18",
"upload_time_iso_8601": "2023-12-31T08:05:18.553638Z",
"url": "https://files.pythonhosted.org/packages/25/b9/cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56/SCSilicon2-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-31 08:05:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xikanfeng2",
"github_project": "SCSilicon2",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "scsilicon2"
}