SCSilicon2


NameSCSilicon2 JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/xikanfeng2/SCSilicon2
SummarySCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.
upload_time2023-12-31 08:05:18
maintainer
docs_urlNone
authorXikang Feng
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SCSilicon2

SCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.

## 1. Pre-requirements
* python3.6 or higher
* pandas>=0.23.4
* matplotlib>=3.0.2
* networkx>=3.2.1
* [wgsim](https://github.com/lh3/wgsim)


All python packages will be automatically installed when you install SCSilicon2 if these packages are not included in your python library.

To install wgsim, please refer to the README of [wgsim](https://github.com/lh3/wgsim). Please make sure the command 'wgsim' works in your command line.

## 2. Installation

## Creation of python virtual env
We recommend creating a virtual environment to run the scsilicon2(This step is optional!). You can use the following command to create a virtual python env:

```Bash
# create a python virtual env in scsilicon2 folder
python -m venv scsilicon2

# activate the virtual env
source scsilicon2/bin/activate

# deactivate the virtual env
deactivate
```

### Installation with pip
To install with pip, run the following from a terminal:
```Bash
pip install scsilicon2
```

### Installation from Github
To clone the repository and install manually, run the following from a terminal:
```Bash
git clone https://github.com/xikanfeng2/SCSilicon2.git
cd SCSilicon2
python setup.py install
```

## 3. Quick start
The following code runs SCSilicon.

```Python
import scsilicon2 as scs

# create SCSilicon2 object: ref_genome and snp_file are required, and outdir, clone_no, and cell_no are optional.
simulator = scs.SCSilicon2(ref_genome='your reference fasta file here', snp_file='your snp list file here', outdir='your output directory here', clone_no=4, cell_no=10)

# simulate dataset
simulator.sim_dataset()
```

## 4. Input file required

1. **A reference genome file with fasta format.**  
Please refer to the example fasta file `example/input/chr22.fa`.
2. **A list of SNPs.**   
The SNPs in this list can be introduced in arbitrary positions of the genome. Please refer to the example snp list file `example/input/dbsnp.tsv`.

## 5. Output files of SCSilicon2
The output directory contains three subfolders: fastq folder, fasta folder and profile folder. The structure of one example output directory is listed as follows (the clone no is 3 and the cell no is 10 in this example):

```
output
 |-fastq
 | |-normal_r2.fq
 | |-clone2
 | | |-cell0_r1.fq
 | | |-cell0_r2.fq
 | |-normal_r1.fq
 | |-clone2_r2.fq
 | |-clone1_r1.fq
 | |-clone0_r2.fq
 | |-clone0
 | | |-cell2_r1.fq
 | | |-cell3_r2.fq
 | | |-cell2_r2.fq
 | | |-cell1_r2.fq
 | | |-cell1_r1.fq
 | | |-cell3_r1.fq
 | | |-cell0_r1.fq
 | | |-cell0_r2.fq
 | |-normal
 | | |-cell2_r1.fq
 | | |-cell2_r2.fq
 | | |-cell1_r2.fq
 | | |-cell1_r1.fq
 | | |-cell0_r1.fq
 | | |-cell0_r2.fq
 | |-clone2_r1.fq
 | |-clone1
 | | |-cell1_r2.fq
 | | |-cell1_r1.fq
 | | |-cell0_r1.fq
 | | |-cell0_r2.fq
 | |-clone0_r1.fq
 | |-clone1_r2.fq
 |-fasta
 | |-clone2.fasta
 | |-normal_paternal.fasta
 | |-clone2_paternal.fasta
 | |-clone0.fasta
 | |-clone1.fasta
 | |-clone0_paternal.fasta
 | |-normal.fasta
 | |-clone1_paternal.fasta
 | |-clone2_maternal.fasta
 | |-clone1_maternal.fasta
 | |-clone0_maternal.fasta
 | |-normal_maternal.fasta
 |-profile
 | |-changes.csv
 | |-tree.pdf
 | |-maternal_cnv_matrix.csv
 | |-paternal_cnv_matrix.csv
 | |-phases.csv
 | |-cnv_profile.csv
 | |-tree.newick
```

* `fasta folder`: stores all the fasta file for each clone.

* `fastq folder`: stores all the paired-reads with fastq format for each clone and each cell.

*  `profile folder`: stores all the profile file which is related to the simulation process. The detailed explanation of the format for each file in this folder is as follows.

    1. `changes.csv`: stores the evlution path for each clone. One example is listed below:

        |Parent|Child |Haplotype|Type|Segment                |Change|
        |------|------|---------|----|-----------------------|------|
        |normal|clone0|paternal |dup |chr22:500001-1000000   |1->3  |
        |normal|clone0|maternal |del |chr22:3500001-4000000  |1->0  |
        |normal|clone0|maternal |dup |chr22:4000001-4500000  |1->2  |
        |normal|clone0|maternal |dup |chr22:5000001-5500000  |1->2  |
        |normal|clone0|maternal |dup |chr22:8000001-8500000  |1->4  |
 

    2. `cnv_profile.csv`: stores the cnv ground truth for ech clone with maternal|paternal format. One example is listed below:

        |Chromosome|Start |End     |clone0|clone1                 |clone2|
        |----------|------|--------|------|-----------------------|------|
        |chr22     |1     |500000  |1|1   |3|1                    |3|1   |
        |chr22     |500001|1000000 |1|3   |1|3                    |3|5   |
        |chr22     |1000001|1500000 |1|1   |3|2                    |3|2   |
        |chr22     |1500001|3000000 |1|1   |1|1                    |1|1   |
        |chr22     |3000001|3500000 |1|1   |3|2                    |3|2   |
 

    3. `maternal_cnv_matrix.csv` and `paternal_cnv_matrix.csv`: store the cnv matrix of each clone seperated by maternal haplotype and paternal haplotype. One example is listed below:

        |Index|clone0_maternal_cnvs|clone1_maternal_cnvs|clone2_maternal_cnvs|
        |------|--------------------|--------------------|--------------------|
        |chr22:1-500000|1                   |3                   |3                   |
        |chr22:500001-1000000|1                   |1                   |3                   |
        |chr22:1000001-1500000|1                   |3                   |3                   |
        |chr22:1500001-3000000|1                   |1                   |1                   |
        |chr22:3000001-3500000|1                   |3                   |3                   |
    
    4. `phases.csv`: stores the SNPs in maternal|paternal haplotype. One example is listed below:

        |chr22 |16578327|1|0     |
        |------|--------|--------|
        |chr22 |17307398|1|0     |
        |chr22 |18025718|1|0     |
        |chr22 |21416314|0|1     |
        |chr22 |22418251|1|0     |

    5. `tree.newick` and `tree.pdf`: the cnv elution tree with newick format and pdf format.

    The example profile folder can be found in `data/profile` folder.

## 6. `SCSilicon2` object
All the general parameters for the SCSilicon2 simulation are stored in a `SCSilicon2` object. Let’s create a new one.

```Python
simulator = scs.SCSilicon2()
```

### 6.1 All parameters in `SCSilicon2` object

* `ref_genome`: str, required<br>
    The reference genome file path
        
* `snp_file`: str, required<br>
    The snp list file

* `outdir`: str, optional, default: './'<br>
    The output directory

* `clone_no`: int, optional, default: 1<br>
    The random clone number contained in evolution tree

* `cell_no`: int, optional, default: 2<br>
    The total cell number for this simultion dataset. Please make sure the `cell_no` is large than `clone_no`. At least one cell is geneated for nomal case.

* `max_cnv_tree_depth`: int, optional, default: 4<br>
    The maximum depth of random evolution tree

* `bin_len`: int, optional, default: 500000<br>
    The fixed bin length

* `HEHOratio`: float, optional, default: 0.5<br>
    Ratio of heterozygous SNPs

* `cnv_prob_cutoff`: float, optional, default: 0.8<br>
    The cutoff probability of a bin undergoing CNV, if random probability is larger than cutoff, CNV happens

* `clone_coverage`: float, optional, default: 30<br>
    The coverage for clone fastq file

* `cell_coverage`: float, optional, default: 0.5<br>
    The coverage for each cell in a clone

* `reads_len`: int, optional, default: 150<br>
    The reads length in fastq file

* `insertion_size`: int, optional, default: 350<br>
    The outer distance between the two ends

* `error_rate`: float, optional, default: 0.02<br>
    The base error rate

## Cite us
todo

## Help
If you have any questions or require assistance using SCSilicon, please contact us with fxk@nwpu.edu.cn.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xikanfeng2/SCSilicon2",
    "name": "SCSilicon2",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Xikang Feng",
    "author_email": "Xikang Feng <fxk@nwpu.edu.cn>",
    "download_url": "https://files.pythonhosted.org/packages/25/b9/cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56/SCSilicon2-1.0.1.tar.gz",
    "platform": null,
    "description": "# SCSilicon2\n\nSCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.\n\n## 1. Pre-requirements\n* python3.6 or higher\n* pandas>=0.23.4\n* matplotlib>=3.0.2\n* networkx>=3.2.1\n* [wgsim](https://github.com/lh3/wgsim)\n\n\nAll python packages will be automatically installed when you install SCSilicon2 if these packages are not included in your python library.\n\nTo install wgsim, please refer to the README of [wgsim](https://github.com/lh3/wgsim). Please make sure the command 'wgsim' works in your command line.\n\n## 2. Installation\n\n## Creation of python virtual env\nWe recommend creating a virtual environment to run the scsilicon2(This step is optional!). You can use the following command to create a virtual python env:\n\n```Bash\n# create a python virtual env in scsilicon2 folder\npython -m venv scsilicon2\n\n# activate the virtual env\nsource scsilicon2/bin/activate\n\n# deactivate the virtual env\ndeactivate\n```\n\n### Installation with pip\nTo install with pip, run the following from a terminal:\n```Bash\npip install scsilicon2\n```\n\n### Installation from Github\nTo clone the repository and install manually, run the following from a terminal:\n```Bash\ngit clone https://github.com/xikanfeng2/SCSilicon2.git\ncd SCSilicon2\npython setup.py install\n```\n\n## 3. Quick start\nThe following code runs SCSilicon.\n\n```Python\nimport scsilicon2 as scs\n\n# create SCSilicon2 object: ref_genome and snp_file are required, and outdir, clone_no, and cell_no are optional.\nsimulator = scs.SCSilicon2(ref_genome='your reference fasta file here', snp_file='your snp list file here', outdir='your output directory here', clone_no=4, cell_no=10)\n\n# simulate dataset\nsimulator.sim_dataset()\n```\n\n## 4. Input file required\n\n1. **A reference genome file with fasta format.**  \nPlease refer to the example fasta file `example/input/chr22.fa`.\n2. **A list of SNPs.**   \nThe SNPs in this list can be introduced in arbitrary positions of the genome. Please refer to the example snp list file `example/input/dbsnp.tsv`.\n\n## 5. Output files of SCSilicon2\nThe output directory contains three subfolders: fastq folder, fasta folder and profile folder. The structure of one example output directory is listed as follows (the clone no is 3 and the cell no is 10 in this example):\n\n```\noutput\n |-fastq\n | |-normal_r2.fq\n | |-clone2\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-normal_r1.fq\n | |-clone2_r2.fq\n | |-clone1_r1.fq\n | |-clone0_r2.fq\n | |-clone0\n | | |-cell2_r1.fq\n | | |-cell3_r2.fq\n | | |-cell2_r2.fq\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell3_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-normal\n | | |-cell2_r1.fq\n | | |-cell2_r2.fq\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-clone2_r1.fq\n | |-clone1\n | | |-cell1_r2.fq\n | | |-cell1_r1.fq\n | | |-cell0_r1.fq\n | | |-cell0_r2.fq\n | |-clone0_r1.fq\n | |-clone1_r2.fq\n |-fasta\n | |-clone2.fasta\n | |-normal_paternal.fasta\n | |-clone2_paternal.fasta\n | |-clone0.fasta\n | |-clone1.fasta\n | |-clone0_paternal.fasta\n | |-normal.fasta\n | |-clone1_paternal.fasta\n | |-clone2_maternal.fasta\n | |-clone1_maternal.fasta\n | |-clone0_maternal.fasta\n | |-normal_maternal.fasta\n |-profile\n | |-changes.csv\n | |-tree.pdf\n | |-maternal_cnv_matrix.csv\n | |-paternal_cnv_matrix.csv\n | |-phases.csv\n | |-cnv_profile.csv\n | |-tree.newick\n```\n\n* `fasta folder`: stores all the fasta file for each clone.\n\n* `fastq folder`: stores all the paired-reads with fastq format for each clone and each cell.\n\n*  `profile folder`: stores all the profile file which is related to the simulation process. The detailed explanation of the format for each file in this folder is as follows.\n\n    1. `changes.csv`: stores the evlution path for each clone. One example is listed below:\n\n        |Parent|Child |Haplotype|Type|Segment                |Change|\n        |------|------|---------|----|-----------------------|------|\n        |normal|clone0|paternal |dup |chr22:500001-1000000   |1->3  |\n        |normal|clone0|maternal |del |chr22:3500001-4000000  |1->0  |\n        |normal|clone0|maternal |dup |chr22:4000001-4500000  |1->2  |\n        |normal|clone0|maternal |dup |chr22:5000001-5500000  |1->2  |\n        |normal|clone0|maternal |dup |chr22:8000001-8500000  |1->4  |\n \n\n    2. `cnv_profile.csv`: stores the cnv ground truth for ech clone with maternal|paternal format. One example is listed below:\n\n        |Chromosome|Start |End     |clone0|clone1                 |clone2|\n        |----------|------|--------|------|-----------------------|------|\n        |chr22     |1     |500000  |1&#124;1   |3&#124;1                    |3&#124;1   |\n        |chr22     |500001|1000000 |1&#124;3   |1&#124;3                    |3&#124;5   |\n        |chr22     |1000001|1500000 |1&#124;1   |3&#124;2                    |3&#124;2   |\n        |chr22     |1500001|3000000 |1&#124;1   |1&#124;1                    |1&#124;1   |\n        |chr22     |3000001|3500000 |1&#124;1   |3&#124;2                    |3&#124;2   |\n \n\n    3. `maternal_cnv_matrix.csv` and `paternal_cnv_matrix.csv`: store the cnv matrix of each clone seperated by maternal haplotype and paternal haplotype. One example is listed below:\n\n        |Index|clone0_maternal_cnvs|clone1_maternal_cnvs|clone2_maternal_cnvs|\n        |------|--------------------|--------------------|--------------------|\n        |chr22:1-500000|1                   |3                   |3                   |\n        |chr22:500001-1000000|1                   |1                   |3                   |\n        |chr22:1000001-1500000|1                   |3                   |3                   |\n        |chr22:1500001-3000000|1                   |1                   |1                   |\n        |chr22:3000001-3500000|1                   |3                   |3                   |\n    \n    4. `phases.csv`: stores the SNPs in maternal|paternal haplotype. One example is listed below:\n\n        |chr22 |16578327|1&#124;0     |\n        |------|--------|--------|\n        |chr22 |17307398|1&#124;0     |\n        |chr22 |18025718|1&#124;0     |\n        |chr22 |21416314|0&#124;1     |\n        |chr22 |22418251|1&#124;0     |\n\n    5. `tree.newick` and `tree.pdf`: the cnv elution tree with newick format and pdf format.\n\n    The example profile folder can be found in `data/profile` folder.\n\n## 6. `SCSilicon2` object\nAll the general parameters for the SCSilicon2 simulation are stored in a `SCSilicon2` object. Let\u2019s create a new one.\n\n```Python\nsimulator = scs.SCSilicon2()\n```\n\n### 6.1 All parameters in `SCSilicon2` object\n\n* `ref_genome`: str, required<br>\n    The reference genome file path\n        \n* `snp_file`: str, required<br>\n    The snp list file\n\n* `outdir`: str, optional, default: './'<br>\n    The output directory\n\n* `clone_no`: int, optional, default: 1<br>\n    The random clone number contained in evolution tree\n\n* `cell_no`: int, optional, default: 2<br>\n    The total cell number for this simultion dataset. Please make sure the `cell_no` is large than `clone_no`. At least one cell is geneated for nomal case.\n\n* `max_cnv_tree_depth`: int, optional, default: 4<br>\n    The maximum depth of random evolution tree\n\n* `bin_len`: int, optional, default: 500000<br>\n    The fixed bin length\n\n* `HEHOratio`: float, optional, default: 0.5<br>\n    Ratio of heterozygous SNPs\n\n* `cnv_prob_cutoff`: float, optional, default: 0.8<br>\n    The cutoff probability of a bin undergoing CNV, if random probability is larger than cutoff, CNV happens\n\n* `clone_coverage`: float, optional, default: 30<br>\n    The coverage for clone fastq file\n\n* `cell_coverage`: float, optional, default: 0.5<br>\n    The coverage for each cell in a clone\n\n* `reads_len`: int, optional, default: 150<br>\n    The reads length in fastq file\n\n* `insertion_size`: int, optional, default: 350<br>\n    The outer distance between the two ends\n\n* `error_rate`: float, optional, default: 0.02<br>\n    The base error rate\n\n## Cite us\ntodo\n\n## Help\nIf you have any questions or require assistance using SCSilicon, please contact us with fxk@nwpu.edu.cn.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "SCSilicon2: a single-cell genomics simulator which cost-effectively simulates single-cell genomics reads with haplotype-specific copy number annotation.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/xikanfeng2/SCSilicon2",
        "Issues": "https://github.com/xikanfeng2/SCSilicon2/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "014b1248638fc0f2730f0e26f4af5f4c844748a456780d6befd6348e1a79ba3f",
                "md5": "0922c9be9964fe84b9fad04eb2f91379",
                "sha256": "98850d3ad5a28cb2cf43a68f94b455df6b50c85e2c5c96da735bef9817a64714"
            },
            "downloads": -1,
            "filename": "SCSilicon2-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0922c9be9964fe84b9fad04eb2f91379",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 14009,
            "upload_time": "2023-12-31T08:05:14",
            "upload_time_iso_8601": "2023-12-31T08:05:14.497245Z",
            "url": "https://files.pythonhosted.org/packages/01/4b/1248638fc0f2730f0e26f4af5f4c844748a456780d6befd6348e1a79ba3f/SCSilicon2-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "25b9cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56",
                "md5": "8c054c3a608baab6738cb85c8aa3c08f",
                "sha256": "855e7da4083cb008055df91c8ad4bdf31d5ae59133fcc692af8b2165be3ddaee"
            },
            "downloads": -1,
            "filename": "SCSilicon2-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8c054c3a608baab6738cb85c8aa3c08f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 12818,
            "upload_time": "2023-12-31T08:05:18",
            "upload_time_iso_8601": "2023-12-31T08:05:18.553638Z",
            "url": "https://files.pythonhosted.org/packages/25/b9/cca465517a77258a7b821aeefce435fcda5bf9f7180b93bd3702a6bcab56/SCSilicon2-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-31 08:05:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xikanfeng2",
    "github_project": "SCSilicon2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "scsilicon2"
}
        
Elapsed time: 0.17190s