fitPhylo


NamefitPhylo JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/FangWang-SYSU/fitPhylo.git
SummaryA package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.
upload_time2023-10-12 10:00:01
maintainer
docs_urlNone
authorXinWang
requires_python>=3.9
licenseMIT
keywords single cell phylogenetic tree cna evolutionary efficiency
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Description
===========
A method for inferring CNA fitness evolutionary trees is based on multiple metrics, including genome similarity, aneuploid segregation distance, and the absolute distance between two genomes. Additionally, CNAs' evolutionary efficiency (CEE) is estimated to enable a quantitative assessment of de novo CNAs' efficiency.

System requirements and dependency
==================================
Software package development environment: 
    
    macOS
    Python 3.11.3

Installation
============
First create a virtual environment for fitPhylo, but this is not required
```shell
conda create --name fitPhylo_env python=3.9
conda activate fitPhylo_env
```
There are two ways to install `fitPhylo`:
## 1.pip
```shell
pip install fitPhylo
```

## 2.source code
```shell
git clone https://github.com/FangWang-SYSU/fitPhylo.git
cd fitPhylo
python setup.py install
```

After the installation is successful, enter `fitPhylo --version` on the command line and the following message will appear, indicating that the installation is successful.
    fitPhylo 1.0

Usage
=====
```
usage: fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES] [-d DRAW]

A package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -I INPUT, --input INPUT
                        single-cell copy number profile.
  -O OUTPUT, --output OUTPUT
                        The output path.
  -p PREFIX, --prefix PREFIX
                        Prefix for output file names.
  -r RESOLUTION, --resolution RESOLUTION
                        Lineage partitioning resolution(default=1).
  -t HUFFMAN_SPLIT_THRESHOLD, --huffman_split_threshold HUFFMAN_SPLIT_THRESHOLD
                        huffman split threshold(default=0.9)
  -n N_NEIGHBORS, --n_neighbors N_NEIGHBORS
                        Number of neighbors for creating affinity matrix in SNF(default=5).
  -m MIN_CLONE_SIZE, --min_clone_size MIN_CLONE_SIZE
                        When min_clone_size is reached, division will no longer continue(default=0.1*cell_number).
  -s SCORING, --scoring SCORING
                        Whether to run Scoring the chromosomal rearrangements.
  -R RANDOM_NUM, --random_num RANDOM_NUM
                        Random number for creating a null distribution.
  -C CANCER_TYPE, --cancer_type CANCER_TYPE
                        Select a cancer type for estimating WGD. The default is all.
  -c CORES, --cores CORES
                        Number of cores required to run copy number variation events.
  -d DRAW, --draw DRAW  Draw tree and CNA heatmap.

Author: wangxin, Email: wangx768@mail2.sysu.edu.cn
```
You can also load fitPhylo in python:
```python
import fitPhylo as fp
```


Input files
===========

The input file of fitPhylo needs to be an integer copy number spectrum:

    The row is the genome segment,
    the first column is the chromosome, 
    the second column is the genome starting coordinate, 
    the third column is the genome end coordinate,
    and the other columns are the integer copy numbers at the cell level.

  	chr	start	end	cell_1	cell_2	cell_3 ...
    1	100167143	100220943	2	2	2 ...
    1	100504443	100559237	2	1	2 ...
    1	101395562	101451560	2	3	4 ...

>Connection with [inferCNV](https://github.com/broadinstitute/inferCNV):
> 
> To obtain the integer copy number,we propose to identify peaks and infer their intervals, with each interval representing an integer copy number (detail in method).

Examples
============

### Run in command line
```shell
#fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES]
mkdir fitPhylo_out
fitPhylo \
    -I ./data/test.txt \
    -O ./fitPhylo_out \
    -p example_ \
    -r 1 \
    -t 1 \
    -n 5 \
    -C ALL \
    -c 8
```
> The `-r` parameter is used for lineage partitioning resolution, where a higher value indicates greater precision. 
> The `-t`parameter represents the proportion of subtree splitting during the `Huffman process` and takes values between 0 and 1. 
> A higher value implies a lower probability of splitting two already merged cells.
> Additionally, the estimation of chromosome rearrangement score is influenced by the `-R` parameter, with a larger value leading to longer runtime.

### Run in python
```python
# load package
import fitPhylo as fp
# 1.infer tree
fp.fitPhylo.run(cna_dir = fp.__path__[0] + '/data/exampleCNA.txt',
                output='fitPhylo_out',
                prefix='example_',
                resolution=1,
                clone_thr=1,
                n_neighbors=5,
                plot_png=False,
                verbose=True
                )

# 2.score
fp.fitPhylo.chromosome_event(
            'fitPhylo_out',
            prefix='example_',
            cancer_type='ALL',
            cores=8,
            randome_num=1000,
            verbose=True)
```

Output files
============
### 1.cell_info.txt: Cell variation information in trace.
    name	Root_gain_loc	Root_loss_loc	Root_gain_cn	Root_loss_cn	Parent_gain_loc	Parent_loss_loc	Parent_gain_cn	Parent_loss_cn	Mitosis_copy	Mitosis_dd_loc	Mitosis_ad_loc	Mitosis_time	Pseudotime_tree	Mitosis_time_next	aneu_rate	copy_rate	status
    root										1042.0	163.0	20.0	0.0	40.0	0.013		Aneuploidy
    cell_2	160.0	428.0	428.0	160.0	160.0	428.0	160.0	428.0	11617.0	825.0	478.0	31.0	21.0	21.0	0.039	0.951	Aneuploidy
    cell_3	151.0	629.0	629.0	151.0	151.0	629.0	151.0	629.0	11425.0	1801.0	182.0	32.5	19.0	19.0	0.014	0.936	Aneuploidy
    cell_4	51.0	360.0	360.0	51.0	291.0	410.0	291.0	410.0	11504.0	779.0	189.0	25.0	46.0	25.0	0.015	0.942	Aneuploidy
```
[Root|Parent]_[gain|loss]_[loc|cn]: The number of sites or copies accumulated (gain|loss) relative to the (Root|Parent) node.
Mitosis_copy: The count of genomic segments sharing the same copy number states between the current node and its parent node(D_ss).
Mitosis_dd_loc: The count of genomic segments different copy number states between the current node and its parent node(D_ds).
Mitosis_ad_loc: The count of aneuploidy segregation states between the current node and its parent node(D_as).
Mitosis_time: Branch length of current node.
Pseudotime_tree: Pseudotime of current node in tree.
Mitosis_time_next: The branch length of the next mitosis of the current node.
aneu_rate: Rate of aneuploidy segregation states.
copy_rate: Rate of same copy number states.
status: The current cell mitotic state inferred based on aneu_rate and copy_rate.
```

### 2.all_node_data.txt: Cell CNA profile, including internal node, name by "virtual_".

    	1_977836_977836	1_1200863_1200863 ...
    cell1	1.0	1.0	...
    cell2	2.0	2.0	...
>Integer copy number profile of all nodes in tree. Rows are cells, columns are genome segments.

### 3.cell_tree.newick: Single cell trace file,format newick.
>Stores the structural information of the evolutionary tree, including branch length.

### 4.*re_score.txt
    ,1,2,3,...
    cell_1,0.761,0.800,0.793,...
    cell_2,0.88,0.0,0.636,0.659,...
    cell_3,0.88,0.957,0.783,...
>The level of chromosomal rearrangements. Rows are cells, columns are chromosome. If the value is -1, it means that the chromosome has not changed significantly.

### 5.*gradual_score.txt
    ,1,2,3,...
    cell_1,0.001,0.008,0.705,...
    cell_2,0.780,0.005,0.837,...
    cell_3,0.890,0.907,0.463,...
>The level of chromosomal gradual. Rows are cells, columns are chromosome.

### 6.*re_socre_pvalue.txt
>P-value of chromosomal rearrangements (detail in methods). 

### 7.*mode.txt
    ,chr_num,wgd,gradual_num,seismic_num,gradual_score,seismic_score,BFB
    cell_1,19,WGD1,14,5,0.009,0.0149,1368
    cell_2,11,WGD1,7,4,0.001,0.006,234
    cell_3,12,WGD0,6,6,0.002,0.001,209
```
chr_num: The total number of chromosomes in gradual and seismic.
wgd: Cell chromosome WGD type (wgd0 is diploid, wgd1 involves a single whole genome duplication, and wgd2 entails multiple whole genome duplications.).
gradual_num: The total number of chromosomes in gradual.
seismic_num: The total number of chromosomes in seismic.
gradual_score: Average gradual score in gradual chromosome.
seismic_score: Average seismic score in seismic chromosome.
BFB: The number of aneuploidy segregation states.
```

### 8.*cee_score.txt
    cell_1	0.24
    cell_2	0.19
    cell_3	0.87
```
first column: Cell id.
second column: CEE score.
```
### 9.*tree.png
    Optional parameter '-d' or 'plot_png'. 
    If set to 1, it will draw the phylogenetic tree and heatmap of CNA profile. 
    If set to 0, it will not be drawn.
>Note that drawing requires `matplotlib` and `seaborn` packages


Developer
=========
Fang Wang (fwang9@mdanderson.org), Xin Wang (wangx768@mail2.sysu.edu.cn)

Draft date
==========
Oct.12, 2023

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/FangWang-SYSU/fitPhylo.git",
    "name": "fitPhylo",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "Single cell,phylogenetic tree,CNA,evolutionary efficiency",
    "author": "XinWang",
    "author_email": "wangx768@mail2.sysu.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/39/13/450c158c8960aa89e6a61f758e66fc4b864f23da251d23d5c7677bcf96c7/fitPhylo-1.0.tar.gz",
    "platform": null,
    "description": "Description\n===========\nA method for inferring CNA fitness evolutionary trees is based on multiple metrics, including genome similarity, aneuploid segregation distance, and the absolute distance between two genomes. Additionally, CNAs' evolutionary efficiency (CEE) is estimated to enable a quantitative assessment of de novo CNAs' efficiency.\n\nSystem requirements and dependency\n==================================\nSoftware package development environment\uff1a \n    \n    macOS\n    Python 3.11.3\n\nInstallation\n============\nFirst create a virtual environment for fitPhylo, but this is not required\n```shell\nconda create --name fitPhylo_env python=3.9\nconda activate fitPhylo_env\n```\nThere are two ways to install `fitPhylo`\uff1a\n## 1.pip\n```shell\npip install fitPhylo\n```\n\n## 2.source code\n```shell\ngit clone https://github.com/FangWang-SYSU/fitPhylo.git\ncd fitPhylo\npython setup.py install\n```\n\nAfter the installation is successful, enter `fitPhylo --version` on the command line and the following message will appear, indicating that the installation is successful.\n    fitPhylo 1.0\n\nUsage\n=====\n```\nusage: fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES] [-d DRAW]\n\nA package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.\n\noptions:\n  -h, --help            show this help message and exit\n  --version             show program's version number and exit\n  -I INPUT, --input INPUT\n                        single-cell copy number profile.\n  -O OUTPUT, --output OUTPUT\n                        The output path.\n  -p PREFIX, --prefix PREFIX\n                        Prefix for output file names.\n  -r RESOLUTION, --resolution RESOLUTION\n                        Lineage partitioning resolution(default=1).\n  -t HUFFMAN_SPLIT_THRESHOLD, --huffman_split_threshold HUFFMAN_SPLIT_THRESHOLD\n                        huffman split threshold(default=0.9)\n  -n N_NEIGHBORS, --n_neighbors N_NEIGHBORS\n                        Number of neighbors for creating affinity matrix in SNF(default=5).\n  -m MIN_CLONE_SIZE, --min_clone_size MIN_CLONE_SIZE\n                        When min_clone_size is reached, division will no longer continue(default=0.1*cell_number).\n  -s SCORING, --scoring SCORING\n                        Whether to run Scoring the chromosomal rearrangements.\n  -R RANDOM_NUM, --random_num RANDOM_NUM\n                        Random number for creating a null distribution.\n  -C CANCER_TYPE, --cancer_type CANCER_TYPE\n                        Select a cancer type for estimating WGD. The default is all.\n  -c CORES, --cores CORES\n                        Number of cores required to run copy number variation events.\n  -d DRAW, --draw DRAW  Draw tree and CNA heatmap.\n\nAuthor: wangxin, Email: wangx768@mail2.sysu.edu.cn\n```\nYou can also load fitPhylo in python:\n```python\nimport fitPhylo as fp\n```\n\n\nInput files\n===========\n\nThe input file of fitPhylo needs to be an integer copy number spectrum:\n\n    The row is the genome segment,\n    the first column is the chromosome, \n    the second column is the genome starting coordinate, \n    the third column is the genome end coordinate,\n    and the other columns are the integer copy numbers at the cell level.\n\n  \tchr\tstart\tend\tcell_1\tcell_2\tcell_3 ...\n    1\t100167143\t100220943\t2\t2\t2 ...\n    1\t100504443\t100559237\t2\t1\t2 ...\n    1\t101395562\t101451560\t2\t3\t4 ...\n\n>Connection with [inferCNV](https://github.com/broadinstitute/inferCNV):\n> \n> To obtain the integer copy number\uff0cwe propose to identify peaks and infer their intervals, with each interval representing an integer copy number (detail in method).\n\nExamples\n============\n\n### Run in command line\n```shell\n#fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES]\nmkdir fitPhylo_out\nfitPhylo \\\n    -I ./data/test.txt \\\n    -O ./fitPhylo_out \\\n    -p example_ \\\n    -r 1 \\\n    -t 1 \\\n    -n 5 \\\n    -C ALL \\\n    -c 8\n```\n> The `-r` parameter is used for lineage partitioning resolution, where a higher value indicates greater precision. \n> The `-t`parameter represents the proportion of subtree splitting during the `Huffman process` and takes values between 0 and 1. \n> A higher value implies a lower probability of splitting two already merged cells.\n> Additionally, the estimation of chromosome rearrangement score is influenced by the `-R` parameter, with a larger value leading to longer runtime.\n\n### Run in python\n```python\n# load package\nimport fitPhylo as fp\n# 1.infer tree\nfp.fitPhylo.run(cna_dir = fp.__path__[0] + '/data/exampleCNA.txt',\n                output='fitPhylo_out',\n                prefix='example_',\n                resolution=1,\n                clone_thr=1,\n                n_neighbors=5,\n                plot_png=False,\n                verbose=True\n                )\n\n# 2.score\nfp.fitPhylo.chromosome_event(\n            'fitPhylo_out',\n            prefix='example_',\n            cancer_type='ALL',\n            cores=8,\n            randome_num=1000,\n            verbose=True)\n```\n\nOutput files\n============\n### 1.cell_info.txt: Cell variation information in trace.\n    name\tRoot_gain_loc\tRoot_loss_loc\tRoot_gain_cn\tRoot_loss_cn\tParent_gain_loc\tParent_loss_loc\tParent_gain_cn\tParent_loss_cn\tMitosis_copy\tMitosis_dd_loc\tMitosis_ad_loc\tMitosis_time\tPseudotime_tree\tMitosis_time_next\taneu_rate\tcopy_rate\tstatus\n    root\t\t\t\t\t\t\t\t\t\t1042.0\t163.0\t20.0\t0.0\t40.0\t0.013\t\tAneuploidy\n    cell_2\t160.0\t428.0\t428.0\t160.0\t160.0\t428.0\t160.0\t428.0\t11617.0\t825.0\t478.0\t31.0\t21.0\t21.0\t0.039\t0.951\tAneuploidy\n    cell_3\t151.0\t629.0\t629.0\t151.0\t151.0\t629.0\t151.0\t629.0\t11425.0\t1801.0\t182.0\t32.5\t19.0\t19.0\t0.014\t0.936\tAneuploidy\n    cell_4\t51.0\t360.0\t360.0\t51.0\t291.0\t410.0\t291.0\t410.0\t11504.0\t779.0\t189.0\t25.0\t46.0\t25.0\t0.015\t0.942\tAneuploidy\n```\n[Root|Parent]_[gain|loss]_[loc|cn]: The number of sites or copies accumulated (gain|loss) relative to the (Root|Parent) node.\nMitosis_copy: The count of genomic segments sharing the same copy number states between the current node and its parent node(D_ss).\nMitosis_dd_loc: The count of genomic segments different copy number states between the current node and its parent node(D_ds).\nMitosis_ad_loc: The count of aneuploidy segregation states between the current node and its parent node(D_as).\nMitosis_time: Branch length of current node.\nPseudotime_tree: Pseudotime of current node in tree.\nMitosis_time_next: The branch length of the next mitosis of the current node.\naneu_rate: Rate of aneuploidy segregation states.\ncopy_rate: Rate of same copy number states.\nstatus: The current cell mitotic state inferred based on aneu_rate and copy_rate.\n```\n\n### 2.all_node_data.txt: Cell CNA profile, including internal node, name by \"virtual_\".\n\n    \t1_977836_977836\t1_1200863_1200863 ...\n    cell1\t1.0\t1.0\t...\n    cell2\t2.0\t2.0\t...\n>Integer copy number profile of all nodes in tree. Rows are cells, columns are genome segments.\n\n### 3.cell_tree.newick: Single cell trace file\uff0cformat newick.\n>Stores the structural information of the evolutionary tree, including branch length.\n\n### 4.*re_score.txt\n    ,1,2,3,...\n    cell_1,0.761,0.800,0.793,...\n    cell_2,0.88,0.0,0.636,0.659,...\n    cell_3,0.88,0.957,0.783,...\n>The level of chromosomal rearrangements. Rows are cells, columns are chromosome. If the value is -1, it means that the chromosome has not changed significantly.\n\n### 5.*gradual_score.txt\n    ,1,2,3,...\n    cell_1,0.001,0.008,0.705,...\n    cell_2,0.780,0.005,0.837,...\n    cell_3,0.890,0.907,0.463,...\n>The level of chromosomal gradual. Rows are cells, columns are chromosome.\n\n### 6.*re_socre_pvalue.txt\n>P-value of chromosomal rearrangements (detail in methods). \n\n### 7.*mode.txt\n    ,chr_num,wgd,gradual_num,seismic_num,gradual_score,seismic_score,BFB\n    cell_1,19,WGD1,14,5,0.009,0.0149,1368\n    cell_2,11,WGD1,7,4,0.001,0.006,234\n    cell_3,12,WGD0,6,6,0.002,0.001,209\n```\nchr_num: The total number of chromosomes in gradual and seismic.\nwgd: Cell chromosome WGD type (wgd0 is diploid, wgd1 involves a single whole genome duplication, and wgd2 entails multiple whole genome duplications.).\ngradual_num: The total number of chromosomes in gradual.\nseismic_num: The total number of chromosomes in seismic.\ngradual_score: Average gradual score in gradual chromosome.\nseismic_score: Average seismic score in seismic chromosome.\nBFB: The number of aneuploidy segregation states.\n```\n\n### 8.*cee_score.txt\n    cell_1\t0.24\n    cell_2\t0.19\n    cell_3\t0.87\n```\nfirst column: Cell id.\nsecond column: CEE score.\n```\n### 9.*tree.png\n    Optional parameter '-d' or 'plot_png'. \n    If set to 1, it will draw the phylogenetic tree and heatmap of CNA profile. \n    If set to 0, it will not be drawn.\n>Note that drawing requires `matplotlib` and `seaborn` packages\n\n\nDeveloper\n=========\nFang Wang (fwang9@mdanderson.org), Xin Wang (wangx768@mail2.sysu.edu.cn)\n\nDraft date\n==========\nOct.12, 2023\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.",
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/FangWang-SYSU/fitPhylo.git"
    },
    "split_keywords": [
        "single cell",
        "phylogenetic tree",
        "cna",
        "evolutionary efficiency"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f87774045ece78125df18c55764042ce1be7a6db8d3dc6a78a85a40804d6155",
                "md5": "f25cc6973f23f90547f8e0fac3fe77ad",
                "sha256": "5cf1f91a85eafbc2660c318f745af0ffc40fc30cde4f7dd4e3d176e6fb267fa9"
            },
            "downloads": -1,
            "filename": "fitPhylo-1.0-cp311-cp311-macosx_11_0_x86_64.whl",
            "has_sig": false,
            "md5_digest": "f25cc6973f23f90547f8e0fac3fe77ad",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 256978,
            "upload_time": "2023-10-12T10:05:52",
            "upload_time_iso_8601": "2023-10-12T10:05:52.868136Z",
            "url": "https://files.pythonhosted.org/packages/0f/87/774045ece78125df18c55764042ce1be7a6db8d3dc6a78a85a40804d6155/fitPhylo-1.0-cp311-cp311-macosx_11_0_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3913450c158c8960aa89e6a61f758e66fc4b864f23da251d23d5c7677bcf96c7",
                "md5": "41134e426c7e583245be24de95405eeb",
                "sha256": "c7595c54d85ddb350d9732b2663294d9b7cf2e12aa5fd4e328b507e0c9164215"
            },
            "downloads": -1,
            "filename": "fitPhylo-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "41134e426c7e583245be24de95405eeb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 57397,
            "upload_time": "2023-10-12T10:00:01",
            "upload_time_iso_8601": "2023-10-12T10:00:01.943015Z",
            "url": "https://files.pythonhosted.org/packages/39/13/450c158c8960aa89e6a61f758e66fc4b864f23da251d23d5c7677bcf96c7/fitPhylo-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-12 10:00:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "FangWang-SYSU",
    "github_project": "fitPhylo",
    "github_not_found": true,
    "lcname": "fitphylo"
}
        
Elapsed time: 1.67836s