CLEMENTDNA


NameCLEMENTDNA JSON
Version 1.0.12 PyPI version JSON
download
home_pagehttps://github.com/Yonsei-TGIL/CLEMENT
SummaryGenomic decomposition and reconstruction of non-tumor diploid subclones
upload_time2024-05-28 11:41:04
maintainerNone
docs_urlNone
authorYoung-soo Chung, M.D.
requires_python>=3.6
licenseGPL v3
keywords clement genomic decomposition
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CLEMENT
- Genomic decomposition and reconstruction of **non-tumor** diploid subclones (2023)
- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
- Support multiple diploid sample
- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.

## Overview of CLEMENT workflow and core algorithms
<br/>

![CLEMENT_overview](https://github.com/Yonsei-TGIL/CLEMENT/assets/111939069/e8ff11b3-5fa8-4e2e-b045-47e4da90b01c)
<br/>

## Installation
### Dependencies
- python 3.6.x
- matplotlib 3.5.2
- seaborn 0.11.2
- numpy 1.21.5
- pandas 1.3.4
- scikit-learn 1.0.2
- scipy 1.7.3
- palettable 3.3.0

### Install from github
1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git   
    cd CLEMENT   
    pip3 install .   

2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git    

### Install from PyPi
3. pip3 install CLEMENTDNA   

## Version update
1.0.11 (Jan 1st, 2024)

## Input format
As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in _"example"_ directory.
- 1st column: mutation ID (CHR_POS is recommended)
- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
- 3rd column: **Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n**    * should be comma-separated, and no space permitted
- 4th column: **BQ1,BQ2....,BQ_n**    * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.

## Running
### command line interface
	CLEMENT [OPTIONS]   


### options

	(Mandatory) These options are regarding User's input and output format
		--INPUT_TSV		Input data whether TSV. The tool automatically detects the number of samples
		--CLEMENT_DIR 		Directory where the outputs of CLEMENT be saved

	These options are regarding downsizing User's input or not
		--RANDOM_PICK 		Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).
	
	These options are regarding the selection of likelihood model
		--MODEL 		Model for TP, FN in E-step.  (default: betabinomial)
		--CONSTANT   Constant multiplier for alpha and beta in beta-binomila distribution. (default:1)

	These options are adjusting E-M algorithm parameter
		--NUM_CLONE_TRIAL_START 	Minimum number of expected cluster_hards (initation of K) 	(default: 3)
		--NUM_CLONE_TRIAL_END 		Maximum number of expected cluster_hards (termination of K)	 (default: 5)
		--TRIAL_NO 			Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
    	--FP_PRIOR FP_PRIOR   		Prior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)
		--TN_PRIOR TN_PRIOR   		Prior of true negative (TN). Recommendation : > 0.99. (default : 0.99)
		--KMEANS_CLUSTERNO		Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
		--MIN_CLUSTER_SIZE		The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number 	(default: 9)

	Other options
		--MODE			Selection of clustering method. "Hard": hard clustering only,  "Both": both hard and soft (fuzzy) clustering (default: "Both")
		--MAKEONE_STRICT  	1: strict, 2: lenient, 3: most lenient (default : 1)
		--SCORING		True : comparing with the answer set, False : just visualization (default: False)
		

	Miscelleneous
		--FONT_FAMILY		Font family that displayed in the plots (default : "arial")
		--VISUALIZATION		Whether produce image in every E-M step (default: True)
		--IMAGE_FORMAT		Image format that displayed in the plots (default : jpg)
		--VERBOSE		0: no record,  1: simplified record,  2: verbose record (default: 2)


### output

**${CLEMENT_DIR}"/result"**
- **CLEMENT_decision**		_CLEMENT's best recommendation among hard and soft clustering._
- **CLEMENT_hard_1st**  	_CLEMENT's best decomposition by hard clustering._
- **CLEMENT_hard.gapstatistics.txt** 	_Selecting the optimal K in hard clustering based on gap* stastics._
- **CLEMENT_soft_1st** 	_CLEMENT's best decomposition by soft (fuzzy) clustering._
- **membership.txt** 	_Membership assignment of all variants to each clusters._
- **membership_count.txt** 	_Count matrix of the membership assignment to each clusters._
- **mixture.txt** 	_Centroid of each clusters_

## Example
	DIR=[YOUR_DIRECTORY]

	# Example 1
	CLEMENT \
		--INPUT_TSV ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt" \
    	--CLEMENT_DIR ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1" \
      	--NUM_CLONE_TRIAL_START 1 \
		--NUM_CLONE_TRIAL_END 5 
  
	# Example 2
	CLEMENT \
		--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt" \
		--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4"  \
		--NUM_CLONE_TRIAL_START 2 \
		--NUM_CLONE_TRIAL_END 6 \
		--RANDOM_PICK 500
	
		


![example1](https://github.com/Yonsei-TGIL/CLEMENT/assets/56012432/a5a6beb2-e0ac-44ad-8a5a-1b9aa4480010)
![example2](https://github.com/Yonsei-TGIL/CLEMENT/assets/56012432/3ee2c4a3-4627-40a3-80e6-666a981a6c20)
<br/>

## Contact
	goldpm1@yuhs.ac



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Yonsei-TGIL/CLEMENT",
    "name": "CLEMENTDNA",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "CLEMENT, genomic decomposition",
    "author": "Young-soo Chung, M.D.",
    "author_email": "goldpm1@yuhs.ac",
    "download_url": "https://files.pythonhosted.org/packages/48/36/5aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57/CLEMENTDNA-1.0.12.tar.gz",
    "platform": null,
    "description": "# CLEMENT\n- Genomic decomposition and reconstruction of **non-tumor** diploid subclones (2023)\n- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting\n- Support multiple diploid sample\n- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.\n\n## Overview of CLEMENT workflow and core algorithms\n<br/>\n\n![CLEMENT_overview](https://github.com/Yonsei-TGIL/CLEMENT/assets/111939069/e8ff11b3-5fa8-4e2e-b045-47e4da90b01c)\n<br/>\n\n## Installation\n### Dependencies\n- python 3.6.x\n- matplotlib 3.5.2\n- seaborn 0.11.2\n- numpy 1.21.5\n- pandas 1.3.4\n- scikit-learn 1.0.2\n- scipy 1.7.3\n- palettable 3.3.0\n\n### Install from github\n1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git   \n    cd CLEMENT   \n    pip3 install .   \n\n2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git    \n\n### Install from PyPi\n3. pip3 install CLEMENTDNA   \n\n## Version update\n1.0.11 (Jan 1st, 2024)\n\n## Input format\nAs now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in _\"example\"_ directory.\n- 1st column: mutation ID (CHR_POS is recommended)\n- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0\n- 3rd column: **Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n**    * should be comma-separated, and no space permitted\n- 4th column: **BQ1,BQ2....,BQ_n**    * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.\n\n## Running\n### command line interface\n\tCLEMENT [OPTIONS]   \n\n\n### options\n\n\t(Mandatory) These options are regarding User's input and output format\n\t\t--INPUT_TSV\t\tInput data whether TSV. The tool automatically detects the number of samples\n\t\t--CLEMENT_DIR \t\tDirectory where the outputs of CLEMENT be saved\n\n\tThese options are regarding downsizing User's input or not\n\t\t--RANDOM_PICK \t\tSet this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).\n\t\n\tThese options are regarding the selection of likelihood model\n\t\t--MODEL \t\tModel for TP, FN in E-step.  (default: betabinomial)\n\t\t--CONSTANT   Constant multiplier for alpha and beta in beta-binomila distribution. (default:1)\n\n\tThese options are adjusting E-M algorithm parameter\n\t\t--NUM_CLONE_TRIAL_START \tMinimum number of expected cluster_hards (initation of K) \t(default: 3)\n\t\t--NUM_CLONE_TRIAL_END \t\tMaximum number of expected cluster_hards (termination of K)\t (default: 5)\n\t\t--TRIAL_NO \t\t\tTrial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)\n    \t--FP_PRIOR FP_PRIOR   \t\tPrior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)\n\t\t--TN_PRIOR TN_PRIOR   \t\tPrior of true negative (TN). Recommendation : > 0.99. (default : 0.99)\n\t\t--KMEANS_CLUSTERNO\t\tNumber of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)\n\t\t--MIN_CLUSTER_SIZE\t\tThe minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number \t(default: 9)\n\n\tOther options\n\t\t--MODE\t\t\tSelection of clustering method. \"Hard\": hard clustering only,  \"Both\": both hard and soft (fuzzy) clustering (default: \"Both\")\n\t\t--MAKEONE_STRICT  \t1: strict, 2: lenient, 3: most lenient (default : 1)\n\t\t--SCORING\t\tTrue : comparing with the answer set, False : just visualization (default: False)\n\t\t\n\n\tMiscelleneous\n\t\t--FONT_FAMILY\t\tFont family that displayed in the plots (default : \"arial\")\n\t\t--VISUALIZATION\t\tWhether produce image in every E-M step (default: True)\n\t\t--IMAGE_FORMAT\t\tImage format that displayed in the plots (default : jpg)\n\t\t--VERBOSE\t\t0: no record,  1: simplified record,  2: verbose record (default: 2)\n\n\n### output\n\n**${CLEMENT_DIR}\"/result\"**\n- **CLEMENT_decision**\t\t_CLEMENT's best recommendation among hard and soft clustering._\n- **CLEMENT_hard_1st**  \t_CLEMENT's best decomposition by hard clustering._\n- **CLEMENT_hard.gapstatistics.txt** \t_Selecting the optimal K in hard clustering based on gap* stastics._\n- **CLEMENT_soft_1st** \t_CLEMENT's best decomposition by soft (fuzzy) clustering._\n- **membership.txt** \t_Membership assignment of all variants to each clusters._\n- **membership_count.txt** \t_Count matrix of the membership assignment to each clusters._\n- **mixture.txt** \t_Centroid of each clusters_\n\n## Example\n\tDIR=[YOUR_DIRECTORY]\n\n\t# Example 1\n\tCLEMENT \\\n\t\t--INPUT_TSV ${DIR}\"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt\" \\\n    \t--CLEMENT_DIR ${DIR}\"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1\" \\\n      \t--NUM_CLONE_TRIAL_START 1 \\\n\t\t--NUM_CLONE_TRIAL_END 5 \n  \n\t# Example 2\n\tCLEMENT \\\n\t\t--INPUT_TSV ${DIR}\"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt\" \\\n\t\t--CLEMENT_DIR ${DIR}\"/example/2.CellData/MRS_2D/M1-8_M2-4\"  \\\n\t\t--NUM_CLONE_TRIAL_START 2 \\\n\t\t--NUM_CLONE_TRIAL_END 6 \\\n\t\t--RANDOM_PICK 500\n\t\n\t\t\n\n\n![example1](https://github.com/Yonsei-TGIL/CLEMENT/assets/56012432/a5a6beb2-e0ac-44ad-8a5a-1b9aa4480010)\n![example2](https://github.com/Yonsei-TGIL/CLEMENT/assets/56012432/3ee2c4a3-4627-40a3-80e6-666a981a6c20)\n<br/>\n\n## Contact\n\tgoldpm1@yuhs.ac\n\n\n",
    "bugtrack_url": null,
    "license": "GPL v3",
    "summary": "Genomic decomposition and reconstruction of non-tumor diploid subclones",
    "version": "1.0.12",
    "project_urls": {
        "Download": "https://github.com/Yonsei-TGIL/CLEMENT.git",
        "Homepage": "https://github.com/Yonsei-TGIL/CLEMENT"
    },
    "split_keywords": [
        "clement",
        " genomic decomposition"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ecad1009fa99565a3444459a76454fe2897ab8f46a53e536bdfb9a7703bc5121",
                "md5": "bebde17f040f65930d1ac2f1b520926b",
                "sha256": "d6a8825864bee8553160c171408997278a8bbab4dbfe61631cf9fc33f11eb0ff"
            },
            "downloads": -1,
            "filename": "CLEMENTDNA-1.0.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bebde17f040f65930d1ac2f1b520926b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 170010,
            "upload_time": "2024-05-28T11:39:01",
            "upload_time_iso_8601": "2024-05-28T11:39:01.490081Z",
            "url": "https://files.pythonhosted.org/packages/ec/ad/1009fa99565a3444459a76454fe2897ab8f46a53e536bdfb9a7703bc5121/CLEMENTDNA-1.0.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48365aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57",
                "md5": "b4ec038148b25409d000701994984c01",
                "sha256": "4f48977805c6c681519ac60e178c709756b6d0d218b1a9559cc54a843c701e99"
            },
            "downloads": -1,
            "filename": "CLEMENTDNA-1.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "b4ec038148b25409d000701994984c01",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 27797685,
            "upload_time": "2024-05-28T11:41:04",
            "upload_time_iso_8601": "2024-05-28T11:41:04.157582Z",
            "url": "https://files.pythonhosted.org/packages/48/36/5aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57/CLEMENTDNA-1.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-28 11:41:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Yonsei-TGIL",
    "github_project": "CLEMENT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "clementdna"
}
        
Elapsed time: 1.27478s