# CLEMENT
- Genomic decomposition and reconstruction of **non-tumor** diploid subclones (2023)
- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
- Support multiple diploid sample
- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.
## Overview of CLEMENT workflow and core algorithms
<br/>

<br/>
## Installation
### Dependencies
- python 3.6.x
- matplotlib 3.5.2
- seaborn 0.11.2
- numpy 1.21.5
- pandas 1.3.4
- scikit-learn 1.0.2
- scipy 1.7.3
- palettable 3.3.0
### Install from github
1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git
cd CLEMENT
pip3 install .
2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git
### Install from PyPi
3. pip3 install CLEMENTDNA
## Version update
1.0.11 (Jan 1st, 2024)
## Input format
As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in _"example"_ directory.
- 1st column: mutation ID (CHR_POS is recommended)
- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
- 3rd column: **Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n** * should be comma-separated, and no space permitted
- 4th column: **BQ1,BQ2....,BQ_n** * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.
## Running
### command line interface
CLEMENT [OPTIONS]
### options
(Mandatory) These options are regarding User's input and output format
--INPUT_TSV Input data whether TSV. The tool automatically detects the number of samples
--CLEMENT_DIR Directory where the outputs of CLEMENT be saved
These options are regarding downsizing User's input or not
--RANDOM_PICK Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).
These options are regarding the selection of likelihood model
--MODEL Model for TP, FN in E-step. (default: betabinomial)
--CONSTANT Constant multiplier for alpha and beta in beta-binomila distribution. (default:1)
These options are adjusting E-M algorithm parameter
--NUM_CLONE_TRIAL_START Minimum number of expected cluster_hards (initation of K) (default: 3)
--NUM_CLONE_TRIAL_END Maximum number of expected cluster_hards (termination of K) (default: 5)
--TRIAL_NO Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
--FP_PRIOR FP_PRIOR Prior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)
--TN_PRIOR TN_PRIOR Prior of true negative (TN). Recommendation : > 0.99. (default : 0.99)
--KMEANS_CLUSTERNO Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
--MIN_CLUSTER_SIZE The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number (default: 9)
Other options
--MODE Selection of clustering method. "Hard": hard clustering only, "Both": both hard and soft (fuzzy) clustering (default: "Both")
--MAKEONE_STRICT 1: strict, 2: lenient, 3: most lenient (default : 1)
--SCORING True : comparing with the answer set, False : just visualization (default: False)
Miscelleneous
--FONT_FAMILY Font family that displayed in the plots (default : "arial")
--VISUALIZATION Whether produce image in every E-M step (default: True)
--IMAGE_FORMAT Image format that displayed in the plots (default : jpg)
--VERBOSE 0: no record, 1: simplified record, 2: verbose record (default: 2)
### output
**${CLEMENT_DIR}"/result"**
- **CLEMENT_decision** _CLEMENT's best recommendation among hard and soft clustering._
- **CLEMENT_hard_1st** _CLEMENT's best decomposition by hard clustering._
- **CLEMENT_hard.gapstatistics.txt** _Selecting the optimal K in hard clustering based on gap* stastics._
- **CLEMENT_soft_1st** _CLEMENT's best decomposition by soft (fuzzy) clustering._
- **membership.txt** _Membership assignment of all variants to each clusters._
- **membership_count.txt** _Count matrix of the membership assignment to each clusters._
- **mixture.txt** _Centroid of each clusters_
## Example
DIR=[YOUR_DIRECTORY]
# Example 1
CLEMENT \
--INPUT_TSV ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt" \
--CLEMENT_DIR ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1" \
--NUM_CLONE_TRIAL_START 1 \
--NUM_CLONE_TRIAL_END 5
# Example 2
CLEMENT \
--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt" \
--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4" \
--NUM_CLONE_TRIAL_START 2 \
--NUM_CLONE_TRIAL_END 6 \
--RANDOM_PICK 500


<br/>
## Contact
goldpm1@yuhs.ac
Raw data
{
"_id": null,
"home_page": "https://github.com/Yonsei-TGIL/CLEMENT",
"name": "CLEMENTDNA",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "CLEMENT, genomic decomposition",
"author": "Young-soo Chung, M.D.",
"author_email": "goldpm1@yuhs.ac",
"download_url": "https://files.pythonhosted.org/packages/48/36/5aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57/CLEMENTDNA-1.0.12.tar.gz",
"platform": null,
"description": "# CLEMENT\n- Genomic decomposition and reconstruction of **non-tumor** diploid subclones (2023)\n- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting\n- Support multiple diploid sample\n- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.\n\n## Overview of CLEMENT workflow and core algorithms\n<br/>\n\n\n<br/>\n\n## Installation\n### Dependencies\n- python 3.6.x\n- matplotlib 3.5.2\n- seaborn 0.11.2\n- numpy 1.21.5\n- pandas 1.3.4\n- scikit-learn 1.0.2\n- scipy 1.7.3\n- palettable 3.3.0\n\n### Install from github\n1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git \n cd CLEMENT \n pip3 install . \n\n2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git \n\n### Install from PyPi\n3. pip3 install CLEMENTDNA \n\n## Version update\n1.0.11 (Jan 1st, 2024)\n\n## Input format\nAs now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in _\"example\"_ directory.\n- 1st column: mutation ID (CHR_POS is recommended)\n- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0\n- 3rd column: **Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n** * should be comma-separated, and no space permitted\n- 4th column: **BQ1,BQ2....,BQ_n** * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.\n\n## Running\n### command line interface\n\tCLEMENT [OPTIONS] \n\n\n### options\n\n\t(Mandatory) These options are regarding User's input and output format\n\t\t--INPUT_TSV\t\tInput data whether TSV. The tool automatically detects the number of samples\n\t\t--CLEMENT_DIR \t\tDirectory where the outputs of CLEMENT be saved\n\n\tThese options are regarding downsizing User's input or not\n\t\t--RANDOM_PICK \t\tSet this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).\n\t\n\tThese options are regarding the selection of likelihood model\n\t\t--MODEL \t\tModel for TP, FN in E-step. (default: betabinomial)\n\t\t--CONSTANT Constant multiplier for alpha and beta in beta-binomila distribution. (default:1)\n\n\tThese options are adjusting E-M algorithm parameter\n\t\t--NUM_CLONE_TRIAL_START \tMinimum number of expected cluster_hards (initation of K) \t(default: 3)\n\t\t--NUM_CLONE_TRIAL_END \t\tMaximum number of expected cluster_hards (termination of K)\t (default: 5)\n\t\t--TRIAL_NO \t\t\tTrial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)\n \t--FP_PRIOR FP_PRIOR \t\tPrior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)\n\t\t--TN_PRIOR TN_PRIOR \t\tPrior of true negative (TN). Recommendation : > 0.99. (default : 0.99)\n\t\t--KMEANS_CLUSTERNO\t\tNumber of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)\n\t\t--MIN_CLUSTER_SIZE\t\tThe minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number \t(default: 9)\n\n\tOther options\n\t\t--MODE\t\t\tSelection of clustering method. \"Hard\": hard clustering only, \"Both\": both hard and soft (fuzzy) clustering (default: \"Both\")\n\t\t--MAKEONE_STRICT \t1: strict, 2: lenient, 3: most lenient (default : 1)\n\t\t--SCORING\t\tTrue : comparing with the answer set, False : just visualization (default: False)\n\t\t\n\n\tMiscelleneous\n\t\t--FONT_FAMILY\t\tFont family that displayed in the plots (default : \"arial\")\n\t\t--VISUALIZATION\t\tWhether produce image in every E-M step (default: True)\n\t\t--IMAGE_FORMAT\t\tImage format that displayed in the plots (default : jpg)\n\t\t--VERBOSE\t\t0: no record, 1: simplified record, 2: verbose record (default: 2)\n\n\n### output\n\n**${CLEMENT_DIR}\"/result\"**\n- **CLEMENT_decision**\t\t_CLEMENT's best recommendation among hard and soft clustering._\n- **CLEMENT_hard_1st** \t_CLEMENT's best decomposition by hard clustering._\n- **CLEMENT_hard.gapstatistics.txt** \t_Selecting the optimal K in hard clustering based on gap* stastics._\n- **CLEMENT_soft_1st** \t_CLEMENT's best decomposition by soft (fuzzy) clustering._\n- **membership.txt** \t_Membership assignment of all variants to each clusters._\n- **membership_count.txt** \t_Count matrix of the membership assignment to each clusters._\n- **mixture.txt** \t_Centroid of each clusters_\n\n## Example\n\tDIR=[YOUR_DIRECTORY]\n\n\t# Example 1\n\tCLEMENT \\\n\t\t--INPUT_TSV ${DIR}\"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt\" \\\n \t--CLEMENT_DIR ${DIR}\"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1\" \\\n \t--NUM_CLONE_TRIAL_START 1 \\\n\t\t--NUM_CLONE_TRIAL_END 5 \n \n\t# Example 2\n\tCLEMENT \\\n\t\t--INPUT_TSV ${DIR}\"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt\" \\\n\t\t--CLEMENT_DIR ${DIR}\"/example/2.CellData/MRS_2D/M1-8_M2-4\" \\\n\t\t--NUM_CLONE_TRIAL_START 2 \\\n\t\t--NUM_CLONE_TRIAL_END 6 \\\n\t\t--RANDOM_PICK 500\n\t\n\t\t\n\n\n\n\n<br/>\n\n## Contact\n\tgoldpm1@yuhs.ac\n\n\n",
"bugtrack_url": null,
"license": "GPL v3",
"summary": "Genomic decomposition and reconstruction of non-tumor diploid subclones",
"version": "1.0.12",
"project_urls": {
"Download": "https://github.com/Yonsei-TGIL/CLEMENT.git",
"Homepage": "https://github.com/Yonsei-TGIL/CLEMENT"
},
"split_keywords": [
"clement",
" genomic decomposition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ecad1009fa99565a3444459a76454fe2897ab8f46a53e536bdfb9a7703bc5121",
"md5": "bebde17f040f65930d1ac2f1b520926b",
"sha256": "d6a8825864bee8553160c171408997278a8bbab4dbfe61631cf9fc33f11eb0ff"
},
"downloads": -1,
"filename": "CLEMENTDNA-1.0.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bebde17f040f65930d1ac2f1b520926b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 170010,
"upload_time": "2024-05-28T11:39:01",
"upload_time_iso_8601": "2024-05-28T11:39:01.490081Z",
"url": "https://files.pythonhosted.org/packages/ec/ad/1009fa99565a3444459a76454fe2897ab8f46a53e536bdfb9a7703bc5121/CLEMENTDNA-1.0.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "48365aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57",
"md5": "b4ec038148b25409d000701994984c01",
"sha256": "4f48977805c6c681519ac60e178c709756b6d0d218b1a9559cc54a843c701e99"
},
"downloads": -1,
"filename": "CLEMENTDNA-1.0.12.tar.gz",
"has_sig": false,
"md5_digest": "b4ec038148b25409d000701994984c01",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 27797685,
"upload_time": "2024-05-28T11:41:04",
"upload_time_iso_8601": "2024-05-28T11:41:04.157582Z",
"url": "https://files.pythonhosted.org/packages/48/36/5aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57/CLEMENTDNA-1.0.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-28 11:41:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Yonsei-TGIL",
"github_project": "CLEMENT",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "clementdna"
}