| Name | ENT3C JSON |
| Version |
2.2.2
JSON |
| download |
| home_page | None |
| Summary | Compute similarity between genomic contact matrices with "Entropy 3C" |
| upload_time | 2025-08-12 09:33:53 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.11 |
| license | None |
| keywords |
hi-c
micro-c
similarity
entropy
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.
For a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.
https://github.com/X3N1A/ENT3C
## Installation
1) generate and activate python environment
```
python3.11 -m venv .ent3c_venv
source .ent3c_venv/bin/activate
```
2) install ENT3C:
```
pip install ENT3C
```
# Usage
* CLI (python) usage:
```
Usage:
ENT3C <command> --config=<path/to/config.json> [options]
Commands:
get_entropy Generates entropy output file <entropy_out_FN> .
get_similarity Generates similarity output file <similarity_out_FN> from <entropy_out_FN>.
run_all Generates <entropy_out_FN> and <similarity_out_FN>.
compare_groups Compare signal groups (requires --group1 and --group2 options)
Global Options:
--config=<path> Path to config JSON file (required for all commands)
<compare_groups> Options:
--group1=<GROUP> First group name, must correspond to what comes before _BR* in config file.
--group2=<GROUP> Second group name, must correspond to what comes before _BR* in config file.
Examples:
ENT3C run_all --config=configs/myconfig.json
ENT3C get_entropy --config=configs/myconfig.json
ENT3C get_similarity --config=configs/myconfig.json
ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562
```
* alternatively run ENT3C in python as:
```
import ENT3C
ENT3C_OUT = ENT3C.run_get_entropy("config/myconfig.json")
Similarity = ENT3C.run_get_similarity("config/myconfig.json")
ENT3C_OUT, Similarity = ENT3C.run_all("config/myconfig.json")
EUCLIDEAN = ENT3C.run_compare_groups("config/myconfig.json",group1,group2)
```
* all ENT3C parameters are defined in .json files ```config/config.json```. Examples can be found in ```config``` directory.
* Paremeters defined in <config_file>:
1) The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```.
* ```"SUB_M_SIZE_FIX": <integer>``` $\dots$ fixed submatrix dimension.
* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into.
```PHI=1+floor((N-SUB_M_SIZE)./phi)```
where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the number of data points in $S$).
* ```"CHRSPLIT": <integer>``` $\dots$ number of submatrices into which the contact matrix is partitioned into. If specified, then ``"SUB_M_SIZE_FIX": null`` otherwise ``"CHRSPLIT": null``.
2) ```"DATA_PATH": </path/to/data> ``` $\dots$ input data path.
3) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```
```
"FILES": [
"ENCSR079VIJ.BioRep1.40kb.cool",
"G401_BR1",
"ENCSR079VIJ.BioRep2.40kb.cool",
"G401_BR2"]
```
* Any biological replicates must be indicated in <SHORT_NAME> using the suffix "_BR%d".
* **Note:** ENT3C also takes ```mcool``` files as input.
4) ```"`OUT_DIR": "<desired_output_directory_name>"``` $\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.
5) ```"OUT_PREFIX": "<desired_output_prefix_>"``` $\dots$ prefix for output files.
6) ```"Resolution": "<integer,integer,...>" e.g. "40e3,100e3"``` $\dots$ resolutions to be evaluated.
7) ```"ChrNr": "<integer,integer,...>" "15,16,17,18,19,20,21,22,X"``` $\dots$ chromosome numbers to be evaluated.
8) ```"NormM": <0|1>``` $\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.
9) ```"WEIGHTS_NAME": "<name_of_weights>"``` $\dots$ name of dataset in cooler containing normalization weights.
10) ```"phi": <integer>``` $\dots$ number of bins to the next matrix.
11) ```"PHI_MAX": <integer>``` $\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. If set, $\varphi$ is increased until $\Phi \approx \Phi\_{\max}$.
# Output files:
1) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.
```
Resolution ChrNr Sample1 Sample2 Q
40000 2 HFFc6_BR3 A549_BR2 0.6132789056404898
40000 2 HFFc6_BR3 LNCap_BR2 0.3126805134567409
40000 2 HFFc6_BR3 LNCap_BR1 0.4221187669214683
40000 2 HFFc6_BR3 HFFc6_BR2 0.9632461160758761
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
```
2) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\dots$ ENT3C output table.
```
Name ChrNr Resolution n PHI phi binNrStart binNrEND START END S
G401_BR1 2 40000 500 918 6 0 499 0 20000000 3.7896426915562462
G401_BR1 2 40000 500 918 6 6 505 240000 20240000 3.789044181663418
G401_BR1 2 40000 500 918 6 12 511 480000 20480000 3.7918253959272032
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
```
Each row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.
- Example of output generated for ```ENT3C get_entropy --config=config/myconfig.json```:
- ```EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf```
- unbalanced 40kb contact matrices for even chromosomes across 5 cell lines. ```SUB_MATRIX_SIZE``` was 500:
<figure>
<img src="OUTPUT/PYTHON/EvenChromosomes_NoWeights_40kb_ENT3C_signals.png" style="max-width:70%;"
alt="ENT3C python Output">
</figure>
3) ```<OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv``` $\dots$ Euclidean distance between average z-scores of S over ```<group1>``` and ```<group2>```:
(here group1=HFFc6, group2=G401)
```
Resolution ChrNr START END meanS_Euclidean
40000 6 62360000 82360000 3.3625023926723685
40000 6 62120000 82120000 3.3546076641065095
40000 6 61880000 81880000 3.3441925121710026
```
- Example of first page of output generated for ```ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = "G401"```
- ```EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf```
<figure>
<img src="OUTPUT/PYTHON/EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.png" style="max-width:60%;"
alt="ENT3C python Output">
</figure>
Raw data
{
"_id": null,
"home_page": null,
"name": "ENT3C",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "Hi-C, micro-C, similarity, entropy",
"author": null,
"author_email": "Xenia Lainscsek <108679125+X3N1A@users.noreply.github.com>",
"download_url": "https://files.pythonhosted.org/packages/fe/ab/c1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a/ent3c-2.2.2.tar.gz",
"platform": null,
"description": "ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.\nFor a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.\n\nhttps://github.com/X3N1A/ENT3C\n\n\n## Installation\n\n1) generate and activate python environment \n\t\n\t```\n\tpython3.11 -m venv .ent3c_venv\n\n\tsource .ent3c_venv/bin/activate\n\t```\n\n2) install ENT3C:\n\n\t```\n\tpip install ENT3C\n\t```\n\n# Usage \n\n* CLI (python) usage:\n\n\t```\n\tUsage:\n \tENT3C <command> --config=<path/to/config.json> [options]\n\n \tCommands:\n get_entropy Generates entropy output file <entropy_out_FN> .\n get_similarity Generates similarity output file <similarity_out_FN> from <entropy_out_FN>.\n run_all Generates <entropy_out_FN> and <similarity_out_FN>.\n compare_groups Compare signal groups (requires --group1 and --group2 options)\n\n \tGlobal Options:\n --config=<path> Path to config JSON file (required for all commands)\n\n \t<compare_groups> Options:\n \t--group1=<GROUP> First group name, must correspond to what comes before _BR* in config file.\n \t--group2=<GROUP> Second group name, must correspond to what comes before _BR* in config file.\n\n\t\tExamples:\n ENT3C run_all --config=configs/myconfig.json\n ENT3C get_entropy --config=configs/myconfig.json\n ENT3C get_similarity --config=configs/myconfig.json\n ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562\n\t```\n\n* alternatively run ENT3C in python as:\n\n\t```\n\timport ENT3C\n\n\tENT3C_OUT = ENT3C.run_get_entropy(\"config/myconfig.json\")\n\n\tSimilarity = ENT3C.run_get_similarity(\"config/myconfig.json\")\n\n\tENT3C_OUT, Similarity = ENT3C.run_all(\"config/myconfig.json\")\n\n\tEUCLIDEAN = ENT3C.run_compare_groups(\"config/myconfig.json\",group1,group2)\n\n\t```\n\n* all ENT3C parameters are defined in .json files ```config/config.json```. Examples can be found in ```config``` directory.\n\n* Paremeters defined in <config_file>: \n\n\t1) The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. \n\n\t\t* ```\"SUB_M_SIZE_FIX\": <integer>``` $\\dots$ fixed submatrix dimension.\n\n\t\t\t* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. \n\n\t\t\t```PHI=1+floor((N-SUB_M_SIZE)./phi)```\n\n\t\t\twhere ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the \tnumber of data points in $S$).\n\n\t\t* ```\"CHRSPLIT\": <integer>``` $\\dots$ number of submatrices into which the contact matrix is partitioned into. If specified, then ``\"SUB_M_SIZE_FIX\": null`` otherwise ``\"CHRSPLIT\": null``. \n\n\t2) ```\"DATA_PATH\": </path/to/data> ``` $\\dots$ input data path. \n\n\t3) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```\n\t\t``` \n\t\t\"FILES\": [\n\t\t\t\"ENCSR079VIJ.BioRep1.40kb.cool\",\n\t\t\t\"G401_BR1\",\n\t\t\t\"ENCSR079VIJ.BioRep2.40kb.cool\",\n\t\t\t\"G401_BR2\"]\n\t\t``` \n\t\t* Any biological replicates must be indicated in <SHORT_NAME> using the suffix \"_BR%d\".\n\n\t\t* **Note:** ENT3C also takes ```mcool``` files as input. \n\n\t4) ```\"`OUT_DIR\": \"<desired_output_directory_name>\"``` $\\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.\n\n\t5) ```\"OUT_PREFIX\": \"<desired_output_prefix_>\"``` $\\dots$ prefix for output files.\n\n\t6) ```\"Resolution\": \"<integer,integer,...>\" e.g. \"40e3,100e3\"``` $\\dots$ resolutions to be evaluated. \n\n\t7) ```\"ChrNr\": \"<integer,integer,...>\" \"15,16,17,18,19,20,21,22,X\"``` $\\dots$ chromosome numbers to be evaluated.\n\n\t8) ```\"NormM\": <0|1>``` $\\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.\n\n\t9) ```\"WEIGHTS_NAME\": \"<name_of_weights>\"``` $\\dots$ name of dataset in cooler containing normalization weights.\n\n\t10) ```\"phi\": <integer>``` $\\dots$ number of bins to the next matrix.\n\n\t11) ```\"PHI_MAX\": <integer>``` $\\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. If set, $\\varphi$ is increased until $\\Phi \\approx \\Phi\\_{\\max}$.\n\n\n# Output files:\n\n1) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score. \n\t```\n\tResolution\tChrNr\tSample1\tSample2\tQ\n\t40000\t2\tHFFc6_BR3\tA549_BR2\t0.6132789056404898\n\t40000\t2\tHFFc6_BR3\tLNCap_BR2\t0.3126805134567409\n\t40000\t2\tHFFc6_BR3\tLNCap_BR1\t0.4221187669214683\n\t40000\t2\tHFFc6_BR3\tHFFc6_BR2\t0.9632461160758761\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t```\n\n2) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\\dots$ ENT3C output table. \n\n\t```\n\tName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEND\tSTART\tEND\tS\n\tG401_BR1\t2\t40000\t500\t918\t6\t0\t499\t0\t20000000\t3.7896426915562462\n\tG401_BR1\t2\t40000\t500\t918\t6\t6\t505\t240000\t20240000\t3.789044181663418\n\tG401_BR1\t2\t40000\t500\t918\t6\t12\t511\t480000\t20480000\t3.7918253959272032\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t```\n\n\tEach row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.\n\n\n\t- Example of output generated for ```ENT3C get_entropy --config=config/myconfig.json```:\n\t\t- ```EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf```\n\t\t- unbalanced 40kb contact matrices for even chromosomes across 5 cell lines. ```SUB_MATRIX_SIZE``` was 500:\n<figure>\n <img src=\"OUTPUT/PYTHON/EvenChromosomes_NoWeights_40kb_ENT3C_signals.png\" style=\"max-width:70%;\"\n alt=\"ENT3C python Output\">\n</figure>\n\n\n3) ```<OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv``` $\\dots$ Euclidean distance between average z-scores of S over ```<group1>``` and ```<group2>```:\n\t(here group1=HFFc6, group2=G401)\n\n\t```\n\tResolution\tChrNr\tSTART\tEND\tmeanS_Euclidean\n\t40000\t6\t62360000\t82360000\t3.3625023926723685\n\t40000\t6\t62120000\t82120000\t3.3546076641065095\n\t40000\t6\t61880000\t81880000\t3.3441925121710026\n\t```\n\n\t- Example of first page of output generated for ```ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = \"G401\"```\n\t\t- ```EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf```\n\n<figure>\n <img src=\"OUTPUT/PYTHON/EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.png\" style=\"max-width:60%;\"\n alt=\"ENT3C python Output\">\n</figure>\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Compute similarity between genomic contact matrices with \"Entropy 3C\" ",
"version": "2.2.2",
"project_urls": {
"Repository": "https://github.com/X3N1A/ENT3C"
},
"split_keywords": [
"hi-c",
" micro-c",
" similarity",
" entropy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "23f6fdccfeb2885a67d528f9609c49777aa3c95bf9abda8f90a80f9c54a70777",
"md5": "998ae1bc83f4c9543c49e333401b6e7a",
"sha256": "260dbc84edd0eb5cb3e16872eb594ae493f04bbc1a1e01ceb108e5621737aca1"
},
"downloads": -1,
"filename": "ent3c-2.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "998ae1bc83f4c9543c49e333401b6e7a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 30700,
"upload_time": "2025-08-12T09:33:52",
"upload_time_iso_8601": "2025-08-12T09:33:52.375364Z",
"url": "https://files.pythonhosted.org/packages/23/f6/fdccfeb2885a67d528f9609c49777aa3c95bf9abda8f90a80f9c54a70777/ent3c-2.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "feabc1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a",
"md5": "6ca819988b1bed838ff3fad3f55ee573",
"sha256": "a0d597d79e89d7c6d8dd473b204dd2ed5e9e77a724bfdc9cdaf3397f1f16d95a"
},
"downloads": -1,
"filename": "ent3c-2.2.2.tar.gz",
"has_sig": false,
"md5_digest": "6ca819988b1bed838ff3fad3f55ee573",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 32761,
"upload_time": "2025-08-12T09:33:53",
"upload_time_iso_8601": "2025-08-12T09:33:53.751979Z",
"url": "https://files.pythonhosted.org/packages/fe/ab/c1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a/ent3c-2.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 09:33:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "X3N1A",
"github_project": "ENT3C",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ent3c"
}