Name | ENT3C JSON |
Version |
2.0.6
JSON |
| download |
home_page | None |
Summary | Compute similarity between genomic contact matrices with "Entropy 3C" |
upload_time | 2025-07-22 15:27:21 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | None |
keywords |
hi-c
micro-c
similarity
entropy
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.
For a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.
https://github.com/X3N1A/ENT3C
## Requirements
* generate and activate python environment
```
python3.12 -m venv .ent3c\_venv
source .ent3c\_venv/bin/activate
```
* install ENT3C and requirements via ```pyproject.toml```:
```
pip install .
```
## Running ENT3C
### Command-Line Usage
* run ENT3C directly from terminal with:
```
ENT3C <get_entropy|get_similarity|run_all> --config-file=/path/to/config_file/<config.json>
```
* ```<get_entropy>``` subcommand generate a dataframe with entropy values according to <config.json>. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```
* ```<get_similarity>``` subcommand will generate a data frame with similarities according to <config.json> and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv```
* ```<run_all>``` will generate both ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>``` and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv``` data frames.
### or as python API
```
import ENT3C
ENT3C.run_get_entropy("config/config.json")
ENT3C.run_get_similarity("config/config.json")
ENT3C.run_all("config/config.json")
```
## Parameters and configuration files of ENT3C
* The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```.
* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into.
```PHI=1+floor((N-SUB_M_SIZE)./phi)```
where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the number of data points in $S$).
* All implementations (```ENT3C.py```, ```ENT3C.jl``` and ```ENT3C.m```) use a configuration file in JSON format.
* example can be found in <config/config.json>
**ENT3C parameters defined in ```config/config.json```**
1) ```"DATA_PATH": "DATA"``` $\dots$ input data path.
2) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```
```
"FILES": [
"ENCSR079VIJ.BioRep1.40kb.cool",
"G401_BR1",
"ENCSR079VIJ.BioRep2.40kb.cool",
"G401_BR2"]
```
- ENT3C also takes ```mcool``` files as input. Please refer to biological replicates as "_BR%d" in the <SHORT_NAME>.
⚠ if comparing biological replicate samples, please ensure they are indicated as <_BR\#> in the config file ⚠
4) ```"`OUT_DIR": "OUTPUT/"``` $\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.
5) ```"OUT_PREFIX": "40kb"``` $\dots$ prefix for output files.
6) ```"Resolution": "40e3,100e3"``` $\dots$ resolutions to be evaluated.
7) ```"ChrNr": "15,16,17,18,19,20,21,22,X"``` $\dots$ chromosome numbers to be evaluated.
8) ```"NormM": 0``` $\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.
9) ```"WEIGHTS_NAME": "weight"``` $\dots$ name of dataset in cooler containing normalization weights.
10) ```"SUB_M_SIZE_FIX": null``` $\dots$ fixed submatrix dimension.
11) ```"CHRSPLIT": 10``` $\dots$ number of submatrices into which the contact matrix is partitioned into.
12) ```"phi": 1``` $\dots$ number of bins to the next matrix.
13) ```"PHI_MAX": 1000``` $\dots$ number of submatrices; i.e. number of data points in entropy signal $S$.
If set, $\varphi$ is increased until $\Phi \approx \Phi\_{\max}$.
## Output files:
* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.
```
Resolution ChrNr Sample1 Sample2 Q
cat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv | head
Resolution ChrNr Sample1 Sample2 Q
40000 2 HFFc6_BR2 A549_BR2 0.5584659814117208
40000 2 HFFc6_BR2 G401_BR2 0.6594518933893059
40000 2 HFFc6_BR2 HFFc6_BR1 0.8473530463515314
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
```
* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\dots$ ENT3C output table.
```
Name ChrNr Resolution n PHI phi binNrStart binNrEND START END S
cat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv | head
Resolution ChrNr Sample1 Sample2 Q
Name ChrNr Resolution n PHI phi binNrStart binNrEnd START END S
G401_BR1 2 40000 600 901 6 0 599 0 24000000 4.067424893091131
G401_BR1 2 40000 600 901 6 6 605 240000 24240000 4.06198007393338
G401_BR1 2 40000 600 901 6 12 611 480000 24480000 4.055473536905049
G401_BR1 2 40000 600 901 6 18 617 720000 24720000 4.048004132456738
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
```
Each row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.
Raw data
{
"_id": null,
"home_page": null,
"name": "ENT3C",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "Hi-C, micro-C, similarity, entropy",
"author": null,
"author_email": "Xenia Lainscsek <108679125+X3N1A@users.noreply.github.com>",
"download_url": "https://files.pythonhosted.org/packages/0f/c1/be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086/ent3c-2.0.6.tar.gz",
"platform": null,
"description": "ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.\nFor a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.\n\nhttps://github.com/X3N1A/ENT3C\n\n\n## Requirements\n\n* generate and activate python environment \n\t\n\t```\n\tpython3.12 -m venv .ent3c\\_venv\n\n\tsource .ent3c\\_venv/bin/activate\n\t```\n\n* install ENT3C and requirements via ```pyproject.toml```: \n\n\t```\n\tpip install .\n\t```\n\n \n\n## Running ENT3C\n\n### Command-Line Usage \n* run ENT3C directly from terminal with: \n\n```\nENT3C <get_entropy|get_similarity|run_all> --config-file=/path/to/config_file/<config.json>\n```\n\t\n* ```<get_entropy>``` subcommand generate a dataframe with entropy values according to <config.json>. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```\n\t\n* ```<get_similarity>``` subcommand will generate a data frame with similarities according to <config.json> and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv```\n\t\n* ```<run_all>``` will generate both ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>``` and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv``` data frames. \n\n### or as python API \n\n```\nimport ENT3C\nENT3C.run_get_entropy(\"config/config.json\")\nENT3C.run_get_similarity(\"config/config.json\")\nENT3C.run_all(\"config/config.json\")\n```\n\n## Parameters and configuration files of ENT3C\n\n* The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. \n\n\t* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. \n\n\t ```PHI=1+floor((N-SUB_M_SIZE)./phi)```\n\n\t where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the number of data points in $S$).\n\n* All implementations (```ENT3C.py```, ```ENT3C.jl``` and ```ENT3C.m```) use a configuration file in JSON format. \n\t* example can be found in <config/config.json>\n\n**ENT3C parameters defined in ```config/config.json```**\n1) ```\"DATA_PATH\": \"DATA\"``` $\\dots$ input data path. \n\n2) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```\n``` \n\"FILES\": [\n\t\"ENCSR079VIJ.BioRep1.40kb.cool\",\n \n\t\"G401_BR1\",\n \n\t\"ENCSR079VIJ.BioRep2.40kb.cool\",\n \n\t\"G401_BR2\"]\n``` \n- ENT3C also takes ```mcool``` files as input. Please refer to biological replicates as \"_BR%d\" in the <SHORT_NAME>.\n\n⚠ if comparing biological replicate samples, please ensure they are indicated as <_BR\\#> in the config file ⚠\n\n4) ```\"`OUT_DIR\": \"OUTPUT/\"``` $\\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.\n\n5) ```\"OUT_PREFIX\": \"40kb\"``` $\\dots$ prefix for output files.\n\n6) ```\"Resolution\": \"40e3,100e3\"``` $\\dots$ resolutions to be evaluated. \n\n7) ```\"ChrNr\": \"15,16,17,18,19,20,21,22,X\"``` $\\dots$ chromosome numbers to be evaluated.\n\n8) ```\"NormM\": 0``` $\\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.\n\n9) ```\"WEIGHTS_NAME\": \"weight\"``` $\\dots$ name of dataset in cooler containing normalization weights.\n\n10) ```\"SUB_M_SIZE_FIX\": null``` $\\dots$ fixed submatrix dimension.\n\n11) ```\"CHRSPLIT\": 10``` $\\dots$ number of submatrices into which the contact matrix is partitioned into.\n\n12) ```\"phi\": 1``` $\\dots$ number of bins to the next matrix.\n\n13) ```\"PHI_MAX\": 1000``` $\\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. \nIf set, $\\varphi$ is increased until $\\Phi \\approx \\Phi\\_{\\max}$.\n\n## Output files:\n* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score. \n```\nResolution\tChrNr\tSample1\tSample2\tQ\ncat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv | head\nResolution\tChrNr\tSample1\tSample2\tQ\n40000\t2\tHFFc6_BR2\tA549_BR2\t0.5584659814117208\n40000\t2\tHFFc6_BR2\tG401_BR2\t0.6594518933893059\n40000\t2\tHFFc6_BR2\tHFFc6_BR1\t0.8473530463515314\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n```\n\n* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\\dots$ ENT3C output table. \n\n```\nName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEND\tSTART\tEND\tS\ncat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv | head\nResolution\tChrNr\tSample1\tSample2\tQ\nName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEnd\tSTART\tEND\tS\nG401_BR1\t2\t40000\t600\t901\t6\t0\t599\t0\t24000000\t4.067424893091131\nG401_BR1\t2\t40000\t600\t901\t6\t6\t605\t240000\t24240000\t4.06198007393338\nG401_BR1\t2\t40000\t600\t901\t6\t12\t611\t480000\t24480000\t4.055473536905049\nG401_BR1\t2\t40000\t600\t901\t6\t18\t617\t720000\t24720000\t4.048004132456738\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n```\nEach row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.\n",
"bugtrack_url": null,
"license": null,
"summary": "Compute similarity between genomic contact matrices with \"Entropy 3C\" ",
"version": "2.0.6",
"project_urls": {
"Repository": "https://github.com/X3N1A/ENT3C"
},
"split_keywords": [
"hi-c",
" micro-c",
" similarity",
" entropy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f9ff0aae38951c4ba6520690389343d44f5d7b68ac9dcad1b30f69b535ffa648",
"md5": "09ea245ed0837942974d1c6a497996ca",
"sha256": "04adf0fdc022846e11112e22733801ad782fe67172308934965a0495b14ae3a4"
},
"downloads": -1,
"filename": "ent3c-2.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "09ea245ed0837942974d1c6a497996ca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 24647,
"upload_time": "2025-07-22T15:27:20",
"upload_time_iso_8601": "2025-07-22T15:27:20.275257Z",
"url": "https://files.pythonhosted.org/packages/f9/ff/0aae38951c4ba6520690389343d44f5d7b68ac9dcad1b30f69b535ffa648/ent3c-2.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0fc1be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086",
"md5": "605c21d0db0f963b91d48fbbf91f0dd1",
"sha256": "b0b200b63c5130da28893d707ac9578abede622cade398e44d1dcdde3ae08a58"
},
"downloads": -1,
"filename": "ent3c-2.0.6.tar.gz",
"has_sig": false,
"md5_digest": "605c21d0db0f963b91d48fbbf91f0dd1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 27505,
"upload_time": "2025-07-22T15:27:21",
"upload_time_iso_8601": "2025-07-22T15:27:21.403154Z",
"url": "https://files.pythonhosted.org/packages/0f/c1/be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086/ent3c-2.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-22 15:27:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "X3N1A",
"github_project": "ENT3C",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ent3c"
}