ENT3C


NameENT3C JSON
Version 2.0.6 PyPI version JSON
download
home_pageNone
SummaryCompute similarity between genomic contact matrices with "Entropy 3C"
upload_time2025-07-22 15:27:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords hi-c micro-c similarity entropy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.
For a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.

https://github.com/X3N1A/ENT3C


## Requirements

* generate and activate python environment 
	
	```
	python3.12 -m venv .ent3c\_venv

	source .ent3c\_venv/bin/activate
	```

* install ENT3C and requirements via ```pyproject.toml```: 

	```
	pip install .
	```

  

## Running ENT3C

### Command-Line Usage 
* run ENT3C directly from terminal with: 

```
ENT3C <get_entropy|get_similarity|run_all> --config-file=/path/to/config_file/<config.json>
```
	
* ```<get_entropy>``` subcommand generate a dataframe with entropy values according to <config.json>. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```
	
* ```<get_similarity>``` subcommand will generate a data frame with similarities according to <config.json> and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv```
	
* ```<run_all>``` will generate both ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>``` and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv``` data frames. 

### or as python API 

```
import ENT3C
ENT3C.run_get_entropy("config/config.json")
ENT3C.run_get_similarity("config/config.json")
ENT3C.run_all("config/config.json")
```

## Parameters and configuration files of ENT3C

* The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. 

	* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. 

	  ```PHI=1+floor((N-SUB_M_SIZE)./phi)```

	  where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the number of data points in $S$).

* All implementations (```ENT3C.py```, ```ENT3C.jl``` and ```ENT3C.m```) use a configuration file in JSON format. 
	* example can be found in <config/config.json>

**ENT3C parameters defined in ```config/config.json```**
1) ```"DATA_PATH": "DATA"``` $\dots$ input data path. 

2) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```
``` 
"FILES": [
	"ENCSR079VIJ.BioRep1.40kb.cool",
 
	"G401_BR1",
 
	"ENCSR079VIJ.BioRep2.40kb.cool",
 
	"G401_BR2"]
``` 
- ENT3C also takes ```mcool``` files as input. Please refer to biological replicates as "_BR%d" in the <SHORT_NAME>.

&#9888; if comparing biological replicate samples, please ensure they are indicated as <_BR\#> in the config file &#9888;

4) ```"`OUT_DIR": "OUTPUT/"``` $\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.

5) ```"OUT_PREFIX": "40kb"``` $\dots$ prefix for output files.

6) ```"Resolution": "40e3,100e3"``` $\dots$ resolutions to be evaluated. 

7) ```"ChrNr": "15,16,17,18,19,20,21,22,X"``` $\dots$ chromosome numbers to be evaluated.

8) ```"NormM": 0``` $\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.

9) ```"WEIGHTS_NAME": "weight"``` $\dots$ name of dataset in cooler containing normalization weights.

10) ```"SUB_M_SIZE_FIX": null``` $\dots$ fixed submatrix dimension.

11) ```"CHRSPLIT": 10``` $\dots$ number of submatrices into which the contact matrix is partitioned into.

12) ```"phi": 1``` $\dots$ number of bins to the next matrix.

13) ```"PHI_MAX": 1000``` $\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. 
If set, $\varphi$ is increased until $\Phi \approx \Phi\_{\max}$.

## Output files:
* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.  
```
Resolution	ChrNr	Sample1	Sample2	Q
cat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv  | head
Resolution	ChrNr	Sample1	Sample2	Q
40000	2	HFFc6_BR2	A549_BR2	0.5584659814117208
40000	2	HFFc6_BR2	G401_BR2	0.6594518933893059
40000	2	HFFc6_BR2	HFFc6_BR1	0.8473530463515314
.		.	.		.	.	.		.		.		.		.
.		.	.		.	.	.		.		.		.		.
.		.	.		.	.	.		.		.		.		.
```

* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\dots$ ENT3C output table. 

```
Name	ChrNr	Resolution	n	PHI	phi	binNrStart	binNrEND	START	END	S
cat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv  | head
Resolution	ChrNr	Sample1	Sample2	Q
Name	ChrNr	Resolution	n	PHI	phi	binNrStart	binNrEnd	START	END	S
G401_BR1	2	40000	600	901	6	0	599	0	24000000	4.067424893091131
G401_BR1	2	40000	600	901	6	6	605	240000	24240000	4.06198007393338
G401_BR1	2	40000	600	901	6	12	611	480000	24480000	4.055473536905049
G401_BR1	2	40000	600	901	6	18	617	720000	24720000	4.048004132456738
.		.	.		.	.	.		.		.		.		.
.		.	.		.	.	.		.		.		.		.
.		.	.		.	.	.		.		.		.		.
```
Each row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ENT3C",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "Hi-C, micro-C, similarity, entropy",
    "author": null,
    "author_email": "Xenia Lainscsek <108679125+X3N1A@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/0f/c1/be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086/ent3c-2.0.6.tar.gz",
    "platform": null,
    "description": "ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.\nFor a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.\n\nhttps://github.com/X3N1A/ENT3C\n\n\n## Requirements\n\n* generate and activate python environment \n\t\n\t```\n\tpython3.12 -m venv .ent3c\\_venv\n\n\tsource .ent3c\\_venv/bin/activate\n\t```\n\n* install ENT3C and requirements via ```pyproject.toml```: \n\n\t```\n\tpip install .\n\t```\n\n  \n\n## Running ENT3C\n\n### Command-Line Usage \n* run ENT3C directly from terminal with: \n\n```\nENT3C <get_entropy|get_similarity|run_all> --config-file=/path/to/config_file/<config.json>\n```\n\t\n* ```<get_entropy>``` subcommand generate a dataframe with entropy values according to <config.json>. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```\n\t\n* ```<get_similarity>``` subcommand will generate a data frame with similarities according to <config.json> and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>```. Output: ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv```\n\t\n* ```<run_all>``` will generate both ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_OUT.csv>``` and ```OUTPUT/PYTHON/<OUT_PREFIX>_<_ENT3C_similarity.csv``` data frames. \n\n### or as python API \n\n```\nimport ENT3C\nENT3C.run_get_entropy(\"config/config.json\")\nENT3C.run_get_similarity(\"config/config.json\")\nENT3C.run_all(\"config/config.json\")\n```\n\n## Parameters and configuration files of ENT3C\n\n* The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. \n\n\t* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. \n\n\t  ```PHI=1+floor((N-SUB_M_SIZE)./phi)```\n\n\t  where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the number of data points in $S$).\n\n* All implementations (```ENT3C.py```, ```ENT3C.jl``` and ```ENT3C.m```) use a configuration file in JSON format. \n\t* example can be found in <config/config.json>\n\n**ENT3C parameters defined in ```config/config.json```**\n1) ```\"DATA_PATH\": \"DATA\"``` $\\dots$ input data path. \n\n2) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```\n``` \n\"FILES\": [\n\t\"ENCSR079VIJ.BioRep1.40kb.cool\",\n \n\t\"G401_BR1\",\n \n\t\"ENCSR079VIJ.BioRep2.40kb.cool\",\n \n\t\"G401_BR2\"]\n``` \n- ENT3C also takes ```mcool``` files as input. Please refer to biological replicates as \"_BR%d\" in the <SHORT_NAME>.\n\n&#9888; if comparing biological replicate samples, please ensure they are indicated as <_BR\\#> in the config file &#9888;\n\n4) ```\"`OUT_DIR\": \"OUTPUT/\"``` $\\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.\n\n5) ```\"OUT_PREFIX\": \"40kb\"``` $\\dots$ prefix for output files.\n\n6) ```\"Resolution\": \"40e3,100e3\"``` $\\dots$ resolutions to be evaluated. \n\n7) ```\"ChrNr\": \"15,16,17,18,19,20,21,22,X\"``` $\\dots$ chromosome numbers to be evaluated.\n\n8) ```\"NormM\": 0``` $\\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.\n\n9) ```\"WEIGHTS_NAME\": \"weight\"``` $\\dots$ name of dataset in cooler containing normalization weights.\n\n10) ```\"SUB_M_SIZE_FIX\": null``` $\\dots$ fixed submatrix dimension.\n\n11) ```\"CHRSPLIT\": 10``` $\\dots$ number of submatrices into which the contact matrix is partitioned into.\n\n12) ```\"phi\": 1``` $\\dots$ number of bins to the next matrix.\n\n13) ```\"PHI_MAX\": 1000``` $\\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. \nIf set, $\\varphi$ is increased until $\\Phi \\approx \\Phi\\_{\\max}$.\n\n## Output files:\n* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.  \n```\nResolution\tChrNr\tSample1\tSample2\tQ\ncat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv  | head\nResolution\tChrNr\tSample1\tSample2\tQ\n40000\t2\tHFFc6_BR2\tA549_BR2\t0.5584659814117208\n40000\t2\tHFFc6_BR2\tG401_BR2\t0.6594518933893059\n40000\t2\tHFFc6_BR2\tHFFc6_BR1\t0.8473530463515314\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n```\n\n* ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\\dots$ ENT3C output table. \n\n```\nName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEND\tSTART\tEND\tS\ncat OUTPUT/PYTHON/EvenChromosomes_NoWeights_ENT3C_similarity.csv  | head\nResolution\tChrNr\tSample1\tSample2\tQ\nName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEnd\tSTART\tEND\tS\nG401_BR1\t2\t40000\t600\t901\t6\t0\t599\t0\t24000000\t4.067424893091131\nG401_BR1\t2\t40000\t600\t901\t6\t6\t605\t240000\t24240000\t4.06198007393338\nG401_BR1\t2\t40000\t600\t901\t6\t12\t611\t480000\t24480000\t4.055473536905049\nG401_BR1\t2\t40000\t600\t901\t6\t18\t617\t720000\t24720000\t4.048004132456738\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n```\nEach row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Compute similarity between genomic contact matrices with \"Entropy 3C\" ",
    "version": "2.0.6",
    "project_urls": {
        "Repository": "https://github.com/X3N1A/ENT3C"
    },
    "split_keywords": [
        "hi-c",
        " micro-c",
        " similarity",
        " entropy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f9ff0aae38951c4ba6520690389343d44f5d7b68ac9dcad1b30f69b535ffa648",
                "md5": "09ea245ed0837942974d1c6a497996ca",
                "sha256": "04adf0fdc022846e11112e22733801ad782fe67172308934965a0495b14ae3a4"
            },
            "downloads": -1,
            "filename": "ent3c-2.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "09ea245ed0837942974d1c6a497996ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 24647,
            "upload_time": "2025-07-22T15:27:20",
            "upload_time_iso_8601": "2025-07-22T15:27:20.275257Z",
            "url": "https://files.pythonhosted.org/packages/f9/ff/0aae38951c4ba6520690389343d44f5d7b68ac9dcad1b30f69b535ffa648/ent3c-2.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0fc1be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086",
                "md5": "605c21d0db0f963b91d48fbbf91f0dd1",
                "sha256": "b0b200b63c5130da28893d707ac9578abede622cade398e44d1dcdde3ae08a58"
            },
            "downloads": -1,
            "filename": "ent3c-2.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "605c21d0db0f963b91d48fbbf91f0dd1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 27505,
            "upload_time": "2025-07-22T15:27:21",
            "upload_time_iso_8601": "2025-07-22T15:27:21.403154Z",
            "url": "https://files.pythonhosted.org/packages/0f/c1/be769cc58f67493e2373801e771642f6b232fc8d977b51442ea4b660b086/ent3c-2.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 15:27:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "X3N1A",
    "github_project": "ENT3C",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ent3c"
}
        
Elapsed time: 1.26298s