ENT3C


NameENT3C JSON
Version 2.2.2 PyPI version JSON
download
home_pageNone
SummaryCompute similarity between genomic contact matrices with "Entropy 3C"
upload_time2025-08-12 09:33:53
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords hi-c micro-c similarity entropy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.
For a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.

https://github.com/X3N1A/ENT3C


## Installation

1) generate and activate python environment 
	
	```
	python3.11 -m venv .ent3c_venv

	source .ent3c_venv/bin/activate
	```

2) install ENT3C:

	```
	pip install ENT3C
	```

# Usage 

* CLI (python) usage:

	```
	Usage:
    	ENT3C <command> --config=<path/to/config.json> [options]

    	Commands:
            get_entropy        Generates entropy output file <entropy_out_FN> .
            get_similarity           Generates similarity output file <similarity_out_FN> from <entropy_out_FN>.
            run_all            Generates <entropy_out_FN> and <similarity_out_FN>.
            compare_groups     Compare signal groups (requires --group1 and --group2 options)

    	Global Options:
            --config=<path>    Path to config JSON file (required for all commands)

    	<compare_groups> Options:
        	--group1=<GROUP>        First group name, must correspond to what comes before _BR* in config file.
        	--group2=<GROUP>        Second group name, must correspond to what comes before _BR* in config file.

		Examples:
            ENT3C run_all --config=configs/myconfig.json
            ENT3C get_entropy --config=configs/myconfig.json
            ENT3C get_similarity --config=configs/myconfig.json
            ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562
	```

* alternatively run ENT3C in python as:

	```
	import ENT3C

	ENT3C_OUT = ENT3C.run_get_entropy("config/myconfig.json")

	Similarity = ENT3C.run_get_similarity("config/myconfig.json")

	ENT3C_OUT, Similarity = ENT3C.run_all("config/myconfig.json")

	EUCLIDEAN = ENT3C.run_compare_groups("config/myconfig.json",group1,group2)

	```

* all ENT3C parameters are defined in .json files ```config/config.json```. Examples can be found in ```config``` directory.

* Paremeters defined in <config_file>: 

	1) The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. 

		* ```"SUB_M_SIZE_FIX": <integer>``` $\dots$ fixed submatrix dimension.

			* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. 

			```PHI=1+floor((N-SUB_M_SIZE)./phi)```

			where ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the 	number of data points in $S$).

		* ```"CHRSPLIT": <integer>``` $\dots$ number of submatrices into which the contact matrix is partitioned into. If specified, then ``"SUB_M_SIZE_FIX": null`` otherwise ``"CHRSPLIT": null``. 

	2) ```"DATA_PATH": </path/to/data> ``` $\dots$ input data path. 

	3) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```
		``` 
		"FILES": [
			"ENCSR079VIJ.BioRep1.40kb.cool",
			"G401_BR1",
			"ENCSR079VIJ.BioRep2.40kb.cool",
			"G401_BR2"]
		``` 
		* Any biological replicates must be indicated in <SHORT_NAME> using the suffix "_BR%d".

		* **Note:** ENT3C also takes ```mcool``` files as input. 

	4) ```"`OUT_DIR": "<desired_output_directory_name>"``` $\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.

	5) ```"OUT_PREFIX": "<desired_output_prefix_>"``` $\dots$ prefix for output files.

	6) ```"Resolution": "<integer,integer,...>" e.g. "40e3,100e3"``` $\dots$ resolutions to be evaluated. 

	7) ```"ChrNr": "<integer,integer,...>" "15,16,17,18,19,20,21,22,X"``` $\dots$ chromosome numbers to be evaluated.

	8) ```"NormM": <0|1>``` $\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.

	9) ```"WEIGHTS_NAME": "<name_of_weights>"``` $\dots$ name of dataset in cooler containing normalization weights.

	10) ```"phi": <integer>``` $\dots$ number of bins to the next matrix.

	11) ```"PHI_MAX": <integer>``` $\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. If set, $\varphi$ is increased until $\Phi \approx \Phi\_{\max}$.


# Output files:

1) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.  
	```
	Resolution	ChrNr	Sample1	Sample2	Q
	40000	2	HFFc6_BR3	A549_BR2	0.6132789056404898
	40000	2	HFFc6_BR3	LNCap_BR2	0.3126805134567409
	40000	2	HFFc6_BR3	LNCap_BR1	0.4221187669214683
	40000	2	HFFc6_BR3	HFFc6_BR2	0.9632461160758761
	.		.	.		.	.	.		.		.		.		.
	.		.	.		.	.	.		.		.		.		.
	.		.	.		.	.	.		.		.		.		.
	```

2) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\dots$ ENT3C output table. 

	```
	Name	ChrNr	Resolution	n	PHI	phi	binNrStart	binNrEND	START	END	S
	G401_BR1	2	40000	500	918	6	0	499	0	20000000	3.7896426915562462
	G401_BR1	2	40000	500	918	6	6	505	240000	20240000	3.789044181663418
	G401_BR1	2	40000	500	918	6	12	511	480000	20480000	3.7918253959272032
	.		.	.		.	.	.		.		.		.		.
	.		.	.		.	.	.		.		.		.		.
	.		.	.		.	.	.		.		.		.		.
	```

	Each row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.


	- Example of output generated for ```ENT3C get_entropy --config=config/myconfig.json```:
		- ```EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf```
		- unbalanced 40kb contact matrices for even chromosomes across 5 cell lines. ```SUB_MATRIX_SIZE``` was 500:
<figure>
    <img src="OUTPUT/PYTHON/EvenChromosomes_NoWeights_40kb_ENT3C_signals.png" style="max-width:70%;"
         alt="ENT3C python Output">
</figure>


3) ```<OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv``` $\dots$ Euclidean distance between average z-scores of S over ```<group1>``` and ```<group2>```:
	(here group1=HFFc6, group2=G401)

	```
	Resolution	ChrNr	START	END	meanS_Euclidean
	40000	6	62360000	82360000	3.3625023926723685
	40000	6	62120000	82120000	3.3546076641065095
	40000	6	61880000	81880000	3.3441925121710026
	```

	- Example of first page of output generated for ```ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = "G401"```
		- ```EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf```

<figure>
    <img src="OUTPUT/PYTHON/EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.png" style="max-width:60%;"
         alt="ENT3C python Output">
</figure>


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ENT3C",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "Hi-C, micro-C, similarity, entropy",
    "author": null,
    "author_email": "Xenia Lainscsek <108679125+X3N1A@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/fe/ab/c1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a/ent3c-2.2.2.tar.gz",
    "platform": null,
    "description": "ENT3C is a method for qunatifying the similarity of micro-C/Hi-C derived chromosomal contact matrices. It is based on the von Neumann entropy<sup>1</sup> and recent work on entropy quantification of Pearson correlation matrices<sup>2</sup>.\nFor a contact matrix, ENT3C records the change in local pattern *complexity* of smaller Pearson-transformed submatrices along a matrix diagonal to generate a characteristic signal. Similarity is defined as the Pearson correlation between the respective entropy signals of two contact matrices.\n\nhttps://github.com/X3N1A/ENT3C\n\n\n## Installation\n\n1) generate and activate python environment \n\t\n\t```\n\tpython3.11 -m venv .ent3c_venv\n\n\tsource .ent3c_venv/bin/activate\n\t```\n\n2) install ENT3C:\n\n\t```\n\tpip install ENT3C\n\t```\n\n# Usage \n\n* CLI (python) usage:\n\n\t```\n\tUsage:\n    \tENT3C <command> --config=<path/to/config.json> [options]\n\n    \tCommands:\n            get_entropy        Generates entropy output file <entropy_out_FN> .\n            get_similarity           Generates similarity output file <similarity_out_FN> from <entropy_out_FN>.\n            run_all            Generates <entropy_out_FN> and <similarity_out_FN>.\n            compare_groups     Compare signal groups (requires --group1 and --group2 options)\n\n    \tGlobal Options:\n            --config=<path>    Path to config JSON file (required for all commands)\n\n    \t<compare_groups> Options:\n        \t--group1=<GROUP>        First group name, must correspond to what comes before _BR* in config file.\n        \t--group2=<GROUP>        Second group name, must correspond to what comes before _BR* in config file.\n\n\t\tExamples:\n            ENT3C run_all --config=configs/myconfig.json\n            ENT3C get_entropy --config=configs/myconfig.json\n            ENT3C get_similarity --config=configs/myconfig.json\n            ENT3C compare_groups --config=configs/myconfig.json --group1=H1-hESC --group2=K562\n\t```\n\n* alternatively run ENT3C in python as:\n\n\t```\n\timport ENT3C\n\n\tENT3C_OUT = ENT3C.run_get_entropy(\"config/myconfig.json\")\n\n\tSimilarity = ENT3C.run_get_similarity(\"config/myconfig.json\")\n\n\tENT3C_OUT, Similarity = ENT3C.run_all(\"config/myconfig.json\")\n\n\tEUCLIDEAN = ENT3C.run_compare_groups(\"config/myconfig.json\",group1,group2)\n\n\t```\n\n* all ENT3C parameters are defined in .json files ```config/config.json```. Examples can be found in ```config``` directory.\n\n* Paremeters defined in <config_file>: \n\n\t1) The main ENT3C parameter affecting the final entropy signal $S$ is the dimension of the submatrices ```SUB_M_SIZE_FIX```. \n\n\t\t* ```\"SUB_M_SIZE_FIX\": <integer>``` $\\dots$ fixed submatrix dimension.\n\n\t\t\t* ```SUB_M_SIZE_FIX``` can be either be fixed by or alternatively, one can specify ```CHRSPLIT```; in this case ```SUB_M_SIZE_FIX``` will be computed internally to fit the number of desired times the contact matrix is to be paritioned into. \n\n\t\t\t```PHI=1+floor((N-SUB_M_SIZE)./phi)```\n\n\t\t\twhere ```N``` is the size of the input contact matrix, ```phi``` is the window shift, ```PHI``` is the number of evaluated submatrices (consequently the \tnumber of data points in $S$).\n\n\t\t* ```\"CHRSPLIT\": <integer>``` $\\dots$ number of submatrices into which the contact matrix is partitioned into. If specified, then ``\"SUB_M_SIZE_FIX\": null`` otherwise ``\"CHRSPLIT\": null``. \n\n\t2) ```\"DATA_PATH\": </path/to/data> ``` $\\dots$ input data path. \n\n\t3) input files in format: ```[<COOL_FILENAME>, <SHORT_NAME>]```\n\t\t``` \n\t\t\"FILES\": [\n\t\t\t\"ENCSR079VIJ.BioRep1.40kb.cool\",\n\t\t\t\"G401_BR1\",\n\t\t\t\"ENCSR079VIJ.BioRep2.40kb.cool\",\n\t\t\t\"G401_BR2\"]\n\t\t``` \n\t\t* Any biological replicates must be indicated in <SHORT_NAME> using the suffix \"_BR%d\".\n\n\t\t* **Note:** ENT3C also takes ```mcool``` files as input. \n\n\t4) ```\"`OUT_DIR\": \"<desired_output_directory_name>\"``` $\\dots$ output directory. ```OUT_DIR``` will be concatenated with ```OUTPUT/JULIA/``` or ```OUTPUT/MATLAB/```.\n\n\t5) ```\"OUT_PREFIX\": \"<desired_output_prefix_>\"``` $\\dots$ prefix for output files.\n\n\t6) ```\"Resolution\": \"<integer,integer,...>\" e.g. \"40e3,100e3\"``` $\\dots$ resolutions to be evaluated. \n\n\t7) ```\"ChrNr\": \"<integer,integer,...>\" \"15,16,17,18,19,20,21,22,X\"``` $\\dots$ chromosome numbers to be evaluated.\n\n\t8) ```\"NormM\": <0|1>``` $\\dots$ input contact matrices can be balanced. If ```NormM: 1```, balancing weights in cooler are applied. If set to 1, ENT3C expects weights to be in dataset ```/resolutions/<resolution>/bins/<WEIGHTS_NAME>```.\n\n\t9) ```\"WEIGHTS_NAME\": \"<name_of_weights>\"``` $\\dots$ name of dataset in cooler containing normalization weights.\n\n\t10) ```\"phi\": <integer>``` $\\dots$ number of bins to the next matrix.\n\n\t11) ```\"PHI_MAX\": <integer>``` $\\dots$ number of submatrices; i.e. number of data points in entropy signal $S$. If set, $\\varphi$ is increased until $\\Phi \\approx \\Phi\\_{\\max}$.\n\n\n# Output files:\n\n1) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_similarity.csv``` $\\dots$ will contain all combinations of comparisons. The second two columns contain the short names specified in ```FILES``` and the third column ```Q``` the corresponding similarity score.  \n\t```\n\tResolution\tChrNr\tSample1\tSample2\tQ\n\t40000\t2\tHFFc6_BR3\tA549_BR2\t0.6132789056404898\n\t40000\t2\tHFFc6_BR3\tLNCap_BR2\t0.3126805134567409\n\t40000\t2\tHFFc6_BR3\tLNCap_BR1\t0.4221187669214683\n\t40000\t2\tHFFc6_BR3\tHFFc6_BR2\t0.9632461160758761\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t```\n\n2) ```<OUT_DIR>/<OUTPUT_PREFIX>_ENT3C_OUT.csv``` $\\dots$ ENT3C output table. \n\n\t```\n\tName\tChrNr\tResolution\tn\tPHI\tphi\tbinNrStart\tbinNrEND\tSTART\tEND\tS\n\tG401_BR1\t2\t40000\t500\t918\t6\t0\t499\t0\t20000000\t3.7896426915562462\n\tG401_BR1\t2\t40000\t500\t918\t6\t6\t505\t240000\t20240000\t3.789044181663418\n\tG401_BR1\t2\t40000\t500\t918\t6\t12\t511\t480000\t20480000\t3.7918253959272032\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t.\t\t.\t.\t\t.\t.\t.\t\t.\t\t.\t\t.\t\t.\n\t```\n\n\tEach row corresponds to an evaluated submatrix with fields ```Name``` (the short name specified in ```FILES```), ```ChrNr```, ```Resolution```, the sub-matrix dimension ```sub_m_dim```, ```PHI=1+floor((N-SUB_M_SIZE)./phi)```, ```binNrStart``` and ```binNrEnd``` correspond to the start and end bin of the submatrix, ```START``` and ```END``` are the corresponding genomic coordinates and ```S``` is the computed von Neumann entropy.\n\n\n\t- Example of output generated for ```ENT3C get_entropy --config=config/myconfig.json```:\n\t\t- ```EvenChromosomes_NoWeights_40kb_ENT3C_signals.pdf```\n\t\t- unbalanced 40kb contact matrices for even chromosomes across 5 cell lines. ```SUB_MATRIX_SIZE``` was 500:\n<figure>\n    <img src=\"OUTPUT/PYTHON/EvenChromosomes_NoWeights_40kb_ENT3C_signals.png\" style=\"max-width:70%;\"\n         alt=\"ENT3C python Output\">\n</figure>\n\n\n3) ```<OUT_DIR>/<OUTPUT_PREFIX>_Eucl_<group1>vs<group2>.csv``` $\\dots$ Euclidean distance between average z-scores of S over ```<group1>``` and ```<group2>```:\n\t(here group1=HFFc6, group2=G401)\n\n\t```\n\tResolution\tChrNr\tSTART\tEND\tmeanS_Euclidean\n\t40000\t6\t62360000\t82360000\t3.3625023926723685\n\t40000\t6\t62120000\t82120000\t3.3546076641065095\n\t40000\t6\t61880000\t81880000\t3.3441925121710026\n\t```\n\n\t- Example of first page of output generated for ```ENT3C compare_groups --config=config/myconfig.json --group1 = HFFc6 group2 = \"G401\"```\n\t\t- ```EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.pdf```\n\n<figure>\n    <img src=\"OUTPUT/PYTHON/EvenChromosomes_NoWeights_Eucl_40kb_HFFc6vsG401.png\" style=\"max-width:60%;\"\n         alt=\"ENT3C python Output\">\n</figure>\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Compute similarity between genomic contact matrices with \"Entropy 3C\" ",
    "version": "2.2.2",
    "project_urls": {
        "Repository": "https://github.com/X3N1A/ENT3C"
    },
    "split_keywords": [
        "hi-c",
        " micro-c",
        " similarity",
        " entropy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "23f6fdccfeb2885a67d528f9609c49777aa3c95bf9abda8f90a80f9c54a70777",
                "md5": "998ae1bc83f4c9543c49e333401b6e7a",
                "sha256": "260dbc84edd0eb5cb3e16872eb594ae493f04bbc1a1e01ceb108e5621737aca1"
            },
            "downloads": -1,
            "filename": "ent3c-2.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "998ae1bc83f4c9543c49e333401b6e7a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 30700,
            "upload_time": "2025-08-12T09:33:52",
            "upload_time_iso_8601": "2025-08-12T09:33:52.375364Z",
            "url": "https://files.pythonhosted.org/packages/23/f6/fdccfeb2885a67d528f9609c49777aa3c95bf9abda8f90a80f9c54a70777/ent3c-2.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "feabc1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a",
                "md5": "6ca819988b1bed838ff3fad3f55ee573",
                "sha256": "a0d597d79e89d7c6d8dd473b204dd2ed5e9e77a724bfdc9cdaf3397f1f16d95a"
            },
            "downloads": -1,
            "filename": "ent3c-2.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6ca819988b1bed838ff3fad3f55ee573",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 32761,
            "upload_time": "2025-08-12T09:33:53",
            "upload_time_iso_8601": "2025-08-12T09:33:53.751979Z",
            "url": "https://files.pythonhosted.org/packages/fe/ab/c1b6cd61f78241eb7d71efd3c45c983de8a155828479dced9426ec73fc0a/ent3c-2.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-12 09:33:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "X3N1A",
    "github_project": "ENT3C",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ent3c"
}
        
Elapsed time: 3.82947s