Name | approximate-cluster-identities JSON |
Version |
0.1.6
JSON |
| download |
home_page | None |
Summary | A package to calculate and visualise approximate cluster identities for a large number of short nucleotide sequences using minimizers. |
upload_time | 2024-04-03 14:52:30 |
maintainer | None |
docs_url | None |
author | Daniel Anderson |
requires_python | None |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Approximate Cluster Identities (ACI)
A python package to visualise the approximate within and between cluster identities of a large number of short sequences as assigned by e.g. mmseqs2, cd-hit or panaroo.
# Installation
```
pip install approximate-cluster-identities
```
# Usage
```
aci -h
Create visualisations of approximate between and within cluster nucleotide identities for short sequences.
positional arguments:
input_fasta Input FASTA file of all sequences.
input_json Input JSON file with cluster assignments ({<sequence header>: <cluster assignment>}).
optional arguments:
-h, --help show this help message and exit
--clusterGML CLUSTERGML
Output path of GML clustering file to view with Cytoscape or similar.
--distanceTable DISTANCETABLE
Output path of CSV of distances (may take a long time).
--clusterPlot CLUSTERPLOT
Output path of jointplot to visualise between and within cluster identities.
--kmerSize KMERSIZE Kmer size (default: 9).
--windowSize WINDOWSIZE
Minimiser window size (default: 20).
--threshold THRESHOLD
Jaccard similarity threshold (default: 0.9).
--threads THREADS Threads for sketching and jaccard distance calculations (default: 1).
--shorter Assess identity relative to the shorter sequence.
```
# Methods
We calculate sequence identities by pairwise calculation of jaccard distances using minimizers of size ```--kmerSize``` where 1 *k*-mer is sampled from a window that slides across each sequence, each containing a total of ```--windowSize``` *k*-mers. Increasing ```--windowSize``` will decrease the number of minimizers per sequence, decreasing the sensitivity of the identity calculations but increasing the speed of the programme. This tool is designed to give you an idea of how variable a large number of short sequences are within and between clusters to choose an appropriate sequencing clustering tool and its parameters.
# Example output
Example cluster plots for data in ```test/``` using ```--windowSize 1``` and ```--windowSize 100```.
### Window size = 1
#### Mean identities
![Mean identities](images/cluster_distances_window_1.mean.png)
#### Mode identities
![Mode identities](images/cluster_distances_window_1.mode.png)
#### Median identities
![Median identities](images/cluster_distances_window_1.median.png)
#### Range identities
![Range of identities](images/cluster_distances_window_1.range.png)
### Window size = 100
#### Mean identities
![Mean identities](images/cluster_distances_window_100.mean.png)
#### Mode identities
![Mode identities](images/cluster_distances_window_100.mode.png)
#### Median identities
![Median identities](images/cluster_distances_window_100.median.png)
#### Range identities
![Range of identities](images/cluster_distances_window_100.range.png)
Raw data
{
"_id": null,
"home_page": null,
"name": "approximate-cluster-identities",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Daniel Anderson",
"author_email": "danp.anderson@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/ba/8e/f9ffb756d79b4914963434bd9b08a916776ba2f7c1091ebd615ad7999fb1/approximate_cluster_identities-0.1.6.tar.gz",
"platform": null,
"description": "# Approximate Cluster Identities (ACI)\n\nA python package to visualise the approximate within and between cluster identities of a large number of short sequences as assigned by e.g. mmseqs2, cd-hit or panaroo.\n\n# Installation\n```\npip install approximate-cluster-identities\n```\n\n# Usage\n\n```\naci -h\n\nCreate visualisations of approximate between and within cluster nucleotide identities for short sequences.\n\npositional arguments:\n input_fasta Input FASTA file of all sequences.\n input_json Input JSON file with cluster assignments ({<sequence header>: <cluster assignment>}).\n\noptional arguments:\n -h, --help show this help message and exit\n --clusterGML CLUSTERGML\n Output path of GML clustering file to view with Cytoscape or similar.\n --distanceTable DISTANCETABLE\n Output path of CSV of distances (may take a long time).\n --clusterPlot CLUSTERPLOT\n Output path of jointplot to visualise between and within cluster identities.\n --kmerSize KMERSIZE Kmer size (default: 9).\n --windowSize WINDOWSIZE\n Minimiser window size (default: 20).\n --threshold THRESHOLD\n Jaccard similarity threshold (default: 0.9).\n --threads THREADS Threads for sketching and jaccard distance calculations (default: 1).\n --shorter Assess identity relative to the shorter sequence.\n```\n\n# Methods\n\nWe calculate sequence identities by pairwise calculation of jaccard distances using minimizers of size ```--kmerSize``` where 1 *k*-mer is sampled from a window that slides across each sequence, each containing a total of ```--windowSize``` *k*-mers. Increasing ```--windowSize``` will decrease the number of minimizers per sequence, decreasing the sensitivity of the identity calculations but increasing the speed of the programme. This tool is designed to give you an idea of how variable a large number of short sequences are within and between clusters to choose an appropriate sequencing clustering tool and its parameters.\n\n# Example output\n\nExample cluster plots for data in ```test/``` using ```--windowSize 1``` and ```--windowSize 100```.\n\n### Window size = 1\n\n#### Mean identities\n![Mean identities](images/cluster_distances_window_1.mean.png)\n#### Mode identities\n![Mode identities](images/cluster_distances_window_1.mode.png)\n#### Median identities\n![Median identities](images/cluster_distances_window_1.median.png)\n#### Range identities\n![Range of identities](images/cluster_distances_window_1.range.png)\n\n### Window size = 100\n\n#### Mean identities\n![Mean identities](images/cluster_distances_window_100.mean.png)\n#### Mode identities\n![Mode identities](images/cluster_distances_window_100.mode.png)\n#### Median identities\n![Median identities](images/cluster_distances_window_100.median.png)\n#### Range identities\n![Range of identities](images/cluster_distances_window_100.range.png)\n",
"bugtrack_url": null,
"license": null,
"summary": "A package to calculate and visualise approximate cluster identities for a large number of short nucleotide sequences using minimizers.",
"version": "0.1.6",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b5744751b79e374862be3b83ccedf7241311a3b0f598f7718bef001f8191768",
"md5": "92ed51affd2c41e2e6e987c60d78d614",
"sha256": "8e2717b24cca4f2a59463ab51c7f16e0cca43261e9ecb1a3b8d76d6425970407"
},
"downloads": -1,
"filename": "approximate_cluster_identities-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "92ed51affd2c41e2e6e987c60d78d614",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 10249,
"upload_time": "2024-04-03T14:52:29",
"upload_time_iso_8601": "2024-04-03T14:52:29.657014Z",
"url": "https://files.pythonhosted.org/packages/0b/57/44751b79e374862be3b83ccedf7241311a3b0f598f7718bef001f8191768/approximate_cluster_identities-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ba8ef9ffb756d79b4914963434bd9b08a916776ba2f7c1091ebd615ad7999fb1",
"md5": "87bb4716d6554c56c77bed006a53bc50",
"sha256": "484110664cfb5983c56c392a1dc060f88dbf5bbd5e082fef639bc597056afdf8"
},
"downloads": -1,
"filename": "approximate_cluster_identities-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "87bb4716d6554c56c77bed006a53bc50",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9170,
"upload_time": "2024-04-03T14:52:30",
"upload_time_iso_8601": "2024-04-03T14:52:30.703712Z",
"url": "https://files.pythonhosted.org/packages/ba/8e/f9ffb756d79b4914963434bd9b08a916776ba2f7c1091ebd615ad7999fb1/approximate_cluster_identities-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-03 14:52:30",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "approximate-cluster-identities"
}