# plasnet
Python package for clustering, typing, visualisation and exploration of plasmid networks.
[![Python CI](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)
![coverage badge](./coverage.svg)
[![PyPI](https://img.shields.io/pypi/v/plasnet)](https://pypi.org/project/plasnet/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/plasnet)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
## TLDR
`plasnet` allows you to cluster, type and visualise plasmids given their evolutionary distance, computed upstream.
### split
The first `plasnet` command is `split`. It creates a plasmid graph given a list of plasmids and their pairwise distances.
It then splits this graph into communities. Communities are groups of plasmids that roughly connected to each other.
This command allows you to view the full plasmids graph and also the isolated communities.
[Click here to view an example of the full plasmid graph from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/single_graph/single_graph.html)
[Click here to view an example of the isolated communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/communities/index.html)
### type
This command allows you to refine the communities defined in the `split` command into types or subcommunities.
The idea is that you can use a more precise distance function to type the communities than the one used to split the graph.
The different types or subcommunities will have different colours in the visualisation.
[Click here to view an example of the typed communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/communities/index.html)
[Click here to view an example of the typed isolated subcommunities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/subcommunities/index.html)
### add-sample-hits
This command allows you to add sample hits annotations on top of previously identified subcommunities or types
in the `type` command. With this command you can explore the subcommunities several different samples hit in more
details and check if they are, for example, sharing plasmids.
[Click here to view an example of two samples hitting a subcommunity](https://leoisl.github.io/plasnet/sample_hits_out/visualisations/sample_graphs/graphs/community_1_subcommunity_40.html)
## Installation
```
pip install plasnet
```
## Usage
### General usage
```
Usage: plasnet [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
add-sample-hits Add sample hits annotations on top of previously...
split Creates and split a plasmid graph into communities
type Type the communities of a previously split plasmid...
```
### split subcommand
```
Usage: plasnet split [OPTIONS] PLASMIDS DISTANCES OUTPUT_DIR
Creates and split a plasmid graph into communities
Options:
-d, --distance-threshold FLOAT Distance threshold
-b, --bh-connectivity INTEGER Minimum number of connections a plasmid need
to be considered a hub plasmid
-e, --bh-neighbours-edge-density FLOAT
Maximum number of edge density between hub
plasmid neighbours to label the plasmid as
hub
-p, --output-plasmid-graph Also outputs the full, unsplit, plasmid
graph
--plasmids-metadata PATH Plasmids metadata text file.
--help Show this message and exit.
Creates and split a plasmid graph into communities.
The plasmid graph is defined by plasmid and distance files.
The plasmid file is a tab-separated file with one column describing all plasmids in the dataset.
Example of such file:
plasmid
AP024796.1
AP024825.1
CP012142.1
CP014494.1
CP019149.1
CP021465.1
CP022675.1
CP024687.1
CP026642.1
CP027485.1
The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.
plasmid_1 and plasmid_2 are plasmid names, and distance is a float between 0 and 1.
The distance threshold is the minimum distance value for two plasmids to be considered connected.
Example of such file:
plasmid_1 plasmid_2 distance
AP024796.1 AP024825.1 0.8
AP024796.1 CP012142.1 0.5
AP024796.1 CP014494.1 0.3
AP024796.1 CP019149.1 0.0
AP024796.1 CP021465.1 0.0
AP024796.1 CP022675.1 1.0
AP024796.1 CP024687.1 0.0
AP024796.1 CP026642.1 0.5
AP024796.1 CP027485.1 0.8
```
### type subcommand
```
Usage: plasnet type [OPTIONS] COMMUNITIES_PICKLE DISTANCES OUTPUT_DIR
Type the communities of a previously split plasmid graph into subcommunities
or types
Options:
-d, --distance-threshold FLOAT Distance threshold
--small-subcommunity-size-threshold INTEGER
Subcommunities with size up to this
parameter will be joined to neighbouring
larger subcommunities
--help Show this message and exit.
Type the communities of a previously split plasmid graph into subcommunities or types.
This typing is based on running an asynchronous label propagation algorithm on the previously identified communities.
This algorithm is implemented in the networkx library, and relies on a given distance file.
This distance file should be a more precise and careful distance function than the one used to split the graph into communities.
For example, you could use gene jaccard distance to split the graph and the DCJ-indel distance to type the communities.
See https://github.com/iqbal-lab-org/pling for a tool to compute gene jaccard and DCJ-indel distances.
The first file, describing the communities, is a pickle file (.pkl) that can be found in <split_out_dir>/objects/communities.pkl,
where <split_out_dir> is the output dir of the split command.
The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.
plasmid_1 and plasmid_2 are plasmid names, and distance is a float number.
The distance threshold is the minimum distance value for two plasmids to be considered connected.
Example of such file:
plasmid_1 plasmid_2 distance
AP024796.1 AP024825.1 4
AP024796.1 CP012142.1 10
AP024796.1 CP014494.1 20
AP024796.1 CP019149.1 1
AP024796.1 CP021465.1 0
AP024796.1 CP022675.1 50
AP024796.1 CP024687.1 1000
AP024796.1 CP026642.1 20
AP024796.1 CP027485.1 1
```
### add-sample-hits subcommand
```
Usage: plasnet add-sample-hits [OPTIONS] SUBCOMMUNITIES_PICKLE SAMPLE_HITS
OUTPUT_DIR
Add sample hits annotations on top of previously identified subcommunities
or types
Options:
--help Show this message and exit.
Add sample hits annotations on top of previously identified subcommunities or types.
The first file, describing the subcommunities, is a pickle file (.pkl) that can be found in <type_out_dir>/objects/subcommunities.pkl,
where <type_out_dir> is the output dir of the type command.
The sample-hits file is a tab-separated file with 2 columns: sample, plasmid.
These columns are self-explanatory and identifies the plasmids present in each sample.
Example of such file:
sample plasmid
cpe001_trim_ill NZ_CP006799.1
cpe001_trim_ill NZ_CP028929.1
cpe002_trim_ill NZ_CP079159.1
cpe005_trim_ill NZ_CP006799.1
cpe005_trim_ill NZ_CP079676.1
cpe010_trim_ill NZ_CP028929.1
cpe020_trim_ill NZ_CP006799.1
cpe020_trim_ill NZ_CP079676.1
cpe021_trim_ill NZ_CP006799.1
```
Raw data
{
"_id": null,
"home_page": "https://github.com/leoisl/plasnet",
"name": "plasnet",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "Plasmids, Networks, Graphs, Clustering, Visualisation, Exploration",
"author": "Leandro Lima",
"author_email": "leandro@ebi.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/07/80/fe665c382ede4feb4cdea17d329a36aa806580e5eb7c68af80c970905413/plasnet-0.6.0.tar.gz",
"platform": null,
"description": "# plasnet\n\nPython package for clustering, typing, visualisation and exploration of plasmid networks.\n\n[![Python CI](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)\n![coverage badge](./coverage.svg)\n[![PyPI](https://img.shields.io/pypi/v/plasnet)](https://pypi.org/project/plasnet/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/plasnet)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## TLDR\n\n`plasnet` allows you to cluster, type and visualise plasmids given their evolutionary distance, computed upstream.\n\n### split\n\nThe first `plasnet` command is `split`. It creates a plasmid graph given a list of plasmids and their pairwise distances.\nIt then splits this graph into communities. Communities are groups of plasmids that roughly connected to each other.\nThis command allows you to view the full plasmids graph and also the isolated communities.\n\n[Click here to view an example of the full plasmid graph from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/single_graph/single_graph.html)\n\n[Click here to view an example of the isolated communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/communities/index.html)\n\n### type\n\nThis command allows you to refine the communities defined in the `split` command into types or subcommunities.\nThe idea is that you can use a more precise distance function to type the communities than the one used to split the graph.\nThe different types or subcommunities will have different colours in the visualisation.\n\n[Click here to view an example of the typed communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/communities/index.html)\n\n[Click here to view an example of the typed isolated subcommunities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/subcommunities/index.html)\n\n### add-sample-hits\n\nThis command allows you to add sample hits annotations on top of previously identified subcommunities or types\nin the `type` command. With this command you can explore the subcommunities several different samples hit in more\ndetails and check if they are, for example, sharing plasmids.\n\n[Click here to view an example of two samples hitting a subcommunity](https://leoisl.github.io/plasnet/sample_hits_out/visualisations/sample_graphs/graphs/community_1_subcommunity_40.html)\n\n\n## Installation\n\n```\npip install plasnet\n```\n\n## Usage\n\n### General usage\n\n```\nUsage: plasnet [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n add-sample-hits Add sample hits annotations on top of previously...\n split Creates and split a plasmid graph into communities\n type Type the communities of a previously split plasmid...\n```\n\n### split subcommand\n\n```\nUsage: plasnet split [OPTIONS] PLASMIDS DISTANCES OUTPUT_DIR\n\n Creates and split a plasmid graph into communities\n\nOptions:\n -d, --distance-threshold FLOAT Distance threshold\n -b, --bh-connectivity INTEGER Minimum number of connections a plasmid need\n to be considered a hub plasmid\n -e, --bh-neighbours-edge-density FLOAT\n Maximum number of edge density between hub\n plasmid neighbours to label the plasmid as\n hub\n -p, --output-plasmid-graph Also outputs the full, unsplit, plasmid\n graph\n --plasmids-metadata PATH Plasmids metadata text file.\n --help Show this message and exit.\n\n Creates and split a plasmid graph into communities.\n The plasmid graph is defined by plasmid and distance files.\n\n The plasmid file is a tab-separated file with one column describing all plasmids in the dataset.\n Example of such file:\n plasmid\n AP024796.1\n AP024825.1\n CP012142.1\n CP014494.1\n CP019149.1\n CP021465.1\n CP022675.1\n CP024687.1\n CP026642.1\n CP027485.1\n\n The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.\n plasmid_1 and plasmid_2 are plasmid names, and distance is a float between 0 and 1.\n The distance threshold is the minimum distance value for two plasmids to be considered connected.\n Example of such file:\n plasmid_1 plasmid_2 distance\n AP024796.1 AP024825.1 0.8\n AP024796.1 CP012142.1 0.5\n AP024796.1 CP014494.1 0.3\n AP024796.1 CP019149.1 0.0\n AP024796.1 CP021465.1 0.0\n AP024796.1 CP022675.1 1.0\n AP024796.1 CP024687.1 0.0\n AP024796.1 CP026642.1 0.5\n AP024796.1 CP027485.1 0.8\n```\n\n### type subcommand\n\n```\nUsage: plasnet type [OPTIONS] COMMUNITIES_PICKLE DISTANCES OUTPUT_DIR\n\n Type the communities of a previously split plasmid graph into subcommunities\n or types\n\nOptions:\n -d, --distance-threshold FLOAT Distance threshold\n --small-subcommunity-size-threshold INTEGER\n Subcommunities with size up to this\n parameter will be joined to neighbouring\n larger subcommunities\n --help Show this message and exit.\n\n Type the communities of a previously split plasmid graph into subcommunities or types.\n This typing is based on running an asynchronous label propagation algorithm on the previously identified communities.\n This algorithm is implemented in the networkx library, and relies on a given distance file.\n This distance file should be a more precise and careful distance function than the one used to split the graph into communities.\n For example, you could use gene jaccard distance to split the graph and the DCJ-indel distance to type the communities.\n See https://github.com/iqbal-lab-org/pling for a tool to compute gene jaccard and DCJ-indel distances. \n\n The first file, describing the communities, is a pickle file (.pkl) that can be found in <split_out_dir>/objects/communities.pkl,\n where <split_out_dir> is the output dir of the split command.\n\n The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.\n plasmid_1 and plasmid_2 are plasmid names, and distance is a float number.\n The distance threshold is the minimum distance value for two plasmids to be considered connected.\n Example of such file:\n plasmid_1 plasmid_2 distance\n AP024796.1 AP024825.1 4\n AP024796.1 CP012142.1 10\n AP024796.1 CP014494.1 20\n AP024796.1 CP019149.1 1\n AP024796.1 CP021465.1 0\n AP024796.1 CP022675.1 50\n AP024796.1 CP024687.1 1000\n AP024796.1 CP026642.1 20\n AP024796.1 CP027485.1 1\n```\n\n### add-sample-hits subcommand\n\n```\nUsage: plasnet add-sample-hits [OPTIONS] SUBCOMMUNITIES_PICKLE SAMPLE_HITS\n OUTPUT_DIR\n\n Add sample hits annotations on top of previously identified subcommunities\n or types\n\nOptions:\n --help Show this message and exit.\n\n Add sample hits annotations on top of previously identified subcommunities or types.\n\n The first file, describing the subcommunities, is a pickle file (.pkl) that can be found in <type_out_dir>/objects/subcommunities.pkl,\n where <type_out_dir> is the output dir of the type command.\n\n The sample-hits file is a tab-separated file with 2 columns: sample, plasmid.\n These columns are self-explanatory and identifies the plasmids present in each sample.\n Example of such file:\n sample plasmid\n cpe001_trim_ill NZ_CP006799.1\n cpe001_trim_ill NZ_CP028929.1\n cpe002_trim_ill NZ_CP079159.1\n cpe005_trim_ill NZ_CP006799.1\n cpe005_trim_ill NZ_CP079676.1\n cpe010_trim_ill NZ_CP028929.1\n cpe020_trim_ill NZ_CP006799.1\n cpe020_trim_ill NZ_CP079676.1\n cpe021_trim_ill NZ_CP006799.1\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Clustering, visualising and exploring plasmid networks",
"version": "0.6.0",
"project_urls": {
"Homepage": "https://github.com/leoisl/plasnet",
"Repository": "https://github.com/leoisl/plasnet"
},
"split_keywords": [
"plasmids",
" networks",
" graphs",
" clustering",
" visualisation",
" exploration"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ee90f3c1ea6d42b9fc29d24650d21b548a427c7b1feeca2656b362100cfd390b",
"md5": "6c35803981f861aff2f20c3d1f231874",
"sha256": "9aff661839cab8ffc49ae716e84e803bd66a9931c30e48c35b6000e320bac8ad"
},
"downloads": -1,
"filename": "plasnet-0.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6c35803981f861aff2f20c3d1f231874",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 631991,
"upload_time": "2024-08-19T09:37:08",
"upload_time_iso_8601": "2024-08-19T09:37:08.316846Z",
"url": "https://files.pythonhosted.org/packages/ee/90/f3c1ea6d42b9fc29d24650d21b548a427c7b1feeca2656b362100cfd390b/plasnet-0.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0780fe665c382ede4feb4cdea17d329a36aa806580e5eb7c68af80c970905413",
"md5": "d38bf99b1bca266201f1ced9ed2125d2",
"sha256": "468929006126262331b14ad874900f92dfcd139ab26166a87102e7725a0bb6a3"
},
"downloads": -1,
"filename": "plasnet-0.6.0.tar.gz",
"has_sig": false,
"md5_digest": "d38bf99b1bca266201f1ced9ed2125d2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 616534,
"upload_time": "2024-08-19T09:37:09",
"upload_time_iso_8601": "2024-08-19T09:37:09.561891Z",
"url": "https://files.pythonhosted.org/packages/07/80/fe665c382ede4feb4cdea17d329a36aa806580e5eb7c68af80c970905413/plasnet-0.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-19 09:37:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "leoisl",
"github_project": "plasnet",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "plasnet"
}