plasnet


Nameplasnet JSON
Version 0.5.1 PyPI version JSON
download
home_pagehttps://github.com/leoisl/plasnet
SummaryClustering, visualising and exploring plasmid networks
upload_time2024-02-27 07:24:59
maintainer
docs_urlNone
authorLeandro Lima
requires_python>=3.9,<3.13
licenseMIT
keywords plasmids networks graphs clustering visualisation exploration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # plasnet

Python package for clustering, typing, visualisation and exploration of plasmid networks.

[![Python CI](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)
![coverage badge](./coverage.svg)
[![PyPI](https://img.shields.io/pypi/v/plasnet)](https://pypi.org/project/plasnet/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/plasnet)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## TLDR

`plasnet` allows you to cluster, type and visualise plasmids given their evolutionary distance, computed upstream.

### split

The first `plasnet` command is `split`. It creates a plasmid graph given a list of plasmids and their pairwise distances.
It then splits this graph into communities. Communities are groups of plasmids that roughly connected to each other.
This command allows you to view the full plasmids graph and also the isolated communities.

[Click here to view an example of the full plasmid graph from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/single_graph/single_graph.html)

[Click here to view an example of the isolated communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/communities/index.html)

### type

This command allows you to refine the communities defined in the `split` command into types or subcommunities.
The idea is that you can use a more precise distance function to type the communities than the one used to split the graph.
The different types or subcommunities will have different colours in the visualisation.

[Click here to view an example of the typed communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/communities/index.html)

[Click here to view an example of the typed isolated subcommunities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/subcommunities/index.html)

### add-sample-hits

This command allows you to add sample hits annotations on top of previously identified subcommunities or types
in the `type` command. With this command you can explore the subcommunities several different samples hit in more
details and check if they are, for example, sharing plasmids.

[Click here to view an example of two samples hitting a subcommunity](https://leoisl.github.io/plasnet/sample_hits_out/visualisations/sample_graphs/graphs/community_1_subcommunity_40.html)


## Installation

```
pip install plasnet
```

## Usage

### General usage

```
Usage: plasnet [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  add-sample-hits  Add sample hits annotations on top of previously...
  split            Creates and split a plasmid graph into communities
  type             Type the communities of a previously split plasmid...
```

### split subcommand

```
Usage: plasnet split [OPTIONS] PLASMIDS DISTANCES OUTPUT_DIR

  Creates and split a plasmid graph into communities

Options:
  -d, --distance-threshold FLOAT  Distance threshold
  -b, --bh-connectivity INTEGER   Minimum number of connections a plasmid need
                                  to be considered a hub plasmid
  -e, --bh-neighbours-edge-density FLOAT
                                  Maximum number of edge density between hub
                                  plasmid neighbours to label the plasmid as
                                  hub
  -p, --output-plasmid-graph      Also outputs the full, unsplit, plasmid
                                  graph
  --plasmids-metadata PATH        Plasmids metadata text file.
  --help                          Show this message and exit.

  Creates and split a plasmid graph into communities.
  The plasmid graph is defined by plasmid and distance files.

  The plasmid file is a tab-separated file with one column describing all plasmids in the dataset.
  Example of such file:
  plasmid
  AP024796.1
  AP024825.1
  CP012142.1
  CP014494.1
  CP019149.1
  CP021465.1
  CP022675.1
  CP024687.1
  CP026642.1
  CP027485.1

  The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.
  plasmid_1 and plasmid_2 are plasmid names, and distance is a float between 0 and 1.
  The distance threshold is the minimum distance value for two plasmids to be considered connected.
  Example of such file:
  plasmid_1       plasmid_2       distance
  AP024796.1      AP024825.1      0.8
  AP024796.1      CP012142.1      0.5
  AP024796.1      CP014494.1      0.3
  AP024796.1      CP019149.1      0.0
  AP024796.1      CP021465.1      0.0
  AP024796.1      CP022675.1      1.0
  AP024796.1      CP024687.1      0.0
  AP024796.1      CP026642.1      0.5
  AP024796.1      CP027485.1      0.8
```

### type subcommand

```
Usage: plasnet type [OPTIONS] COMMUNITIES_PICKLE DISTANCES OUTPUT_DIR

  Type the communities of a previously split plasmid graph into subcommunities
  or types

Options:
  -d, --distance-threshold FLOAT  Distance threshold
  --small-subcommunity-size-threshold INTEGER
                                  Subcommunities with size up to this
                                  parameter will be joined to neighbouring
                                  larger subcommunities
  --help                          Show this message and exit.

  Type the communities of a previously split plasmid graph into subcommunities or types.
  This typing is based on running an asynchronous label propagation algorithm on the previously identified communities.
  This algorithm is implemented in the networkx library, and relies on a given distance file.
  This distance file should be a more precise and careful distance function than the one used to split the graph into communities.
  For example, you could use gene jaccard distance to split the graph and the DCJ-indel distance to type the communities.
  See https://github.com/iqbal-lab-org/pling for a tool to compute gene jaccard and DCJ-indel distances. 

  The first file, describing the communities, is a pickle file (.pkl) that can be found in <split_out_dir>/objects/communities.pkl,
  where <split_out_dir> is the output dir of the split command.

  The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.
  plasmid_1 and plasmid_2 are plasmid names, and distance is a float number.
  The distance threshold is the minimum distance value for two plasmids to be considered connected.
  Example of such file:
  plasmid_1       plasmid_2       distance
  AP024796.1      AP024825.1      4
  AP024796.1      CP012142.1      10
  AP024796.1      CP014494.1      20
  AP024796.1      CP019149.1      1
  AP024796.1      CP021465.1      0
  AP024796.1      CP022675.1      50
  AP024796.1      CP024687.1      1000
  AP024796.1      CP026642.1      20
  AP024796.1      CP027485.1      1
```

### add-sample-hits subcommand

```
Usage: plasnet add-sample-hits [OPTIONS] SUBCOMMUNITIES_PICKLE SAMPLE_HITS
                               OUTPUT_DIR

  Add sample hits annotations on top of previously identified subcommunities
  or types

Options:
  --help  Show this message and exit.

  Add sample hits annotations on top of previously identified subcommunities or types.

  The first file, describing the subcommunities, is a pickle file (.pkl) that can be found in <type_out_dir>/objects/subcommunities.pkl,
  where <type_out_dir> is the output dir of the type command.

  The sample-hits file is a tab-separated file with 2 columns: sample, plasmid.
  These columns are self-explanatory and identifies the plasmids present in each sample.
  Example of such file:
  sample              plasmid
  cpe001_trim_ill     NZ_CP006799.1
  cpe001_trim_ill     NZ_CP028929.1
  cpe002_trim_ill     NZ_CP079159.1
  cpe005_trim_ill     NZ_CP006799.1
  cpe005_trim_ill     NZ_CP079676.1
  cpe010_trim_ill     NZ_CP028929.1
  cpe020_trim_ill     NZ_CP006799.1
  cpe020_trim_ill     NZ_CP079676.1
  cpe021_trim_ill     NZ_CP006799.1
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/leoisl/plasnet",
    "name": "plasnet",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.13",
    "maintainer_email": "",
    "keywords": "Plasmids,Networks,Graphs,Clustering,Visualisation,Exploration",
    "author": "Leandro Lima",
    "author_email": "leandro@ebi.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/bd/37/83afbbe251d0e89b7588dbc2cef58e3d72686e246f1c8942e3de3d4be1fc/plasnet-0.5.1.tar.gz",
    "platform": null,
    "description": "# plasnet\n\nPython package for clustering, typing, visualisation and exploration of plasmid networks.\n\n[![Python CI](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)](https://github.com/leoisl/plasnet/actions/workflows/ci.yaml/badge.svg)\n![coverage badge](./coverage.svg)\n[![PyPI](https://img.shields.io/pypi/v/plasnet)](https://pypi.org/project/plasnet/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/plasnet)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## TLDR\n\n`plasnet` allows you to cluster, type and visualise plasmids given their evolutionary distance, computed upstream.\n\n### split\n\nThe first `plasnet` command is `split`. It creates a plasmid graph given a list of plasmids and their pairwise distances.\nIt then splits this graph into communities. Communities are groups of plasmids that roughly connected to each other.\nThis command allows you to view the full plasmids graph and also the isolated communities.\n\n[Click here to view an example of the full plasmid graph from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/single_graph/single_graph.html)\n\n[Click here to view an example of the isolated communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/split_out/visualisations/communities/index.html)\n\n### type\n\nThis command allows you to refine the communities defined in the `split` command into types or subcommunities.\nThe idea is that you can use a more precise distance function to type the communities than the one used to split the graph.\nThe different types or subcommunities will have different colours in the visualisation.\n\n[Click here to view an example of the typed communities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/communities/index.html)\n\n[Click here to view an example of the typed isolated subcommunities from the latest version on an example dataset](https://leoisl.github.io/plasnet/type_out/visualisations/subcommunities/index.html)\n\n### add-sample-hits\n\nThis command allows you to add sample hits annotations on top of previously identified subcommunities or types\nin the `type` command. With this command you can explore the subcommunities several different samples hit in more\ndetails and check if they are, for example, sharing plasmids.\n\n[Click here to view an example of two samples hitting a subcommunity](https://leoisl.github.io/plasnet/sample_hits_out/visualisations/sample_graphs/graphs/community_1_subcommunity_40.html)\n\n\n## Installation\n\n```\npip install plasnet\n```\n\n## Usage\n\n### General usage\n\n```\nUsage: plasnet [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --version  Show the version and exit.\n  --help     Show this message and exit.\n\nCommands:\n  add-sample-hits  Add sample hits annotations on top of previously...\n  split            Creates and split a plasmid graph into communities\n  type             Type the communities of a previously split plasmid...\n```\n\n### split subcommand\n\n```\nUsage: plasnet split [OPTIONS] PLASMIDS DISTANCES OUTPUT_DIR\n\n  Creates and split a plasmid graph into communities\n\nOptions:\n  -d, --distance-threshold FLOAT  Distance threshold\n  -b, --bh-connectivity INTEGER   Minimum number of connections a plasmid need\n                                  to be considered a hub plasmid\n  -e, --bh-neighbours-edge-density FLOAT\n                                  Maximum number of edge density between hub\n                                  plasmid neighbours to label the plasmid as\n                                  hub\n  -p, --output-plasmid-graph      Also outputs the full, unsplit, plasmid\n                                  graph\n  --plasmids-metadata PATH        Plasmids metadata text file.\n  --help                          Show this message and exit.\n\n  Creates and split a plasmid graph into communities.\n  The plasmid graph is defined by plasmid and distance files.\n\n  The plasmid file is a tab-separated file with one column describing all plasmids in the dataset.\n  Example of such file:\n  plasmid\n  AP024796.1\n  AP024825.1\n  CP012142.1\n  CP014494.1\n  CP019149.1\n  CP021465.1\n  CP022675.1\n  CP024687.1\n  CP026642.1\n  CP027485.1\n\n  The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.\n  plasmid_1 and plasmid_2 are plasmid names, and distance is a float between 0 and 1.\n  The distance threshold is the minimum distance value for two plasmids to be considered connected.\n  Example of such file:\n  plasmid_1       plasmid_2       distance\n  AP024796.1      AP024825.1      0.8\n  AP024796.1      CP012142.1      0.5\n  AP024796.1      CP014494.1      0.3\n  AP024796.1      CP019149.1      0.0\n  AP024796.1      CP021465.1      0.0\n  AP024796.1      CP022675.1      1.0\n  AP024796.1      CP024687.1      0.0\n  AP024796.1      CP026642.1      0.5\n  AP024796.1      CP027485.1      0.8\n```\n\n### type subcommand\n\n```\nUsage: plasnet type [OPTIONS] COMMUNITIES_PICKLE DISTANCES OUTPUT_DIR\n\n  Type the communities of a previously split plasmid graph into subcommunities\n  or types\n\nOptions:\n  -d, --distance-threshold FLOAT  Distance threshold\n  --small-subcommunity-size-threshold INTEGER\n                                  Subcommunities with size up to this\n                                  parameter will be joined to neighbouring\n                                  larger subcommunities\n  --help                          Show this message and exit.\n\n  Type the communities of a previously split plasmid graph into subcommunities or types.\n  This typing is based on running an asynchronous label propagation algorithm on the previously identified communities.\n  This algorithm is implemented in the networkx library, and relies on a given distance file.\n  This distance file should be a more precise and careful distance function than the one used to split the graph into communities.\n  For example, you could use gene jaccard distance to split the graph and the DCJ-indel distance to type the communities.\n  See https://github.com/iqbal-lab-org/pling for a tool to compute gene jaccard and DCJ-indel distances. \n\n  The first file, describing the communities, is a pickle file (.pkl) that can be found in <split_out_dir>/objects/communities.pkl,\n  where <split_out_dir> is the output dir of the split command.\n\n  The distances file is a tab-separated file with 3 columns: plasmid_1, plasmid_2, distance.\n  plasmid_1 and plasmid_2 are plasmid names, and distance is a float number.\n  The distance threshold is the minimum distance value for two plasmids to be considered connected.\n  Example of such file:\n  plasmid_1       plasmid_2       distance\n  AP024796.1      AP024825.1      4\n  AP024796.1      CP012142.1      10\n  AP024796.1      CP014494.1      20\n  AP024796.1      CP019149.1      1\n  AP024796.1      CP021465.1      0\n  AP024796.1      CP022675.1      50\n  AP024796.1      CP024687.1      1000\n  AP024796.1      CP026642.1      20\n  AP024796.1      CP027485.1      1\n```\n\n### add-sample-hits subcommand\n\n```\nUsage: plasnet add-sample-hits [OPTIONS] SUBCOMMUNITIES_PICKLE SAMPLE_HITS\n                               OUTPUT_DIR\n\n  Add sample hits annotations on top of previously identified subcommunities\n  or types\n\nOptions:\n  --help  Show this message and exit.\n\n  Add sample hits annotations on top of previously identified subcommunities or types.\n\n  The first file, describing the subcommunities, is a pickle file (.pkl) that can be found in <type_out_dir>/objects/subcommunities.pkl,\n  where <type_out_dir> is the output dir of the type command.\n\n  The sample-hits file is a tab-separated file with 2 columns: sample, plasmid.\n  These columns are self-explanatory and identifies the plasmids present in each sample.\n  Example of such file:\n  sample              plasmid\n  cpe001_trim_ill     NZ_CP006799.1\n  cpe001_trim_ill     NZ_CP028929.1\n  cpe002_trim_ill     NZ_CP079159.1\n  cpe005_trim_ill     NZ_CP006799.1\n  cpe005_trim_ill     NZ_CP079676.1\n  cpe010_trim_ill     NZ_CP028929.1\n  cpe020_trim_ill     NZ_CP006799.1\n  cpe020_trim_ill     NZ_CP079676.1\n  cpe021_trim_ill     NZ_CP006799.1\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Clustering, visualising and exploring plasmid networks",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/leoisl/plasnet",
        "Repository": "https://github.com/leoisl/plasnet"
    },
    "split_keywords": [
        "plasmids",
        "networks",
        "graphs",
        "clustering",
        "visualisation",
        "exploration"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce1e805b59d4507671b03d06170bc78ed7d94ed60218183a989581e9b413ace0",
                "md5": "43ac8d300aeba768df80a90170876432",
                "sha256": "4a8f8474e3bdb2f32e1db19409501a6506de457d546d588e008dfd74ea1a08d9"
            },
            "downloads": -1,
            "filename": "plasnet-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "43ac8d300aeba768df80a90170876432",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.13",
            "size": 630015,
            "upload_time": "2024-02-27T07:24:56",
            "upload_time_iso_8601": "2024-02-27T07:24:56.426540Z",
            "url": "https://files.pythonhosted.org/packages/ce/1e/805b59d4507671b03d06170bc78ed7d94ed60218183a989581e9b413ace0/plasnet-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd3783afbbe251d0e89b7588dbc2cef58e3d72686e246f1c8942e3de3d4be1fc",
                "md5": "4ad4932e96413721e168e90e91a596eb",
                "sha256": "c253ec9af1eec8e1ac1c2cfdb0a723e067605e5e22561eca08a28e41fe274d7c"
            },
            "downloads": -1,
            "filename": "plasnet-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4ad4932e96413721e168e90e91a596eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.13",
            "size": 614672,
            "upload_time": "2024-02-27T07:24:59",
            "upload_time_iso_8601": "2024-02-27T07:24:59.258795Z",
            "url": "https://files.pythonhosted.org/packages/bd/37/83afbbe251d0e89b7588dbc2cef58e3d72686e246f1c8942e3de3d4be1fc/plasnet-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-27 07:24:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "leoisl",
    "github_project": "plasnet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "plasnet"
}
        
Elapsed time: 0.19906s