clustree


Nameclustree JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/ben-j-barlow/clustree
SummaryVisualize relationship between clusterings at different resolutions
upload_time2023-05-11 11:43:29
maintainer
docs_urlNone
authorBen Barlow
requires_python>=3.9,<4.0
licenseGPL-3.0-or-later
keywords clustering visualization visualisation clustering-trees cluster-trees
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # clustree

## Status

**Functionality: Implemented**

* Directed graph representing clustree. Nodes are parsed images and node information is encoded by a border surrounding the image.
* Loading: Data provided directly or through a path to parent directory. Images provided through a path to parent directory.
* Appearance: Edge and node color can correspond to one of: #samples that pass through edge/node, cluster resolution `K`, or a fixed color. In the case of node color, a column name in the data and aggregate function can be used too. Use of column name and #samples creates a continuous colormap, whilst the other options result in discrete colors.
* Layout: Reingold-Tilford algorithm used for node positioning. Not recommended for kk > 12 due to memory bottleneck in igraph dependency.
* Legend: demonstration of node / edge color.


**Functionality: To Add**

* Legend: demonstration of transparency of edges.
* Layout: Bespoke implementation of Reingold-Tilford algorithm to overcome dependency's memory bottleneck.

## Usage

### Installation

Install the package with pip:

```
pip install clustree
```

### Quickstart

The powerhouse function of the library is `clustree`. Use

```
from clustree import clustree
```

to import the function. A detailed description of the parameters is provided below.

```
def clustree(
    data: Union[Path, str],
    prefix: str,
    images: Union[Path, str],
    output_path: Optional[Union[Path, str]] = None,
    draw: bool = True,
    node_color: str = "prefix",
    node_color_aggr: Optional[Union[Callable, str]] = None,
    node_cmap: Union[mpl.colors.Colormap, str] = "inferno",
    edge_color: str = "samples",
    edge_cmap: Union[mpl.colors.Colormap, str] = "viridis",
    orientation: Literal["vertical", "horizontal"] = "vertical",
    layout_reingold_tilford: bool = None,
    min_cluster_number: Literal[0, 1] = 1,
    border_size: float = 0.05,
    figsize: tuple[float, float] = None,
    arrows: bool = None,
    node_size: float = 300,
    node_size_edge: Optional[float] = None,
    dpi: float = 500,
    kk: Optional[int] = None,
) -> DiGraph:
    """

```

* `data` : Path of csv or DataFrame object.
* `prefix` : String indicating columns containing clustering information.
* `images` : Path of directory that contains images.
* `output_path` : Absolute path to save clustree drawing at. If file extension is supplied, must be .png. If None, then output not written to file.
* `draw` : Whether to draw the clustree. Defaults to True. If False and output_path supplied, will be overridden.
* `node_color` : For continuous colormap, use 'samples' or the name of a metadata column to color nodes by. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set equal to value of prefix to color by resolution.
* `node_color_aggr` : If node_color is a column name then a function or string giving the name of a function to aggregate that column for samples in each cluster.
* `node_cmap` : If node_color is 'samples' or a column name then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).
* `edge_color` : For continuous colormap, use 'samples'. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set to 'samples'.
* `edge_cmap` : If edge_color is 'samples' then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).
* `orientation` : Orientation of clustree drawing. Defaults to 'vertical'.
* `layout_reingold_tilford` : Whether to use the Reingold-Tilford algorithm for node positioning. Defaults to True if (kk <= 12), False otherwise. Setting True not recommended if (kk > 12) due to memory bottleneck in igraph dependency.
* `min_cluster_number` : Cluster number can take values (0, ..., K-1) or (1, ..., K). If the former option is preferred, parameter should take value 0, and 1 otherwise. Defaults to None, in which case, minimum cluster number is found automatically.
* `border_size` : Border width as proportion of image width. Defaults to 0.05.
* `figsize` : Parsed to matplotlib to determine figure size. Defaults to (kk/2, kk/2), clipped to a minimum of (3,3) and maximum of (10,10).
* `arrows` : Whether to add arrows to graph edges. Removing arrows alleviates appearance issue caused by arrows overlapping nodes. Defaults to True.
* `node_size` : Size of nodes in clustree graph drawing. Parsed directly to networkx.draw_networkx_nodes. Default to 300.
* `node_size_edge`: Controls edge start and end point. Parsed directly to networkx.draw_networkx_edges.
* `dpi` : Controls resolution of output if saved to file.
* `kk` : Choose custom depth of clustree graph.

## Glossary

* *cluster resolution*: Upper case `K`. For example, at cluster resolution `K=2` data is clustered into 2 distinct clusters.
* *cluster number*: Lower case `k`. For example, at cluster resolution 2 data is clustered into 2 distinct clusters `k=1` and `k=2`.
* *kk*: highest value of `K` (cluster resolution) shown in clustree.
* *cluster membership*: The association between data points and cluster numbers for fixed cluster resolution. For example, `[1, 1, 2, 2, 2]` would mean the first 2 data points belong to cluster number `1` and the following 3 data points belong to cluster number `2`.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ben-j-barlow/clustree",
    "name": "clustree",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "clustering,visualization,visualisation,clustering-trees,cluster-trees",
    "author": "Ben Barlow",
    "author_email": "ben-j-barlow.1@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/40/32/7da6e2c5ad94915b09847984cb1c441a8b1284e50fde257e23a743617c42/clustree-0.2.1.tar.gz",
    "platform": null,
    "description": "# clustree\n\n## Status\n\n**Functionality: Implemented**\n\n* Directed graph representing clustree. Nodes are parsed images and node information is encoded by a border surrounding the image.\n* Loading: Data provided directly or through a path to parent directory. Images provided through a path to parent directory.\n* Appearance: Edge and node color can correspond to one of: #samples that pass through edge/node, cluster resolution `K`, or a fixed color. In the case of node color, a column name in the data and aggregate function can be used too. Use of column name and #samples creates a continuous colormap, whilst the other options result in discrete colors.\n* Layout: Reingold-Tilford algorithm used for node positioning. Not recommended for kk > 12 due to memory bottleneck in igraph dependency.\n* Legend: demonstration of node / edge color.\n\n\n**Functionality: To Add**\n\n* Legend: demonstration of transparency of edges.\n* Layout: Bespoke implementation of Reingold-Tilford algorithm to overcome dependency's memory bottleneck.\n\n## Usage\n\n### Installation\n\nInstall the package with pip:\n\n```\npip install clustree\n```\n\n### Quickstart\n\nThe powerhouse function of the library is `clustree`. Use\n\n```\nfrom clustree import clustree\n```\n\nto import the function. A detailed description of the parameters is provided below.\n\n```\ndef clustree(\n    data: Union[Path, str],\n    prefix: str,\n    images: Union[Path, str],\n    output_path: Optional[Union[Path, str]] = None,\n    draw: bool = True,\n    node_color: str = \"prefix\",\n    node_color_aggr: Optional[Union[Callable, str]] = None,\n    node_cmap: Union[mpl.colors.Colormap, str] = \"inferno\",\n    edge_color: str = \"samples\",\n    edge_cmap: Union[mpl.colors.Colormap, str] = \"viridis\",\n    orientation: Literal[\"vertical\", \"horizontal\"] = \"vertical\",\n    layout_reingold_tilford: bool = None,\n    min_cluster_number: Literal[0, 1] = 1,\n    border_size: float = 0.05,\n    figsize: tuple[float, float] = None,\n    arrows: bool = None,\n    node_size: float = 300,\n    node_size_edge: Optional[float] = None,\n    dpi: float = 500,\n    kk: Optional[int] = None,\n) -> DiGraph:\n    \"\"\"\n\n```\n\n* `data` : Path of csv or DataFrame object.\n* `prefix` : String indicating columns containing clustering information.\n* `images` : Path of directory that contains images.\n* `output_path` : Absolute path to save clustree drawing at. If file extension is supplied, must be .png. If None, then output not written to file.\n* `draw` : Whether to draw the clustree. Defaults to True. If False and output_path supplied, will be overridden.\n* `node_color` : For continuous colormap, use 'samples' or the name of a metadata column to color nodes by. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set equal to value of prefix to color by resolution.\n* `node_color_aggr` : If node_color is a column name then a function or string giving the name of a function to aggregate that column for samples in each cluster.\n* `node_cmap` : If node_color is 'samples' or a column name then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).\n* `edge_color` : For continuous colormap, use 'samples'. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set to 'samples'.\n* `edge_cmap` : If edge_color is 'samples' then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).\n* `orientation` : Orientation of clustree drawing. Defaults to 'vertical'.\n* `layout_reingold_tilford` : Whether to use the Reingold-Tilford algorithm for node positioning. Defaults to True if (kk <= 12), False otherwise. Setting True not recommended if (kk > 12) due to memory bottleneck in igraph dependency.\n* `min_cluster_number` : Cluster number can take values (0, ..., K-1) or (1, ..., K). If the former option is preferred, parameter should take value 0, and 1 otherwise. Defaults to None, in which case, minimum cluster number is found automatically.\n* `border_size` : Border width as proportion of image width. Defaults to 0.05.\n* `figsize` : Parsed to matplotlib to determine figure size. Defaults to (kk/2, kk/2), clipped to a minimum of (3,3) and maximum of (10,10).\n* `arrows` : Whether to add arrows to graph edges. Removing arrows alleviates appearance issue caused by arrows overlapping nodes. Defaults to True.\n* `node_size` : Size of nodes in clustree graph drawing. Parsed directly to networkx.draw_networkx_nodes. Default to 300.\n* `node_size_edge`: Controls edge start and end point. Parsed directly to networkx.draw_networkx_edges.\n* `dpi` : Controls resolution of output if saved to file.\n* `kk` : Choose custom depth of clustree graph.\n\n## Glossary\n\n* *cluster resolution*: Upper case `K`. For example, at cluster resolution `K=2` data is clustered into 2 distinct clusters.\n* *cluster number*: Lower case `k`. For example, at cluster resolution 2 data is clustered into 2 distinct clusters `k=1` and `k=2`.\n* *kk*: highest value of `K` (cluster resolution) shown in clustree.\n* *cluster membership*: The association between data points and cluster numbers for fixed cluster resolution. For example, `[1, 1, 2, 2, 2]` would mean the first 2 data points belong to cluster number `1` and the following 3 data points belong to cluster number `2`.",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Visualize relationship between clusterings at different resolutions",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/ben-j-barlow/clustree",
        "Repository": "https://github.com/ben-j-barlow/clustree"
    },
    "split_keywords": [
        "clustering",
        "visualization",
        "visualisation",
        "clustering-trees",
        "cluster-trees"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ef50f4c458c357ba2cfd245a5a9db58b12899a64a1afa236fa5b7e86cfc824a",
                "md5": "363556ccc50cf33abb66beb6e7347a58",
                "sha256": "0845225a41a008e6f26535546c3090e03055daaea865691122b56b00b69c0e26"
            },
            "downloads": -1,
            "filename": "clustree-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "363556ccc50cf33abb66beb6e7347a58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 25475,
            "upload_time": "2023-05-11T11:43:27",
            "upload_time_iso_8601": "2023-05-11T11:43:27.307029Z",
            "url": "https://files.pythonhosted.org/packages/4e/f5/0f4c458c357ba2cfd245a5a9db58b12899a64a1afa236fa5b7e86cfc824a/clustree-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40327da6e2c5ad94915b09847984cb1c441a8b1284e50fde257e23a743617c42",
                "md5": "c87374c4b7d562cf41373310c2b55016",
                "sha256": "4d8d26279c65bc15ec11c3c1111ab6922eb32ed759aa3ac4d027328bfacfe346"
            },
            "downloads": -1,
            "filename": "clustree-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c87374c4b7d562cf41373310c2b55016",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 24889,
            "upload_time": "2023-05-11T11:43:29",
            "upload_time_iso_8601": "2023-05-11T11:43:29.771475Z",
            "url": "https://files.pythonhosted.org/packages/40/32/7da6e2c5ad94915b09847984cb1c441a8b1284e50fde257e23a743617c42/clustree-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-11 11:43:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ben-j-barlow",
    "github_project": "clustree",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "clustree"
}
        
Elapsed time: 0.06425s