nimble-raccoon


Namenimble-raccoon JSON
Version 0.1.1 PyPI version JSON
download
home_page
SummaryA slim and fast implementation of the RACCOON clustering library.
upload_time2023-12-11 03:28:15
maintainer
docs_urlNone
author
requires_python>=3.10
licenseGPL-3.0
keywords clustering dimensionality-reduction iterative hierarchical-clustering multi-scale scale-adaptive snn louvain
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# nimble raccoon 

A slim and fast reimplementation of [RACCOON](https://github.com/shlienlab/raccoon) 
(Resolution-Adaptive Coarse-to-fine Clusters OptimizatiON), iterative clustering library for python 3.

<!-- 
<img src="docs/figs/sps_logo.png" width=400, padding=100>
[![DOI](https://zenodo.org/badge/91130860.svg)](https://zenodo.org/badge/latestdoi/91130860)
[![PyPI version](https://badge.fury.io/py/simpsom.svg)](https://badge.fury.io/py/simpsom)
[![Documentation Status](https://readthedocs.org/projects/simpsom/badge/?version=latest)](https://simpsom.readthedocs.io/en/latest/?badge=latest)
![example workflow](https://github.com/fcomitani/simpsom/actions/workflows/pytest.yml/badge.svg)
[![codecov](https://codecov.io/gh/fcomitani/simpsom/branch/main/graph/badge.svg?token=2OHOCO0O4I)](https://codecov.io/gh/fcomitani/simpsom) 
-->

It relies on [faiss](https://github.com/facebookresearch/faiss) for quick nearest neighbors searches and better memory management,
but offers fewer functionalities, only allowing `cosine` and `euclidean` distances, Louvain as its clustering algorithm, and grid search.

## Installation

`nimble-raccoon` can be installed with `pip` with the following command

    pip install nimble-raccoon 
    
 <!-- For the GPU-enable version, clone this repo and install add the `-gpu` flag -->

## Usage

To identify clusters and build the hierarchy initialize a `Raccoon` object
with set parameters and then call it on the `input_data` object, a pandas dataframe.
    
    import numpy as np
    from functools import partial

    from nimbloon import Raccoon

You can define custom functions to dynamically adapt the search range
to the size of the dataset.

    def half_sqrt_range(x, num_elements=5):
        sq = np.sqrt(x)
        return np.linspace(sq/2, sq, num_elements, dtype=int)

    rc_args = {'metric': 'cosine',
               'scale': False,
               'cumulative_variance': [.75, .8, .9, .95, .99],
               'clustering_parameter': np.logspace(-2, 1.5, 10),
               'n_neighbors': partial(half_sqrt_range, num_elements=3),
               'target_dimensions': 12,
               'min_cluster_size': 25,
               'max_neighbors': 100,
               'silhouette_threshold': 0.,
               'max_depth': 5}

Once everything is set up you can instantiate the Raccoon object.

`rc = Raccoon(outh_path='./rc_output', **rc_args)`

And then call it on the input dataset to build the clusters hierarchy.

`rc_labels, rc_tree = rc(input_table)`

Available parameters are:

- `metric`: the distance metric, currently only `cosine` and `euclidean` can be set as metrics.
- `scale`:  scale features at every iteration.
- `cumulative_variance`: the limit of cumulative variance for low-information
    features removal with tSVD. Can be a single `float` or a `list` of `float`
- `clustering_parameter`: the range of resolutions for the Louvain clustering 
    algorithm. Can be a single `float` or a `list` of `float`
- `n_neighbors`: the number of nearest neighbors to use across the search.
    Can be a single `int` or a `list` of `int`. Can also be a function, in which case this will be applied to the input set size,
    adapting this parameter at each iteration.
- `target_dimensions`: the dimensionality of the target space after 
    applying UMAP.
- `min_cluster_size`: minimum size of clusters, if a cluster is identified
    with size below this threshold, it will not be further split.
- `max_neighbors`: maximum number of neighbors, useful when `n_neighbors` is
    dynamic and population dependent avoiding excessively costly operations for large datasets.
- `silhouette_threshold`: minimum silhouette score value to reach for a 
    partition to be accepted.
- `max_depth`: maximum depth of the clusters hierarchy.

The output will be a one-hot-encoded dataframe with samples as rows and cluster labels as columns, and an anytree object with information on the
hierarchical relationship among clusters.
This information will also be automatically saved to disk in the `out_path` folder.

Both tSVD and UMAP steps can be skipped, by setting `cumulative_variance` to `1` and/or `target_dimensions` to `None` respectively.

## Citation

If you are using this library for your work, please cite the original RACCOON publication.

> Comitani, F., Nash, J.O., Cohen-Gogo, S. et al. Diagnostic classification of childhood cancer using multiscale transcriptomics. Nat Med 29, 656–666 (2023). https://doi.org/10.1038/s41591-023-02221-x

## Contributions

Contributions are always welcome. If you would like to help us improve this library please fork the main branch and make sure pytest pass after your changes. 

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "nimble-raccoon",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "clustering,dimensionality-reduction,iterative,hierarchical-clustering,multi-scale,scale-adaptive,snn,louvain",
    "author": "",
    "author_email": "Federico Comitani <federico.comitani@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/12/93/15916710b02fdfbb6a41584c6360b677231676e8de8683dcdf7fcd3245d5/nimble-raccoon-0.1.1.tar.gz",
    "platform": null,
    "description": "\n# nimble raccoon \n\nA slim and fast reimplementation of [RACCOON](https://github.com/shlienlab/raccoon) \n(Resolution-Adaptive Coarse-to-fine Clusters OptimizatiON), iterative clustering library for python 3.\n\n<!-- \n<img src=\"docs/figs/sps_logo.png\" width=400, padding=100>\n[![DOI](https://zenodo.org/badge/91130860.svg)](https://zenodo.org/badge/latestdoi/91130860)\n[![PyPI version](https://badge.fury.io/py/simpsom.svg)](https://badge.fury.io/py/simpsom)\n[![Documentation Status](https://readthedocs.org/projects/simpsom/badge/?version=latest)](https://simpsom.readthedocs.io/en/latest/?badge=latest)\n![example workflow](https://github.com/fcomitani/simpsom/actions/workflows/pytest.yml/badge.svg)\n[![codecov](https://codecov.io/gh/fcomitani/simpsom/branch/main/graph/badge.svg?token=2OHOCO0O4I)](https://codecov.io/gh/fcomitani/simpsom) \n-->\n\nIt relies on [faiss](https://github.com/facebookresearch/faiss) for quick nearest neighbors searches and better memory management,\nbut offers fewer functionalities, only allowing `cosine` and `euclidean` distances, Louvain as its clustering algorithm, and grid search.\n\n## Installation\n\n`nimble-raccoon` can be installed with `pip` with the following command\n\n    pip install nimble-raccoon \n    \n <!-- For the GPU-enable version, clone this repo and install add the `-gpu` flag -->\n\n## Usage\n\nTo identify clusters and build the hierarchy initialize a `Raccoon` object\nwith set parameters and then call it on the `input_data` object, a pandas dataframe.\n    \n    import numpy as np\n    from functools import partial\n\n    from nimbloon import Raccoon\n\nYou can define custom functions to dynamically adapt the search range\nto the size of the dataset.\n\n    def half_sqrt_range(x, num_elements=5):\n        sq = np.sqrt(x)\n        return np.linspace(sq/2, sq, num_elements, dtype=int)\n\n    rc_args = {'metric': 'cosine',\n               'scale': False,\n               'cumulative_variance': [.75, .8, .9, .95, .99],\n               'clustering_parameter': np.logspace(-2, 1.5, 10),\n               'n_neighbors': partial(half_sqrt_range, num_elements=3),\n               'target_dimensions': 12,\n               'min_cluster_size': 25,\n               'max_neighbors': 100,\n               'silhouette_threshold': 0.,\n               'max_depth': 5}\n\nOnce everything is set up you can instantiate the Raccoon object.\n\n`rc = Raccoon(outh_path='./rc_output', **rc_args)`\n\nAnd then call it on the input dataset to build the clusters hierarchy.\n\n`rc_labels, rc_tree = rc(input_table)`\n\nAvailable parameters are:\n\n- `metric`: the distance metric, currently only `cosine` and `euclidean` can be set as metrics.\n- `scale`:  scale features at every iteration.\n- `cumulative_variance`: the limit of cumulative variance for low-information\n    features removal with tSVD. Can be a single `float` or a `list` of `float`\n- `clustering_parameter`: the range of resolutions for the Louvain clustering \n    algorithm. Can be a single `float` or a `list` of `float`\n- `n_neighbors`: the number of nearest neighbors to use across the search.\n    Can be a single `int` or a `list` of `int`. Can also be a function, in which case this will be applied to the input set size,\n    adapting this parameter at each iteration.\n- `target_dimensions`: the dimensionality of the target space after \n    applying UMAP.\n- `min_cluster_size`: minimum size of clusters, if a cluster is identified\n    with size below this threshold, it will not be further split.\n- `max_neighbors`: maximum number of neighbors, useful when `n_neighbors` is\n    dynamic and population dependent avoiding excessively costly operations for large datasets.\n- `silhouette_threshold`: minimum silhouette score value to reach for a \n    partition to be accepted.\n- `max_depth`: maximum depth of the clusters hierarchy.\n\nThe output will be a one-hot-encoded dataframe with samples as rows and cluster labels as columns, and an anytree object with information on the\nhierarchical relationship among clusters.\nThis information will also be automatically saved to disk in the `out_path` folder.\n\nBoth tSVD and UMAP steps can be skipped, by setting `cumulative_variance` to `1` and/or `target_dimensions` to `None` respectively.\n\n## Citation\n\nIf you are using this library for your work, please cite the original RACCOON publication.\n\n> Comitani, F., Nash, J.O., Cohen-Gogo, S. et al. Diagnostic classification of childhood cancer using multiscale transcriptomics. Nat Med 29, 656\u2013666 (2023). https://doi.org/10.1038/s41591-023-02221-x\n\n## Contributions\n\nContributions are always welcome. If you would like to help us improve this library please fork the main branch and make sure pytest pass after your changes. \n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "A slim and fast implementation of the RACCOON clustering library.",
    "version": "0.1.1",
    "project_urls": {
        "repository": "https://github.com/fcomitani/nimble-raccoon"
    },
    "split_keywords": [
        "clustering",
        "dimensionality-reduction",
        "iterative",
        "hierarchical-clustering",
        "multi-scale",
        "scale-adaptive",
        "snn",
        "louvain"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "682a34e2b897f7339d044b7f3a72e07492950a5ee76437180b0560125cda4427",
                "md5": "5df7efaffc697c47903b656a2be90435",
                "sha256": "056604f5f872f832c7198e04ddbb235e35d190f29766e04ce7fa56db7c0b6d43"
            },
            "downloads": -1,
            "filename": "nimble_raccoon-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5df7efaffc697c47903b656a2be90435",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24483,
            "upload_time": "2023-12-11T03:28:13",
            "upload_time_iso_8601": "2023-12-11T03:28:13.583340Z",
            "url": "https://files.pythonhosted.org/packages/68/2a/34e2b897f7339d044b7f3a72e07492950a5ee76437180b0560125cda4427/nimble_raccoon-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "129315916710b02fdfbb6a41584c6360b677231676e8de8683dcdf7fcd3245d5",
                "md5": "bd9090ad311b32c4af79d0be5ec90482",
                "sha256": "fdfff58f33cb899c1402966224dc9b2f9c1887b93faa64bd395ea9c432884f25"
            },
            "downloads": -1,
            "filename": "nimble-raccoon-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bd9090ad311b32c4af79d0be5ec90482",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 24734,
            "upload_time": "2023-12-11T03:28:15",
            "upload_time_iso_8601": "2023-12-11T03:28:15.575137Z",
            "url": "https://files.pythonhosted.org/packages/12/93/15916710b02fdfbb6a41584c6360b677231676e8de8683dcdf7fcd3245d5/nimble-raccoon-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-11 03:28:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fcomitani",
    "github_project": "nimble-raccoon",
    "github_not_found": true,
    "lcname": "nimble-raccoon"
}
        
Elapsed time: 0.14801s