mosaic-clustering


Namemosaic-clustering JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttps://github.com/moldyn/feature_selection
SummaryCorrelation based feature selection for MD data
upload_time2023-04-14 09:14:02
maintainer
docs_urlNone
authorbraniii
requires_python>=3.8
licenseBSD 3-Clause License
keywords feature selection md analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  
  <img class="lightmode" style="width: 400px;" src="https://github.com/moldyn/MoSAIC/blob/main/docs/logo_large_light.svg?raw=true#gh-light-mode-only" />

  <p>
    <a href="https://github.com/wemake-services/wemake-python-styleguide" alt="wemake-python-styleguide">
        <img src="https://img.shields.io/badge/style-wemake-000000.svg" /></a>
    <a href="https://beartype.rtfd.io" alt="bear-ified">
        <img src="https://raw.githubusercontent.com/beartype/beartype-assets/main/badge/bear-ified.svg" /></a>
    <a href="https://pypi.org/project/mosaic-clustering" alt="PyPI">
        <img src="https://img.shields.io/pypi/v/mosaic-clustering" /></a>
    <a href="https://anaconda.org/conda-forge/mosaic-clustering" alt="conda version">
	<img src="https://img.shields.io/conda/vn/conda-forge/mosaic-clustering" /></a>
    <a href="https://pepy.tech/project/mosaic-clustering" alt="Downloads">
        <img src="https://pepy.tech/badge/mosaic-clustering" /></a>
    <a href="https://github.com/moldyn/MoSAIC/actions/workflows/pytest.yml" alt="GitHub Workflow Status">
        <img src="https://img.shields.io/github/actions/workflow/status/moldyn/MoSAIC/pytest.yml?branch=main"></a>
    <a href="https://codecov.io/gh/moldyn/MoSAIC" alt="Code coverage">
        <img src="https://codecov.io/gh/moldyn/MoSAIC/branch/main/graph/badge.svg?token=KNWDAUXIGI" /></a>
    <a href="https://github.com/moldyn/MoSAIC/actions/workflows/codeql.yml" alt="CodeQL">
        <img src="https://github.com/moldyn/MoSAIC/actions/workflows/codeql.yml/badge.svg?branch=main" /></a>
    <a href="https://img.shields.io/pypi/pyversions/mosaic-clustering" alt="PyPI - Python Version">
        <img src="https://img.shields.io/pypi/pyversions/mosaic-clustering" /></a>
    <a href="https://moldyn.github.io/MoSAIC" alt="Docs">
        <img src="https://img.shields.io/badge/MkDocs-Documentation-brightgreen" /></a>
    <a href="https://doi.org/10.1021/acs.jctc.2c00337" alt="doi">
        <img src="https://img.shields.io/badge/doi-10.1021%2Facs.jctc.2c00337-blue" /></a>
    <a href="https://arxiv.org/abs/2204.02770" alt="arXiv">
        <img src="https://img.shields.io/badge/arXiv-2204.02770-red" /></a>
    <a href="https://github.com/moldyn/MoSAIC/blob/main/LICENSE" alt="License">
        <img src="https://img.shields.io/github/license/moldyn/MoSAIC" /></a>
  </p>

  <p>
    <a href="https://moldyn.github.io/MoSAIC">Docs</a> •
    <a href="#features">Features</a> •
    <a href="#installation">Installation</a> •
    <a href="#usage">Usage</a> •
    <a href="https://moldyn.github.io/MoSAIC/faq">FAQ</a>
  </p>
</div>

# Molecular Systems Automated Identification of Cooperativity
MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise.
Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process.  It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

The method was published in:
> **Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins**  
> G. Diez, D. Nagel, and G. Stock,  
> *J. Chem. Theory Comput.* **2022** 18 (8), 5079-5088,  
> doi: [10.1021/acs.jctc.2c00337](https://pubs.acs.org/doi/10.1021/acs.jctc.2c00337)

If you use this software package, please cite the above mentioned paper.

## Features
- Intuitive usage via [module](#module---inside-a-python-script) and via [CI](#ci---usage-directly-from-the-command-line)
- Sklearn-style API for fast integration into your Python workflow
- No magic, only a  single parameter which can be optimized via cross-validation
- Extensive [documentation](https://moldyn.github.io/MoSAIC) and detailed discussion in publication


## Installation
The package is called `mosaic-clustering` and is available via [PyPI](https://pypi.org/project/mosaic-clustering) or [conda](https://anaconda.org/conda-forge/mosaic-clustering). To install it, simply call:
```bash
python3 -m pip install --upgrade mosaic-clustering
```
or
```
conda install -c conda-forge mosaic-clustering
```

or for the latest dev version
```bash
# via ssh key
python3 -m pip install git+ssh://git@github.com/moldyn/MoSAIC.git

# or via password-based login
python3 -m pip install git+https://github.com/moldyn/MoSAIC.git
```

In case one wants to use the deprecated `UMAPSimilarity` or the module `mosaic umap` one needs to specify the `extras_require='umap'`, so
```bash
python3 -m pip install --upgrade moldyn-mosaic[umap]
```

### Shell Completion
Using the `bash`, `zsh` or `fish` shell click provides an easy way to provide shell completion, checkout the [docs](https://click.palletsprojects.com/en/8.0.x/shell-completion).
In the case of bash you need to add following line to your `~/.bashrc`
```bash
eval "$(_MOSAIC_COMPLETE=bash_source mosaic)"
```

## Usage
In general one can call the module directly by its entry point `$ MoSAIC` or by calling the module `$ python -m mosaic`. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.

### CI - Usage Directly from the Command Line
The module brings a rich CI using [click](https://click.palletsprojects.com). Each module and submodule contains a detailed help, which can be accessed by
```bash
$ python -m mosaic
Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...

  MoSAIC motion v0.4.1

  Molecular systems automated identification of collective motion, is
  a correlation based feature selection framework for MD data.
  Copyright (c) 2021-2023, Georg Diez and Daniel Nagel

Options:
  --help  Show this message and exit.

Commands:
  clustering  Clustering similarity matrix of coordinates.
  similarity  Creating similarity matrix of coordinates.
```
For more details on the submodule one needs to specify one of the three
commands.

A simple workflow example for clustering the input file `input_file` using
correlation and Leiden with CPM and the default resolution parameter:
```bash
# creating correlation matrix
$ python -m mosaic similarity -i input_file -o output_similarity --metric correlation -v

MoSAIC SIMILARITY
~~~ Initialize similarity class
~~~ Load file input_file
~~~ Fit input
~~~ Store similarity matrix in output_similarity

# clustering with CPM and default resolution parameter
# the latter needs to be fine-tuned to each matrix
$ python -m mosaic clustering -i output_similarity -o output_clustering --plot -v

MoSAIC CLUSTERING
~~~ Initialize clustering class
~~~ Load file output_similarity
~~~ Fit input
~~~ Store output
~~~ Plot matrix
```
This will generate the similarity matrix stored in `output_similarity`,
the plotted result in `output_clustering.matrix.pdf`, the raw data of
the matrix in `output_clustering.matrix` and a file containing in each
row the indices of a cluster.

### Module - Inside a Python Script
```python
import mosaic

# Load file
# X is np.ndarray of shape (n_samples, n_features)

sim = mosaic.Similarity(
    metric='correlation',  # or 'NMI', 'GY', 'JSD'
)
sim.fit(X)


# Cluster matrix
clust = mosaic.Clustering(
    mode='CPM',  # or 'modularity
)
clust.fit(sim.matrix_)

clusters = clust.clusters_
clusterd_X = clust.matrix_
...
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/moldyn/feature_selection",
    "name": "mosaic-clustering",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "feature selection,MD analysis",
    "author": "braniii",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/a7/a4/92dabdb0a25038263f7dfe333dec57697380b1ae14e4626b2dcb0a8e3ca5/mosaic-clustering-0.4.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  \n  <img class=\"lightmode\" style=\"width: 400px;\" src=\"https://github.com/moldyn/MoSAIC/blob/main/docs/logo_large_light.svg?raw=true#gh-light-mode-only\" />\n\n  <p>\n    <a href=\"https://github.com/wemake-services/wemake-python-styleguide\" alt=\"wemake-python-styleguide\">\n        <img src=\"https://img.shields.io/badge/style-wemake-000000.svg\" /></a>\n    <a href=\"https://beartype.rtfd.io\" alt=\"bear-ified\">\n        <img src=\"https://raw.githubusercontent.com/beartype/beartype-assets/main/badge/bear-ified.svg\" /></a>\n    <a href=\"https://pypi.org/project/mosaic-clustering\" alt=\"PyPI\">\n        <img src=\"https://img.shields.io/pypi/v/mosaic-clustering\" /></a>\n    <a href=\"https://anaconda.org/conda-forge/mosaic-clustering\" alt=\"conda version\">\n\t<img src=\"https://img.shields.io/conda/vn/conda-forge/mosaic-clustering\" /></a>\n    <a href=\"https://pepy.tech/project/mosaic-clustering\" alt=\"Downloads\">\n        <img src=\"https://pepy.tech/badge/mosaic-clustering\" /></a>\n    <a href=\"https://github.com/moldyn/MoSAIC/actions/workflows/pytest.yml\" alt=\"GitHub Workflow Status\">\n        <img src=\"https://img.shields.io/github/actions/workflow/status/moldyn/MoSAIC/pytest.yml?branch=main\"></a>\n    <a href=\"https://codecov.io/gh/moldyn/MoSAIC\" alt=\"Code coverage\">\n        <img src=\"https://codecov.io/gh/moldyn/MoSAIC/branch/main/graph/badge.svg?token=KNWDAUXIGI\" /></a>\n    <a href=\"https://github.com/moldyn/MoSAIC/actions/workflows/codeql.yml\" alt=\"CodeQL\">\n        <img src=\"https://github.com/moldyn/MoSAIC/actions/workflows/codeql.yml/badge.svg?branch=main\" /></a>\n    <a href=\"https://img.shields.io/pypi/pyversions/mosaic-clustering\" alt=\"PyPI - Python Version\">\n        <img src=\"https://img.shields.io/pypi/pyversions/mosaic-clustering\" /></a>\n    <a href=\"https://moldyn.github.io/MoSAIC\" alt=\"Docs\">\n        <img src=\"https://img.shields.io/badge/MkDocs-Documentation-brightgreen\" /></a>\n    <a href=\"https://doi.org/10.1021/acs.jctc.2c00337\" alt=\"doi\">\n        <img src=\"https://img.shields.io/badge/doi-10.1021%2Facs.jctc.2c00337-blue\" /></a>\n    <a href=\"https://arxiv.org/abs/2204.02770\" alt=\"arXiv\">\n        <img src=\"https://img.shields.io/badge/arXiv-2204.02770-red\" /></a>\n    <a href=\"https://github.com/moldyn/MoSAIC/blob/main/LICENSE\" alt=\"License\">\n        <img src=\"https://img.shields.io/github/license/moldyn/MoSAIC\" /></a>\n  </p>\n\n  <p>\n    <a href=\"https://moldyn.github.io/MoSAIC\">Docs</a> \u2022\n    <a href=\"#features\">Features</a> \u2022\n    <a href=\"#installation\">Installation</a> \u2022\n    <a href=\"#usage\">Usage</a> \u2022\n    <a href=\"https://moldyn.github.io/MoSAIC/faq\">FAQ</a>\n  </p>\n</div>\n\n# Molecular Systems Automated Identification of Cooperativity\nMoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise.\nHence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process.  It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.\n\nThe method was published in:\n> **Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins**  \n> G. Diez, D. Nagel, and G. Stock,  \n> *J. Chem. Theory Comput.* **2022** 18 (8), 5079-5088,  \n> doi: [10.1021/acs.jctc.2c00337](https://pubs.acs.org/doi/10.1021/acs.jctc.2c00337)\n\nIf you use this software package, please cite the above mentioned paper.\n\n## Features\n- Intuitive usage via [module](#module---inside-a-python-script) and via [CI](#ci---usage-directly-from-the-command-line)\n- Sklearn-style API for fast integration into your Python workflow\n- No magic, only a  single parameter which can be optimized via cross-validation\n- Extensive [documentation](https://moldyn.github.io/MoSAIC) and detailed discussion in publication\n\n\n## Installation\nThe package is called `mosaic-clustering` and is available via [PyPI](https://pypi.org/project/mosaic-clustering) or [conda](https://anaconda.org/conda-forge/mosaic-clustering). To install it, simply call:\n```bash\npython3 -m pip install --upgrade mosaic-clustering\n```\nor\n```\nconda install -c conda-forge mosaic-clustering\n```\n\nor for the latest dev version\n```bash\n# via ssh key\npython3 -m pip install git+ssh://git@github.com/moldyn/MoSAIC.git\n\n# or via password-based login\npython3 -m pip install git+https://github.com/moldyn/MoSAIC.git\n```\n\nIn case one wants to use the deprecated `UMAPSimilarity` or the module `mosaic umap` one needs to specify the `extras_require='umap'`, so\n```bash\npython3 -m pip install --upgrade moldyn-mosaic[umap]\n```\n\n### Shell Completion\nUsing the `bash`, `zsh` or `fish` shell click provides an easy way to provide shell completion, checkout the [docs](https://click.palletsprojects.com/en/8.0.x/shell-completion).\nIn the case of bash you need to add following line to your `~/.bashrc`\n```bash\neval \"$(_MOSAIC_COMPLETE=bash_source mosaic)\"\n```\n\n## Usage\nIn general one can call the module directly by its entry point `$ MoSAIC` or by calling the module `$ python -m mosaic`. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.\n\n### CI - Usage Directly from the Command Line\nThe module brings a rich CI using [click](https://click.palletsprojects.com). Each module and submodule contains a detailed help, which can be accessed by\n```bash\n$ python -m mosaic\nUsage: python -m mosaic [OPTIONS] COMMAND [ARGS]...\n\n  MoSAIC motion v0.4.1\n\n  Molecular systems automated identification of collective motion, is\n  a correlation based feature selection framework for MD data.\n  Copyright (c) 2021-2023, Georg Diez and Daniel Nagel\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  clustering  Clustering similarity matrix of coordinates.\n  similarity  Creating similarity matrix of coordinates.\n```\nFor more details on the submodule one needs to specify one of the three\ncommands.\n\nA simple workflow example for clustering the input file `input_file` using\ncorrelation and Leiden with CPM and the default resolution parameter:\n```bash\n# creating correlation matrix\n$ python -m mosaic similarity -i input_file -o output_similarity --metric correlation -v\n\nMoSAIC SIMILARITY\n~~~ Initialize similarity class\n~~~ Load file input_file\n~~~ Fit input\n~~~ Store similarity matrix in output_similarity\n\n# clustering with CPM and default resolution parameter\n# the latter needs to be fine-tuned to each matrix\n$ python -m mosaic clustering -i output_similarity -o output_clustering --plot -v\n\nMoSAIC CLUSTERING\n~~~ Initialize clustering class\n~~~ Load file output_similarity\n~~~ Fit input\n~~~ Store output\n~~~ Plot matrix\n```\nThis will generate the similarity matrix stored in `output_similarity`,\nthe plotted result in `output_clustering.matrix.pdf`, the raw data of\nthe matrix in `output_clustering.matrix` and a file containing in each\nrow the indices of a cluster.\n\n### Module - Inside a Python Script\n```python\nimport mosaic\n\n# Load file\n# X is np.ndarray of shape (n_samples, n_features)\n\nsim = mosaic.Similarity(\n    metric='correlation',  # or 'NMI', 'GY', 'JSD'\n)\nsim.fit(X)\n\n\n# Cluster matrix\nclust = mosaic.Clustering(\n    mode='CPM',  # or 'modularity\n)\nclust.fit(sim.matrix_)\n\nclusters = clust.clusters_\nclusterd_X = clust.matrix_\n...\n```\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "Correlation based feature selection for MD data",
    "version": "0.4.1",
    "split_keywords": [
        "feature selection",
        "md analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f72feba0ae186c881645f66effa52071ad6734b88239222e0df79a7abe8ab84",
                "md5": "d95e9861883bb210b3800fd286a7f195",
                "sha256": "ab2034dcab5c73da632a504a4eeb6708bd43fae4ce156b4d5453c5d400da2b2c"
            },
            "downloads": -1,
            "filename": "mosaic_clustering-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d95e9861883bb210b3800fd286a7f195",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 26680,
            "upload_time": "2023-04-14T09:14:00",
            "upload_time_iso_8601": "2023-04-14T09:14:00.401827Z",
            "url": "https://files.pythonhosted.org/packages/9f/72/feba0ae186c881645f66effa52071ad6734b88239222e0df79a7abe8ab84/mosaic_clustering-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7a492dabdb0a25038263f7dfe333dec57697380b1ae14e4626b2dcb0a8e3ca5",
                "md5": "89ee8b5b248164ed5642a3e950ca72b6",
                "sha256": "2666b025257808c411bd2521c8b0f7fc50e0b90103868a46f8c49ee9f1409ec7"
            },
            "downloads": -1,
            "filename": "mosaic-clustering-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "89ee8b5b248164ed5642a3e950ca72b6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 30003,
            "upload_time": "2023-04-14T09:14:02",
            "upload_time_iso_8601": "2023-04-14T09:14:02.953269Z",
            "url": "https://files.pythonhosted.org/packages/a7/a4/92dabdb0a25038263f7dfe333dec57697380b1ae14e4626b2dcb0a8e3ca5/mosaic-clustering-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-14 09:14:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "moldyn",
    "github_project": "feature_selection",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "mosaic-clustering"
}
        
Elapsed time: 0.12561s