sketchKH


NamesketchKH JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/CompCy-lab/SketchKH
SummaryDistribution-based sketching of single-cell samples
upload_time2024-12-27 18:01:18
maintainerNone
docs_urlNone
authorCompCy Lab
requires_python<3.10,>=3.6
licenseMIT
keywords cytometry single-cell bioinformatics clinical prediction compression computational biology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SketchKH
Distribution-Informed Sketching with Kernel Herding 

## Overview
We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. 
* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584
* Updated : December 27, 2024

![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)

## Installation
Dependencies
* Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib

You can install the package with `pip` by,
```
pip install sketchKH
```

Alternatively, you can clone the git repository by,
```
git clone https://github.com/CompCy-lab/SketchKH.git
```

## Example usage
To perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. 

```python
import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))
```

Then simply sketch your data with 500 cells per sample-set by,
```python

# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample 
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------

# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)

# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/CompCy-lab/SketchKH",
    "name": "sketchKH",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.10,>=3.6",
    "maintainer_email": null,
    "keywords": "cytometry, single-cell bioinformatics, clinical prediction, compression, computational biology",
    "author": "CompCy Lab",
    "author_email": "compcylab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9a/7d/70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be/sketchKH-0.1.2.tar.gz",
    "platform": null,
    "description": "# SketchKH\nDistribution-Informed Sketching with Kernel Herding \n\n## Overview\nWe provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. \n* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584\n* Updated : December 27, 2024\n\n![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)\n\n## Installation\nDependencies\n* Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib\n\nYou can install the package with `pip` by,\n```\npip install sketchKH\n```\n\nAlternatively, you can clone the git repository by,\n```\ngit clone https://github.com/CompCy-lab/SketchKH.git\n```\n\n## Example usage\nTo perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. \n\n```python\nimport anndata\nimport os\nadata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))\n```\n\nThen simply sketch your data with 500 cells per sample-set by,\n```python\n\n# Inputs\n# adata: annotated data object (dimensions = cells x features)\n# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample\n# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample \n# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation\n# frequency_seed: random state\n# num_subsamples: number of cells to subsample per sample-set\n# n_jobs: number of tasks\n# ----------------------------\n\n# Returns:\n# kh_indices: list of indices referencing the subsampled cells per sample-set\n# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)\n\n# ----------------------------\nfrom sketchKH import *\nkh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Distribution-based sketching of single-cell samples",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/CompCy-lab/SketchKH"
    },
    "split_keywords": [
        "cytometry",
        " single-cell bioinformatics",
        " clinical prediction",
        " compression",
        " computational biology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b59963a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7",
                "md5": "1d6faf65c14ee967b42cee60c6417cd4",
                "sha256": "bb6a345cd5779ab2acfcb15031b8507d00360de57ffbf65228054f84403118ba"
            },
            "downloads": -1,
            "filename": "sketchKH-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1d6faf65c14ee967b42cee60c6417cd4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.10,>=3.6",
            "size": 5766,
            "upload_time": "2024-12-27T18:01:17",
            "upload_time_iso_8601": "2024-12-27T18:01:17.113891Z",
            "url": "https://files.pythonhosted.org/packages/b5/99/63a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7/sketchKH-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a7d70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be",
                "md5": "0cccd6a0f9299c2e600375d2fa5e6233",
                "sha256": "8af5fbec374931a919f52943920de742187914babf8bd371d88d12f08ead2ab3"
            },
            "downloads": -1,
            "filename": "sketchKH-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "0cccd6a0f9299c2e600375d2fa5e6233",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.10,>=3.6",
            "size": 5457,
            "upload_time": "2024-12-27T18:01:18",
            "upload_time_iso_8601": "2024-12-27T18:01:18.719846Z",
            "url": "https://files.pythonhosted.org/packages/9a/7d/70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be/sketchKH-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-27 18:01:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CompCy-lab",
    "github_project": "SketchKH",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sketchkh"
}
        
Elapsed time: 0.34086s