sketchKH


NamesketchKH JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/CompCy-lab/SketchKH
SummaryDistribution-based sketching of single-cell samples
upload_time2024-02-14 17:27:14
maintainer
docs_urlNone
authorCompCy Lab
requires_python>=3.6
licenseMIT
keywords cytometry single-cell bioinformatics clinical prediction compression computational biology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SketchKH
Distribution-Informed Sketching with Kernel Herding 

## Overview
We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. 
* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584
* Updated : June 27, 2023

![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)

## Installation
Dependencies
* Python >= 3.6, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, tqdm

You can install the package with `pip` by,
```
pip install sketchKH
```

Alternatively, you can clone the git repository by,
```
git clone https://github.com/CompCy-lab/SketchKH.git
```

## Example usage
To perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. 

```python
import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))
```

Then simply sketch your data with 500 cells per sample-set by,
```python

# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample 
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------

# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)

# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/CompCy-lab/SketchKH",
    "name": "sketchKH",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "cytometry,single-cell bioinformatics,clinical prediction,compression,computational biology",
    "author": "CompCy Lab",
    "author_email": "compcylab@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/df/aa/32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e/sketchKH-0.1.1.tar.gz",
    "platform": null,
    "description": "# SketchKH\nDistribution-Informed Sketching with Kernel Herding \n\n## Overview\nWe provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. \n* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584\n* Updated : June 27, 2023\n\n![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)\n\n## Installation\nDependencies\n* Python >= 3.6, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, tqdm\n\nYou can install the package with `pip` by,\n```\npip install sketchKH\n```\n\nAlternatively, you can clone the git repository by,\n```\ngit clone https://github.com/CompCy-lab/SketchKH.git\n```\n\n## Example usage\nTo perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. \n\n```python\nimport anndata\nimport os\nadata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))\n```\n\nThen simply sketch your data with 500 cells per sample-set by,\n```python\n\n# Inputs\n# adata: annotated data object (dimensions = cells x features)\n# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample\n# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample \n# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation\n# frequency_seed: random state\n# num_subsamples: number of cells to subsample per sample-set\n# n_jobs: number of tasks\n# ----------------------------\n\n# Returns:\n# kh_indices: list of indices referencing the subsampled cells per sample-set\n# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)\n\n# ----------------------------\nfrom sketchKH import *\nkh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Distribution-based sketching of single-cell samples",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/CompCy-lab/SketchKH"
    },
    "split_keywords": [
        "cytometry",
        "single-cell bioinformatics",
        "clinical prediction",
        "compression",
        "computational biology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a177a34883f6ee158a308c1d99be9275779cac3c29c2b2f669e87c4768d6506d",
                "md5": "7f32378625e3e62c2e7fae125161bc7b",
                "sha256": "7cb8838089d2a4f2390dd7c01d4530a063511cc6b08c388d427a42a008dd98f7"
            },
            "downloads": -1,
            "filename": "sketchKH-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7f32378625e3e62c2e7fae125161bc7b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5494,
            "upload_time": "2024-02-14T17:27:13",
            "upload_time_iso_8601": "2024-02-14T17:27:13.579526Z",
            "url": "https://files.pythonhosted.org/packages/a1/77/a34883f6ee158a308c1d99be9275779cac3c29c2b2f669e87c4768d6506d/sketchKH-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dfaa32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e",
                "md5": "09d8d2bc7af551ed3929af5de5aec74f",
                "sha256": "36425d213739b4bb269807766e8cd0e96a58c9c79af8fe011a1aefd4761abaf3"
            },
            "downloads": -1,
            "filename": "sketchKH-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "09d8d2bc7af551ed3929af5de5aec74f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5182,
            "upload_time": "2024-02-14T17:27:14",
            "upload_time_iso_8601": "2024-02-14T17:27:14.960940Z",
            "url": "https://files.pythonhosted.org/packages/df/aa/32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e/sketchKH-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-14 17:27:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CompCy-lab",
    "github_project": "SketchKH",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sketchkh"
}
        
Elapsed time: 1.74957s