# SketchKH
Distribution-Informed Sketching with Kernel Herding
## Overview
We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented.
* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584
* Updated : June 27, 2023
![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)
## Installation
Dependencies
* Python >= 3.6, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, tqdm
You can install the package with `pip` by,
```
pip install sketchKH
```
Alternatively, you can clone the git repository by,
```
git clone https://github.com/CompCy-lab/SketchKH.git
```
## Example usage
To perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*.
```python
import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))
```
Then simply sketch your data with 500 cells per sample-set by,
```python
# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------
# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)
# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/CompCy-lab/SketchKH",
"name": "sketchKH",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "cytometry,single-cell bioinformatics,clinical prediction,compression,computational biology",
"author": "CompCy Lab",
"author_email": "compcylab@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/df/aa/32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e/sketchKH-0.1.1.tar.gz",
"platform": null,
"description": "# SketchKH\nDistribution-Informed Sketching with Kernel Herding \n\n## Overview\nWe provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. \n* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584\n* Updated : June 27, 2023\n\n![Sketching via KH Overview](https://github.com/CompCy-lab/SketchKH/blob/main/sketch_overview.png?raw=True)\n\n## Installation\nDependencies\n* Python >= 3.6, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, tqdm\n\nYou can install the package with `pip` by,\n```\npip install sketchKH\n```\n\nAlternatively, you can clone the git repository by,\n```\ngit clone https://github.com/CompCy-lab/SketchKH.git\n```\n\n## Example usage\nTo perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. \n\n```python\nimport anndata\nimport os\nadata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))\n```\n\nThen simply sketch your data with 500 cells per sample-set by,\n```python\n\n# Inputs\n# adata: annotated data object (dimensions = cells x features)\n# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample\n# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample \n# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation\n# frequency_seed: random state\n# num_subsamples: number of cells to subsample per sample-set\n# n_jobs: number of tasks\n# ----------------------------\n\n# Returns:\n# kh_indices: list of indices referencing the subsampled cells per sample-set\n# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)\n\n# ----------------------------\nfrom sketchKH import *\nkh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distribution-based sketching of single-cell samples",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/CompCy-lab/SketchKH"
},
"split_keywords": [
"cytometry",
"single-cell bioinformatics",
"clinical prediction",
"compression",
"computational biology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a177a34883f6ee158a308c1d99be9275779cac3c29c2b2f669e87c4768d6506d",
"md5": "7f32378625e3e62c2e7fae125161bc7b",
"sha256": "7cb8838089d2a4f2390dd7c01d4530a063511cc6b08c388d427a42a008dd98f7"
},
"downloads": -1,
"filename": "sketchKH-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7f32378625e3e62c2e7fae125161bc7b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5494,
"upload_time": "2024-02-14T17:27:13",
"upload_time_iso_8601": "2024-02-14T17:27:13.579526Z",
"url": "https://files.pythonhosted.org/packages/a1/77/a34883f6ee158a308c1d99be9275779cac3c29c2b2f669e87c4768d6506d/sketchKH-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dfaa32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e",
"md5": "09d8d2bc7af551ed3929af5de5aec74f",
"sha256": "36425d213739b4bb269807766e8cd0e96a58c9c79af8fe011a1aefd4761abaf3"
},
"downloads": -1,
"filename": "sketchKH-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "09d8d2bc7af551ed3929af5de5aec74f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 5182,
"upload_time": "2024-02-14T17:27:14",
"upload_time_iso_8601": "2024-02-14T17:27:14.960940Z",
"url": "https://files.pythonhosted.org/packages/df/aa/32d10a3375817898bb9c492a2b79b760ddd44985e7b62774bbb59694139e/sketchKH-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-14 17:27:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CompCy-lab",
"github_project": "SketchKH",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sketchkh"
}