# SketchKH
Distribution-Informed Sketching with Kernel Herding
## Overview
We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented.
* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584
* Updated : December 27, 2024

## Installation
Dependencies
* Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib
You can install the package with `pip` by,
```
pip install sketchKH
```
Alternatively, you can clone the git repository by,
```
git clone https://github.com/CompCy-lab/SketchKH.git
```
## Example usage
To perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*.
```python
import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))
```
Then simply sketch your data with 500 cells per sample-set by,
```python
# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------
# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)
# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/CompCy-lab/SketchKH",
"name": "sketchKH",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.10,>=3.6",
"maintainer_email": null,
"keywords": "cytometry, single-cell bioinformatics, clinical prediction, compression, computational biology",
"author": "CompCy Lab",
"author_email": "compcylab@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/9a/7d/70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be/sketchKH-0.1.2.tar.gz",
"platform": null,
"description": "# SketchKH\nDistribution-Informed Sketching with Kernel Herding \n\n## Overview\nWe provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented. \n* Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584\n* Updated : December 27, 2024\n\n\n\n## Installation\nDependencies\n* Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib\n\nYou can install the package with `pip` by,\n```\npip install sketchKH\n```\n\nAlternatively, you can clone the git repository by,\n```\ngit clone https://github.com/CompCy-lab/SketchKH.git\n```\n\n## Example usage\nTo perform sketching, first read in a preprocessed `.h5ad` adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a *sample-set*. \n\n```python\nimport anndata\nimport os\nadata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))\n```\n\nThen simply sketch your data with 500 cells per sample-set by,\n```python\n\n# Inputs\n# adata: annotated data object (dimensions = cells x features)\n# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample\n# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample \n# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation\n# frequency_seed: random state\n# num_subsamples: number of cells to subsample per sample-set\n# n_jobs: number of tasks\n# ----------------------------\n\n# Returns:\n# kh_indices: list of indices referencing the subsampled cells per sample-set\n# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)\n\n# ----------------------------\nfrom sketchKH import *\nkh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distribution-based sketching of single-cell samples",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/CompCy-lab/SketchKH"
},
"split_keywords": [
"cytometry",
" single-cell bioinformatics",
" clinical prediction",
" compression",
" computational biology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b59963a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7",
"md5": "1d6faf65c14ee967b42cee60c6417cd4",
"sha256": "bb6a345cd5779ab2acfcb15031b8507d00360de57ffbf65228054f84403118ba"
},
"downloads": -1,
"filename": "sketchKH-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1d6faf65c14ee967b42cee60c6417cd4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.10,>=3.6",
"size": 5766,
"upload_time": "2024-12-27T18:01:17",
"upload_time_iso_8601": "2024-12-27T18:01:17.113891Z",
"url": "https://files.pythonhosted.org/packages/b5/99/63a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7/sketchKH-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9a7d70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be",
"md5": "0cccd6a0f9299c2e600375d2fa5e6233",
"sha256": "8af5fbec374931a919f52943920de742187914babf8bd371d88d12f08ead2ab3"
},
"downloads": -1,
"filename": "sketchKH-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "0cccd6a0f9299c2e600375d2fa5e6233",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.10,>=3.6",
"size": 5457,
"upload_time": "2024-12-27T18:01:18",
"upload_time_iso_8601": "2024-12-27T18:01:18.719846Z",
"url": "https://files.pythonhosted.org/packages/9a/7d/70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be/sketchKH-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-27 18:01:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CompCy-lab",
"github_project": "SketchKH",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sketchkh"
}