harmonypy
=========
[![Latest PyPI Version][pb]][pypi] [![PyPI Downloads][db]][pypi] [](https://zenodo.org/badge/latestdoi/229105533)
[pb]: https://img.shields.io/pypi/v/harmonypy.svg
[pypi]: https://pypi.org/project/harmonypy/
[db]: https://img.shields.io/pypi/dm/harmonypy?label=pypi%20downloads
Harmony is an algorithm for integrating multiple high-dimensional datasets.
harmonypy is a port of the [harmony] R package by [Ilya Korsunsky].
Example
-------
<p align="center">
<img src="https://i.imgur.com/lqReopf.gif">
</p>
This animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.
[→ How to make this animation.](https://slowkow.com/notes/harmony-animation/)
Installation
------------
This package has been tested with Python 3.7.
Use [pip] to install:
```bash
pip install harmonypy
```
Usage
-----
Here is a brief example using the data that comes with the R package:
```python
# Load data
import pandas as pd
meta_data = pd.read_csv("data/meta.tsv.gz", sep = "\t")
vars_use = ['dataset']
# meta_data
#
# cell_id dataset nGene percent_mito cell_type
# 0 half_TGAAATTGGTCTAG half 3664 0.017722 jurkat
# 1 half_GCGATATGCTGATG half 3858 0.029228 t293
# 2 half_ATTTCTCTCACTAG half 4049 0.015966 jurkat
# 3 half_CGTAACGACGAGAG half 3443 0.020379 jurkat
# 4 half_ACGCCTTGTTTACC half 2813 0.024774 t293
# .. ... ... ... ... ...
# 295 t293_TTACGTACGACACT t293 4152 0.033997 t293
# 296 t293_TAGAATTGTTGGTG t293 3097 0.021769 t293
# 297 t293_CGGATAACACCACA t293 3157 0.020411 t293
# 298 t293_GGTACTGAGTCGAT t293 2685 0.027846 t293
# 299 t293_ACGCTGCTTCTTAC t293 3513 0.021240 t293
data_mat = pd.read_csv("data/pcs.tsv.gz", sep = "\t")
data_mat = np.array(data_mat)
# data_mat[:5,:5]
#
# array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025, 0.00028931],
# [-0.011333 , 0.00022233, -0.00073589, -0.00192452, 0.0032624 ],
# [ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],
# [ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],
# [-0.00953977, 0.00222714, -0.00374373, -0.00028554, 0.00063737]])
# meta_data.shape # 300 cells, 5 variables
# (300, 5)
#
# data_mat.shape # 300 cells, 20 PCs
# (300, 20)
# Run Harmony
import harmonypy as hm
ho = hm.run_harmony(data_mat, meta_data, vars_use)
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])]
res.to_csv("data/adj.tsv.gz", sep = "\t", index = False)
```
[harmony]: https://github.com/immunogenomics/harmony
[Ilya Korsunsky]: https://github.com/ilyakorsunsky
[pip]: https://pip.readthedocs.io/
Raw data
{
"_id": null,
"home_page": "https://github.com/slowkow/harmonypy",
"name": "harmonypy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Kamil Slowikowski",
"author_email": "kslowikowski@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/16/42/14fbc85018c24b2a8a8ba776f8cbf6efe8a7df54b56d45c5d9c5b9ab2987/harmonypy-0.0.9.tar.gz",
"platform": null,
"description": "harmonypy\n=========\n\n[![Latest PyPI Version][pb]][pypi] [![PyPI Downloads][db]][pypi] [](https://zenodo.org/badge/latestdoi/229105533)\n\n[pb]: https://img.shields.io/pypi/v/harmonypy.svg\n[pypi]: https://pypi.org/project/harmonypy/\n\n[db]: https://img.shields.io/pypi/dm/harmonypy?label=pypi%20downloads\n\nHarmony is an algorithm for integrating multiple high-dimensional datasets.\n\nharmonypy is a port of the [harmony] R package by [Ilya Korsunsky].\n\nExample\n-------\n\n<p align=\"center\">\n <img src=\"https://i.imgur.com/lqReopf.gif\">\n</p>\n\nThis animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.\n\n[\u2192 How to make this animation.](https://slowkow.com/notes/harmony-animation/)\n\nInstallation\n------------\n\nThis package has been tested with Python 3.7.\n\nUse [pip] to install:\n\n```bash\npip install harmonypy\n```\n\nUsage\n-----\n\nHere is a brief example using the data that comes with the R package:\n\n```python\n# Load data\nimport pandas as pd\n\nmeta_data = pd.read_csv(\"data/meta.tsv.gz\", sep = \"\\t\")\nvars_use = ['dataset']\n\n# meta_data\n#\n# cell_id dataset nGene percent_mito cell_type\n# 0 half_TGAAATTGGTCTAG half 3664 0.017722 jurkat\n# 1 half_GCGATATGCTGATG half 3858 0.029228 t293\n# 2 half_ATTTCTCTCACTAG half 4049 0.015966 jurkat\n# 3 half_CGTAACGACGAGAG half 3443 0.020379 jurkat\n# 4 half_ACGCCTTGTTTACC half 2813 0.024774 t293\n# .. ... ... ... ... ...\n# 295 t293_TTACGTACGACACT t293 4152 0.033997 t293\n# 296 t293_TAGAATTGTTGGTG t293 3097 0.021769 t293\n# 297 t293_CGGATAACACCACA t293 3157 0.020411 t293\n# 298 t293_GGTACTGAGTCGAT t293 2685 0.027846 t293\n# 299 t293_ACGCTGCTTCTTAC t293 3513 0.021240 t293\n\ndata_mat = pd.read_csv(\"data/pcs.tsv.gz\", sep = \"\\t\")\ndata_mat = np.array(data_mat)\n\n# data_mat[:5,:5]\n#\n# array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025, 0.00028931],\n# [-0.011333 , 0.00022233, -0.00073589, -0.00192452, 0.0032624 ],\n# [ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],\n# [ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],\n# [-0.00953977, 0.00222714, -0.00374373, -0.00028554, 0.00063737]])\n\n# meta_data.shape # 300 cells, 5 variables\n# (300, 5)\n#\n# data_mat.shape # 300 cells, 20 PCs\n# (300, 20)\n\n# Run Harmony\nimport harmonypy as hm\nho = hm.run_harmony(data_mat, meta_data, vars_use)\n\n# Write the adjusted PCs to a new file.\nres = pd.DataFrame(ho.Z_corr)\nres.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])]\nres.to_csv(\"data/adj.tsv.gz\", sep = \"\\t\", index = False)\n```\n\n[harmony]: https://github.com/immunogenomics/harmony\n[Ilya Korsunsky]: https://github.com/ilyakorsunsky\n[pip]: https://pip.readthedocs.io/\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A data integration algorithm.",
"version": "0.0.9",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "f3bb2a0069527bc8212c990e57653fe1",
"sha256": "bc75fc47a8a426e93d4fae820dae7e464e4ef2a9f11f524157fe299a07565878"
},
"downloads": -1,
"filename": "harmonypy-0.0.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f3bb2a0069527bc8212c990e57653fe1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 20822,
"upload_time": "2022-11-23T14:56:39",
"upload_time_iso_8601": "2022-11-23T14:56:39.802802Z",
"url": "https://files.pythonhosted.org/packages/1f/8e/9c592ad543206d5d5eccc4d8c46749344320961dea77a4eab48142ec795a/harmonypy-0.0.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "db1b0001cba6f4d9bf818a59ec84c118",
"sha256": "85bfdd4e6ec6e0fa8816a276639358d3598a40d60ba9f7a5d9dada8706be8c4d"
},
"downloads": -1,
"filename": "harmonypy-0.0.9.tar.gz",
"has_sig": false,
"md5_digest": "db1b0001cba6f4d9bf818a59ec84c118",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 21382,
"upload_time": "2022-11-23T14:56:41",
"upload_time_iso_8601": "2022-11-23T14:56:41.657939Z",
"url": "https://files.pythonhosted.org/packages/16/42/14fbc85018c24b2a8a8ba776f8cbf6efe8a7df54b56d45c5d9c5b9ab2987/harmonypy-0.0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-11-23 14:56:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "slowkow",
"github_project": "harmonypy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.13"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.2"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"0.21"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"0.25"
]
]
}
],
"lcname": "harmonypy"
}