harmonypy


Nameharmonypy JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/slowkow/harmonypy
SummaryA data integration algorithm.
upload_time2022-11-23 14:56:41
maintainer
docs_urlNone
authorKamil Slowikowski
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements numpy scipy scikit-learn pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            harmonypy
=========

[![Latest PyPI Version][pb]][pypi] [![PyPI Downloads][db]][pypi] [![DOI](https://zenodo.org/badge/229105533.svg)](https://zenodo.org/badge/latestdoi/229105533)

[pb]: https://img.shields.io/pypi/v/harmonypy.svg
[pypi]: https://pypi.org/project/harmonypy/

[db]: https://img.shields.io/pypi/dm/harmonypy?label=pypi%20downloads

Harmony is an algorithm for integrating multiple high-dimensional datasets.

harmonypy is a port of the [harmony] R package by [Ilya Korsunsky].

Example
-------

<p align="center">
  <img src="https://i.imgur.com/lqReopf.gif">
</p>

This animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.

[→ How to make this animation.](https://slowkow.com/notes/harmony-animation/)

Installation
------------

This package has been tested with Python 3.7.

Use [pip] to install:

```bash
pip install harmonypy
```

Usage
-----

Here is a brief example using the data that comes with the R package:

```python
# Load data
import pandas as pd

meta_data = pd.read_csv("data/meta.tsv.gz", sep = "\t")
vars_use = ['dataset']

# meta_data
#
#                  cell_id dataset  nGene  percent_mito cell_type
# 0    half_TGAAATTGGTCTAG    half   3664      0.017722    jurkat
# 1    half_GCGATATGCTGATG    half   3858      0.029228      t293
# 2    half_ATTTCTCTCACTAG    half   4049      0.015966    jurkat
# 3    half_CGTAACGACGAGAG    half   3443      0.020379    jurkat
# 4    half_ACGCCTTGTTTACC    half   2813      0.024774      t293
# ..                   ...     ...    ...           ...       ...
# 295  t293_TTACGTACGACACT    t293   4152      0.033997      t293
# 296  t293_TAGAATTGTTGGTG    t293   3097      0.021769      t293
# 297  t293_CGGATAACACCACA    t293   3157      0.020411      t293
# 298  t293_GGTACTGAGTCGAT    t293   2685      0.027846      t293
# 299  t293_ACGCTGCTTCTTAC    t293   3513      0.021240      t293

data_mat = pd.read_csv("data/pcs.tsv.gz", sep = "\t")
data_mat = np.array(data_mat)

# data_mat[:5,:5]
#
# array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025,  0.00028931],
#        [-0.011333  ,  0.00022233, -0.00073589, -0.00192452,  0.0032624 ],
#        [ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],
#        [ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],
#        [-0.00953977,  0.00222714, -0.00374373, -0.00028554,  0.00063737]])

# meta_data.shape # 300 cells, 5 variables
# (300, 5)
#
# data_mat.shape  # 300 cells, 20 PCs
# (300, 20)

# Run Harmony
import harmonypy as hm
ho = hm.run_harmony(data_mat, meta_data, vars_use)

# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])]
res.to_csv("data/adj.tsv.gz", sep = "\t", index = False)
```

[harmony]: https://github.com/immunogenomics/harmony
[Ilya Korsunsky]: https://github.com/ilyakorsunsky
[pip]: https://pip.readthedocs.io/




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/slowkow/harmonypy",
    "name": "harmonypy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Kamil Slowikowski",
    "author_email": "kslowikowski@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/16/42/14fbc85018c24b2a8a8ba776f8cbf6efe8a7df54b56d45c5d9c5b9ab2987/harmonypy-0.0.9.tar.gz",
    "platform": null,
    "description": "harmonypy\n=========\n\n[![Latest PyPI Version][pb]][pypi] [![PyPI Downloads][db]][pypi] [![DOI](https://zenodo.org/badge/229105533.svg)](https://zenodo.org/badge/latestdoi/229105533)\n\n[pb]: https://img.shields.io/pypi/v/harmonypy.svg\n[pypi]: https://pypi.org/project/harmonypy/\n\n[db]: https://img.shields.io/pypi/dm/harmonypy?label=pypi%20downloads\n\nHarmony is an algorithm for integrating multiple high-dimensional datasets.\n\nharmonypy is a port of the [harmony] R package by [Ilya Korsunsky].\n\nExample\n-------\n\n<p align=\"center\">\n  <img src=\"https://i.imgur.com/lqReopf.gif\">\n</p>\n\nThis animation shows the Harmony alignment of three single-cell RNA-seq datasets from different donors.\n\n[\u2192 How to make this animation.](https://slowkow.com/notes/harmony-animation/)\n\nInstallation\n------------\n\nThis package has been tested with Python 3.7.\n\nUse [pip] to install:\n\n```bash\npip install harmonypy\n```\n\nUsage\n-----\n\nHere is a brief example using the data that comes with the R package:\n\n```python\n# Load data\nimport pandas as pd\n\nmeta_data = pd.read_csv(\"data/meta.tsv.gz\", sep = \"\\t\")\nvars_use = ['dataset']\n\n# meta_data\n#\n#                  cell_id dataset  nGene  percent_mito cell_type\n# 0    half_TGAAATTGGTCTAG    half   3664      0.017722    jurkat\n# 1    half_GCGATATGCTGATG    half   3858      0.029228      t293\n# 2    half_ATTTCTCTCACTAG    half   4049      0.015966    jurkat\n# 3    half_CGTAACGACGAGAG    half   3443      0.020379    jurkat\n# 4    half_ACGCCTTGTTTACC    half   2813      0.024774      t293\n# ..                   ...     ...    ...           ...       ...\n# 295  t293_TTACGTACGACACT    t293   4152      0.033997      t293\n# 296  t293_TAGAATTGTTGGTG    t293   3097      0.021769      t293\n# 297  t293_CGGATAACACCACA    t293   3157      0.020411      t293\n# 298  t293_GGTACTGAGTCGAT    t293   2685      0.027846      t293\n# 299  t293_ACGCTGCTTCTTAC    t293   3513      0.021240      t293\n\ndata_mat = pd.read_csv(\"data/pcs.tsv.gz\", sep = \"\\t\")\ndata_mat = np.array(data_mat)\n\n# data_mat[:5,:5]\n#\n# array([[ 0.0071695 , -0.00552724, -0.0036281 , -0.00798025,  0.00028931],\n#        [-0.011333  ,  0.00022233, -0.00073589, -0.00192452,  0.0032624 ],\n#        [ 0.0091214 , -0.00940727, -0.00106816, -0.0042749 , -0.00029096],\n#        [ 0.00866286, -0.00514987, -0.0008989 , -0.00821785, -0.00126997],\n#        [-0.00953977,  0.00222714, -0.00374373, -0.00028554,  0.00063737]])\n\n# meta_data.shape # 300 cells, 5 variables\n# (300, 5)\n#\n# data_mat.shape  # 300 cells, 20 PCs\n# (300, 20)\n\n# Run Harmony\nimport harmonypy as hm\nho = hm.run_harmony(data_mat, meta_data, vars_use)\n\n# Write the adjusted PCs to a new file.\nres = pd.DataFrame(ho.Z_corr)\nres.columns = ['X{}'.format(i + 1) for i in range(res.shape[1])]\nres.to_csv(\"data/adj.tsv.gz\", sep = \"\\t\", index = False)\n```\n\n[harmony]: https://github.com/immunogenomics/harmony\n[Ilya Korsunsky]: https://github.com/ilyakorsunsky\n[pip]: https://pip.readthedocs.io/\n\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A data integration algorithm.",
    "version": "0.0.9",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "f3bb2a0069527bc8212c990e57653fe1",
                "sha256": "bc75fc47a8a426e93d4fae820dae7e464e4ef2a9f11f524157fe299a07565878"
            },
            "downloads": -1,
            "filename": "harmonypy-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f3bb2a0069527bc8212c990e57653fe1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 20822,
            "upload_time": "2022-11-23T14:56:39",
            "upload_time_iso_8601": "2022-11-23T14:56:39.802802Z",
            "url": "https://files.pythonhosted.org/packages/1f/8e/9c592ad543206d5d5eccc4d8c46749344320961dea77a4eab48142ec795a/harmonypy-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "db1b0001cba6f4d9bf818a59ec84c118",
                "sha256": "85bfdd4e6ec6e0fa8816a276639358d3598a40d60ba9f7a5d9dada8706be8c4d"
            },
            "downloads": -1,
            "filename": "harmonypy-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "db1b0001cba6f4d9bf818a59ec84c118",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 21382,
            "upload_time": "2022-11-23T14:56:41",
            "upload_time_iso_8601": "2022-11-23T14:56:41.657939Z",
            "url": "https://files.pythonhosted.org/packages/16/42/14fbc85018c24b2a8a8ba776f8cbf6efe8a7df54b56d45c5d9c5b9ab2987/harmonypy-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-11-23 14:56:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "slowkow",
    "github_project": "harmonypy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.13"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "0.21"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "0.25"
                ]
            ]
        }
    ],
    "lcname": "harmonypy"
}
        
Elapsed time: 0.04674s