scalex


Namescalex JSON
Version 1.0.4 PyPI version JSON
download
home_pageNone
SummaryOnline single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
upload_time2024-06-02 21:25:09
maintainerNone
docs_urlNone
authorLei Xiong
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Stars](https://img.shields.io/github/stars/jsxlei/scalex?logo=GitHub&color=yellow)](https://github.com/jsxlei/scalex/stargazers)
[![PyPI](https://img.shields.io/pypi/v/scalex.svg)](https://pypi.org/project/scalex)
[![Documentation Status](https://readthedocs.org/projects/scalex/badge/?version=latest)](https://scalex.readthedocs.io/en/latest/?badge=stable)
[![Downloads](https://pepy.tech/badge/scalex)](https://pepy.tech/project/scalex)
[![DOI](https://zenodo.org/badge/345941713.svg)](https://zenodo.org/badge/latestdoi/345941713)
# [Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space](https://www.nature.com/articles/s41467-022-33758-z)

![](docs/source/_static/img/scalex.jpg)


## News
#### [2022-10-17] SCALEX is online at [Nature Communications](https://www.nature.com/articles/s41467-022-33758-z)

## [Documentation](https://scalex.readthedocs.io/en/latest/index.html) 
## [Tutorial](https://scalex.readthedocs.io/en/latest/tutorial/index.html) 
## Installation  	
#### install from PyPI

    pip install scalex
    
#### install from GitHub
install the latest develop version

    pip install git+https://github.com/jsxlei/scalex.git

or git clone and install

    git clone git://github.com/jsxlei/scalex.git
    cd scalex
    python setup.py install
    
SCALEX is implemented in [Pytorch](https://pytorch.org/) framework.  
SCALEX can be run on CPU devices, and running SCALEX on GPU devices if available is recommended.   

## Getting started

SCALEX can both used under command line and API function in jupyter notebook   
Please refer to the [Documentation](https://readthedocs.org/projects/scalex/badge/?version=latest) and [Tutorial](https://scalex.readthedocs.io/en/latest/tutorial/index.html)


### 1. API function

    from scalex import SCALEX
    adata = SCALEX(data_list, batch_categories)
    
Function of parameters are similar to command line options.  
Output is a Anndata object for further analysis with scanpy.  
`data_list` can be 
* data_path, file format included txt, csv, h5ad, h5mu/rna, h5mu/atac, dir contains mtx
* list of data_paths
* [Anndata]((https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData))
* list of [AnnData]((https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData))
* above mixed

`batch_categories` is optional, name of each batch, will be range from 0 to N-1 if not provided

### 2. Command line
#### Standard usage


    SCALEX --data_list data1 data2 dataN --batch_categories batch_name1 batch_name2 batch_nameN 
    
    
`--data_list`: data path of each batch of single-cell dataset, use `-d` for short

`--batch_categories`: name of each batch, batch_categories will range from 0 to N-1 if not specified

    
#### Output
Output will be saved in the output folder including:
* **checkpoint**:  saved model to reproduce results cooperated with option --checkpoint or -c
* **[adata.h5ad](https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData)**:  preprocessed data and results including, latent, clustering and imputation
* **umap.png**:  UMAP visualization of latent representations of cells 
* **log.txt**:  log file of training process

### Other Common Usage
#### Use h5ad file storing `anndata` as input, one or multiple separated files

    SCALEX --data_list <filename.h5ad>

#### Specify batch in `anadata.obs` using `--batch_name` if only one concatenated h5ad file provided, batch_name can be e.g. conditions, samples, assays or patients, default is `batch`

    SCALEX --data_list <filename.h5ad> --batch_name <specific_batch_name>
    
    
#### Integrate heterogenous scATAC-seq datasets, add option `--profile` ATAC
        
    SCALEX --data_list <filename.h5ad> --profile ATAC
    
#### Inputation simultaneously along with Integration, add option `--impute`, results are stored at anndata.layers['impute']

    SCALEX --data_list <atac_filename.h5ad> --profile ATAC --impute True
    
    
#### Custom features through `--n_top_features` a filename contains features in one column format read

    SCALEX --data_list <filename.h5ad> --n_top_features features.txt
    
#### Use preprocessed data `--processed`

    SCALEX --data_list <filename.h5ad> --processed

#### Option

* --**data_list**  
        A list of matrices file (each as a `batch`) or a single batch/batch-merged file.
* --**batch_categories**  
        Categories for the batch annotation. By default, use increasing numbers if not given
* --**batch_name**  
        Use this annotation in anndata.obs as batches for training model. Default: 'batch'.
* --**profile**  
        Specify the single-cell profile, RNA or ATAC. Default: RNA.
* --**min_features**  
        Filtered out cells that are detected in less than min_features. Default: 600 for RNA, 100 for ATAC.
* --**min_cells**  
        Filtered out genes that are detected in less than min_cells. Default: 3.
* --**n_top_features**  
        Number of highly-variable genes to keep. Default: 2000 for RNA, 30000 for ATAC.
* --**outdir**  
        Output directory. Default: 'output/'.
* --**projection**  
        Use for new dataset projection. Input the folder containing the pre-trained model. Default: None. 
* --**impute**  
        If True, calculate the imputed gene expression and store it at adata.layers['impute']. Default: False.
* --**chunk_size**  
        Number of samples from the same batch to transform. Default: 20000.
* --**ignore_umap**  
        If True, do not perform UMAP for visualization and leiden for clustering. Default: False.
* --**join**  
        Use intersection ('inner') or union ('outer') of variables of different batches. 
* --**batch_key**  
        Add the batch annotation to obs using this key. By default, batch_key='batch'.
* --**batch_size**  
        Number of samples per batch to load. Default: 64.
* --**lr**  
        Learning rate. Default: 2e-4.
* --**max_iteration**  
        Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.
* --**seed**  
        Random seed for torch and numpy. Default: 124.
* --**gpu**  
        Index of GPU to use if GPU is available. Default: 0.
* --**verbose**  
        Verbosity, True or False. Default: False.
    

	
    
#### Help
Look for more usage of SCALEX

	SCALEX.py --help 
    
    
## Release notes

See the [changelog](https://github.com/jsxlei/SCALEX/CHANGELOG.md).  


## Citation

Xiong, L., Tian, K., Li, Y., Ning, W., Gao, X., & Zhang, Q. C. (2022). Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nature Communications, 13(1), 6118. https://doi.org/10.1038/s41467-022-33758-z

    
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scalex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Lei Xiong",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/4d/45/4c58807fcdc3408ca94fdf8daeb6879a8ebd2ada42f64b3214380dbf4932/scalex-1.0.4.tar.gz",
    "platform": null,
    "description": "[![Stars](https://img.shields.io/github/stars/jsxlei/scalex?logo=GitHub&color=yellow)](https://github.com/jsxlei/scalex/stargazers)\n[![PyPI](https://img.shields.io/pypi/v/scalex.svg)](https://pypi.org/project/scalex)\n[![Documentation Status](https://readthedocs.org/projects/scalex/badge/?version=latest)](https://scalex.readthedocs.io/en/latest/?badge=stable)\n[![Downloads](https://pepy.tech/badge/scalex)](https://pepy.tech/project/scalex)\n[![DOI](https://zenodo.org/badge/345941713.svg)](https://zenodo.org/badge/latestdoi/345941713)\n# [Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space](https://www.nature.com/articles/s41467-022-33758-z)\n\n![](docs/source/_static/img/scalex.jpg)\n\n\n## News\n#### [2022-10-17] SCALEX is online at [Nature Communications](https://www.nature.com/articles/s41467-022-33758-z)\n\n## [Documentation](https://scalex.readthedocs.io/en/latest/index.html) \n## [Tutorial](https://scalex.readthedocs.io/en/latest/tutorial/index.html) \n## Installation  \t\n#### install from PyPI\n\n    pip install scalex\n    \n#### install from GitHub\ninstall the latest develop version\n\n    pip install git+https://github.com/jsxlei/scalex.git\n\nor git clone and install\n\n    git clone git://github.com/jsxlei/scalex.git\n    cd scalex\n    python setup.py install\n    \nSCALEX is implemented in [Pytorch](https://pytorch.org/) framework.  \nSCALEX can be run on CPU devices, and running SCALEX on GPU devices if available is recommended.   \n\n## Getting started\n\nSCALEX can both used under command line and API function in jupyter notebook   \nPlease refer to the [Documentation](https://readthedocs.org/projects/scalex/badge/?version=latest) and [Tutorial](https://scalex.readthedocs.io/en/latest/tutorial/index.html)\n\n\n### 1. API function\n\n    from scalex import SCALEX\n    adata = SCALEX(data_list, batch_categories)\n    \nFunction of parameters are similar to command line options.  \nOutput is a Anndata object for further analysis with scanpy.  \n`data_list` can be \n* data_path, file format included txt, csv, h5ad, h5mu/rna, h5mu/atac, dir contains mtx\n* list of data_paths\n* [Anndata]((https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData))\n* list of [AnnData]((https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData))\n* above mixed\n\n`batch_categories` is optional, name of each batch, will be range from 0 to N-1 if not provided\n\n### 2. Command line\n#### Standard usage\n\n\n    SCALEX --data_list data1 data2 dataN --batch_categories batch_name1 batch_name2 batch_nameN \n    \n    \n`--data_list`: data path of each batch of single-cell dataset, use `-d` for short\n\n`--batch_categories`: name of each batch, batch_categories will range from 0 to N-1 if not specified\n\n    \n#### Output\nOutput will be saved in the output folder including:\n* **checkpoint**:  saved model to reproduce results cooperated with option --checkpoint or -c\n* **[adata.h5ad](https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData)**:  preprocessed data and results including, latent, clustering and imputation\n* **umap.png**:  UMAP visualization of latent representations of cells \n* **log.txt**:  log file of training process\n\n### Other Common Usage\n#### Use h5ad file storing `anndata` as input, one or multiple separated files\n\n    SCALEX --data_list <filename.h5ad>\n\n#### Specify batch in `anadata.obs` using `--batch_name` if only one concatenated h5ad file provided, batch_name can be e.g. conditions, samples, assays or patients, default is `batch`\n\n    SCALEX --data_list <filename.h5ad> --batch_name <specific_batch_name>\n    \n    \n#### Integrate heterogenous scATAC-seq datasets, add option `--profile` ATAC\n        \n    SCALEX --data_list <filename.h5ad> --profile ATAC\n    \n#### Inputation simultaneously along with Integration, add option `--impute`, results are stored at anndata.layers['impute']\n\n    SCALEX --data_list <atac_filename.h5ad> --profile ATAC --impute True\n    \n    \n#### Custom features through `--n_top_features` a filename contains features in one column format read\n\n    SCALEX --data_list <filename.h5ad> --n_top_features features.txt\n    \n#### Use preprocessed data `--processed`\n\n    SCALEX --data_list <filename.h5ad> --processed\n\n#### Option\n\n* --**data_list**  \n        A list of matrices file (each as a `batch`) or a single batch/batch-merged file.\n* --**batch_categories**  \n        Categories for the batch annotation. By default, use increasing numbers if not given\n* --**batch_name**  \n        Use this annotation in anndata.obs as batches for training model. Default: 'batch'.\n* --**profile**  \n        Specify the single-cell profile, RNA or ATAC. Default: RNA.\n* --**min_features**  \n        Filtered out cells that are detected in less than min_features. Default: 600 for RNA, 100 for ATAC.\n* --**min_cells**  \n        Filtered out genes that are detected in less than min_cells. Default: 3.\n* --**n_top_features**  \n        Number of highly-variable genes to keep. Default: 2000 for RNA, 30000 for ATAC.\n* --**outdir**  \n        Output directory. Default: 'output/'.\n* --**projection**  \n        Use for new dataset projection. Input the folder containing the pre-trained model. Default: None. \n* --**impute**  \n        If True, calculate the imputed gene expression and store it at adata.layers['impute']. Default: False.\n* --**chunk_size**  \n        Number of samples from the same batch to transform. Default: 20000.\n* --**ignore_umap**  \n        If True, do not perform UMAP for visualization and leiden for clustering. Default: False.\n* --**join**  \n        Use intersection ('inner') or union ('outer') of variables of different batches. \n* --**batch_key**  \n        Add the batch annotation to obs using this key. By default, batch_key='batch'.\n* --**batch_size**  \n        Number of samples per batch to load. Default: 64.\n* --**lr**  \n        Learning rate. Default: 2e-4.\n* --**max_iteration**  \n        Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.\n* --**seed**  \n        Random seed for torch and numpy. Default: 124.\n* --**gpu**  \n        Index of GPU to use if GPU is available. Default: 0.\n* --**verbose**  \n        Verbosity, True or False. Default: False.\n    \n\n\t\n    \n#### Help\nLook for more usage of SCALEX\n\n\tSCALEX.py --help \n    \n    \n## Release notes\n\nSee the [changelog](https://github.com/jsxlei/SCALEX/CHANGELOG.md).  \n\n\n## Citation\n\nXiong, L., Tian, K., Li, Y., Ning, W., Gao, X., & Zhang, Q. C. (2022). Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nature Communications, 13(1), 6118. https://doi.org/10.1038/s41467-022-33758-z\n\n    ",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space",
    "version": "1.0.4",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab9b6745a16efe50a2cba607d70cddaa15b7772332f3a8dc05c0e5290f724d14",
                "md5": "4773ad518ae1157aa4ce16867d375dbe",
                "sha256": "a98fe01cd89601675fd198c75e2afbd28ed2c7124eb40e1b4d9208bbd627e006"
            },
            "downloads": -1,
            "filename": "scalex-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4773ad518ae1157aa4ce16867d375dbe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 25946,
            "upload_time": "2024-06-02T21:25:07",
            "upload_time_iso_8601": "2024-06-02T21:25:07.066221Z",
            "url": "https://files.pythonhosted.org/packages/ab/9b/6745a16efe50a2cba607d70cddaa15b7772332f3a8dc05c0e5290f724d14/scalex-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4d454c58807fcdc3408ca94fdf8daeb6879a8ebd2ada42f64b3214380dbf4932",
                "md5": "2cd7f07a52ee29bd18018457da3d58e2",
                "sha256": "af821d3c8dd98416376f20d9ab30554c7d182107f3d45d43ca4a819bf002427a"
            },
            "downloads": -1,
            "filename": "scalex-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2cd7f07a52ee29bd18018457da3d58e2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 4047711,
            "upload_time": "2024-06-02T21:25:09",
            "upload_time_iso_8601": "2024-06-02T21:25:09.440030Z",
            "url": "https://files.pythonhosted.org/packages/4d/45/4c58807fcdc3408ca94fdf8daeb6879a8ebd2ada42f64b3214380dbf4932/scalex-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-02 21:25:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scalex"
}
        
Elapsed time: 3.24646s