eeisp


Nameeeisp JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/nakatolab/EEISP
SummaryIdentify gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data.
upload_time2023-04-12 06:10:21
maintainer
docs_urlNone
authorNatsu Nakajima, Ryuichiro Nakato
requires_python>=3.6
licenseGPL3.0
keywords eeisp scrna-seq
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EEISP

EEISP identifies gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data. 
       
## Installation

    pip3 install eeisp

###  (Optional) Dependencies for GPU
EEISP requires [cupy](https://cupy.dev/) when using GPU computation `--gpu`. Use pip to install cupy like this (see [the manual](https://docs.cupy.dev/en/stable/install.html) for more detail).

    # For CUDA 9.2
    pip3 install cupy-cuda92
    # For CUDA 10.1
    pip3 install cupy-cuda101

If you do not use `--gpu`, you do not need to install cupy.

## Usage
EEISP takes a read count matrix as an input, in which rows and columns represent genes and cells, respectively. A gzipped file (.gz) is also acceptable.

   0. (Optional) Convert CellRanger output to an input matrix (require R and [Seurat](https://satijalab.org/seurat/) library)
       ```
         datadir="outs/filtered_feature_bc_matrix/"
         matrix="matrix.txt"
         R -e "library(Seurat); so <- Read10X('$datadir'); write.table(so, '$matrix', quote=F, sep=',', col.names=T)"
       ```

   1.  `eeisp` calculates the CDI and EEI scores for all gene pairs. The output contains lists of gene pairs that have CDI or EEI values above the specified threshold and the tables of degree distribution.
       ```
         usage: eeisp [-h] [--threCDI THRECDI] [--threEEI THREEEI] [--tsv] [--gpu] [-p THREADS] [-v] matrix output

         positional arguments:
           matrix                Input matrix
           output                Output prefix

         optional arguments:
           -h, --help            show this help message and exit
           --threCDI THRECDI     Threshold for CDI (default: 20.0)
           --threEEI THREEEI     Threshold for EEI (default: 10.0)
           --tsv                 Specify when the input file is tab-delimited (.tsv)
           --gpu                 GPU mode
           -p THREADS, --threads THREADS  number of threads (default: 2)
           -v, --version         show program's version number and exit
       ```  
   2. `eeisp_add_genename_from_geneid` add Gene Names (Symbols) to the output files of `eeisp`.
        ```
         usage: eeisp_add_genename_from_geneid [-h] [--i_id I_ID] [--i_name I_NAME] input output genelist

         positional arguments:
           input            Input matrix
           output           Output prefix
           genelist         Gene list

         optional arguments:
           -h, --help       show this help message and exit
           --i_id I_ID      column number of gene id (default: 0)
           --i_name I_NAME  column number of gene name (default: 1)
       ```
## Tutorial
The sample data is included in `sample` directory. 
   * `data.txt`: the input matrix of scRNA-seq data.
   * `genelidlist.txt`: the gene list for `eeisp_add_genename_from_geneid`.


    eeisp data.txt Sample --threCDI 0.5 --threEEI 0.5 -p 8
This command outputs gene pair lists that have CDI>0.5 or EEI>0.5. `-p 8` means 8 CPUs are used.

Supply `--gpu` option to GPU computation (require [cupy](https://www.preferred.jp/en/projects/cupy/)):

    eeisp data.txt Sample --threCDI 0.5 --threEEI 0.5 -p 8 --gpu
    
(Note: Since GPU computation covers a part of eeisp, it is better to use multiple CPUs even in `--gpu` mode for the fast computation.)

Output files are:
```
   Sample_CDI_score_data_thre0.5.txt            # A list of gene pairs with CDI score.  
   Sample_CDI_degree_distribution_thre0.5.csv   # A table of the number of CDI degree and genes.  
   Sample_EEI_score_data_thre0.5.txt            # A list of gene pairs with EEI scores.  
   Sample_EEI_degree_distribution_thre0.5.csv   # A table of the number of EEI degree and genes.
```
The output files might include gene ids only. 

```
   $ head Sample_CDI_score_data_thre0.5.txt
   2       7       ESG000003       ESG000008       0.96384320244841
   0       1       ESG000001       ESG000002       0.6852891560232545
   0       6       ESG000001       ESG000007       0.6852891560232545
   7       8       ESG000008       ESG000009       0.6852891560232545
   3       9       ESG000004       ESG000010       0.6469554204484568
   4       6       ESG100005       ESG000007       0.5258703930217091
```

If you want to add gene names (Symbols), use `eeisp_add_genename_from_geneid` with `geneidlist.txt`, which contains the pairs of gene ids and names.

```
 eeisp_add_genename_from_geneid \
     Sample_CDI_score_data_thre0.5.txt \
     Sample_CDI_score_data_thre0.5.addgenename.txt \
     geneidlist.txt
 eeisp_add_genename_from_geneid \
     Sample_EEI_score_data_thre0.5.txt \
     Sample_EEI_score_data_thre0.5.addgenename.txt \
     geneidlist.txt
```

The output files include gene names.

```
   $ head Sample_CDI_score_data_thre0.5.addgenename.txt
   2       7       ESG000003       ESG000008       OR4F5   FO538757.3      0.96384320244841
   0       1       ESG000001       ESG000002       RP11-34P13.3    FAM138A 0.6852891560232545
   0       6       ESG000001       ESG000007       RP11-34P13.3    RP11-34P13.9    0.6852891560232545
   7       8       ESG000008       ESG000009       FO538757.3      FO538757.2      0.6852891560232545
   3       9       ESG000004       ESG000010       RP11-34P13.7    AP006222.2      0.6469554204484568
   4       6       ESG100005       ESG000007       RP11-34P13.8    RP11-34P13.9    0.5258703930217091
```

## Reference
Nakajima N., Hayashi T., Fujiki K., Shirahige K., Akiyama T., Akutsu T. and Nakato R., [Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data](https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab601/6324613), *Nucleic Acids Research*, 2021.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nakatolab/EEISP",
    "name": "eeisp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "eeisp scRNA-seq",
    "author": "Natsu Nakajima, Ryuichiro Nakato",
    "author_email": "rnakato@iqb.u-tokyo.ac.jp",
    "download_url": "",
    "platform": null,
    "description": "# EEISP\n\nEEISP identifies gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data. \n       \n## Installation\n\n    pip3 install eeisp\n\n###  (Optional) Dependencies for GPU\nEEISP requires [cupy](https://cupy.dev/) when using GPU computation `--gpu`. Use pip to install cupy like this (see [the manual](https://docs.cupy.dev/en/stable/install.html) for more detail).\n\n    # For CUDA 9.2\n    pip3 install cupy-cuda92\n    # For CUDA 10.1\n    pip3 install cupy-cuda101\n\nIf you do not use `--gpu`, you do not need to install cupy.\n\n## Usage\nEEISP takes a read count matrix as an input, in which rows and columns represent genes and cells, respectively. A gzipped file (.gz) is also acceptable.\n\n   0. (Optional) Convert CellRanger output to an input matrix (require R and [Seurat](https://satijalab.org/seurat/) library)\n       ```\n         datadir=\"outs/filtered_feature_bc_matrix/\"\n         matrix=\"matrix.txt\"\n         R -e \"library(Seurat); so <- Read10X('$datadir'); write.table(so, '$matrix', quote=F, sep=',', col.names=T)\"\n       ```\n\n   1.  `eeisp` calculates the CDI and EEI scores for all gene pairs. The output contains lists of gene pairs that have CDI or EEI values above the specified threshold and the tables of degree distribution.\n       ```\n         usage: eeisp [-h] [--threCDI THRECDI] [--threEEI THREEEI] [--tsv] [--gpu] [-p THREADS] [-v] matrix output\n\n         positional arguments:\n           matrix                Input matrix\n           output                Output prefix\n\n         optional arguments:\n           -h, --help            show this help message and exit\n           --threCDI THRECDI     Threshold for CDI (default: 20.0)\n           --threEEI THREEEI     Threshold for EEI (default: 10.0)\n           --tsv                 Specify when the input file is tab-delimited (.tsv)\n           --gpu                 GPU mode\n           -p THREADS, --threads THREADS  number of threads (default: 2)\n           -v, --version         show program's version number and exit\n       ```  \n   2. `eeisp_add_genename_from_geneid` add Gene Names (Symbols) to the output files of `eeisp`.\n        ```\n         usage: eeisp_add_genename_from_geneid [-h] [--i_id I_ID] [--i_name I_NAME] input output genelist\n\n         positional arguments:\n           input            Input matrix\n           output           Output prefix\n           genelist         Gene list\n\n         optional arguments:\n           -h, --help       show this help message and exit\n           --i_id I_ID      column number of gene id (default: 0)\n           --i_name I_NAME  column number of gene name (default: 1)\n       ```\n## Tutorial\nThe sample data is included in `sample` directory. \n   * `data.txt`: the input matrix of scRNA-seq data.\n   * `genelidlist.txt`: the gene list for `eeisp_add_genename_from_geneid`.\n\n\n    eeisp data.txt Sample --threCDI 0.5 --threEEI 0.5 -p 8\nThis command outputs gene pair lists that have CDI>0.5 or EEI>0.5. `-p 8` means 8 CPUs are used.\n\nSupply `--gpu` option to GPU computation (require [cupy](https://www.preferred.jp/en/projects/cupy/)):\n\n    eeisp data.txt Sample --threCDI 0.5 --threEEI 0.5 -p 8 --gpu\n    \n(Note: Since GPU computation covers a part of eeisp, it is better to use multiple CPUs even in `--gpu` mode for the fast computation.)\n\nOutput files are:\n```\n   Sample_CDI_score_data_thre0.5.txt            # A list of gene pairs with CDI score.  \n   Sample_CDI_degree_distribution_thre0.5.csv   # A table of the number of CDI degree and genes.  \n   Sample_EEI_score_data_thre0.5.txt            # A list of gene pairs with EEI scores.  \n   Sample_EEI_degree_distribution_thre0.5.csv   # A table of the number of EEI degree and genes.\n```\nThe output files might include gene ids only. \n\n```\n   $ head Sample_CDI_score_data_thre0.5.txt\n   2       7       ESG000003       ESG000008       0.96384320244841\n   0       1       ESG000001       ESG000002       0.6852891560232545\n   0       6       ESG000001       ESG000007       0.6852891560232545\n   7       8       ESG000008       ESG000009       0.6852891560232545\n   3       9       ESG000004       ESG000010       0.6469554204484568\n   4       6       ESG100005       ESG000007       0.5258703930217091\n```\n\nIf you want to add gene names (Symbols), use `eeisp_add_genename_from_geneid` with `geneidlist.txt`, which contains the pairs of gene ids and names.\n\n```\n eeisp_add_genename_from_geneid \\\n     Sample_CDI_score_data_thre0.5.txt \\\n     Sample_CDI_score_data_thre0.5.addgenename.txt \\\n     geneidlist.txt\n eeisp_add_genename_from_geneid \\\n     Sample_EEI_score_data_thre0.5.txt \\\n     Sample_EEI_score_data_thre0.5.addgenename.txt \\\n     geneidlist.txt\n```\n\nThe output files include gene names.\n\n```\n   $ head Sample_CDI_score_data_thre0.5.addgenename.txt\n   2       7       ESG000003       ESG000008       OR4F5   FO538757.3      0.96384320244841\n   0       1       ESG000001       ESG000002       RP11-34P13.3    FAM138A 0.6852891560232545\n   0       6       ESG000001       ESG000007       RP11-34P13.3    RP11-34P13.9    0.6852891560232545\n   7       8       ESG000008       ESG000009       FO538757.3      FO538757.2      0.6852891560232545\n   3       9       ESG000004       ESG000010       RP11-34P13.7    AP006222.2      0.6469554204484568\n   4       6       ESG100005       ESG000007       RP11-34P13.8    RP11-34P13.9    0.5258703930217091\n```\n\n## Reference\nNakajima N., Hayashi T., Fujiki K., Shirahige K., Akiyama T., Akutsu T. and Nakato R., [Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data](https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab601/6324613), *Nucleic Acids Research*, 2021.\n",
    "bugtrack_url": null,
    "license": "GPL3.0",
    "summary": "Identify gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data.",
    "version": "0.5.0",
    "split_keywords": [
        "eeisp",
        "scrna-seq"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3be3c31759115a7c6d0bba728bdf5ff15f55de4247fc0bdc5fb99ef58905b9f1",
                "md5": "497524125165da6b16d057a268b7e5e1",
                "sha256": "1524a84f50f8fddb8206f79fa3cdd23dd91d91c18d8ff74e57df023ab866ea6b"
            },
            "downloads": -1,
            "filename": "eeisp-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "497524125165da6b16d057a268b7e5e1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 19370,
            "upload_time": "2023-04-12T06:10:21",
            "upload_time_iso_8601": "2023-04-12T06:10:21.801816Z",
            "url": "https://files.pythonhosted.org/packages/3b/e3/c31759115a7c6d0bba728bdf5ff15f55de4247fc0bdc5fb99ef58905b9f1/eeisp-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-12 06:10:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "nakatolab",
    "github_project": "EEISP",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "eeisp"
}
        
Elapsed time: 0.05451s