IPFMC


NameIPFMC JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/BioLemon/IPFMC
SummaryA tool for interpretable multi-omics integrated clustering.
upload_time2024-09-10 07:05:15
maintainerNone
docs_urlNone
authorHaoyang Zhang
requires_python>=3.6.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# IPFMC

## Brief description of each module

This section provides a brief description of each module, for detailed description of parameters of each method, see function descriptions.

### ‘direct’ module

This module provides methods to directly perform integrated cancer multi-omics clustering using IPFMC.

(1) **direct.ipfmc_discretize()**: Implementation of strategy 1 for IPFMC.

(2) **direct.ipfmc_average()**: Implementation of strategy 2 for IPFMC.

(3) **direct.spec_cluster()**: Generate spectral clustering results in cluster labels with sample indexes.

(4) **direct.suggest_k()**: Gives a suggested number of clusters according to the silhouette coefficient.

### ‘separate’ module

This module has roughly the same function as the direct module, but the two strategies of ipfmc accept single omics data and return the single omics representation and pathway ranking. Users can use similarity network fusion(SNF) to fuse each single omics representation to obtain multi-omics representation.

(1) **separate.ipfmc_discretize()**: Implementation of strategy 1 for IPFMC.

(2) **separate.ipfmc_average()**: Implementation of strategy 2 for IPFMC.

(3) **separate.spec_cluster()**: Generate spectral clustering results in cluster labels with sample indexes.

(4) **separate.suggest_k()**: Gives a suggested number of clusters according to the silhouette coefficient.

### ‘analysis’ module

This module provides some functions for pathway data processing and downstream analysis.

## Simple Test Case

This section provides sample code for multi-omics data integration clustering and biological interpretation using the package, and you can change some of the variables to apply it to your own dataset.

### Import neccessary packages

```python
import pandas as pd
import numpy as np
from snf import snf
from IPFMC import direct
from IPFMC import separate
```

### Input datasets

1. Omics data

   All standard input omics data should be a csv file with one feature in each row and one sample in each column. The first row should be the sample name and the first column should be the gene name. (For other omics data besides miRNA and mRNA expression data, such as methylation, copy number variation, etc., the features should be mapped to genes and converted to gene names before being used as IPFMC input data).

2. Pathway data

   In addition to omics data, it is also necessary to input the gene information data contained in the general pathway. If your omics data includes miRNA omics, you also need to input the corresponding relationship data between miRNA and pathway.

Code is as follows:

```python
# Filepath of the omics data, ‘LUAD’ is the folder contains omics datas of LUAD cancer
Omic_dir = './Omics/LUAD'  
# Filepath of the pathway index
BP_dir = './Pathways/Pathway_Index.csv'
# Filepath of the miRNA pathway index
mirBP_dir = './Pathways/miRNA_Pathway_Index.csv'
datatypes = ['mRNA','Methy','CNV','miRNA']  # The type of data to be used in the experiment
omic_list = []  # A list for storing multiple omics data
BP_data = pd.read_csv(BP_dir,index_col=0)  # The pandas package is used to pass in the pathway data
mirBP_data = pd.read_csv(mirBP_dir,index_col=0)  # Pass in the pathway-mirna relationship data
for datatype in datatypes:
    '''
    We named the omics data <cancer name>_<data type>.csv, for example, LUAD_mRNA.csv
    You can change it according to your habits
    '''
    omicdata = pd.read_csv(f'{Omic_dir}/LUAD_{datatype}.csv',index_col=0)
    omic_list.append(omicdata)
```

The file structure used in the sample code is as follows:

```bash
.
├── Omics
│   └── LUAD
│       ├── LUAD_mRNA.csv
│       ├── LUAD_miRNA.csv
│       ├── LUAD_Methy.csv
│       └── LUAD_CNV.csv
└── Pathways
    ├── Pathway_Index.csv
    └── miRNA_Pathway_Index.csv
└── script.py
```

Where script.py is the python script currently in use. You can also personalize the data by changing the path of each file, but the key is to use the read_csv provided by pandas and make sure that the row index of omics data is the feature name, the column index is the sample name, and the row index of pathway data is the pathway name.

### Acquisition of single/multi-omics data representation

After obtaining all the necessary data, we can input them into IPFMC for multi-omics data integration. This will produce the multi-omics integrated representation and the ranking of the filtered retained pathway for each omics. In this step, IPFMC offers two modalities, each with two strategies. We use strategy 1 of IPFMC as an example to illustrate its usage. We showed two approaches (direct integration and separate computation) to obtain the multi-omics representation.

#### directly input the multi-omics data list and obtain the multi-omics representation 

You can choose to use a direct multi-omics integration strategy. This requires importing the direct module. Here's the code (The ‘omic_list’, ‘BP_data’ and ‘mirBP_data’ variable obtained earlier are used in this step):

```python
represent, pathways = direct.ipfmc_discretize(omic_list,BP_data,mirna=True,mirtarinfo=mirBP_data)
"""
	represent: Integrated representation of multi-omics data calculated by IPFMC
	pathways: The pathway ranking of each omics calculated by IPFMC (each omics has a path ranking), in the same order as the order of the omics in the input omic_list
"""
```

Detailed Parameters of ‘direct.ipfmc_discretize()’ are listed below:

```python
"""
    :param datasets: List of your multi-omics datasets, each element of the list should be a pandas dataframe.
    :param pathwayinfo: Pathways and their containing genetic information.
    :param k: The number of initial points of kmeans clustering
    :param fusetime: Number of pathway screening and fusion performed
    :param proportion: The proportion of pathways that were retained was fused at each iteration
    :param snfk: Number of SNF neighborhoods when multiple data sets are fused
    :param seed: Random number seed, set to None if no seed is needed
    :param mirtarinfo: miRNA target gene information, valid only if miRNA data is included in the dataset
    :param mirna: Set to True if your dataset contains mirna data, and False otherwise
    :return: Final representation of input datasets; a list of pathway rankings of each dataset.
"""
```

**If your datasets contains miRNA expression data, please make sure the ‘mirna’ parameter is set to ‘True’, and the miRNA expression data must be the last element of ‘omic_list’ variable, ‘mirtarinfo’ must be set to the variable that contains miRNA-pathway relationship data.**

#### Compute the representation of each single omics separately

You can also choose to obtain single omics representation for each omics and then using SNF integration. 

```python
represents = []
pathways_list = []
# Only the first three data sets are processed here, and the last data set is miRNA, which needs to be processed separately
for i in range(3):  
    represent, pathways = separate.ipfmc_discretize(omic_list[i], BP_data)
    represents.append(np.array(represent))
    print(represent)
    pathways_list.append(pathways)

represent, pathways = separate.ipfmc_discretize(omic_list[3], mirBP_data)  # Here processes miRNA dataset
represents.append(np.array(represent))
pathways_list.append(pathways)
represent_final = snf(represents, K=15)  # 'represent_final' is the final multi-omics representation
```

We recommend using this approach because computing the representation of each single-omics separately is more flexible in performing downstream tasks and has fewer parameters to consider.

### Clustering using multi-omics representation

You can directly select number of clusters and use the code below to obtain cluster labels:

```python
labels = separate.spec_cluster(omic_list[0],fusion_matrix=represent_final,k=4)  # Here we set number of clusters to 4
# 'labels' is the cluster labels of input multi-omics datasets.
```

(The first parameter can be any element in ‘omic_list’. It is used to retrieve the sample name)

Or you can use the function we provide to recommend a suggested number of clusters.

```python
K = separate.suggest_k(represent_final)  # input the final representation, and this function will give a suggested cluster
labels = separate.spec_cluster(omic_list[0],fusion_matrix=represent_final,k=K)
```

Then you can use the obtained cluster labels to perform all kinds of analysis.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BioLemon/IPFMC",
    "name": "IPFMC",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": null,
    "keywords": null,
    "author": "Haoyang Zhang",
    "author_email": "2609668279@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/6e/b3/0e7ef344a95aefe242720abc71301a9c3bee5b7bde658790a8a74d82c1d9/IPFMC-1.1.1.tar.gz",
    "platform": null,
    "description": "\r\n# IPFMC\r\n\r\n## Brief description of each module\r\n\r\nThis section provides a brief description of each module, for detailed description of parameters of each method, see function descriptions.\r\n\r\n### \u2018direct\u2019 module\r\n\r\nThis module provides methods to directly perform integrated cancer multi-omics clustering using IPFMC.\r\n\r\n(1) **direct.ipfmc_discretize()**: Implementation of strategy 1 for IPFMC.\r\n\r\n(2) **direct.ipfmc_average()**: Implementation of strategy 2 for IPFMC.\r\n\r\n(3) **direct.spec_cluster()**: Generate spectral clustering results in cluster labels with sample indexes.\r\n\r\n(4) **direct.suggest_k()**: Gives a suggested number of clusters according to the silhouette coefficient.\r\n\r\n### \u2018separate\u2019 module\r\n\r\nThis module has roughly the same function as the direct module, but the two strategies of ipfmc accept single omics data and return the single omics representation and pathway ranking. Users can use similarity network fusion(SNF) to fuse each single omics representation to obtain multi-omics representation.\r\n\r\n(1) **separate.ipfmc_discretize()**: Implementation of strategy 1 for IPFMC.\r\n\r\n(2) **separate.ipfmc_average()**: Implementation of strategy 2 for IPFMC.\r\n\r\n(3) **separate.spec_cluster()**: Generate spectral clustering results in cluster labels with sample indexes.\r\n\r\n(4) **separate.suggest_k()**: Gives a suggested number of clusters according to the silhouette coefficient.\r\n\r\n### \u2018analysis\u2019 module\r\n\r\nThis module provides some functions for pathway data processing and downstream analysis.\r\n\r\n## Simple Test Case\r\n\r\nThis section provides sample code for multi-omics data integration clustering and biological interpretation using the package, and you can change some of the variables to apply it to your own dataset.\r\n\r\n### Import neccessary packages\r\n\r\n```python\r\nimport pandas as pd\r\nimport numpy as np\r\nfrom snf import snf\r\nfrom IPFMC import direct\r\nfrom IPFMC import separate\r\n```\r\n\r\n### Input datasets\r\n\r\n1. Omics data\r\n\r\n   All standard input omics data should be a csv file with one feature in each row and one sample in each column. The first row should be the sample name and the first column should be the gene name. (For other omics data besides miRNA and mRNA expression data, such as methylation, copy number variation, etc., the features should be mapped to genes and converted to gene names before being used as IPFMC input data).\r\n\r\n2. Pathway data\r\n\r\n   In addition to omics data, it is also necessary to input the gene information data contained in the general pathway. If your omics data includes miRNA omics, you also need to input the corresponding relationship data between miRNA and pathway.\r\n\r\nCode is as follows:\r\n\r\n```python\r\n# Filepath of the omics data, \u2018LUAD\u2019 is the folder contains omics datas of LUAD cancer\r\nOmic_dir = './Omics/LUAD'  \r\n# Filepath of the pathway index\r\nBP_dir = './Pathways/Pathway_Index.csv'\r\n# Filepath of the miRNA pathway index\r\nmirBP_dir = './Pathways/miRNA_Pathway_Index.csv'\r\ndatatypes = ['mRNA','Methy','CNV','miRNA']  # The type of data to be used in the experiment\r\nomic_list = []  # A list for storing multiple omics data\r\nBP_data = pd.read_csv(BP_dir,index_col=0)  # The pandas package is used to pass in the pathway data\r\nmirBP_data = pd.read_csv(mirBP_dir,index_col=0)  # Pass in the pathway-mirna relationship data\r\nfor datatype in datatypes:\r\n    '''\r\n    We named the omics data <cancer name>_<data type>.csv, for example, LUAD_mRNA.csv\r\n    You can change it according to your habits\r\n    '''\r\n    omicdata = pd.read_csv(f'{Omic_dir}/LUAD_{datatype}.csv',index_col=0)\r\n    omic_list.append(omicdata)\r\n```\r\n\r\nThe file structure used in the sample code is as follows:\r\n\r\n```bash\r\n.\r\n\u251c\u2500\u2500 Omics\r\n\u2502   \u2514\u2500\u2500 LUAD\r\n\u2502       \u251c\u2500\u2500 LUAD_mRNA.csv\r\n\u2502       \u251c\u2500\u2500 LUAD_miRNA.csv\r\n\u2502       \u251c\u2500\u2500 LUAD_Methy.csv\r\n\u2502       \u2514\u2500\u2500 LUAD_CNV.csv\r\n\u2514\u2500\u2500 Pathways\r\n    \u251c\u2500\u2500 Pathway_Index.csv\r\n    \u2514\u2500\u2500 miRNA_Pathway_Index.csv\r\n\u2514\u2500\u2500 script.py\r\n```\r\n\r\nWhere script.py is the python script currently in use. You can also personalize the data by changing the path of each file, but the key is to use the read_csv provided by pandas and make sure that the row index of omics data is the feature name, the column index is the sample name, and the row index of pathway data is the pathway name.\r\n\r\n### Acquisition of single/multi-omics data representation\r\n\r\nAfter obtaining all the necessary data, we can input them into IPFMC for multi-omics data integration. This will produce the multi-omics integrated representation and the ranking of the filtered retained pathway for each omics. In this step, IPFMC offers two modalities, each with two strategies. We use strategy 1 of IPFMC as an example to illustrate its usage. We showed two approaches (direct integration and separate computation) to obtain the multi-omics representation.\r\n\r\n#### directly input the multi-omics data list and obtain the multi-omics representation \r\n\r\nYou can choose to use a direct multi-omics integration strategy. This requires importing the direct module. Here's the code (The \u2018omic_list\u2019, \u2018BP_data\u2019 and \u2018mirBP_data\u2019 variable obtained earlier are used in this step):\r\n\r\n```python\r\nrepresent, pathways = direct.ipfmc_discretize(omic_list,BP_data,mirna=True,mirtarinfo=mirBP_data)\r\n\"\"\"\r\n\trepresent: Integrated representation of multi-omics data calculated by IPFMC\r\n\tpathways: The pathway ranking of each omics calculated by IPFMC (each omics has a path ranking), in the same order as the order of the omics in the input omic_list\r\n\"\"\"\r\n```\r\n\r\nDetailed Parameters of \u2018direct.ipfmc_discretize()\u2019 are listed below:\r\n\r\n```python\r\n\"\"\"\r\n    :param datasets: List of your multi-omics datasets, each element of the list should be a pandas dataframe.\r\n    :param pathwayinfo: Pathways and their containing genetic information.\r\n    :param k: The number of initial points of kmeans clustering\r\n    :param fusetime: Number of pathway screening and fusion performed\r\n    :param proportion: The proportion of pathways that were retained was fused at each iteration\r\n    :param snfk: Number of SNF neighborhoods when multiple data sets are fused\r\n    :param seed: Random number seed, set to None if no seed is needed\r\n    :param mirtarinfo: miRNA target gene information, valid only if miRNA data is included in the dataset\r\n    :param mirna: Set to True if your dataset contains mirna data, and False otherwise\r\n    :return: Final representation of input datasets; a list of pathway rankings of each dataset.\r\n\"\"\"\r\n```\r\n\r\n**If your datasets contains miRNA expression data, please make sure the \u2018mirna\u2019 parameter is set to \u2018True\u2019, and the miRNA expression data must be the last element of \u2018omic_list\u2019 variable, \u2018mirtarinfo\u2019 must be set to the variable that contains miRNA-pathway relationship data.**\r\n\r\n#### Compute the representation of each single omics separately\r\n\r\nYou can also choose to obtain single omics representation for each omics and then using SNF integration. \r\n\r\n```python\r\nrepresents = []\r\npathways_list = []\r\n# Only the first three data sets are processed here, and the last data set is miRNA, which needs to be processed separately\r\nfor i in range(3):  \r\n    represent, pathways = separate.ipfmc_discretize(omic_list[i], BP_data)\r\n    represents.append(np.array(represent))\r\n    print(represent)\r\n    pathways_list.append(pathways)\r\n\r\nrepresent, pathways = separate.ipfmc_discretize(omic_list[3], mirBP_data)  # Here processes miRNA dataset\r\nrepresents.append(np.array(represent))\r\npathways_list.append(pathways)\r\nrepresent_final = snf(represents, K=15)  # 'represent_final' is the final multi-omics representation\r\n```\r\n\r\nWe recommend using this approach because computing the representation of each single-omics separately is more flexible in performing downstream tasks and has fewer parameters to consider.\r\n\r\n### Clustering using multi-omics representation\r\n\r\nYou can directly select number of clusters and use the code below to obtain cluster labels:\r\n\r\n```python\r\nlabels = separate.spec_cluster(omic_list[0],fusion_matrix=represent_final,k=4)  # Here we set number of clusters to 4\r\n# 'labels' is the cluster labels of input multi-omics datasets.\r\n```\r\n\r\n(The first parameter can be any element in \u2018omic_list\u2019. It is used to retrieve the sample name)\r\n\r\nOr you can use the function we provide to recommend a suggested number of clusters.\r\n\r\n```python\r\nK = separate.suggest_k(represent_final)  # input the final representation, and this function will give a suggested cluster\r\nlabels = separate.spec_cluster(omic_list[0],fusion_matrix=represent_final,k=K)\r\n```\r\n\r\nThen you can use the obtained cluster labels to perform all kinds of analysis.\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool for interpretable multi-omics integrated clustering.",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/BioLemon/IPFMC"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa21771e0cf2fac205d7aab152e5c66c1e0230b354a8788484a5ffb3978716a1",
                "md5": "df2ce35c3f579fbe9d1098d79beeb27c",
                "sha256": "6bb05b3fa95cdf2d756d6f1449a385a14c67e7151e62e6b61288b57e898e7528"
            },
            "downloads": -1,
            "filename": "IPFMC-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "df2ce35c3f579fbe9d1098d79beeb27c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.0",
            "size": 16930,
            "upload_time": "2024-09-10T07:05:13",
            "upload_time_iso_8601": "2024-09-10T07:05:13.874869Z",
            "url": "https://files.pythonhosted.org/packages/aa/21/771e0cf2fac205d7aab152e5c66c1e0230b354a8788484a5ffb3978716a1/IPFMC-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6eb30e7ef344a95aefe242720abc71301a9c3bee5b7bde658790a8a74d82c1d9",
                "md5": "5e6dd6a6a2a0ebffa84bea00820f28eb",
                "sha256": "a2084c7cd1108f99c22b9ca68706a9f458ed33673472b04624d2067eb289bad0"
            },
            "downloads": -1,
            "filename": "IPFMC-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5e6dd6a6a2a0ebffa84bea00820f28eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 13836,
            "upload_time": "2024-09-10T07:05:15",
            "upload_time_iso_8601": "2024-09-10T07:05:15.772227Z",
            "url": "https://files.pythonhosted.org/packages/6e/b3/0e7ef344a95aefe242720abc71301a9c3bee5b7bde658790a8a74d82c1d9/IPFMC-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-10 07:05:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BioLemon",
    "github_project": "IPFMC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "ipfmc"
}
        
Elapsed time: 0.31376s