# ProteinNetworks
The library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the [`stringdb`](https://pypi.org/project/stringdb/) library. Some features require you to install [R](https://www.r-project.org/) to work (see [`EnrichmentAnalysis.prioretizingGO()`](#prioretizingGO))
### The module will contain 4 sets of tools:
* **Enrichment Analysis**
* **Protein networks Analysis**
* **Group comparing tools**
* **Visualization tools**
## Get Started
`pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3`
## Contents:
* [Enrichment Analysis](#EnrichmentAnalysis)
* module: [`ProteinNetworks.STRING_enrichment`](#STRING_enrichment)
* class: [`EnrichmentAnalysis`](#classEnrichmentAnalysis)
methods:
* [`EnrichmentAnalysis.create_subframe_by_names()`](#create)
* [`EnrichmentAnalysis.drop_duplicated_genes()`](#drop_duplicated_genes)
* [`EnrichmentAnalysis.get_category_terms()`](#get_category_terms)
* [`EnrichmentAnalysis.get_enrichment()`](#get_enrichment)
* [`EnrichmentAnalysis.get_genes_by_localization()`](#get_genes_by_localization)
* [`EnrichmentAnalysis.get_genes_of_term()`](#get_genes_of_term)
* [`EnrichmentAnalysis.get_mapped()`](#get_mapped)
* [`EnrichmentAnalysis.prioretizingGO()`](#prioretizingGO)
* [`EnrichmentAnalysis.proteins_participation_in_the_category()`](#proteins_participation_in_the_category)
* [`EnrichmentAnalysis.save_table()`](#save_table)
* [`EnrichmentAnalysis.show_category_terms()`](#show_category_terms)
* [`EnrichmentAnalysis.show_enrichest_terms_in_category()`](#show_enrichest_terms_in_category)
* [`EnrichmentAnalysis.show_enrichment_categories()`](#show_enrichment_categories)
_________________________
# <a name='EnrichmentAnalysis'></a> Enrichment Analysis
Contains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis
Look examples in [Colab Notebook](https://drive.google.com/file/d/1JlcrtDNwOVLuKmwDy4apfIpt7Mheu4cF/view?usp=sharing)
## <a name='STRING_enrichment'></a> ProteinNetworks.STRING_enrichment module
### <a name="classEnrichmentAnalysis"></a> *class* ProteinNetworks.STRING_enrichment.EnrichmentAnalysis *(data, enrichment=None, protein_id_type='UniProtID')*
Bases: `object`
EnrichmentAnalysis class.
* **Parameters:**
* **data:** Dataframe containing the protein ID for analysis. It must contain either a “Gene” or “UniProtID” column’
* **enrichment:** Dataframe containing the results of previous enrichment analysis
* **protein_id_type:** type of protein ID. Valid Types
#### <a name="create"></a>*static* create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')
function finds rows in original dataset and returns sub-dataframe including input names in selected column
* **Parameters:**
* **df** – target DataFrame
* **column** – the selected column in which names will be searched
* **names** – list of target names whose records need to be found in the table
* **add** – [‘first’, ‘last’, ‘all’] parameter of adding found rows.
‘first’ - add only the first entry
‘last’ - add only the last entry
‘all’ - add all entries
* **Returns:**
sub-dataframe including input names in selected column
#### <a name="drop_duplicated_genes"></a> drop_duplicated_genes(silent=False)
function for droppig dublicated genes
* **Parameters:**
* **subset:** (list) Only consider certain columns for identifying duplicates, by default use all columns.
return: df of dropped genes
#### <a name="get_category_terms"></a> get_category_terms(category: str, term_type: str = 'id')
function returns set of all terms in chosen category
* **Parameters:**
* **category:** Name of category
* **term_type:** ‘id’ or ‘description’.
> id - returns terms IDs of category (for example, GO terms)
>
> description - returns Description of IDs of category
* **Returns:**
set of terms
#### <a name="get_enrichment"></a> get_enrichment()
function performs enrichment analysis. Results store in self.enrichment
* **Returns:** None
#### <a name="get_genes_by_localization"></a> get_genes_by_localization(compartments: list, set_operation: str, save=False)
function for getting proteins localized in target compartments. You also can do common set operations
under compartments genes
> Example: *get_genes_by_localization([Nucleus, Cytosol], ‘union’)* - return proteins localized in Nucleus or Cytosol
* **Parameters:**
* **compartments:** list of compartments. **Will be attention**:
1. Capitalization of letters matters. Get available compartment names by calling *get_components_list()*.
2. Order of compartments matter if you want to get sets difference.
* **set_operation:** operation between sets. This means that the operations will be applied sequentially to all
sets from the compartments. *[**A**, **B**, **C**], 'intersection' **->** **A** and **B** and **C***
> For example:
>
> *get_genes_by_localization([‘Nucleus’, ‘Cytosol’], ‘difference’)* - return just nucleus proteins,
*get_genes_by_localization([‘Cytosol’, ‘Nucleus’], ‘union’)* - return cytosol and nucleus proteins.
*get_genes_by_localization([‘all’, ‘Nucleus’], ‘difference’)* - return all proteins except nucleus proteins.
#### <a name="get_genes_of_term"></a> get_genes_of_term(term: str)
function get genes from enrichment table by target term
* **Parameters:**
* **term**: target GO term from column ‘term’ in enrichment table
* **Returns:** list of genes associated with target term
#### <a name="get_mapped"></a> get_mapped(species=9606)
function makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis
* **Parameters:**
* **species:** ID of organism. For example, Human species=9606
* **Returns:** None
#### <a name="prioretizingGO"></a> prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')
function for prioretizing GO-terms using R script with [GOxploreR](https://cran.r-universe.dev/GOxploreR/doc/manual.html) package ([doi:10.1038/s41598-020-73326-3](https://www.nature.com/articles/s41598-020-73326-3))
See ‘RScript Prioretizing_GO.R’
work with R.4-3.x. Yoy need to add RScript in PATH
If you use this function in google-collab, you will have to install R-packages at the first launch.
This may take a long time (up to 20 minutes)
* **Parameters:**
* **terms** – list of GO-terms
* **organism** – name of target organism
* **domain** – name of domain in GO-graph. Available inputs: ‘BP’ - Biological Process
‘CC’ - Cellular Component
“MF” - Molecular Functions
* **Returns:**
list of Prioretized GO terms
#### <a name="proteins_participation_in_the_category"></a> proteins_participation_in_the_category(df, category, term_type='id', term_sep='\\n')
function check terms that proteins participated and make statistics table
* **Parameters:**
* **df:** target DataFrame
* **category:** Name of category
* **term_type:** ‘id’ or ‘description’.
> id - returns terms IDs of category (for example, GO terms)
>
> description - returns Description of IDs of category
* **term_sep:** terms connected with each protein will save in one cell. Choose separator beetwen terms
* **Returns:** None
#### <a name="save_table"></a> *static* save_table(table, name, saveformat='xlsx', index: bool = True)
function for saving DataFrame tables
* **Parameters:**
* **table**: DataFrame
* **name**: name of file
* **saveformat**: format of saving file: ‘xlsx’ or ‘csv’
* **index**: show indexes in saved table?
* **Returns:** None
#### <a name="show_category_terms"></a> show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')
function displays all terms and number of associated genes in category
* **Parameters:**
* **category:** Name of category. You can check available category by calling ‘show_enrichment_categories’ method
* **show:** “all” or integer number. Number of strings to display
* **sort_by:** [“genes”, “term”] - sort by number of genes (by descending) or term names (by ascending)
* **save:** Need to save? Choose True. By default, save in .xlsx format
* **savename:** work with save=True, name of file
* **saveformat:** format of saving file: ‘xlsx’ or ‘csv’
* **Returns:** None
#### <a name="show_enrichest_terms_in_category"></a> show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')
function shows top-%count of most enriched terms in %category
* **Parameters:**
* **category:** Name of category. You can check available category by calling ‘show_enrichment_categories’ method
* **count:** count of terms you need to show
* **sort_by:** you can sort target list by one of ‘fdr’, ‘p_value’, ‘number_of_genes’ parameters
* **save:** Need to save? Choose True. By default, save in .xlsx format
* **savename:** work with save=True, name of file
* **saveformat:** format of saving file: ‘xlsx’ or ‘csv’
* **Returns:** None
#### <a name="show_enrichment_categories"></a> show_enrichment_categories()
function shown available enrichment categories for current dataset
* **Returns:** None
Raw data
{
"_id": null,
"home_page": "https://github.com/skewer33/ProteinNetworks.git",
"name": "ProteinNetworks",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "proteins interactions PPI networks enrichment STRINGdb Bioilogical-Processes Molecular-Functions Cellular-Components Gene-Ontology",
"author": "Mokin Yakov",
"author_email": "mokinyakov@mail.ru",
"download_url": "https://files.pythonhosted.org/packages/29/b7/fa600e296171597abcf1743b63f3284460278c34091dae6056fe4d3c5919/proteinnetworks-0.1.4.tar.gz",
"platform": null,
"description": "# ProteinNetworks\r\n\r\nThe library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the [`stringdb`](https://pypi.org/project/stringdb/) library. Some features require you to install [R](https://www.r-project.org/) to work (see [`EnrichmentAnalysis.prioretizingGO()`](#prioretizingGO))\r\n\r\n### The module will contain 4 sets of tools:\r\n * **Enrichment Analysis** \r\n * **Protein networks Analysis**\r\n * **Group comparing tools**\r\n * **Visualization tools**\r\n\r\n## Get Started\r\n\r\n`pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3`\r\n\r\n## Contents:\r\n\r\n* [Enrichment Analysis](#EnrichmentAnalysis)\r\n\r\n * module: [`ProteinNetworks.STRING_enrichment`](#STRING_enrichment)\r\n * class: [`EnrichmentAnalysis`](#classEnrichmentAnalysis)\r\n \r\n methods:\r\n * [`EnrichmentAnalysis.create_subframe_by_names()`](#create)\r\n * [`EnrichmentAnalysis.drop_duplicated_genes()`](#drop_duplicated_genes)\r\n * [`EnrichmentAnalysis.get_category_terms()`](#get_category_terms)\r\n * [`EnrichmentAnalysis.get_enrichment()`](#get_enrichment)\r\n * [`EnrichmentAnalysis.get_genes_by_localization()`](#get_genes_by_localization)\r\n * [`EnrichmentAnalysis.get_genes_of_term()`](#get_genes_of_term)\r\n * [`EnrichmentAnalysis.get_mapped()`](#get_mapped)\r\n * [`EnrichmentAnalysis.prioretizingGO()`](#prioretizingGO)\r\n * [`EnrichmentAnalysis.proteins_participation_in_the_category()`](#proteins_participation_in_the_category)\r\n * [`EnrichmentAnalysis.save_table()`](#save_table)\r\n * [`EnrichmentAnalysis.show_category_terms()`](#show_category_terms)\r\n * [`EnrichmentAnalysis.show_enrichest_terms_in_category()`](#show_enrichest_terms_in_category)\r\n * [`EnrichmentAnalysis.show_enrichment_categories()`](#show_enrichment_categories)\r\n\r\n\r\n_________________________\r\n\r\n\r\n# <a name='EnrichmentAnalysis'></a> Enrichment Analysis\r\nContains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis\r\nLook examples in [Colab Notebook](https://drive.google.com/file/d/1JlcrtDNwOVLuKmwDy4apfIpt7Mheu4cF/view?usp=sharing)\r\n\r\n\r\n## <a name='STRING_enrichment'></a> ProteinNetworks.STRING_enrichment module\r\n\r\n\r\n### <a name=\"classEnrichmentAnalysis\"></a> *class* ProteinNetworks.STRING_enrichment.EnrichmentAnalysis *(data, enrichment=None, protein_id_type='UniProtID')*\r\n\r\nBases: `object`\r\n\r\nEnrichmentAnalysis class.\r\n* **Parameters:**\r\n * **data:** Dataframe containing the protein ID for analysis. It must contain either a \u201cGene\u201d or \u201cUniProtID\u201d column\u2019\r\n * **enrichment:** Dataframe containing the results of previous enrichment analysis\r\n * **protein_id_type:** type of protein ID. Valid Types\r\n\r\n#### <a name=\"create\"></a>*static* create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')\r\n\r\nfunction finds rows in original dataset and returns sub-dataframe including input names in selected column\r\n\r\n* **Parameters:**\r\n * **df** \u2013 target DataFrame\r\n * **column** \u2013 the selected column in which names will be searched\r\n * **names** \u2013 list of target names whose records need to be found in the table\r\n * **add** \u2013 [\u2018first\u2019, \u2018last\u2019, \u2018all\u2019] parameter of adding found rows.\r\n \u2018first\u2019 - add only the first entry\r\n \u2018last\u2019 - add only the last entry\r\n \u2018all\u2019 - add all entries\r\n* **Returns:**\r\n sub-dataframe including input names in selected column\r\n\r\n#### <a name=\"drop_duplicated_genes\"></a> drop_duplicated_genes(silent=False)\r\n\r\nfunction for droppig dublicated genes\r\n* **Parameters:**\r\n * **subset:** (list) Only consider certain columns for identifying duplicates, by default use all columns.\r\nreturn: df of dropped genes\r\n\r\n#### <a name=\"get_category_terms\"></a> get_category_terms(category: str, term_type: str = 'id')\r\n\r\nfunction returns set of all terms in chosen category\r\n* **Parameters:**\r\n * **category:** Name of category\r\n * **term_type:** \u2018id\u2019 or \u2018description\u2019.\r\n\r\n > id - returns terms IDs of category (for example, GO terms) \r\n > \r\n > description - returns Description of IDs of category\r\n* **Returns:**\r\n set of terms\r\n\r\n#### <a name=\"get_enrichment\"></a> get_enrichment()\r\n\r\nfunction performs enrichment analysis. Results store in self.enrichment\r\n* **Returns:** None\r\n\r\n#### <a name=\"get_genes_by_localization\"></a> get_genes_by_localization(compartments: list, set_operation: str, save=False)\r\n\r\nfunction for getting proteins localized in target compartments. You also can do common set operations\r\nunder compartments genes\r\n> Example: *get_genes_by_localization([Nucleus, Cytosol], \u2018union\u2019)* - return proteins localized in Nucleus or Cytosol\r\n\r\n* **Parameters:**\r\n * **compartments:** list of compartments. **Will be attention**:\r\n 1. Capitalization of letters matters. Get available compartment names by calling *get_components_list()*.\r\n\r\n 2. Order of compartments matter if you want to get sets difference.\r\n * **set_operation:** operation between sets. This means that the operations will be applied sequentially to all\r\n sets from the compartments. *[**A**, **B**, **C**], 'intersection' **->** **A** and **B** and **C***\r\n\r\n > For example:\r\n > \r\n > *get_genes_by_localization([\u2018Nucleus\u2019, \u2018Cytosol\u2019], \u2018difference\u2019)* - return just nucleus proteins,\r\n *get_genes_by_localization([\u2018Cytosol\u2019, \u2018Nucleus\u2019], \u2018union\u2019)* - return cytosol and nucleus proteins.\r\n *get_genes_by_localization([\u2018all\u2019, \u2018Nucleus\u2019], \u2018difference\u2019)* - return all proteins except nucleus proteins.\r\n\r\n#### <a name=\"get_genes_of_term\"></a> get_genes_of_term(term: str)\r\n\r\nfunction get genes from enrichment table by target term\r\n* **Parameters:**\r\n * **term**: target GO term from column \u2018term\u2019 in enrichment table\r\n* **Returns:** list of genes associated with target term\r\n\r\n#### <a name=\"get_mapped\"></a> get_mapped(species=9606)\r\nfunction makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis\r\n* **Parameters:**\r\n * **species:** ID of organism. For example, Human species=9606\r\n* **Returns:** None\r\n\r\n#### <a name=\"prioretizingGO\"></a> prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')\r\n\r\nfunction for prioretizing GO-terms using R script with [GOxploreR](https://cran.r-universe.dev/GOxploreR/doc/manual.html) package ([doi:10.1038/s41598-020-73326-3](https://www.nature.com/articles/s41598-020-73326-3))\r\nSee \u2018RScript Prioretizing_GO.R\u2019\r\nwork with R.4-3.x. Yoy need to add RScript in PATH\r\n\r\nIf you use this function in google-collab, you will have to install R-packages at the first launch.\r\nThis may take a long time (up to 20 minutes)\r\n\r\n* **Parameters:**\r\n * **terms** \u2013 list of GO-terms\r\n * **organism** \u2013 name of target organism\r\n * **domain** \u2013 name of domain in GO-graph. Available inputs: \u2018BP\u2019 - Biological Process\r\n \u2018CC\u2019 - Cellular Component\r\n \u201cMF\u201d - Molecular Functions\r\n* **Returns:**\r\n list of Prioretized GO terms\r\n\r\n#### <a name=\"proteins_participation_in_the_category\"></a> proteins_participation_in_the_category(df, category, term_type='id', term_sep='\\\\n')\r\n\r\nfunction check terms that proteins participated and make statistics table\r\n* **Parameters:**\r\n * **df:** target DataFrame\r\n * **category:** Name of category\r\n * **term_type:** \u2018id\u2019 or \u2018description\u2019.\r\n\r\n > id - returns terms IDs of category (for example, GO terms) \r\n > \r\n > description - returns Description of IDs of category\r\n * **term_sep:** terms connected with each protein will save in one cell. Choose separator beetwen terms\r\n* **Returns:** None\r\n\r\n#### <a name=\"save_table\"></a> *static* save_table(table, name, saveformat='xlsx', index: bool = True)\r\n\r\nfunction for saving DataFrame tables\r\n* **Parameters:**\r\n * **table**: DataFrame\r\n * **name**: name of file\r\n * **saveformat**: format of saving file: \u2018xlsx\u2019 or \u2018csv\u2019\r\n * **index**: show indexes in saved table?\r\n* **Returns:** None\r\n\r\n#### <a name=\"show_category_terms\"></a> show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')\r\n\r\nfunction displays all terms and number of associated genes in category\r\n* **Parameters:**\r\n * **category:** Name of category. You can check available category by calling \u2018show_enrichment_categories\u2019 method\r\n * **show:** \u201call\u201d or integer number. Number of strings to display\r\n * **sort_by:** [\u201cgenes\u201d, \u201cterm\u201d] - sort by number of genes (by descending) or term names (by ascending)\r\n * **save:** Need to save? Choose True. By default, save in .xlsx format\r\n * **savename:** work with save=True, name of file\r\n * **saveformat:** format of saving file: \u2018xlsx\u2019 or \u2018csv\u2019\r\n* **Returns:** None\r\n\r\n#### <a name=\"show_enrichest_terms_in_category\"></a> show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')\r\n\r\nfunction shows top-%count of most enriched terms in %category\r\n* **Parameters:**\r\n * **category:** Name of category. You can check available category by calling \u2018show_enrichment_categories\u2019 method\r\n * **count:** count of terms you need to show\r\n * **sort_by:** you can sort target list by one of \u2018fdr\u2019, \u2018p_value\u2019, \u2018number_of_genes\u2019 parameters\r\n * **save:** Need to save? Choose True. By default, save in .xlsx format\r\n * **savename:** work with save=True, name of file\r\n * **saveformat:** format of saving file: \u2018xlsx\u2019 or \u2018csv\u2019\r\n* **Returns:** None\r\n\r\n#### <a name=\"show_enrichment_categories\"></a> show_enrichment_categories()\r\n\r\nfunction shown available enrichment categories for current dataset\r\n* **Returns:** None\r\n\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Module for working with protein networks (gene ontology, enrichment, protein-protein interactions, etc.)",
"version": "0.1.4",
"project_urls": {
"Documentation": "https://github.com/skewer33/ProteinNetworks/blob/main/README.md",
"Homepage": "https://github.com/skewer33/ProteinNetworks.git"
},
"split_keywords": [
"proteins",
"interactions",
"ppi",
"networks",
"enrichment",
"stringdb",
"bioilogical-processes",
"molecular-functions",
"cellular-components",
"gene-ontology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "63cefb9c05099631d16ea85c47274cf1867174f988313c7863266d742d9d17b1",
"md5": "96d0412d006c3667da67f2d362313742",
"sha256": "e0e8991a771a75714bc1fe985a2f4a946952e612048a0e0267fcc2ad970008b7"
},
"downloads": -1,
"filename": "ProteinNetworks-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "96d0412d006c3667da67f2d362313742",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 37619,
"upload_time": "2024-10-24T01:53:44",
"upload_time_iso_8601": "2024-10-24T01:53:44.377944Z",
"url": "https://files.pythonhosted.org/packages/63/ce/fb9c05099631d16ea85c47274cf1867174f988313c7863266d742d9d17b1/ProteinNetworks-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "29b7fa600e296171597abcf1743b63f3284460278c34091dae6056fe4d3c5919",
"md5": "a3ca2a00fa992a53e7754b08fc2e4ce0",
"sha256": "c71b172e12b6d66cbb33cd9a3bd32f6394edd79c5bf6f99499249b0b85851726"
},
"downloads": -1,
"filename": "proteinnetworks-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "a3ca2a00fa992a53e7754b08fc2e4ce0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 28648,
"upload_time": "2024-10-24T01:53:46",
"upload_time_iso_8601": "2024-10-24T01:53:46.481997Z",
"url": "https://files.pythonhosted.org/packages/29/b7/fa600e296171597abcf1743b63f3284460278c34091dae6056fe4d3c5919/proteinnetworks-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-24 01:53:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "skewer33",
"github_project": "ProteinNetworks",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "proteinnetworks"
}