# KATLAS
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
<img alt="Katlas logo" width="600" caption="Katlas logo" src="https://github.com/sky1ove/katlas/raw/main/dataset/images/logo.png" id="logo"/>
<p><a target="_blank" href="https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/index.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<a href="https://pypi.org/project/python-katlas/"><img src="https://img.shields.io/pypi/v/python-katlas?link=https%3A%2F%2Fpypi.org%2Fproject%2Fpython-katlas%2F" alt="PyPI"></a></p>
KATLAS is a repository containing python tools to predict kinases given
a substrate sequence. It also contains datasets of kinase substrate
specificities and human phosphoproteomics.
***References***: Please cite the appropriate papers if KATLAS is
helpful to your research.
- KATLAS was described in the paper \[Computational Decoding of Human
Kinome Substrate Specificities and Functions\]
- The positional scanning peptide array (PSPA) data is from paper [An
atlas of substrate specificities for the human serine/threonine
kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper
[The intrinsic substrate specificity of the human tyrosine
kinome](https://www.nature.com/articles/s41586-024-07407-y)
- The kinase substrate datasets used for generating PSSMs are derived
from
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)
and paper [Large-scale Discovery of Substrates of the Human
Kinome](https://www.nature.com/articles/s41598-019-46385-4)
- Phosphorylation sites are acquired from
[PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),
paper [The functional landscape of the human
phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
[LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
## Reproduce datasets & figures
Follow the instructions in katlas_raw:
https://github.com/sky1ove/katlas_raw
Need to install the package via: `pip install 'python-katlas[dev]' -U`
## Web applications
Users can now run the analysis directly on the web without needing to
code.
Check out our latest web platform:
[kinase-atlas.com](https://kinase-atlas.com/)
## Tutorials on Colab
- 1. [Substrate scoring on a single substrate
sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
- 2. [High throughput substrate scoring on phosphoproteomics
dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
- 3. [Kinase enrichment analysis for AKT
inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)
## Install
pip install python-katlas -U
To use other modules besides the core, do
`pip install 'python-katlas[dev]' -U`
## Import
``` python
from katlas.core import *
```
# Quick start
We provide two methods to calculate substrate sequence:
- Computational Data-Driven Method (CDDM)
- Positional Scanning Peptide Array (PSPA)
We consider the input in two formats:
- a single input string (phosphorylation site)
- a csv/dataframe that contains a column of phosphorylation sites
For input sequences, we also consider it in two conditions:
- all capital
- contains lower cases indicating phosphorylation status
## Single sequence as input
### CDDM, all capital
``` python
predict_kinase('AAAAAAASGGAGSDN',**param_CDDM_upper)
```
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2G', '3A', '4G', '5S', '6D', '7N']
kinase
PAK6 2.032
ULK3 2.032
PRKX 2.012
ATR 1.991
PRKD1 1.988
...
DDR2 0.928
EPHA4 0.928
TEK 0.921
KIT 0.915
FGFR3 0.910
Length: 289, dtype: float64
### CDDM, with lower case indicating phosphorylation status
``` python
predict_kinase('AAAAAAAsGGAGsDN',**param_CDDM)
```
considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']
kinase
ULK3 1.987
PAK6 1.981
PRKD1 1.946
PIM3 1.944
PRKX 1.939
...
EPHA4 0.905
EGFR 0.900
TEK 0.898
FGFR3 0.894
KIT 0.882
Length: 289, dtype: float64
### PSPA, with lower case indicating phosphorylation status
``` python
predict_kinase('AEEKEyHsEGG',**param_PSPA).head()
```
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']
kinase
EGFR 4.013
FGFR4 3.568
ZAP70 3.412
CSK 3.241
SYK 3.209
dtype: float64
### To replicate the results from The Kinase Library (PSPA)
Check this link: [The Kinase
Library](https://kinase-library.phosphosite.org/site?s=AEEKEy*HsEGG&pp=false&scp=true),
and use log2(score) to rank, it shows same results with the below (with
slight differences due to rounding).
``` python
predict_kinase('AEEKEyHSEGG',**param_PSPA).head(10)
```
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']
kinase
EGFR 3.181
FGFR4 2.390
CSK 2.308
ZAP70 2.068
SYK 1.998
PDHK1_TYR 1.922
RET 1.732
MATK 1.688
FLT1 1.627
BMPR2_TYR 1.456
dtype: float64
- So far [The kinase Library](https://kinase-library.phosphosite.org)
considers all ***tyr sequences*** in capital regardless of whether or
not they contain lower cases, which is a small bug and should be fixed
soon.
- Kinase with “\_TYR” indicates it is a dual specificity kinase tested
in PSPA tyrosine setting, which has not been included in
kinase-library yet.
We can also calculate the percentile score using a referenced score
sheet.
``` python
# Percentile reference sheet
y_pct = Data.get_pspa_tyr_pct()
get_pct('AEEKEyHSEGG',**param_PSPA_y, pct_ref = y_pct)
```
considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']
| | log2(score) | percentile |
|-------|-------------|------------|
| EGFR | 3.181 | 96.787423 |
| FGFR4 | 2.390 | 94.012303 |
| CSK | 2.308 | 95.201640 |
| ZAP70 | 2.068 | 88.380041 |
| SYK | 1.998 | 85.522898 |
| ... | ... | ... |
| EPHA1 | -3.501 | 12.139440 |
| FES | -3.699 | 21.216678 |
| TNK1 | -4.269 | 5.481887 |
| TNK2 | -4.577 | 2.050581 |
| DDR2 | -4.920 | 10.403281 |
<p>93 rows × 2 columns</p>
## High-throughput substrate scoring on a dataframe
### Load your csv
``` python
# df = pd.read_csv('your_file.csv')
```
### Load a demo df
``` python
# Load a demo df with phosphorylation sites
df = Data.get_ochoa_site().head()
df.iloc[:,-2:]
```
| | site_seq | gene_site |
|-----|-----------------|----------------|
| 0 | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
| 1 | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
| 2 | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
| 3 | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |
| 4 | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |
### Set the column name and param to calculate
Here we choose param_CDDM_upper, as the sequences in the demo df are all
in capital. You can also choose other params.
``` python
results = predict_kinase_df(df,'site_seq',**param_CDDM_upper)
results
```
input dataframe has a length 5
Preprocessing
Finish preprocessing
Calculating position: [-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]
100%|██████████| 289/289 [00:05<00:00, 56.64it/s]
| kinase | SRC | EPHA3 | FES | NTRK3 | ALK | EPHA8 | ABL1 | FLT3 | EPHB2 | FYN | ... | MEK5 | PKN2 | MAP2K7 | MRCKB | HIPK3 | CDK8 | BUB1 | MEKK3 | MAP2K3 | GRK1 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0 | 0.991760 | 1.093712 | 1.051750 | 1.067134 | 1.013682 | 1.097519 | 0.966379 | 0.982464 | 1.054986 | 1.055910 | ... | 1.314859 | 1.635470 | 1.652251 | 1.622672 | 1.362973 | 1.797155 | 1.305198 | 1.423618 | 1.504941 | 1.872020 |
| 1 | 0.910262 | 0.953743 | 0.942327 | 0.950601 | 0.872694 | 0.932586 | 0.846899 | 0.826662 | 0.915020 | 0.942713 | ... | 1.175454 | 1.402006 | 1.430392 | 1.215826 | 1.569373 | 1.716455 | 1.270999 | 1.195081 | 1.223082 | 1.793290 |
| 2 | 0.849866 | 0.899910 | 0.848895 | 0.879652 | 0.874959 | 0.899414 | 0.839200 | 0.836523 | 0.858040 | 0.867269 | ... | 1.408003 | 1.813739 | 1.454786 | 1.084522 | 1.352556 | 1.524663 | 1.377839 | 1.173830 | 1.305691 | 1.811849 |
| 3 | 0.803826 | 0.836527 | 0.800759 | 0.894570 | 0.839905 | 0.781001 | 0.847847 | 0.807040 | 0.805877 | 0.801402 | ... | 1.110307 | 1.703637 | 1.795092 | 1.469653 | 1.549936 | 1.491344 | 1.446922 | 1.055452 | 1.534895 | 1.741090 |
| 4 | 0.822793 | 0.796532 | 0.792343 | 0.839882 | 0.810122 | 0.781420 | 0.805251 | 0.795022 | 0.790380 | 0.864538 | ... | 1.062617 | 1.357689 | 1.485945 | 1.249266 | 1.456078 | 1.422782 | 1.376471 | 1.089629 | 1.121309 | 1.697524 |
<p>5 rows × 289 columns</p>
## Phosphorylation sites
Besides calculating sequence scores, we also provides multiple datasets
of phosphorylation sites.
### CPTAC pan-cancer phosphoproteomics
``` python
df = Data.get_cptac_ensembl_site()
df.head(3)
```
| | gene | site | site_seq | protein | gene_name | gene_site | protein_site |
|----|----|----|----|----|----|----|----|
| 0 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR | M6PR_S267 | ENSP00000000412_S267 |
| 1 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR | M6PR_S267 | ENSP00000440488_S267 |
| 2 | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28 | USP28_S1053 | ENSP00000003302_S1053 |
### [Ochoa et al. human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)
``` python
df = Data.get_ochoa_site()
df.head(3)
```
| | uniprot | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name | gene | Sequence | is_valid | site_seq | gene_site |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0 | A0A075B6Q4 | 24 | S | True | 0.91 | 6.839384 | True | A0A075B6Q4_24 | 0.149257 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
| 1 | A0A075B6Q4 | 35 | S | True | 0.87 | 9.192622 | False | A0A075B6Q4_35 | 0.136966 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
| 2 | A0A075B6Q4 | 57 | S | False | 0.28 | 0.818834 | False | A0A075B6Q4_57 | 0.125364 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
### PhosphoSitePlus human phosphorylation site
``` python
df = Data.get_psp_human_site()
df.head(3)
```
| | gene | protein | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0 | YWHAB | 14-3-3 beta | P31946 | T2 | YWHAB_T2 | 15718712 | human | \_\_\_\_\_\_MtMDksELV | NaN | 3.0 | 1.0 | None | 0 |
| 1 | YWHAB | 14-3-3 beta | P31946 | S6 | YWHAB_S6 | 15718709 | human | \_\_MtMDksELVQkAk | NaN | 8.0 | NaN | None | 0 |
| 2 | YWHAB | 14-3-3 beta | P31946 | Y21 | YWHAB_Y21 | 3426383 | human | LAEQAERyDDMAAAM | NaN | NaN | 4.0 | None | 0 |
### Unique sites of combined Ochoa & PhosphoSitePlus
``` python
df = Data.get_combine_site_psp_ochoa()
df.head(3)
```
| | site_seq | gene_site | gene | source | num_site | acceptor | -7 | -6 | -5 | -4 | ... | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| 0 | AAAAAAASGGAGSDN | PBX1_S136 | PBX1 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | A | G | S | D | N |
| 1 | AAAAAAASGGGVSPD | PBX2_S146 | PBX2 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | G | V | S | P | D |
| 2 | AAAAAAASGVTTGKP | CLASR_S349 | CLASR | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | V | T | T | G | K | P |
<p>3 rows × 21 columns</p>
## Phosphorylation site sequence example
***All capital - 15 length (-7 to +7)***
- QSEEEKLSPSPTTED
- TLQHVPDYRQNVYIP
- TMGLSARyGPQFTLQ
***All capital - 10 length (-5 to +4)***
- SRDPHYQDPH
- LDNPDyQQDF
- AAAAAsGGAG
***With lowercase - (-7 to +7)***
- QsEEEKLsPsPTTED
- TLQHVPDyRQNVYIP
- TMGLsARyGPQFTLQ
***With lowercase - (-5 to +4)***
- sRDPHyQDPH
- LDNPDyQQDF
- AAAAAsGGAG
Raw data
{
"_id": null,
"home_page": "https://github.com/sky1ove/katlas",
"name": "python-katlas",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "nbdev jupyter notebook python",
"author": "lily",
"author_email": "lcai888666@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ab/7d/a88c44c7bcf6f42f7b0c367722ee25cd428a4bd271a4ef9516004ef13cd4/python_katlas-0.1.4.tar.gz",
"platform": null,
"description": "# KATLAS\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n<img alt=\"Katlas logo\" width=\"600\" caption=\"Katlas logo\" src=\"https://github.com/sky1ove/katlas/raw/main/dataset/images/logo.png\" id=\"logo\"/>\n\n<p><a target=\"_blank\" href=\"https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/index.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n<a href=\"https://pypi.org/project/python-katlas/\"><img src=\"https://img.shields.io/pypi/v/python-katlas?link=https%3A%2F%2Fpypi.org%2Fproject%2Fpython-katlas%2F\" alt=\"PyPI\"></a></p>\n\nKATLAS is a repository containing python tools to predict kinases given\na substrate sequence. It also contains datasets of kinase substrate\nspecificities and human phosphoproteomics.\n\n***References***: Please cite the appropriate papers if KATLAS is\nhelpful to your research.\n\n- KATLAS was described in the paper \\[Computational Decoding of Human\n Kinome Substrate Specificities and Functions\\]\n\n- The positional scanning peptide array (PSPA) data is from paper [An\n atlas of substrate specificities for the human serine/threonine\n kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper\n [The intrinsic substrate specificity of the human tyrosine\n kinome](https://www.nature.com/articles/s41586-024-07407-y)\n\n- The kinase substrate datasets used for generating PSSMs are derived\n from\n [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)\n and paper [Large-scale Discovery of Substrates of the Human\n Kinome](https://www.nature.com/articles/s41598-019-46385-4)\n\n- Phosphorylation sites are acquired from\n [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),\n paper [The functional landscape of the human\n phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),\n and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /\n [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)\n\n## Reproduce datasets & figures\n\nFollow the instructions in katlas_raw:\nhttps://github.com/sky1ove/katlas_raw\n\nNeed to install the package via: `pip install 'python-katlas[dev]' -U`\n\n## Web applications\n\nUsers can now run the analysis directly on the web without needing to\ncode.\n\nCheck out our latest web platform:\n[kinase-atlas.com](https://kinase-atlas.com/)\n\n## Tutorials on Colab\n\n- 1. [Substrate scoring on a single substrate\n sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)\n- 2. [High throughput substrate scoring on phosphoproteomics\n dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)\n- 3. [Kinase enrichment analysis for AKT\n inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)\n\n## Install\n\n pip install python-katlas -U\n\nTo use other modules besides the core, do\n`pip install 'python-katlas[dev]' -U`\n\n## Import\n\n``` python\nfrom katlas.core import *\n```\n\n# Quick start\n\nWe provide two methods to calculate substrate sequence:\n\n- Computational Data-Driven Method (CDDM)\n- Positional Scanning Peptide Array (PSPA)\n\nWe consider the input in two formats:\n\n- a single input string (phosphorylation site)\n- a csv/dataframe that contains a column of phosphorylation sites\n\nFor input sequences, we also consider it in two conditions:\n\n- all capital\n- contains lower cases indicating phosphorylation status\n\n## Single sequence as input\n\n### CDDM, all capital\n\n``` python\npredict_kinase('AAAAAAASGGAGSDN',**param_CDDM_upper)\n```\n\n considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2G', '3A', '4G', '5S', '6D', '7N']\n\n kinase\n PAK6 2.032\n ULK3 2.032\n PRKX 2.012\n ATR 1.991\n PRKD1 1.988\n ... \n DDR2 0.928\n EPHA4 0.928\n TEK 0.921\n KIT 0.915\n FGFR3 0.910\n Length: 289, dtype: float64\n\n### CDDM, with lower case indicating phosphorylation status\n\n``` python\npredict_kinase('AAAAAAAsGGAGsDN',**param_CDDM)\n```\n\n considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']\n\n kinase\n ULK3 1.987\n PAK6 1.981\n PRKD1 1.946\n PIM3 1.944\n PRKX 1.939\n ... \n EPHA4 0.905\n EGFR 0.900\n TEK 0.898\n FGFR3 0.894\n KIT 0.882\n Length: 289, dtype: float64\n\n### PSPA, with lower case indicating phosphorylation status\n\n``` python\npredict_kinase('AEEKEyHsEGG',**param_PSPA).head()\n```\n\n considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']\n\n kinase\n EGFR 4.013\n FGFR4 3.568\n ZAP70 3.412\n CSK 3.241\n SYK 3.209\n dtype: float64\n\n### To replicate the results from The Kinase Library (PSPA)\n\nCheck this link: [The Kinase\nLibrary](https://kinase-library.phosphosite.org/site?s=AEEKEy*HsEGG&pp=false&scp=true),\nand use log2(score) to rank, it shows same results with the below (with\nslight differences due to rounding).\n\n``` python\npredict_kinase('AEEKEyHSEGG',**param_PSPA).head(10)\n```\n\n considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']\n\n kinase\n EGFR 3.181\n FGFR4 2.390\n CSK 2.308\n ZAP70 2.068\n SYK 1.998\n PDHK1_TYR 1.922\n RET 1.732\n MATK 1.688\n FLT1 1.627\n BMPR2_TYR 1.456\n dtype: float64\n\n- So far [The kinase Library](https://kinase-library.phosphosite.org)\n considers all ***tyr sequences*** in capital regardless of whether or\n not they contain lower cases, which is a small bug and should be fixed\n soon.\n- Kinase with \u201c\\_TYR\u201d indicates it is a dual specificity kinase tested\n in PSPA tyrosine setting, which has not been included in\n kinase-library yet.\n\nWe can also calculate the percentile score using a referenced score\nsheet.\n\n``` python\n# Percentile reference sheet\ny_pct = Data.get_pspa_tyr_pct()\n\nget_pct('AEEKEyHSEGG',**param_PSPA_y, pct_ref = y_pct)\n```\n\n considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']\n\n\n\n| | log2(score) | percentile |\n|-------|-------------|------------|\n| EGFR | 3.181 | 96.787423 |\n| FGFR4 | 2.390 | 94.012303 |\n| CSK | 2.308 | 95.201640 |\n| ZAP70 | 2.068 | 88.380041 |\n| SYK | 1.998 | 85.522898 |\n| ... | ... | ... |\n| EPHA1 | -3.501 | 12.139440 |\n| FES | -3.699 | 21.216678 |\n| TNK1 | -4.269 | 5.481887 |\n| TNK2 | -4.577 | 2.050581 |\n| DDR2 | -4.920 | 10.403281 |\n\n<p>93 rows \u00d7 2 columns</p>\n\n\n## High-throughput substrate scoring on a dataframe\n\n### Load your csv\n\n``` python\n# df = pd.read_csv('your_file.csv')\n```\n\n### Load a demo df\n\n``` python\n# Load a demo df with phosphorylation sites\ndf = Data.get_ochoa_site().head()\ndf.iloc[:,-2:]\n```\n\n\n\n| | site_seq | gene_site |\n|-----|-----------------|----------------|\n| 0 | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |\n| 1 | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |\n| 2 | IADHLFWSEETKSRF | A0A075B6Q4_S57 |\n| 3 | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |\n| 4 | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |\n\n\n\n### Set the column name and param to calculate\n\nHere we choose param_CDDM_upper, as the sequences in the demo df are all\nin capital. You can also choose other params.\n\n``` python\nresults = predict_kinase_df(df,'site_seq',**param_CDDM_upper)\nresults\n```\n\n input dataframe has a length 5\n Preprocessing\n Finish preprocessing\n Calculating position: [-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]\n\n 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 289/289 [00:05<00:00, 56.64it/s]\n\n\n\n| kinase | SRC | EPHA3 | FES | NTRK3 | ALK | EPHA8 | ABL1 | FLT3 | EPHB2 | FYN | ... | MEK5 | PKN2 | MAP2K7 | MRCKB | HIPK3 | CDK8 | BUB1 | MEKK3 | MAP2K3 | GRK1 |\n|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|\n| 0 | 0.991760 | 1.093712 | 1.051750 | 1.067134 | 1.013682 | 1.097519 | 0.966379 | 0.982464 | 1.054986 | 1.055910 | ... | 1.314859 | 1.635470 | 1.652251 | 1.622672 | 1.362973 | 1.797155 | 1.305198 | 1.423618 | 1.504941 | 1.872020 |\n| 1 | 0.910262 | 0.953743 | 0.942327 | 0.950601 | 0.872694 | 0.932586 | 0.846899 | 0.826662 | 0.915020 | 0.942713 | ... | 1.175454 | 1.402006 | 1.430392 | 1.215826 | 1.569373 | 1.716455 | 1.270999 | 1.195081 | 1.223082 | 1.793290 |\n| 2 | 0.849866 | 0.899910 | 0.848895 | 0.879652 | 0.874959 | 0.899414 | 0.839200 | 0.836523 | 0.858040 | 0.867269 | ... | 1.408003 | 1.813739 | 1.454786 | 1.084522 | 1.352556 | 1.524663 | 1.377839 | 1.173830 | 1.305691 | 1.811849 |\n| 3 | 0.803826 | 0.836527 | 0.800759 | 0.894570 | 0.839905 | 0.781001 | 0.847847 | 0.807040 | 0.805877 | 0.801402 | ... | 1.110307 | 1.703637 | 1.795092 | 1.469653 | 1.549936 | 1.491344 | 1.446922 | 1.055452 | 1.534895 | 1.741090 |\n| 4 | 0.822793 | 0.796532 | 0.792343 | 0.839882 | 0.810122 | 0.781420 | 0.805251 | 0.795022 | 0.790380 | 0.864538 | ... | 1.062617 | 1.357689 | 1.485945 | 1.249266 | 1.456078 | 1.422782 | 1.376471 | 1.089629 | 1.121309 | 1.697524 |\n\n<p>5 rows \u00d7 289 columns</p>\n\n\n## Phosphorylation sites\n\nBesides calculating sequence scores, we also provides multiple datasets\nof phosphorylation sites.\n\n### CPTAC pan-cancer phosphoproteomics\n\n``` python\ndf = Data.get_cptac_ensembl_site()\ndf.head(3)\n```\n\n\n\n| | gene | site | site_seq | protein | gene_name | gene_site | protein_site |\n|----|----|----|----|----|----|----|----|\n| 0 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR | M6PR_S267 | ENSP00000000412_S267 |\n| 1 | ENSG00000003056.8 | S267 | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR | M6PR_S267 | ENSP00000440488_S267 |\n| 2 | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28 | USP28_S1053 | ENSP00000003302_S1053 |\n\n\n\n### [Ochoa et al.\u00a0human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)\n\n``` python\ndf = Data.get_ochoa_site()\ndf.head(3)\n```\n\n\n\n| | uniprot | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name | gene | Sequence | is_valid | site_seq | gene_site |\n|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|\n| 0 | A0A075B6Q4 | 24 | S | True | 0.91 | 6.839384 | True | A0A075B6Q4_24 | 0.149257 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |\n| 1 | A0A075B6Q4 | 35 | S | True | 0.87 | 9.192622 | False | A0A075B6Q4_35 | 0.136966 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |\n| 2 | A0A075B6Q4 | 57 | S | False | 0.28 | 0.818834 | False | A0A075B6Q4_57 | 0.125364 | A0A075B6Q4 | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True | IADHLFWSEETKSRF | A0A075B6Q4_S57 |\n\n\n\n### PhosphoSitePlus human phosphorylation site\n\n``` python\ndf = Data.get_psp_human_site()\ndf.head(3)\n```\n\n\n\n| | gene | protein | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |\n|----|----|----|----|----|----|----|----|----|----|----|----|----|----|\n| 0 | YWHAB | 14-3-3 beta | P31946 | T2 | YWHAB_T2 | 15718712 | human | \\_\\_\\_\\_\\_\\_MtMDksELV | NaN | 3.0 | 1.0 | None | 0 |\n| 1 | YWHAB | 14-3-3 beta | P31946 | S6 | YWHAB_S6 | 15718709 | human | \\_\\_MtMDksELVQkAk | NaN | 8.0 | NaN | None | 0 |\n| 2 | YWHAB | 14-3-3 beta | P31946 | Y21 | YWHAB_Y21 | 3426383 | human | LAEQAERyDDMAAAM | NaN | NaN | 4.0 | None | 0 |\n\n\n\n### Unique sites of combined Ochoa & PhosphoSitePlus\n\n``` python\ndf = Data.get_combine_site_psp_ochoa()\ndf.head(3)\n```\n\n\n\n| | site_seq | gene_site | gene | source | num_site | acceptor | -7 | -6 | -5 | -4 | ... | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |\n|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|\n| 0 | AAAAAAASGGAGSDN | PBX1_S136 | PBX1 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | A | G | S | D | N |\n| 1 | AAAAAAASGGGVSPD | PBX2_S146 | PBX2 | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | G | G | V | S | P | D |\n| 2 | AAAAAAASGVTTGKP | CLASR_S349 | CLASR | ochoa | 1 | S | A | A | A | A | ... | A | A | S | G | V | T | T | G | K | P |\n\n<p>3 rows \u00d7 21 columns</p>\n\n\n## Phosphorylation site sequence example\n\n***All capital - 15 length (-7 to +7)***\n\n- QSEEEKLSPSPTTED\n- TLQHVPDYRQNVYIP\n- TMGLSARyGPQFTLQ\n\n***All capital - 10 length (-5 to +4)***\n\n- SRDPHYQDPH\n- LDNPDyQQDF\n- AAAAAsGGAG\n\n***With lowercase - (-7 to +7)***\n\n- QsEEEKLsPsPTTED\n- TLQHVPDyRQNVYIP\n- TMGLsARyGPQFTLQ\n\n***With lowercase - (-5 to +4)***\n\n- sRDPHyQDPH\n- LDNPDyQQDF\n- AAAAAsGGAG\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "tools for predicting kinome specificities",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/sky1ove/katlas"
},
"split_keywords": [
"nbdev",
"jupyter",
"notebook",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8276ed0f520a8d185ce457b179784494000812f6cd06a2699d42888b15f4b12f",
"md5": "ae6e71e0e784ecddb9ac2a4a6ebae781",
"sha256": "2462b7d78c4344fb683d146bbfe3167dd7524a8f556c7cf5edf405b2d8aee1a5"
},
"downloads": -1,
"filename": "python_katlas-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ae6e71e0e784ecddb9ac2a4a6ebae781",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 39215,
"upload_time": "2024-11-01T18:21:27",
"upload_time_iso_8601": "2024-11-01T18:21:27.504897Z",
"url": "https://files.pythonhosted.org/packages/82/76/ed0f520a8d185ce457b179784494000812f6cd06a2699d42888b15f4b12f/python_katlas-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ab7da88c44c7bcf6f42f7b0c367722ee25cd428a4bd271a4ef9516004ef13cd4",
"md5": "d03be869287d6af46d20b014230e03d6",
"sha256": "ba51b1967fe935937932fbb260bb039f6bb29fe874f25cabd5ff5a500a5ba496"
},
"downloads": -1,
"filename": "python_katlas-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "d03be869287d6af46d20b014230e03d6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 43279,
"upload_time": "2024-11-01T18:21:28",
"upload_time_iso_8601": "2024-11-01T18:21:28.789901Z",
"url": "https://files.pythonhosted.org/packages/ab/7d/a88c44c7bcf6f42f7b0c367722ee25cd428a4bd271a4ef9516004ef13cd4/python_katlas-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-01 18:21:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sky1ove",
"github_project": "katlas",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "python-katlas"
}