python-katlas


Namepython-katlas JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/sky1ove/python-katlas
Summarytools for predicting kinome specificities
upload_time2024-09-27 04:30:15
maintainerNone
docs_urlNone
authorlily
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KATLAS


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

<a target="_blank" href="https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/index.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<img alt="Katlas logo" width="700" caption="Katlas logo" src="https://github.com/sky1ove/katlas/raw/main/dataset/images/logo.png" id="logo"/>

KATLAS is a repository containing python tools to predict kinases given
a substrate sequence. It also contains datasets of kinase substrate
specificities and human phosphoproteomics.

***References***: Please cite the appropriate papers if KATLAS is
helpful to your research.

- KATLAS was described in the paper \[Decoding Human Kinome
  Specificities through a Computational Data-Driven Approach
  (manuscript)\]

- The positional scanning peptide array (PSPA) data is from paper [An
  atlas of substrate specificities for the human serine/threonine
  kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper
  [The intrinsic substrate specificity of the human tyrosine
  kinome](https://www.nature.com/articles/s41586-024-07407-y)

- The kinase substrate datasets used for generating PSSMs are derived
  from
  [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)
  and paper [Large-scale Discovery of Substrates of the Human
  Kinome](https://www.nature.com/articles/s41598-019-46385-4)

- Phosphorylation sites are acquired from
  [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),
  paper [The functional landscape of the human
  phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),
  and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /
  [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)
  
  
## Web applications

Users can now run the analysis directly on the web without needing to code. 

Check out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)

## Tutorials on Colab

- 1.  [Substrate scoring on a single substrate
      sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)
- 2.  [High throughput substrate scoring on phosphoproteomics
      dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)
- 3.  [Kinase enrichment analysis for AKT
      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)


## Install

Install the latest version through pip

``` python
pip install python-katlas -Uq
```

## Import

``` python
from katlas.core import *
```

# Quick start

We provide two methods to calculate substrate sequence:

- Computational Data-Driven Method (CDDM)
- Positional Scanning Peptide Array (PSPA)

We consider the input in two formats:

- a single input string (phosphorylation site)
- a csv/dataframe that contains a column of phosphorylation sites

For input sequences, we also consider it in two conditions:

- all capital
- contains lower cases indicating phosphorylation status

## Single sequence as input

### CDDM, all capital

``` python
predict_kinase('AAAAAAASGGAGSDN',**param_CDDM_upper)
```

    considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2G', '3A', '4G', '5S', '6D', '7N']

    kinase
    PAK6     2.032
    ULK3     2.032
    PRKX     2.012
    ATR      1.991
    PRKD1    1.988
             ...  
    DDR2     0.928
    EPHA4    0.928
    TEK      0.921
    KIT      0.915
    FGFR3    0.910
    Length: 289, dtype: float64

### CDDM, with lower case indicating phosphorylation status

``` python
predict_kinase('AAAAAAAsGGAGsDN',**param_CDDM)
```

    considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']

    kinase
    ULK3     1.987
    PAK6     1.981
    PRKD1    1.946
    PIM3     1.944
    PRKX     1.939
             ...  
    EPHA4    0.905
    EGFR     0.900
    TEK      0.898
    FGFR3    0.894
    KIT      0.882
    Length: 289, dtype: float64

### PSPA, with lower case indicating phosphorylation status

``` python
predict_kinase('AEEKEyHsEGG',**param_PSPA).head()
```

    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']

    kinase
    EGFR     4.013
    FGFR4    3.568
    ZAP70    3.412
    CSK      3.241
    SYK      3.209
    dtype: float64

### To replicate the results from The Kinase Library (PSPA)

Check this link: [The Kinase
Library](https://kinase-library.phosphosite.org/site?s=AEEKEy*HsEGG&pp=false&scp=true),
and use log2(score) to rank, it shows same results with the below (with
slight differences due to rounding).

``` python
predict_kinase('AEEKEyHSEGG',**param_PSPA).head(10)
```

    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']

    kinase
    EGFR         3.181
    FGFR4        2.390
    CSK          2.308
    ZAP70        2.068
    SYK          1.998
    PDHK1_TYR    1.922
    RET          1.732
    MATK         1.688
    FLT1         1.627
    BMPR2_TYR    1.456
    dtype: float64

- So far [The kinase Library](https://kinase-library.phosphosite.org)
  considers all ***tyr sequences*** in capital regardless of whether or
  not they contain lower cases, which is a small bug and should be fixed
  soon.
- Kinase with “\_TYR” indicates it is a dual specificity kinase tested
  in PSPA tyrosine setting, which has not been included in
  kinase-library yet.

We can also calculate the percentile score using a referenced score
sheet.

``` python
# Percentile reference sheet
y_pct = Data.get_pspa_tyr_pct()

get_pct('AEEKEyHSEGG',**param_PSPA_y, pct_ref = y_pct)
```

    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']



|       | log2(score) | percentile |
|-------|-------------|------------|
| EGFR  | 3.181       | 96.787423  |
| FGFR4 | 2.390       | 94.012303  |
| CSK   | 2.308       | 95.201640  |
| ZAP70 | 2.068       | 88.380041  |
| SYK   | 1.998       | 85.522898  |
| ...   | ...         | ...        |
| EPHA1 | -3.501      | 12.139440  |
| FES   | -3.699      | 21.216678  |
| TNK1  | -4.269      | 5.481887   |
| TNK2  | -4.577      | 2.050581   |
| DDR2  | -4.920      | 10.403281  |



## High-throughput substrate scoring on a dataframe

### Load your csv

``` python
# df = pd.read_csv('your_file.csv')
```

### Load a demo df

``` python
# Load a demo df with phosphorylation sites
df = Data.get_ochoa_site().head()
df.iloc[:,-2:]
```


|     | site_seq        | gene_site      |
|-----|-----------------|----------------|
| 0   | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
| 1   | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
| 2   | IADHLFWSEETKSRF | A0A075B6Q4_S57 |
| 3   | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |
| 4   | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |



### Set the column name and param to calculate

Here we choose param_CDDM_upper, as the sequences in the demo df are all
in capital. You can also choose other params.

``` python
results = predict_kinase_df(df,'site_seq',**param_CDDM_upper)
results
```

    input dataframe has a length 5
    Preprocessing
    Finish preprocessing
    Calculating position: [-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]

    100%|██████████| 289/289 [00:05<00:00, 56.64it/s]



| kinase | SRC      | EPHA3    | FES      | NTRK3    | ALK      | EPHA8    | ABL1     | FLT3     | EPHB2    | FYN      | ... | MEK5     | PKN2     | MAP2K7   | MRCKB    | HIPK3    | CDK8     | BUB1     | MEKK3    | MAP2K3   | GRK1     |
|--------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|-----|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| 0      | 0.991760 | 1.093712 | 1.051750 | 1.067134 | 1.013682 | 1.097519 | 0.966379 | 0.982464 | 1.054986 | 1.055910 | ... | 1.314859 | 1.635470 | 1.652251 | 1.622672 | 1.362973 | 1.797155 | 1.305198 | 1.423618 | 1.504941 | 1.872020 |
| 1      | 0.910262 | 0.953743 | 0.942327 | 0.950601 | 0.872694 | 0.932586 | 0.846899 | 0.826662 | 0.915020 | 0.942713 | ... | 1.175454 | 1.402006 | 1.430392 | 1.215826 | 1.569373 | 1.716455 | 1.270999 | 1.195081 | 1.223082 | 1.793290 |
| 2      | 0.849866 | 0.899910 | 0.848895 | 0.879652 | 0.874959 | 0.899414 | 0.839200 | 0.836523 | 0.858040 | 0.867269 | ... | 1.408003 | 1.813739 | 1.454786 | 1.084522 | 1.352556 | 1.524663 | 1.377839 | 1.173830 | 1.305691 | 1.811849 |
| 3      | 0.803826 | 0.836527 | 0.800759 | 0.894570 | 0.839905 | 0.781001 | 0.847847 | 0.807040 | 0.805877 | 0.801402 | ... | 1.110307 | 1.703637 | 1.795092 | 1.469653 | 1.549936 | 1.491344 | 1.446922 | 1.055452 | 1.534895 | 1.741090 |
| 4      | 0.822793 | 0.796532 | 0.792343 | 0.839882 | 0.810122 | 0.781420 | 0.805251 | 0.795022 | 0.790380 | 0.864538 | ... | 1.062617 | 1.357689 | 1.485945 | 1.249266 | 1.456078 | 1.422782 | 1.376471 | 1.089629 | 1.121309 | 1.697524 |



## Phosphorylation sites

Besides calculating sequence scores, we also provides multiple datasets
of phosphorylation sites.

### CPTAC pan-cancer phosphoproteomics

``` python
df = Data.get_cptac_ensembl_site()
df.head(3)
```



|     | gene               | site  | site_seq        | protein           | gene_name | gene_site   | protein_site          |
|-----|--------------------|-------|-----------------|-------------------|-----------|-------------|-----------------------|
| 0   | ENSG00000003056.8  | S267  | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR      | M6PR_S267   | ENSP00000000412_S267  |
| 1   | ENSG00000003056.8  | S267  | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR      | M6PR_S267   | ENSP00000440488_S267  |
| 2   | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28     | USP28_S1053 | ENSP00000003302_S1053 |



### [Ochoa et al. human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)

``` python
df = Data.get_ochoa_site()
df.head(3)
```


|     | uniprot    | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name             | gene | Sequence                                          | is_valid | site_seq        | gene_site      |
|-----|------------|----------|---------|-------------|----------------|------------------------|-----------|------------------|------------------|-----------------|------------------|------|---------------------------------------------------|----------|-----------------|----------------|
| 0   | A0A075B6Q4 | 24       | S       | True        | 0.91           | 6.839384               | True      | A0A075B6Q4_24    | 0.149257         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |
| 1   | A0A075B6Q4 | 35       | S       | True        | 0.87           | 9.192622               | False     | A0A075B6Q4_35    | 0.136966         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |
| 2   | A0A075B6Q4 | 57       | S       | False       | 0.28           | 0.818834               | False     | A0A075B6Q4_57    | 0.125364         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | IADHLFWSEETKSRF | A0A075B6Q4_S57 |



### PhosphoSitePlus human phosphorylation site

``` python
df = Data.get_psp_human_site()
df.head(3)
```


|     | gene  | protein     | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq              | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |
|-----|-------|-------------|---------|------|-----------|-------------|---------|-----------------------|--------|--------|--------|----------|----------------|
| 0   | YWHAB | 14-3-3 beta | P31946  | T2   | YWHAB_T2  | 15718712    | human   | \_\_\_\_\_\_MtMDksELV | NaN    | 3.0    | 1.0    | None     | 0              |
| 1   | YWHAB | 14-3-3 beta | P31946  | S6   | YWHAB_S6  | 15718709    | human   | \_\_MtMDksELVQkAk     | NaN    | 8.0    | NaN    | None     | 0              |
| 2   | YWHAB | 14-3-3 beta | P31946  | Y21  | YWHAB_Y21 | 3426383     | human   | LAEQAERyDDMAAAM       | NaN    | NaN    | 4.0    | None     | 0              |



### Unique sites of combined Ochoa & PhosphoSitePlus

``` python
df = Data.get_combine_site_psp_ochoa()
df.head(3)
```


|     | site_seq        | gene_site  | gene  | source | num_site | acceptor | -7  | -6  | -5  | -4  | ... | -2  | -1  | 0   | 1   | 2   | 3   | 4   | 5   | 6   | 7   |
|-----|-----------------|------------|-------|--------|----------|----------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 0   | AAAAAAASGGAGSDN | PBX1_S136  | PBX1  | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | G   | A   | G   | S   | D   | N   |
| 1   | AAAAAAASGGGVSPD | PBX2_S146  | PBX2  | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | G   | G   | V   | S   | P   | D   |
| 2   | AAAAAAASGVTTGKP | CLASR_S349 | CLASR | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | V   | T   | T   | G   | K   | P   |



## Phosphorylation site sequence example

***All capital - 15 length (-7 to +7)***

- QSEEEKLSPSPTTED
- TLQHVPDYRQNVYIP
- TMGLSARyGPQFTLQ

***All capital - 10 length (-5 to +4)***

- SRDPHYQDPH
- LDNPDyQQDF
- AAAAAsGGAG

***With lowercase - (-7 to +7)***

- QsEEEKLsPsPTTED
- TLQHVPDyRQNVYIP
- TMGLsARyGPQFTLQ

***With lowercase - (-5 to +4)***

- sRDPHyQDPH
- LDNPDyQQDF
- AAAAAsGGAG

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sky1ove/python-katlas",
    "name": "python-katlas",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "lily",
    "author_email": "lcai888666@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/69/77/caa7969eeab584747aa447129e6ee12258c1f659dc4aff4f62aa03e16623/python-katlas-0.1.2.tar.gz",
    "platform": null,
    "description": "# KATLAS\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n<a target=\"_blank\" href=\"https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/index.ipynb\">\n<img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n\n<img alt=\"Katlas logo\" width=\"700\" caption=\"Katlas logo\" src=\"https://github.com/sky1ove/katlas/raw/main/dataset/images/logo.png\" id=\"logo\"/>\n\nKATLAS is a repository containing python tools to predict kinases given\na substrate sequence. It also contains datasets of kinase substrate\nspecificities and human phosphoproteomics.\n\n***References***: Please cite the appropriate papers if KATLAS is\nhelpful to your research.\n\n- KATLAS was described in the paper \\[Decoding Human Kinome\n  Specificities through a Computational Data-Driven Approach\n  (manuscript)\\]\n\n- The positional scanning peptide array (PSPA) data is from paper [An\n  atlas of substrate specificities for the human serine/threonine\n  kinome](https://www.nature.com/articles/s41586-022-05575-3) and paper\n  [The intrinsic substrate specificity of the human tyrosine\n  kinome](https://www.nature.com/articles/s41586-024-07407-y)\n\n- The kinase substrate datasets used for generating PSSMs are derived\n  from\n  [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/)\n  and paper [Large-scale Discovery of Substrates of the Human\n  Kinome](https://www.nature.com/articles/s41598-019-46385-4)\n\n- Phosphorylation sites are acquired from\n  [PhosphoSitePlus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245126/),\n  paper [The functional landscape of the human\n  phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3),\n  and [CPTAC](https://pdc.cancer.gov/pdc/cptac-pancancer) /\n  [LinkedOmics](https://academic.oup.com/nar/article/46/D1/D956/4607804)\n  \n  \n## Web applications\n\nUsers can now run the analysis directly on the web without needing to code. \n\nCheck out our latest web: [kinase-atlas.com](https://kinase-atlas.com/)\n\n## Tutorials on Colab\n\n- 1.  [Substrate scoring on a single substrate\n      sequence](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_01_sinlge_input.ipynb)\n- 2.  [High throughput substrate scoring on phosphoproteomics\n      dataset](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_02_high_throughput.ipynb)\n- 3.  [Kinase enrichment analysis for AKT\n      inhibitor](https://colab.research.google.com/github/sky1ove/katlas/blob/main/nbs/tutorial_03a_enrichment_AKTi.ipynb)\n\n\n## Install\n\nInstall the latest version through pip\n\n``` python\npip install python-katlas -Uq\n```\n\n## Import\n\n``` python\nfrom katlas.core import *\n```\n\n# Quick start\n\nWe provide two methods to calculate substrate sequence:\n\n- Computational Data-Driven Method (CDDM)\n- Positional Scanning Peptide Array (PSPA)\n\nWe consider the input in two formats:\n\n- a single input string (phosphorylation site)\n- a csv/dataframe that contains a column of phosphorylation sites\n\nFor input sequences, we also consider it in two conditions:\n\n- all capital\n- contains lower cases indicating phosphorylation status\n\n## Single sequence as input\n\n### CDDM, all capital\n\n``` python\npredict_kinase('AAAAAAASGGAGSDN',**param_CDDM_upper)\n```\n\n    considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0S', '1G', '2G', '3A', '4G', '5S', '6D', '7N']\n\n    kinase\n    PAK6     2.032\n    ULK3     2.032\n    PRKX     2.012\n    ATR      1.991\n    PRKD1    1.988\n             ...  \n    DDR2     0.928\n    EPHA4    0.928\n    TEK      0.921\n    KIT      0.915\n    FGFR3    0.910\n    Length: 289, dtype: float64\n\n### CDDM, with lower case indicating phosphorylation status\n\n``` python\npredict_kinase('AAAAAAAsGGAGsDN',**param_CDDM)\n```\n\n    considering string: ['-7A', '-6A', '-5A', '-4A', '-3A', '-2A', '-1A', '0s', '1G', '2G', '3A', '4G', '5s', '6D', '7N']\n\n    kinase\n    ULK3     1.987\n    PAK6     1.981\n    PRKD1    1.946\n    PIM3     1.944\n    PRKX     1.939\n             ...  \n    EPHA4    0.905\n    EGFR     0.900\n    TEK      0.898\n    FGFR3    0.894\n    KIT      0.882\n    Length: 289, dtype: float64\n\n### PSPA, with lower case indicating phosphorylation status\n\n``` python\npredict_kinase('AEEKEyHsEGG',**param_PSPA).head()\n```\n\n    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2s', '3E', '4G', '5G']\n\n    kinase\n    EGFR     4.013\n    FGFR4    3.568\n    ZAP70    3.412\n    CSK      3.241\n    SYK      3.209\n    dtype: float64\n\n### To replicate the results from The Kinase Library (PSPA)\n\nCheck this link: [The Kinase\nLibrary](https://kinase-library.phosphosite.org/site?s=AEEKEy*HsEGG&pp=false&scp=true),\nand use log2(score) to rank, it shows same results with the below (with\nslight differences due to rounding).\n\n``` python\npredict_kinase('AEEKEyHSEGG',**param_PSPA).head(10)\n```\n\n    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0y', '1H', '2S', '3E', '4G', '5G']\n\n    kinase\n    EGFR         3.181\n    FGFR4        2.390\n    CSK          2.308\n    ZAP70        2.068\n    SYK          1.998\n    PDHK1_TYR    1.922\n    RET          1.732\n    MATK         1.688\n    FLT1         1.627\n    BMPR2_TYR    1.456\n    dtype: float64\n\n- So far [The kinase Library](https://kinase-library.phosphosite.org)\n  considers all ***tyr sequences*** in capital regardless of whether or\n  not they contain lower cases, which is a small bug and should be fixed\n  soon.\n- Kinase with \u201c\\_TYR\u201d indicates it is a dual specificity kinase tested\n  in PSPA tyrosine setting, which has not been included in\n  kinase-library yet.\n\nWe can also calculate the percentile score using a referenced score\nsheet.\n\n``` python\n# Percentile reference sheet\ny_pct = Data.get_pspa_tyr_pct()\n\nget_pct('AEEKEyHSEGG',**param_PSPA_y, pct_ref = y_pct)\n```\n\n    considering string: ['-5A', '-4E', '-3E', '-2K', '-1E', '0Y', '1H', '2S', '3E', '4G', '5G']\n\n\n\n|       | log2(score) | percentile |\n|-------|-------------|------------|\n| EGFR  | 3.181       | 96.787423  |\n| FGFR4 | 2.390       | 94.012303  |\n| CSK   | 2.308       | 95.201640  |\n| ZAP70 | 2.068       | 88.380041  |\n| SYK   | 1.998       | 85.522898  |\n| ...   | ...         | ...        |\n| EPHA1 | -3.501      | 12.139440  |\n| FES   | -3.699      | 21.216678  |\n| TNK1  | -4.269      | 5.481887   |\n| TNK2  | -4.577      | 2.050581   |\n| DDR2  | -4.920      | 10.403281  |\n\n\n\n## High-throughput substrate scoring on a dataframe\n\n### Load your csv\n\n``` python\n# df = pd.read_csv('your_file.csv')\n```\n\n### Load a demo df\n\n``` python\n# Load a demo df with phosphorylation sites\ndf = Data.get_ochoa_site().head()\ndf.iloc[:,-2:]\n```\n\n\n|     | site_seq        | gene_site      |\n|-----|-----------------|----------------|\n| 0   | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |\n| 1   | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |\n| 2   | IADHLFWSEETKSRF | A0A075B6Q4_S57 |\n| 3   | KSRFTEYSMTSSVMR | A0A075B6Q4_S68 |\n| 4   | FTEYSMTSSVMRRNE | A0A075B6Q4_S71 |\n\n\n\n### Set the column name and param to calculate\n\nHere we choose param_CDDM_upper, as the sequences in the demo df are all\nin capital. You can also choose other params.\n\n``` python\nresults = predict_kinase_df(df,'site_seq',**param_CDDM_upper)\nresults\n```\n\n    input dataframe has a length 5\n    Preprocessing\n    Finish preprocessing\n    Calculating position: [-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]\n\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 289/289 [00:05<00:00, 56.64it/s]\n\n\n\n| kinase | SRC      | EPHA3    | FES      | NTRK3    | ALK      | EPHA8    | ABL1     | FLT3     | EPHB2    | FYN      | ... | MEK5     | PKN2     | MAP2K7   | MRCKB    | HIPK3    | CDK8     | BUB1     | MEKK3    | MAP2K3   | GRK1     |\n|--------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|-----|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|\n| 0      | 0.991760 | 1.093712 | 1.051750 | 1.067134 | 1.013682 | 1.097519 | 0.966379 | 0.982464 | 1.054986 | 1.055910 | ... | 1.314859 | 1.635470 | 1.652251 | 1.622672 | 1.362973 | 1.797155 | 1.305198 | 1.423618 | 1.504941 | 1.872020 |\n| 1      | 0.910262 | 0.953743 | 0.942327 | 0.950601 | 0.872694 | 0.932586 | 0.846899 | 0.826662 | 0.915020 | 0.942713 | ... | 1.175454 | 1.402006 | 1.430392 | 1.215826 | 1.569373 | 1.716455 | 1.270999 | 1.195081 | 1.223082 | 1.793290 |\n| 2      | 0.849866 | 0.899910 | 0.848895 | 0.879652 | 0.874959 | 0.899414 | 0.839200 | 0.836523 | 0.858040 | 0.867269 | ... | 1.408003 | 1.813739 | 1.454786 | 1.084522 | 1.352556 | 1.524663 | 1.377839 | 1.173830 | 1.305691 | 1.811849 |\n| 3      | 0.803826 | 0.836527 | 0.800759 | 0.894570 | 0.839905 | 0.781001 | 0.847847 | 0.807040 | 0.805877 | 0.801402 | ... | 1.110307 | 1.703637 | 1.795092 | 1.469653 | 1.549936 | 1.491344 | 1.446922 | 1.055452 | 1.534895 | 1.741090 |\n| 4      | 0.822793 | 0.796532 | 0.792343 | 0.839882 | 0.810122 | 0.781420 | 0.805251 | 0.795022 | 0.790380 | 0.864538 | ... | 1.062617 | 1.357689 | 1.485945 | 1.249266 | 1.456078 | 1.422782 | 1.376471 | 1.089629 | 1.121309 | 1.697524 |\n\n\n\n## Phosphorylation sites\n\nBesides calculating sequence scores, we also provides multiple datasets\nof phosphorylation sites.\n\n### CPTAC pan-cancer phosphoproteomics\n\n``` python\ndf = Data.get_cptac_ensembl_site()\ndf.head(3)\n```\n\n\n\n|     | gene               | site  | site_seq        | protein           | gene_name | gene_site   | protein_site          |\n|-----|--------------------|-------|-----------------|-------------------|-----------|-------------|-----------------------|\n| 0   | ENSG00000003056.8  | S267  | DDQLGEESEERDDHL | ENSP00000000412.3 | M6PR      | M6PR_S267   | ENSP00000000412_S267  |\n| 1   | ENSG00000003056.8  | S267  | DDQLGEESEERDDHL | ENSP00000440488.2 | M6PR      | M6PR_S267   | ENSP00000440488_S267  |\n| 2   | ENSG00000048028.11 | S1053 | PPTIRPNSPYDLCSR | ENSP00000003302.4 | USP28     | USP28_S1053 | ENSP00000003302_S1053 |\n\n\n\n### [Ochoa et al.\u00a0human phosphoproteome](https://www.nature.com/articles/s41587-019-0344-3)\n\n``` python\ndf = Data.get_ochoa_site()\ndf.head(3)\n```\n\n\n|     | uniprot    | position | residue | is_disopred | disopred_score | log10_hotspot_pval_min | isHotspot | uniprot_position | functional_score | current_uniprot | name             | gene | Sequence                                          | is_valid | site_seq        | gene_site      |\n|-----|------------|----------|---------|-------------|----------------|------------------------|-----------|------------------|------------------|-----------------|------------------|------|---------------------------------------------------|----------|-----------------|----------------|\n| 0   | A0A075B6Q4 | 24       | S       | True        | 0.91           | 6.839384               | True      | A0A075B6Q4_24    | 0.149257         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | VDDEKGDSNDDYDSA | A0A075B6Q4_S24 |\n| 1   | A0A075B6Q4 | 35       | S       | True        | 0.87           | 9.192622               | False     | A0A075B6Q4_35    | 0.136966         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | YDSAGLLSDEDCMSV | A0A075B6Q4_S35 |\n| 2   | A0A075B6Q4 | 57       | S       | False       | 0.28           | 0.818834               | False     | A0A075B6Q4_57    | 0.125364         | A0A075B6Q4      | A0A075B6Q4_HUMAN | None | MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... | True     | IADHLFWSEETKSRF | A0A075B6Q4_S57 |\n\n\n\n### PhosphoSitePlus human phosphorylation site\n\n``` python\ndf = Data.get_psp_human_site()\ndf.head(3)\n```\n\n\n|     | gene  | protein     | uniprot | site | gene_site | SITE_GRP_ID | species | site_seq              | LT_LIT | MS_LIT | MS_CST | CST_CAT# | Ambiguous_Site |\n|-----|-------|-------------|---------|------|-----------|-------------|---------|-----------------------|--------|--------|--------|----------|----------------|\n| 0   | YWHAB | 14-3-3 beta | P31946  | T2   | YWHAB_T2  | 15718712    | human   | \\_\\_\\_\\_\\_\\_MtMDksELV | NaN    | 3.0    | 1.0    | None     | 0              |\n| 1   | YWHAB | 14-3-3 beta | P31946  | S6   | YWHAB_S6  | 15718709    | human   | \\_\\_MtMDksELVQkAk     | NaN    | 8.0    | NaN    | None     | 0              |\n| 2   | YWHAB | 14-3-3 beta | P31946  | Y21  | YWHAB_Y21 | 3426383     | human   | LAEQAERyDDMAAAM       | NaN    | NaN    | 4.0    | None     | 0              |\n\n\n\n### Unique sites of combined Ochoa & PhosphoSitePlus\n\n``` python\ndf = Data.get_combine_site_psp_ochoa()\ndf.head(3)\n```\n\n\n|     | site_seq        | gene_site  | gene  | source | num_site | acceptor | -7  | -6  | -5  | -4  | ... | -2  | -1  | 0   | 1   | 2   | 3   | 4   | 5   | 6   | 7   |\n|-----|-----------------|------------|-------|--------|----------|----------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|\n| 0   | AAAAAAASGGAGSDN | PBX1_S136  | PBX1  | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | G   | A   | G   | S   | D   | N   |\n| 1   | AAAAAAASGGGVSPD | PBX2_S146  | PBX2  | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | G   | G   | V   | S   | P   | D   |\n| 2   | AAAAAAASGVTTGKP | CLASR_S349 | CLASR | ochoa  | 1        | S        | A   | A   | A   | A   | ... | A   | A   | S   | G   | V   | T   | T   | G   | K   | P   |\n\n\n\n## Phosphorylation site sequence example\n\n***All capital - 15 length (-7 to +7)***\n\n- QSEEEKLSPSPTTED\n- TLQHVPDYRQNVYIP\n- TMGLSARyGPQFTLQ\n\n***All capital - 10 length (-5 to +4)***\n\n- SRDPHYQDPH\n- LDNPDyQQDF\n- AAAAAsGGAG\n\n***With lowercase - (-7 to +7)***\n\n- QsEEEKLsPsPTTED\n- TLQHVPDyRQNVYIP\n- TMGLsARyGPQFTLQ\n\n***With lowercase - (-5 to +4)***\n\n- sRDPHyQDPH\n- LDNPDyQQDF\n- AAAAAsGGAG\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "tools for predicting kinome specificities",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/sky1ove/python-katlas"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "11912a2d388c72d1dc17d37865a894010313787094c23bd77dc9f1599fb16400",
                "md5": "5df2db955137efdc96d8f5c39695d66b",
                "sha256": "84b6f6b7d1e4f8147c6da7dfeb686d357a1051d82870dac2f1c01bb14669205a"
            },
            "downloads": -1,
            "filename": "python_katlas-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5df2db955137efdc96d8f5c39695d66b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 39190,
            "upload_time": "2024-09-27T04:30:13",
            "upload_time_iso_8601": "2024-09-27T04:30:13.410357Z",
            "url": "https://files.pythonhosted.org/packages/11/91/2a2d388c72d1dc17d37865a894010313787094c23bd77dc9f1599fb16400/python_katlas-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6977caa7969eeab584747aa447129e6ee12258c1f659dc4aff4f62aa03e16623",
                "md5": "4800bd3b2e231ba3e4cc3f99c887d6d5",
                "sha256": "29f836eece5195cf7cf15ef6f48217975f730d358a2417f0bd956f3c3a0f7e08"
            },
            "downloads": -1,
            "filename": "python-katlas-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4800bd3b2e231ba3e4cc3f99c887d6d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 43000,
            "upload_time": "2024-09-27T04:30:15",
            "upload_time_iso_8601": "2024-09-27T04:30:15.159125Z",
            "url": "https://files.pythonhosted.org/packages/69/77/caa7969eeab584747aa447129e6ee12258c1f659dc4aff4f62aa03e16623/python-katlas-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-27 04:30:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sky1ove",
    "github_project": "python-katlas",
    "github_not_found": true,
    "lcname": "python-katlas"
}
        
Elapsed time: 0.37374s