phylokrr


Namephylokrr JSON
Version 0.4.2 PyPI version JSON
download
home_page
Summary
upload_time2024-02-25 07:50:03
maintainer
docs_urlNone
author
requires_python>=3.5
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Non-linear Phylogenetic regression using regularized kernels


# Installation

```
pip install phylokrr
```

# Quick overview

## Data simulation
This simulation is based on a given covariance matrix


```python
import random
import numpy as np

# seed for reproducibility
seed = 12038 
np.random.seed(seed)
random.seed(seed)


# cov. matrix obtained from the phylogenetic tree
vcv = np.loadtxt("./data/test_cov2.csv", delimiter=',') 

# Trait simulation under Brownian motion
n = vcv.shape[0]
mean = np.zeros(n)
X = np.random.multivariate_normal(cov=vcv, mean=mean).reshape(-1,1)
# Non-linear response variable (sine curve)
y = np.sin(X*2.1).ravel() + 5 

# Add noise to the response variable
y[::10] += 4 * (0.5 - np.random.rand(X.shape[0] // 10)) 
```
We then split data into training and testing sets, including their covariances

```python
from phylokrr.utils import split_data_vcv

# split data into training and testing sets 
num_test = round(0.5*n)

(X_train  , X_test,  
 y_train  , y_test,  
 vcv_train, vcv_test) = split_data_vcv(X, y, vcv, num_test, seed = seed) # seed defined above
```

## Simple model fitting without Cross-Validation (CV)

```python
from phylokrr.kernels import KRR

# set model
model = KRR(kernel='rbf', fit_intercept= True)

# arbitrarily proposed hyperparameters
params = {'lambda': 2, 'gamma': 2}

# set hyperparamters
model.set_params(**params)

# fit model with phylogenetic covariance matrix
model.fit(X_train, y_train, vcv = vcv_train)
y_pred1 = model.predict(X_test)
```

Let's compare it with the standard phylogenetic regression (i.e., PGLS)

```python
import matplotlib.pyplot as plt

from phylokrr.utils import PGLS

# fit standard phylogenetic regression
b_wls = PGLS(X_train, y_train, vcv_train)
y_pred3 = np.hstack((np.ones((X_test.shape[0],1)), X_test)) @ b_wls

plt.scatter(X_test, y_test , color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')
plt.scatter(X_test, y_pred1, color = 'green', alpha=0.5, label = 'phyloKRR predictions w\o CV')
plt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
```
<p align="center">
<img src="https://github.com/Ulises-Rosas/phylokrr/blob/main/data/imgs/phyloKRR_vs_PGLS.png" alt="drawing" width="600px"/>
</p>

## Hyperparameter tuning with CV

```python
from phylokrr.utils import k_fold_cv_random

params = {
    'lambda' : np.logspace(-5, 5, 200, base=2),
    'gamma' : np.logspace(-5, 5, 200,  base=2),
}

best_params = k_fold_cv_random(X_train, y_train, vcv_train,
                                model, 
                                params,
                                folds = 2, 
                                sample = 50)

model.set_params(**best_params)
model.fit(X_train, y_train, vcv = vcv_train)
y_pred_cv = model.predict(X_test)

plt.scatter(X_test, y_test, color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')
plt.scatter(X_test, y_pred_cv, color = 'green', alpha=0.5, label = 'phyloKRR predictions \w CV')
plt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions') # y_pred3 defined above
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
```

<p align="center">
<img src="https://github.com/Ulises-Rosas/phylokrr/blob/main/data/imgs/phyloKRR_vs_PGLS_cv.png" alt="drawing" width="600px"/>
</p>


# Reference

Rosas-Puchuri, U., Santaquiteria, A., Khanmohammadi, S., Solis-Lemus, C., & Betancur-R, R. (2023). [Non-linear phylogenetic regression using regularized kernels](https://www.biorxiv.org/content/10.1101/2023.10.04.560983v1.abstract). bioRxiv, 2023-10.



            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "phylokrr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/6f/0f/de1ee454a0b37bd5a8001793726b80ff9be897f5f2571686faffb26fcb1c/phylokrr-0.4.2.tar.gz",
    "platform": null,
    "description": "# Non-linear Phylogenetic regression using regularized kernels\n\n\n# Installation\n\n```\npip install phylokrr\n```\n\n# Quick overview\n\n## Data simulation\nThis simulation is based on a given covariance matrix\n\n\n```python\nimport random\nimport numpy as np\n\n# seed for reproducibility\nseed = 12038 \nnp.random.seed(seed)\nrandom.seed(seed)\n\n\n# cov. matrix obtained from the phylogenetic tree\nvcv = np.loadtxt(\"./data/test_cov2.csv\", delimiter=',') \n\n# Trait simulation under Brownian motion\nn = vcv.shape[0]\nmean = np.zeros(n)\nX = np.random.multivariate_normal(cov=vcv, mean=mean).reshape(-1,1)\n# Non-linear response variable (sine curve)\ny = np.sin(X*2.1).ravel() + 5 \n\n# Add noise to the response variable\ny[::10] += 4 * (0.5 - np.random.rand(X.shape[0] // 10)) \n```\nWe then split data into training and testing sets, including their covariances\n\n```python\nfrom phylokrr.utils import split_data_vcv\n\n# split data into training and testing sets \nnum_test = round(0.5*n)\n\n(X_train  , X_test,  \n y_train  , y_test,  \n vcv_train, vcv_test) = split_data_vcv(X, y, vcv, num_test, seed = seed) # seed defined above\n```\n\n## Simple model fitting without Cross-Validation (CV)\n\n```python\nfrom phylokrr.kernels import KRR\n\n# set model\nmodel = KRR(kernel='rbf', fit_intercept= True)\n\n# arbitrarily proposed hyperparameters\nparams = {'lambda': 2, 'gamma': 2}\n\n# set hyperparamters\nmodel.set_params(**params)\n\n# fit model with phylogenetic covariance matrix\nmodel.fit(X_train, y_train, vcv = vcv_train)\ny_pred1 = model.predict(X_test)\n```\n\nLet's compare it with the standard phylogenetic regression (i.e., PGLS)\n\n```python\nimport matplotlib.pyplot as plt\n\nfrom phylokrr.utils import PGLS\n\n# fit standard phylogenetic regression\nb_wls = PGLS(X_train, y_train, vcv_train)\ny_pred3 = np.hstack((np.ones((X_test.shape[0],1)), X_test)) @ b_wls\n\nplt.scatter(X_test, y_test , color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')\nplt.scatter(X_test, y_pred1, color = 'green', alpha=0.5, label = 'phyloKRR predictions w\\o CV')\nplt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions')\nplt.xlabel('x')\nplt.ylabel('y')\nplt.legend()\n```\n<p align=\"center\">\n<img src=\"https://github.com/Ulises-Rosas/phylokrr/blob/main/data/imgs/phyloKRR_vs_PGLS.png\" alt=\"drawing\" width=\"600px\"/>\n</p>\n\n## Hyperparameter tuning with CV\n\n```python\nfrom phylokrr.utils import k_fold_cv_random\n\nparams = {\n    'lambda' : np.logspace(-5, 5, 200, base=2),\n    'gamma' : np.logspace(-5, 5, 200,  base=2),\n}\n\nbest_params = k_fold_cv_random(X_train, y_train, vcv_train,\n                                model, \n                                params,\n                                folds = 2, \n                                sample = 50)\n\nmodel.set_params(**best_params)\nmodel.fit(X_train, y_train, vcv = vcv_train)\ny_pred_cv = model.predict(X_test)\n\nplt.scatter(X_test, y_test, color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')\nplt.scatter(X_test, y_pred_cv, color = 'green', alpha=0.5, label = 'phyloKRR predictions \\w CV')\nplt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions') # y_pred3 defined above\nplt.xlabel('x')\nplt.ylabel('y')\nplt.legend()\n```\n\n<p align=\"center\">\n<img src=\"https://github.com/Ulises-Rosas/phylokrr/blob/main/data/imgs/phyloKRR_vs_PGLS_cv.png\" alt=\"drawing\" width=\"600px\"/>\n</p>\n\n\n# Reference\n\nRosas-Puchuri, U., Santaquiteria, A., Khanmohammadi, S., Solis-Lemus, C., & Betancur-R, R. (2023). [Non-linear phylogenetic regression using regularized kernels](https://www.biorxiv.org/content/10.1101/2023.10.04.560983v1.abstract). bioRxiv, 2023-10.\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "0.4.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f0fde1ee454a0b37bd5a8001793726b80ff9be897f5f2571686faffb26fcb1c",
                "md5": "2484965fa264fc211fa7001f9dbefc85",
                "sha256": "f0bd057b684f7b014eb426a024f7d5ccc00ffcff00c29fedb8c94ba066fec0c4"
            },
            "downloads": -1,
            "filename": "phylokrr-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2484965fa264fc211fa7001f9dbefc85",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 8220,
            "upload_time": "2024-02-25T07:50:03",
            "upload_time_iso_8601": "2024-02-25T07:50:03.291491Z",
            "url": "https://files.pythonhosted.org/packages/6f/0f/de1ee454a0b37bd5a8001793726b80ff9be897f5f2571686faffb26fcb1c/phylokrr-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-25 07:50:03",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "phylokrr"
}
        
Elapsed time: 0.25359s