# Lazy QSAR
A library to build QSAR models fastly
## Installation
```bash
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
```
## Usage
### TLDR
1. Choose one of the available descriptors of small molecules.
2. Fit a model using FLAML AutoML. FLAML will search several estimators, which can lead to memory issues. Restrict the list on a case-by-case basis.
3. Get the validation of the model on the test set.
### Example for Binary Classifications
#### Get the data
You can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.
```python
from tdc.single_pred import Tox
data = Tox(name = 'hERG')
split = data.get_split()
```
Here we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.
```python
# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])
```
#### Build a model
Now we can train a model based on Morgan fingerprints.
```python
import lazyqsar as lq
model = lq.MorganBinaryClassifier()
# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.
model.fit(smiles_train, y_train)
```
#### Validate its performance
```python
from sklearn.metrics import roc_curve, auc
y_hat = model.predict_proba(smiles_valid)[:,1]
fpr, tpr, _ = roc_curve(y_valid, y_hat)
print("AUROC", auc(fpr, tpr))
```
### Example for Regressions
_Currently, only Morgan Descriptors and Ersilia Embeddings are available for regression models_
#### Get the data
You can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.
```python
from tdc.single_pred import Tox
data = Tox(name = 'LD50_Zhu')
split = data.get_split()
```
Here we are selecting the Acute Toxicity dataset. Let's refactor data for convenience.
```python
# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])
```
#### Build a model
Now we can train a model based on Morgan fingerprints.
```python
import lazyqsar as lq
model = lq.MorganBinaryClassifier()
# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.
model.fit(smiles_train, y_train)
```
#### Validate its performance
```python
from sklearn.metrics import roc_curve, auc
y_hat = model.predict(smiles_valid)
mae = mean_absolute_error(y_valid, y_hat)
r2 = r2_score(y_valid, y_hat)
print("MAE", mae, "R2", r2)
```
## Benchmark
The pipeline has been validated using the Therapeutic Data Commons ADMET datasets. More information about its results can be found in the /benchmark folder.
## Disclaimer
This library is only intended for quick-and-dirty QSAR modeling.
For a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)
## About us
Learn about the [Ersilia Open Source Initiative](https://ersilia.io)!
Raw data
{
"_id": null,
"home_page": "https://github.com/ersilia-os/lazy-qsar",
"name": "lazyqsar",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "qsar machine-learning chemistry computer-aided-drug-design",
"author": "Miquel Duran-Frigola",
"author_email": "miquel@ersilia.io",
"download_url": "https://files.pythonhosted.org/packages/23/db/e763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2/lazyqsar-0.3.tar.gz",
"platform": null,
"description": "# Lazy QSAR\n\nA library to build QSAR models fastly\n\n## Installation\n\n```bash\ngit clone https://github.com/ersilia-os/lazy-qsar.git\ncd lazy-qsar\npython -m pip install -e .\n```\n\n## Usage\n\n### TLDR\n1. Choose one of the available descriptors of small molecules.\n2. Fit a model using FLAML AutoML. FLAML will search several estimators, which can lead to memory issues. Restrict the list on a case-by-case basis.\n3. Get the validation of the model on the test set.\n\n### Example for Binary Classifications\n\n#### Get the data\n\nYou can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.\n\n```python\nfrom tdc.single_pred import Tox\ndata = Tox(name = 'hERG')\nsplit = data.get_split()\n```\nHere we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.\n\n```python\n# refactor fetched data in a convenient format\nsmiles_train = list(split[\"train\"][\"Drug\"])\ny_train = list(split[\"train\"][\"Y\"])\nsmiles_valid = list(split[\"valid\"][\"Drug\"])\ny_valid = list(split[\"valid\"][\"Y\"])\n```\n\n#### Build a model\n\nNow we can train a model based on Morgan fingerprints.\n\n```python\nimport lazyqsar as lq\n\n\nmodel = lq.MorganBinaryClassifier() \n# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.\nmodel.fit(smiles_train, y_train)\n```\n#### Validate its performance\n\n```python\nfrom sklearn.metrics import roc_curve, auc\ny_hat = model.predict_proba(smiles_valid)[:,1]\nfpr, tpr, _ = roc_curve(y_valid, y_hat)\nprint(\"AUROC\", auc(fpr, tpr))\n```\n\n### Example for Regressions\n_Currently, only Morgan Descriptors and Ersilia Embeddings are available for regression models_\n\n#### Get the data\nYou can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.\n\n```python\nfrom tdc.single_pred import Tox\ndata = Tox(name = 'LD50_Zhu')\nsplit = data.get_split()\n```\nHere we are selecting the Acute Toxicity dataset. Let's refactor data for convenience.\n\n```python\n# refactor fetched data in a convenient format\nsmiles_train = list(split[\"train\"][\"Drug\"])\ny_train = list(split[\"train\"][\"Y\"])\nsmiles_valid = list(split[\"valid\"][\"Drug\"])\ny_valid = list(split[\"valid\"][\"Y\"])\n```\n\n#### Build a model\n\nNow we can train a model based on Morgan fingerprints.\n\n```python\nimport lazyqsar as lq\n\nmodel = lq.MorganBinaryClassifier() \n# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.\nmodel.fit(smiles_train, y_train)\n```\n\n#### Validate its performance\n\n```python\nfrom sklearn.metrics import roc_curve, auc\ny_hat = model.predict(smiles_valid)\nmae = mean_absolute_error(y_valid, y_hat)\nr2 = r2_score(y_valid, y_hat)\nprint(\"MAE\", mae, \"R2\", r2)\n```\n\n## Benchmark\nThe pipeline has been validated using the Therapeutic Data Commons ADMET datasets. More information about its results can be found in the /benchmark folder.\n\n## Disclaimer\n\nThis library is only intended for quick-and-dirty QSAR modeling.\nFor a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)\n\n## About us\n\nLearn about the [Ersilia Open Source Initiative](https://ersilia.io)!\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "A library to quickly build QSAR models",
"version": "0.3",
"project_urls": {
"Homepage": "https://github.com/ersilia-os/lazy-qsar",
"Source Code": "https://github.com/ersilia-os/lazy-qsar"
},
"split_keywords": [
"qsar",
"machine-learning",
"chemistry",
"computer-aided-drug-design"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "23dbe763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2",
"md5": "94dca54e5c0252b2e87780483fed0c84",
"sha256": "407d60dfd92adcf1d91a2d9053ff2fe159fcb3c39cfc51546351e7fc770f92f0"
},
"downloads": -1,
"filename": "lazyqsar-0.3.tar.gz",
"has_sig": false,
"md5_digest": "94dca54e5c0252b2e87780483fed0c84",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 22572,
"upload_time": "2023-09-25T12:02:22",
"upload_time_iso_8601": "2023-09-25T12:02:22.277393Z",
"url": "https://files.pythonhosted.org/packages/23/db/e763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2/lazyqsar-0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-25 12:02:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ersilia-os",
"github_project": "lazy-qsar",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "requests",
"specs": []
},
{
"name": "flaml",
"specs": []
},
{
"name": "rdkit-pypi",
"specs": []
},
{
"name": "mordred",
"specs": []
},
{
"name": "PyTDC",
"specs": []
},
{
"name": "shap",
"specs": []
},
{
"name": "eosce",
"specs": []
}
],
"lcname": "lazyqsar"
}