lazyqsar


Namelazyqsar JSON
Version 0.3 PyPI version JSON
download
home_pagehttps://github.com/ersilia-os/lazy-qsar
SummaryA library to quickly build QSAR models
upload_time2023-09-25 12:02:22
maintainer
docs_urlNone
authorMiquel Duran-Frigola
requires_python>=3.7
licenseGPLv3
keywords qsar machine-learning chemistry computer-aided-drug-design
VCS
bugtrack_url
requirements requests flaml rdkit-pypi mordred PyTDC shap eosce
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Lazy QSAR

A library to build QSAR models fastly

## Installation

```bash
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
```

## Usage

### TLDR
1. Choose one of the available descriptors of small molecules.
2. Fit a model using FLAML AutoML. FLAML will search several estimators, which can lead to memory issues. Restrict the list on a case-by-case basis.
3. Get the validation of the model on the test set.

### Example for Binary Classifications

#### Get the data

You can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.

```python
from tdc.single_pred import Tox
data = Tox(name = 'hERG')
split = data.get_split()
```
Here we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.

```python
# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])
```

#### Build a model

Now we can train a model based on Morgan fingerprints.

```python
import lazyqsar as lq


model = lq.MorganBinaryClassifier() 
# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.
model.fit(smiles_train, y_train)
```
#### Validate its performance

```python
from sklearn.metrics import roc_curve, auc
y_hat = model.predict_proba(smiles_valid)[:,1]
fpr, tpr, _ = roc_curve(y_valid, y_hat)
print("AUROC", auc(fpr, tpr))
```

### Example for Regressions
_Currently, only Morgan Descriptors and Ersilia Embeddings are available for regression models_

#### Get the data
You can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.

```python
from tdc.single_pred import Tox
data = Tox(name = 'LD50_Zhu')
split = data.get_split()
```
Here we are selecting the Acute Toxicity dataset. Let's refactor data for convenience.

```python
# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])
```

#### Build a model

Now we can train a model based on Morgan fingerprints.

```python
import lazyqsar as lq

model = lq.MorganBinaryClassifier() 
# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.
model.fit(smiles_train, y_train)
```

#### Validate its performance

```python
from sklearn.metrics import roc_curve, auc
y_hat = model.predict(smiles_valid)
mae = mean_absolute_error(y_valid, y_hat)
r2 = r2_score(y_valid, y_hat)
print("MAE", mae, "R2", r2)
```

## Benchmark
The pipeline has been validated using the Therapeutic Data Commons ADMET datasets. More information about its results can be found in the /benchmark folder.

## Disclaimer

This library is only intended for quick-and-dirty QSAR modeling.
For a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)

## About us

Learn about the [Ersilia Open Source Initiative](https://ersilia.io)!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ersilia-os/lazy-qsar",
    "name": "lazyqsar",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "qsar machine-learning chemistry computer-aided-drug-design",
    "author": "Miquel Duran-Frigola",
    "author_email": "miquel@ersilia.io",
    "download_url": "https://files.pythonhosted.org/packages/23/db/e763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2/lazyqsar-0.3.tar.gz",
    "platform": null,
    "description": "# Lazy QSAR\n\nA library to build QSAR models fastly\n\n## Installation\n\n```bash\ngit clone https://github.com/ersilia-os/lazy-qsar.git\ncd lazy-qsar\npython -m pip install -e .\n```\n\n## Usage\n\n### TLDR\n1. Choose one of the available descriptors of small molecules.\n2. Fit a model using FLAML AutoML. FLAML will search several estimators, which can lead to memory issues. Restrict the list on a case-by-case basis.\n3. Get the validation of the model on the test set.\n\n### Example for Binary Classifications\n\n#### Get the data\n\nYou can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.\n\n```python\nfrom tdc.single_pred import Tox\ndata = Tox(name = 'hERG')\nsplit = data.get_split()\n```\nHere we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.\n\n```python\n# refactor fetched data in a convenient format\nsmiles_train = list(split[\"train\"][\"Drug\"])\ny_train = list(split[\"train\"][\"Y\"])\nsmiles_valid = list(split[\"valid\"][\"Drug\"])\ny_valid = list(split[\"valid\"][\"Y\"])\n```\n\n#### Build a model\n\nNow we can train a model based on Morgan fingerprints.\n\n```python\nimport lazyqsar as lq\n\n\nmodel = lq.MorganBinaryClassifier() \n# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.\nmodel.fit(smiles_train, y_train)\n```\n#### Validate its performance\n\n```python\nfrom sklearn.metrics import roc_curve, auc\ny_hat = model.predict_proba(smiles_valid)[:,1]\nfpr, tpr, _ = roc_curve(y_valid, y_hat)\nprint(\"AUROC\", auc(fpr, tpr))\n```\n\n### Example for Regressions\n_Currently, only Morgan Descriptors and Ersilia Embeddings are available for regression models_\n\n#### Get the data\nYou can find example data in the fantastic [Therapeutic Data Commons](https://tdcommons.ai) portal.\n\n```python\nfrom tdc.single_pred import Tox\ndata = Tox(name = 'LD50_Zhu')\nsplit = data.get_split()\n```\nHere we are selecting the Acute Toxicity dataset. Let's refactor data for convenience.\n\n```python\n# refactor fetched data in a convenient format\nsmiles_train = list(split[\"train\"][\"Drug\"])\ny_train = list(split[\"train\"][\"Y\"])\nsmiles_valid = list(split[\"valid\"][\"Drug\"])\ny_valid = list(split[\"valid\"][\"Y\"])\n```\n\n#### Build a model\n\nNow we can train a model based on Morgan fingerprints.\n\n```python\nimport lazyqsar as lq\n\nmodel = lq.MorganBinaryClassifier() \n# time_budget (in seconds) and estimator_list can be passed as parameters of the classifier. Defaults to 20s and all the available estimators in FLAML.\nmodel.fit(smiles_train, y_train)\n```\n\n#### Validate its performance\n\n```python\nfrom sklearn.metrics import roc_curve, auc\ny_hat = model.predict(smiles_valid)\nmae = mean_absolute_error(y_valid, y_hat)\nr2 = r2_score(y_valid, y_hat)\nprint(\"MAE\", mae, \"R2\", r2)\n```\n\n## Benchmark\nThe pipeline has been validated using the Therapeutic Data Commons ADMET datasets. More information about its results can be found in the /benchmark folder.\n\n## Disclaimer\n\nThis library is only intended for quick-and-dirty QSAR modeling.\nFor a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)\n\n## About us\n\nLearn about the [Ersilia Open Source Initiative](https://ersilia.io)!\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "A library to quickly build QSAR models",
    "version": "0.3",
    "project_urls": {
        "Homepage": "https://github.com/ersilia-os/lazy-qsar",
        "Source Code": "https://github.com/ersilia-os/lazy-qsar"
    },
    "split_keywords": [
        "qsar",
        "machine-learning",
        "chemistry",
        "computer-aided-drug-design"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "23dbe763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2",
                "md5": "94dca54e5c0252b2e87780483fed0c84",
                "sha256": "407d60dfd92adcf1d91a2d9053ff2fe159fcb3c39cfc51546351e7fc770f92f0"
            },
            "downloads": -1,
            "filename": "lazyqsar-0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "94dca54e5c0252b2e87780483fed0c84",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 22572,
            "upload_time": "2023-09-25T12:02:22",
            "upload_time_iso_8601": "2023-09-25T12:02:22.277393Z",
            "url": "https://files.pythonhosted.org/packages/23/db/e763177b71e1112acc7fc6c52b9e36173a09c2065e460ffd61e9bc56d8f2/lazyqsar-0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-25 12:02:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ersilia-os",
    "github_project": "lazy-qsar",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "flaml",
            "specs": []
        },
        {
            "name": "rdkit-pypi",
            "specs": []
        },
        {
            "name": "mordred",
            "specs": []
        },
        {
            "name": "PyTDC",
            "specs": []
        },
        {
            "name": "shap",
            "specs": []
        },
        {
            "name": "eosce",
            "specs": []
        }
    ],
    "lcname": "lazyqsar"
}
        
Elapsed time: 0.16533s