mlchemad


Namemlchemad JSON
Version 1.5.2 PyPI version JSON
download
home_pagehttps://github.com/OlivierBeq/mlchemad
SummaryApplicability domains for cheminformactics.
upload_time2024-04-17 11:46:59
maintainerOlivier J.M. Béquignon
docs_urlNone
authorOlivier J.M. Béquignon
requires_pythonNone
licenseMIT
keywords applicability domain cheminformatics outlier molecule detection out-of-distribution detection machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MLChemAD
Applicability domain definitions for cheminformatics modelling.

# Getting Started

## Install
```
pip install mlchemad
```

## Example Usage

- With molecular fingerprints, prefer the use of the `KNNApplicabilityDomain` with `k=1`, `scaling=None`, `hard_threshold=0.3`, and `dist='jaccard'`.
- Otherwise, the use of the `TopKatApplicabilityDomain` is recommended.

```python
from mlchemad import TopKatApplicabilityDomain, KNNApplicabilityDomain, data

# Create the applicability domain using TopKat's definition
app_domain = TopKatApplicabilityDomain()
# Fit it to the training set
app_domain.fit(data.mekenyan1993.training)

# Determine outliers from multiple samples (rows) ...
print(app_domain.contains(data.mekenyan1993.test))

# ... or a unique sample
sample = data.mekenyan1993.test.iloc[5] # Obtain the 5th row as a pandas.Series object 
print(app_domain.contains(sample))

# Now with Morgan fingerprints
app_domain = KNNApplicabilityDomain(k=1, scaling=None, hard_threshold=0.3, dist='jaccard')
app_domain.fit(data.broccatelli2011.training.drop(columns='Activity'))
print(app_domain.contains(data.broccatelli2011.test.drop(columns='Activity')))
```

Depending on the definition of the applicability domain, some samples of the training set might be outliers themselves.

# Applicability domains
The applicability domain defined by MLChemAD as the following:
- Bounding Box
- PCA Bounding Box
- Convex Hull<br/>
  ***(does not scale well)***
- TOPKAT's Optimum Prediction Space<br/>
  ***(recommended with molecular descriptors)***
- Leverage
- Hotelling T²
- Distance to Centroids
- k-Nearest Neighbors<br/>
  ***(recommended with molecular fingerprints with the use of `dist='rogerstanimoto'`, `scaling=None` and `hard_threshold=0.75` for ECFP fingerprints)***
- Isolation Forests
- Non-parametric Kernel Densities

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/OlivierBeq/mlchemad",
    "name": "mlchemad",
    "maintainer": "Olivier J.M. B\u00e9quignon",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "olivier.bequignon.maintainer@gmail.com",
    "keywords": "applicability domain, cheminformatics, outlier molecule detection, out-of-distribution detection, machine learning",
    "author": "Olivier J.M. B\u00e9quignon",
    "author_email": "olivier.bequignon.maintainer@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/58/1d/2452236c0e6cfcaf451f56dbbfe5bcc91354411e983d99cdc7a62af3314a/mlchemad-1.5.2.tar.gz",
    "platform": null,
    "description": "# MLChemAD\r\nApplicability domain definitions for cheminformatics modelling.\r\n\r\n# Getting Started\r\n\r\n## Install\r\n```\r\npip install mlchemad\r\n```\r\n\r\n## Example Usage\r\n\r\n- With molecular fingerprints, prefer the use of the `KNNApplicabilityDomain` with `k=1`, `scaling=None`, `hard_threshold=0.3`, and `dist='jaccard'`.\r\n- Otherwise, the use of the `TopKatApplicabilityDomain` is recommended.\r\n\r\n```python\r\nfrom mlchemad import TopKatApplicabilityDomain, KNNApplicabilityDomain, data\r\n\r\n# Create the applicability domain using TopKat's definition\r\napp_domain = TopKatApplicabilityDomain()\r\n# Fit it to the training set\r\napp_domain.fit(data.mekenyan1993.training)\r\n\r\n# Determine outliers from multiple samples (rows) ...\r\nprint(app_domain.contains(data.mekenyan1993.test))\r\n\r\n# ... or a unique sample\r\nsample = data.mekenyan1993.test.iloc[5] # Obtain the 5th row as a pandas.Series object \r\nprint(app_domain.contains(sample))\r\n\r\n# Now with Morgan fingerprints\r\napp_domain = KNNApplicabilityDomain(k=1, scaling=None, hard_threshold=0.3, dist='jaccard')\r\napp_domain.fit(data.broccatelli2011.training.drop(columns='Activity'))\r\nprint(app_domain.contains(data.broccatelli2011.test.drop(columns='Activity')))\r\n```\r\n\r\nDepending on the definition of the applicability domain, some samples of the training set might be outliers themselves.\r\n\r\n# Applicability domains\r\nThe applicability domain defined by MLChemAD as the following:\r\n- Bounding Box\r\n- PCA Bounding Box\r\n- Convex Hull<br/>\r\n  ***(does not scale well)***\r\n- TOPKAT's Optimum Prediction Space<br/>\r\n  ***(recommended with molecular descriptors)***\r\n- Leverage\r\n- Hotelling T\u00b2\r\n- Distance to Centroids\r\n- k-Nearest Neighbors<br/>\r\n  ***(recommended with molecular fingerprints with the use of `dist='rogerstanimoto'`, `scaling=None` and `hard_threshold=0.75` for ECFP fingerprints)***\r\n- Isolation Forests\r\n- Non-parametric Kernel Densities\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Applicability domains for cheminformactics.",
    "version": "1.5.2",
    "project_urls": {
        "Homepage": "https://github.com/OlivierBeq/mlchemad"
    },
    "split_keywords": [
        "applicability domain",
        " cheminformatics",
        " outlier molecule detection",
        " out-of-distribution detection",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "984937b077b10c2bd780bf16a79cdf15d6f1dd0cf49ea51a0162ee0477989a4f",
                "md5": "832ea93fe074abdb10e12d89d0d52a22",
                "sha256": "b578ceca58139578c84843aa851aa14430e3748bcf339a7b9a12ad94dbddfa3e"
            },
            "downloads": -1,
            "filename": "mlchemad-1.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "832ea93fe074abdb10e12d89d0d52a22",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 314364,
            "upload_time": "2024-04-17T11:46:57",
            "upload_time_iso_8601": "2024-04-17T11:46:57.401573Z",
            "url": "https://files.pythonhosted.org/packages/98/49/37b077b10c2bd780bf16a79cdf15d6f1dd0cf49ea51a0162ee0477989a4f/mlchemad-1.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "581d2452236c0e6cfcaf451f56dbbfe5bcc91354411e983d99cdc7a62af3314a",
                "md5": "ab322f828ea6da5ec56dd08389f9b7cd",
                "sha256": "b0f2c6d6b8c639e0c873f14af0364693ad7b3e9641705441464d5e1816168a41"
            },
            "downloads": -1,
            "filename": "mlchemad-1.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ab322f828ea6da5ec56dd08389f9b7cd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 286179,
            "upload_time": "2024-04-17T11:46:59",
            "upload_time_iso_8601": "2024-04-17T11:46:59.907326Z",
            "url": "https://files.pythonhosted.org/packages/58/1d/2452236c0e6cfcaf451f56dbbfe5bcc91354411e983d99cdc7a62af3314a/mlchemad-1.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-17 11:46:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OlivierBeq",
    "github_project": "mlchemad",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mlchemad"
}
        
Elapsed time: 0.91617s