pka-predictor-moitessier


Namepka-predictor-moitessier JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://github.com/MoitessierLab/pKa-predictor
SummaryGraph-based pKa prediction for small molecules
upload_time2025-07-29 17:46:53
maintainerNone
docs_urlNone
authorMoitessier Lab
requires_python>=3.8
licenseGPL-3.0
keywords pka prediction gnn chemistry rdkit
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pKa-predictor

Leveraging our Teaching Experience to Improve Machine Learning: Application to pKa PredictionJérôme Genzling, Ziling Luo, Benjamin Weiser, Nicolas Moitessier
nicolas.moitessier@mcgill.ca
2023-12-07 – revised 2025-05-16

![Graphical Abstract](Graphical-abstract300.png)

# 🔍 What is this?

A Graph Neural Network (GNN) model for:

- Predicting pKa values of ionizable centers
- Identifying protonation sites
- Estimating dominant protonation states at a given pH
- Supporting iterative protonation/deprotonation of polyprotic molecules

# 🧪 Core Functionalities

- Input: CSV with SMILES and (optionally) ionizable atom indices
- Output: pKa value(s), and major protonated species at given pH
- Iterative inference for molecules with multiple ionizable centers
- Easily extendable to new datasets or re-trainable on custom data

# 📦 Required Libraries

Install with pip:

pip install torch torch_geometric pandas numpy rdkit seaborn hyperopt

You can also recreate our virtual environment using environment.yml

# 📁 Repository Structure

Datasets/ : All cleaned, split, and raw datasets

Baseline_Models/Descriptors/ : Code to generate traditional descriptors

Baseline_Models/RF, /XGB : Traditional model training scripts (Random Forest/XGB)

GNN/ : All code related to GNN/GAT models

MolGpKa_retrained/ : Code and data for retraining MolGpKa

# 🚀 Getting Started with the GNN

## 1. See available options

python main.py --mode usage

All possible arguments and their default values will be printed.

## 2. Predict pKa on a sample set
Your CSV will need to have at least two columns: 'Name' and 'Smiles'

On Windows:

python main.py --mode infer --input your_input.csv > infer_your_input.out

On Linux: 

python main.py --mode infer --data_path ..\Datasets\ --input your_input.csv --infer_pickled ..\Datasets\pickled_data\infer_pickled.pkl --model_dir ..\Model\ > infer_your_input.out

## 3. Predict from a CSV in Python

You can also use the predict() function directly:

from predict import predict

predicted_pkas, protonated_smiles = predict("your_dataset.csv", pH=7.4)

## 4. Verbose Levels

Use the --verbose flag to control output detail:

--verbose 0: No details printed in the output (silent mode)

--verbose 1: Summary of predictions + Some cleaning details

--verbose 2: Detailed view of every deprotonation step

# 📖 Citation

If you use this code or model, please cite:

Genzling J, Luo Z, Weiser B, Moitessier N. Leveraging our Teacher’s Experience to Improve Machine Learning: Application to pKa Prediction. ChemRxiv. 2024; doi:10.26434/chemrxiv-2024-bpd53-v2 
This content is a preprint and has not been peer-reviewed.

# 🧠 Tips

Use Cheminfo SMILES viewer to visualize and debug SMILES (https://www.cheminfo.org/Chemistry/Cheminformatics/Smiles/index.html)

If protonation states are off, check atom indexing or consider using neutral forms.

You can retrain on your own dataset by modifying train_pKa_predictor.py.

# 🛠 Support

Feel free to reach out via email or GitHub issues if you need help using or adapting the model.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MoitessierLab/pKa-predictor",
    "name": "pka-predictor-moitessier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pKa prediction GNN chemistry rdkit",
    "author": "Moitessier Lab",
    "author_email": "nicolas.moitessier@mcgill.ca",
    "download_url": "https://files.pythonhosted.org/packages/f5/5e/92f8942d2e1224a46d3eb1c3ebcb0985e19b474d76b1573ed9ea3702f962/pka_predictor_moitessier-0.1.12.tar.gz",
    "platform": null,
    "description": "# pKa-predictor\r\n\r\nLeveraging our Teaching Experience to Improve Machine Learning: Application to pKa PredictionJ\u00e9r\u00f4me Genzling, Ziling Luo, Benjamin Weiser, Nicolas Moitessier\r\nnicolas.moitessier@mcgill.ca\r\n2023-12-07 \u2013 revised 2025-05-16\r\n\r\n![Graphical Abstract](Graphical-abstract300.png)\r\n\r\n# \ud83d\udd0d What is this?\r\n\r\nA Graph Neural Network (GNN) model for:\r\n\r\n- Predicting pKa values of ionizable centers\r\n- Identifying protonation sites\r\n- Estimating dominant protonation states at a given pH\r\n- Supporting iterative protonation/deprotonation of polyprotic molecules\r\n\r\n# \ud83e\uddea Core Functionalities\r\n\r\n- Input: CSV with SMILES and (optionally) ionizable atom indices\r\n- Output: pKa value(s), and major protonated species at given pH\r\n- Iterative inference for molecules with multiple ionizable centers\r\n- Easily extendable to new datasets or re-trainable on custom data\r\n\r\n# \ud83d\udce6 Required Libraries\r\n\r\nInstall with pip:\r\n\r\npip install torch torch_geometric pandas numpy rdkit seaborn hyperopt\r\n\r\nYou can also recreate our virtual environment using environment.yml\r\n\r\n# \ud83d\udcc1 Repository Structure\r\n\r\nDatasets/ : All cleaned, split, and raw datasets\r\n\r\nBaseline_Models/Descriptors/ : Code to generate traditional descriptors\r\n\r\nBaseline_Models/RF, /XGB : Traditional model training scripts (Random Forest/XGB)\r\n\r\nGNN/ : All code related to GNN/GAT models\r\n\r\nMolGpKa_retrained/ : Code and data for retraining MolGpKa\r\n\r\n# \ud83d\ude80 Getting Started with the GNN\r\n\r\n## 1. See available options\r\n\r\npython main.py --mode usage\r\n\r\nAll possible arguments and their default values will be printed.\r\n\r\n## 2. Predict pKa on a sample set\r\nYour CSV will need to have at least two columns: 'Name' and 'Smiles'\r\n\r\nOn Windows:\r\n\r\npython main.py --mode infer --input your_input.csv > infer_your_input.out\r\n\r\nOn Linux: \r\n\r\npython main.py --mode infer --data_path ..\\Datasets\\ --input your_input.csv --infer_pickled ..\\Datasets\\pickled_data\\infer_pickled.pkl --model_dir ..\\Model\\ > infer_your_input.out\r\n\r\n## 3. Predict from a CSV in Python\r\n\r\nYou can also use the predict() function directly:\r\n\r\nfrom predict import predict\r\n\r\npredicted_pkas, protonated_smiles = predict(\"your_dataset.csv\", pH=7.4)\r\n\r\n## 4. Verbose Levels\r\n\r\nUse the --verbose flag to control output detail:\r\n\r\n--verbose 0: No details printed in the output (silent mode)\r\n\r\n--verbose 1: Summary of predictions + Some cleaning details\r\n\r\n--verbose 2: Detailed view of every deprotonation step\r\n\r\n# \ud83d\udcd6 Citation\r\n\r\nIf you use this code or model, please cite:\r\n\r\nGenzling J, Luo Z, Weiser B, Moitessier N. Leveraging our Teacher\u2019s Experience to Improve Machine Learning: Application to pKa Prediction. ChemRxiv. 2024; doi:10.26434/chemrxiv-2024-bpd53-v2 \r\nThis content is a preprint and has not been peer-reviewed.\r\n\r\n# \ud83e\udde0 Tips\r\n\r\nUse Cheminfo SMILES viewer to visualize and debug SMILES (https://www.cheminfo.org/Chemistry/Cheminformatics/Smiles/index.html)\r\n\r\nIf protonation states are off, check atom indexing or consider using neutral forms.\r\n\r\nYou can retrain on your own dataset by modifying train_pKa_predictor.py.\r\n\r\n# \ud83d\udee0 Support\r\n\r\nFeel free to reach out via email or GitHub issues if you need help using or adapting the model.\r\n\r\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "Graph-based pKa prediction for small molecules",
    "version": "0.1.12",
    "project_urls": {
        "Homepage": "https://github.com/MoitessierLab/pKa-predictor"
    },
    "split_keywords": [
        "pka",
        "prediction",
        "gnn",
        "chemistry",
        "rdkit"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f58d291f775255ee862f359058a24ee4404bfb64e91f1facb9ae257b6977f1c2",
                "md5": "d3115f8be103e67b2991d2951a93d4f4",
                "sha256": "5c478fc6e1518056dc7836594505f4141e0a86eb7a684e9542682486758f7232"
            },
            "downloads": -1,
            "filename": "pka_predictor_moitessier-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d3115f8be103e67b2991d2951a93d4f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9109704,
            "upload_time": "2025-07-29T17:46:51",
            "upload_time_iso_8601": "2025-07-29T17:46:51.531341Z",
            "url": "https://files.pythonhosted.org/packages/f5/8d/291f775255ee862f359058a24ee4404bfb64e91f1facb9ae257b6977f1c2/pka_predictor_moitessier-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f55e92f8942d2e1224a46d3eb1c3ebcb0985e19b474d76b1573ed9ea3702f962",
                "md5": "5b98a6b026036997f77f9feb5b46cd68",
                "sha256": "1a9103c3b331c1747a06d03e464379c6812790799a58c3ca9b339b8d74485baf"
            },
            "downloads": -1,
            "filename": "pka_predictor_moitessier-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "5b98a6b026036997f77f9feb5b46cd68",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 9104142,
            "upload_time": "2025-07-29T17:46:53",
            "upload_time_iso_8601": "2025-07-29T17:46:53.667761Z",
            "url": "https://files.pythonhosted.org/packages/f5/5e/92f8942d2e1224a46d3eb1c3ebcb0985e19b474d76b1573ed9ea3702f962/pka_predictor_moitessier-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 17:46:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MoitessierLab",
    "github_project": "pKa-predictor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pka-predictor-moitessier"
}
        
Elapsed time: 1.17161s