# DockTDeep
Preprint: **"Data-centric training enables meaningful interaction learning in protein–ligand binding affinity prediction."** [ChemRXiv.](https://chemrxiv.org/engage/chemrxiv/article-details/68a52850728bf9025e40d9e4)
## 💾 Installation
> [!TIP]
> Always use a virtual environment to manage dependencies.
>
> ```bash
> python -m venv .venv
> source .venv/bin/activate
> ```
### Using pip
Quick setup for inference. Install the package directly from PyPI:
```bash
pip install docktdeep
```
## 🚀 Quick start
### Basic usage
Predict binding affinities for protein-ligand pairs _(predictions are given in kcal/mol)_.
```bash
# single protein-ligand pair
docktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv
# multiple pairs
docktdeep predict \
--proteins protein1.pdb protein2.pdb \
--ligands ligand1.pdb ligand2.pdb \
--output-csv results.csv \
--max-batch-size 16
# options available in help
docktdeep predict --help
```
> [!TIP]
> Use shell globbing patterns to process multiple files efficiently.
> ```bash
> # using regex expansion
> docktdeep predict \
> --proteins $(ls path/to/proteins/*_protein.pdb) \
> --ligands $(ls path/to/ligands/*_ligand.pdb)
>
> # another example using find command for more complex patterns
> docktdeep predict \
> --proteins $(find /data/complexes -name "*_protein_prep.pdb" | sort) \
> --ligands $(find /data/complexes -name "*_ligand_rnum.pdb" | sort)
> ```
## ⚙️ Development setup
For development and training custom models:
```bash
# clone the repository
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep
# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# install deps
python -m pip install -r requirements.txt
# run tests to verify installation
python -m pytest tests/
```
### Training models
Initialize a new aim repository for tracking experiments:
```bash
aim init
# to start the aim server
aim server
```
To see all available training options:
```bash
python train.py --help
```
Train a model with optimized hyperparameters:
```bash
python train.py \
--model Baseline \
--experiment experiment-name \
--depthwise-convs \
--adaptive-pooling \
--optim AdamW \
--max-epochs 1500 \
--batch-size 64 \
--lr 0.00087469 \
--beta1 0.25693012 \
--eps 0.00032933 \
--dropout 0.25348994 \
--wdecay 0.0000169 \
--molecular-dropout 0.06 \
--molecular-dropout-unit complex \
--random-rotation \
--dataframe-path path/to/dataframe.csv \
--root-dir path/to/data/PDBbind2020 \
--ligand-path-pattern "{c}/{c}_ligand_rnum.pdb" \
--protein-path-pattern "{c}/{c}_protein_prep.pdb" \
--split-column random_split
```
## 📝 Citation
If you use DockTDeep in your research, please cite:
```bibtex
@article{dasilva2025docktdeep,
title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},
author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\~a}es, Camila and Cust{\'o}dio, F{\'a}bio and Dardenne, Laurent},
year={2025}
}
```
### Related
- **DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes.** [GitHub](https://github.com/gmmsb-lncc/docktgrid).
Raw data
{
"_id": null,
"home_page": null,
"name": "docktdeep",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "protein-ligand, binding-affinity, deep-learning, drug-discovery",
"author": null,
"author_email": "\"Matheus M. P. da Silva\" <matheusp@posgrad.lncc.br>",
"download_url": "https://files.pythonhosted.org/packages/47/72/a8a0b8d29932479a2119fec70e95f1689b76d3f0b965bc08b24131b84a8a/docktdeep-0.1.1.tar.gz",
"platform": null,
"description": "# DockTDeep\n\nPreprint: **\"Data-centric training enables meaningful interaction learning in protein\u2013ligand binding affinity prediction.\"** [ChemRXiv.](https://chemrxiv.org/engage/chemrxiv/article-details/68a52850728bf9025e40d9e4)\n\n## \ud83d\udcbe Installation\n\n> [!TIP]\n> Always use a virtual environment to manage dependencies.\n> \n> ```bash\n> python -m venv .venv\n> source .venv/bin/activate\n> ```\n\n### Using pip\n\nQuick setup for inference. Install the package directly from PyPI:\n\n```bash\npip install docktdeep\n```\n\n\n\n## \ud83d\ude80 Quick start\n\n### Basic usage\n\nPredict binding affinities for protein-ligand pairs _(predictions are given in kcal/mol)_.\n\n```bash\n# single protein-ligand pair\ndocktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv\n\n# multiple pairs\ndocktdeep predict \\\n --proteins protein1.pdb protein2.pdb \\\n --ligands ligand1.pdb ligand2.pdb \\\n --output-csv results.csv \\\n --max-batch-size 16\n\n# options available in help\ndocktdeep predict --help\n```\n\n> [!TIP]\n> Use shell globbing patterns to process multiple files efficiently.\n> ```bash\n> # using regex expansion\n> docktdeep predict \\\n> --proteins $(ls path/to/proteins/*_protein.pdb) \\\n> --ligands $(ls path/to/ligands/*_ligand.pdb)\n>\n> # another example using find command for more complex patterns\n> docktdeep predict \\\n> --proteins $(find /data/complexes -name \"*_protein_prep.pdb\" | sort) \\\n> --ligands $(find /data/complexes -name \"*_ligand_rnum.pdb\" | sort)\n> ```\n\n\n## \u2699\ufe0f Development setup\n\nFor development and training custom models:\n\n```bash\n# clone the repository\ngit clone https://github.com/gmmsb-lncc/docktdeep.git\ncd docktdeep\n\n# create and activate a virtual environment\npython -m venv .venv\nsource .venv/bin/activate\n\n# install deps\npython -m pip install -r requirements.txt\n\n# run tests to verify installation\npython -m pytest tests/\n```\n\n### Training models\n\nInitialize a new aim repository for tracking experiments:\n\n```bash\naim init\n\n# to start the aim server\naim server\n```\n\nTo see all available training options:\n\n```bash\npython train.py --help\n```\n\n\nTrain a model with optimized hyperparameters:\n\n```bash\npython train.py \\\n --model Baseline \\\n --experiment experiment-name \\\n --depthwise-convs \\\n --adaptive-pooling \\\n --optim AdamW \\\n --max-epochs 1500 \\\n --batch-size 64 \\\n --lr 0.00087469 \\\n --beta1 0.25693012 \\\n --eps 0.00032933 \\\n --dropout 0.25348994 \\\n --wdecay 0.0000169 \\\n --molecular-dropout 0.06 \\\n --molecular-dropout-unit complex \\\n --random-rotation \\\n --dataframe-path path/to/dataframe.csv \\\n --root-dir path/to/data/PDBbind2020 \\\n --ligand-path-pattern \"{c}/{c}_ligand_rnum.pdb\" \\\n --protein-path-pattern \"{c}/{c}_protein_prep.pdb\" \\\n --split-column random_split\n```\n\n\n\n## \ud83d\udcdd Citation\n\nIf you use DockTDeep in your research, please cite:\n\n```bibtex\n@article{dasilva2025docktdeep,\n title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},\n author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\\~a}es, Camila and Cust{\\'o}dio, F{\\'a}bio and Dardenne, Laurent},\n year={2025}\n}\n```\n\n### Related\n- **DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes.** [GitHub](https://github.com/gmmsb-lncc/docktgrid).\n",
"bugtrack_url": null,
"license": "LGPL-2.1-or-later",
"summary": "A deep learning model for protein-ligand binding affinity prediction",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/gmmsb-lncc/docktdeep",
"Issues": "https://github.com/gmmsb-lncc/docktdeep/issues",
"Repository": "https://github.com/gmmsb-lncc/docktdeep"
},
"split_keywords": [
"protein-ligand",
" binding-affinity",
" deep-learning",
" drug-discovery"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7f8c883458cb2346bcbe4da6fbf3a76acad7f9b1b812e2fa0c9e4cf315f6a4fd",
"md5": "667db0e6d33f06db05e12b774fb686ab",
"sha256": "2783f7e43fa883da7cfdbfaac8ef3f97116d7806d372301df6cff84e3e7a5b7c"
},
"downloads": -1,
"filename": "docktdeep-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "667db0e6d33f06db05e12b774fb686ab",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19089,
"upload_time": "2025-10-17T17:54:58",
"upload_time_iso_8601": "2025-10-17T17:54:58.134867Z",
"url": "https://files.pythonhosted.org/packages/7f/8c/883458cb2346bcbe4da6fbf3a76acad7f9b1b812e2fa0c9e4cf315f6a4fd/docktdeep-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4772a8a0b8d29932479a2119fec70e95f1689b76d3f0b965bc08b24131b84a8a",
"md5": "fb1110843941c07a8f543e7c381c7b82",
"sha256": "0c328c85ac4b029c1261aff44e2ff39d6883914b819de20a8758a6e66287e10c"
},
"downloads": -1,
"filename": "docktdeep-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "fb1110843941c07a8f543e7c381c7b82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 313216,
"upload_time": "2025-10-17T17:55:00",
"upload_time_iso_8601": "2025-10-17T17:55:00.066482Z",
"url": "https://files.pythonhosted.org/packages/47/72/a8a0b8d29932479a2119fec70e95f1689b76d3f0b965bc08b24131b84a8a/docktdeep-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-17 17:55:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gmmsb-lncc",
"github_project": "docktdeep",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aim",
"specs": [
[
"==",
"3.29.1"
]
]
},
{
"name": "aim-ui",
"specs": [
[
"==",
"3.29.1"
]
]
},
{
"name": "aimrecords",
"specs": [
[
"==",
"0.0.7"
]
]
},
{
"name": "aimrocks",
"specs": [
[
"==",
"0.5.2"
]
]
},
{
"name": "aiofiles",
"specs": [
[
"==",
"24.1.0"
]
]
},
{
"name": "aiohappyeyeballs",
"specs": [
[
"==",
"2.6.1"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"==",
"3.12.15"
]
]
},
{
"name": "aiosignal",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "alembic",
"specs": [
[
"==",
"1.16.5"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "anyio",
"specs": [
[
"==",
"4.10.0"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"25.3.0"
]
]
},
{
"name": "base58",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "biopandas",
"specs": [
[
"==",
"0.5.1"
]
]
},
{
"name": "boto3",
"specs": [
[
"==",
"1.40.21"
]
]
},
{
"name": "botocore",
"specs": [
[
"==",
"1.40.21"
]
]
},
{
"name": "cachetools",
"specs": [
[
"==",
"6.2.0"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.8.3"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.17.1"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.3"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "cryptography",
"specs": [
[
"==",
"45.0.7"
]
]
},
{
"name": "docktgrid",
"specs": [
[
"==",
"0.0.3"
]
]
},
{
"name": "dotenv",
"specs": [
[
"==",
"0.9.9"
]
]
},
{
"name": "fastapi",
"specs": [
[
"==",
"0.116.1"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.19.1"
]
]
},
{
"name": "frozenlist",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2025.7.0"
]
]
},
{
"name": "greenlet",
"specs": [
[
"==",
"3.2.4"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "jmespath",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "lightning",
"specs": [
[
"==",
"2.5.4"
]
]
},
{
"name": "lightning-utilities",
"specs": [
[
"==",
"0.15.2"
]
]
},
{
"name": "looseversion",
"specs": [
[
"==",
"1.1.2"
]
]
},
{
"name": "Mako",
"specs": [
[
"==",
"1.3.10"
]
]
},
{
"name": "MarkupSafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "mmtf-python",
"specs": [
[
"==",
"1.1.3"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "msgpack",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "multidict",
"specs": [
[
"==",
"6.6.4"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.5"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.3.2"
]
]
},
{
"name": "nvidia-cublas-cu12",
"specs": [
[
"==",
"12.8.4.1"
]
]
},
{
"name": "nvidia-cuda-cupti-cu12",
"specs": [
[
"==",
"12.8.90"
]
]
},
{
"name": "nvidia-cuda-nvrtc-cu12",
"specs": [
[
"==",
"12.8.93"
]
]
},
{
"name": "nvidia-cuda-runtime-cu12",
"specs": [
[
"==",
"12.8.90"
]
]
},
{
"name": "nvidia-cudnn-cu12",
"specs": [
[
"==",
"9.10.2.21"
]
]
},
{
"name": "nvidia-cufft-cu12",
"specs": [
[
"==",
"11.3.3.83"
]
]
},
{
"name": "nvidia-cufile-cu12",
"specs": [
[
"==",
"1.13.1.3"
]
]
},
{
"name": "nvidia-curand-cu12",
"specs": [
[
"==",
"10.3.9.90"
]
]
},
{
"name": "nvidia-cusolver-cu12",
"specs": [
[
"==",
"11.7.3.90"
]
]
},
{
"name": "nvidia-cusparse-cu12",
"specs": [
[
"==",
"12.5.8.93"
]
]
},
{
"name": "nvidia-cusparselt-cu12",
"specs": [
[
"==",
"0.7.1"
]
]
},
{
"name": "nvidia-nccl-cu12",
"specs": [
[
"==",
"2.27.3"
]
]
},
{
"name": "nvidia-nvjitlink-cu12",
"specs": [
[
"==",
"12.8.93"
]
]
},
{
"name": "nvidia-nvtx-cu12",
"specs": [
[
"==",
"12.8.90"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.2"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"11.3.0"
]
]
},
{
"name": "propcache",
"specs": [
[
"==",
"0.3.2"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.22"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.7"
]
]
},
{
"name": "pydantic_core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pytorch-lightning",
"specs": [
[
"==",
"2.5.4"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.5"
]
]
},
{
"name": "RestrictedPython",
"specs": [
[
"==",
"8.0"
]
]
},
{
"name": "s3transfer",
"specs": [
[
"==",
"0.13.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.16.1"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"80.9.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "SQLAlchemy",
"specs": [
[
"==",
"2.0.43"
]
]
},
{
"name": "starlette",
"specs": [
[
"==",
"0.47.3"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.14.0"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.8.0"
]
]
},
{
"name": "torchmetrics",
"specs": [
[
"==",
"1.8.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "triton",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.15.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
"==",
"0.35.0"
]
]
},
{
"name": "watchdog",
"specs": [
[
"==",
"6.0.0"
]
]
},
{
"name": "websockets",
"specs": [
[
"==",
"15.0.1"
]
]
},
{
"name": "yarl",
"specs": [
[
"==",
"1.20.1"
]
]
}
],
"lcname": "docktdeep"
}