# Docko
Docking for ligands a discusting mix of code combining the latest from Chai with old scool bro vina.
Made this for myself but others wanted to use. Love only pls. Take it as it is <3 but if you notice bugs, please submit an
issue, be a g.
## Install
Make sure you have vina installed: https://autodock-vina.readthedocs.io/en/latest/installation.html
I have not found it to work with pip needs the executable.
Works on mac and liunx, you need big power tho for Chai so would rec linux.
```
conda create --name docko python=3.10.14 -y
conda activate docko
conda install -c conda-forge pdbfixer -y
conda config --env --add channels conda-forge
pip install git+https://github.com/chaidiscovery/chai-lab.git
```
### install docko now
```
conda activate docko
pip install docko
```
### Lucky last since vina is a b
You need to make a second environment just to prepare the ligand, I came across this issue when making all my stuff.
```
conda create --name vina python=3.9.7 -y
conda activate vina
conda install -c conda-forge numpy openbabel scipy rdkit -y
pip install meeko
```
## Quick start
#### Use case 1: you have a sequence and you want to bind it
Here you're best bet is using Chai, this will automatically handle everything for you:
Example:
```
from docko import *
base_dir = 'some_folder' # A folder on your computer
run_chai('A0A0E3LLD2_METBA', # name
'MSIEKIPGYTYGKTESMSPLNLEDLKLLKDSVMFTEEDEKYLKKAGEVLEDQVEEILDTWYGFVGSHPHLLYYFTSPDGTPNEEYLAAVRKRFSKWILDTCNRNYDQAWLDYQYEIGLRHHRTKKNRTDNVESVPNINYRYLVAFIYPITATIKPFLARKGHTSEEVEKMHQAWFKATVLQVALWSYPYVKQGDF', # sequence
'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', # ligand as smiles
base_dir
)
```
The outputs will now be in `base_dir`.
Say you have a csv of these and you want to make bound structures for all of them:
```
run_chai_df(output_dir, filename, entry_column='Entry', seq_column='Sequence', ligand_column='Substrate')
```
This runs Chai on a csv that contains your sequneces, ligands (Substartes) and the entry name (Entry)
and makes a new folder using the entry name (this would mean you ideally don't want dumb characters in there.) And puts
all these new folders in `output_dir`.
#### Use case 2: you have a uniprot ID and you want to get the structure and bind a ligand with vina
Here you got told "oh wow physics informed models are the best, I don't trust ML!" this will typically arise
from someone over the age of 40. Here to humour them you can also run `vina`, you'll need to have it installed.
The smiles is your ligand as smiles, `base_dir` is where you want your data to be output. Note given we are passing
the `protein_name='A0A0H2V871'` which is a uniprot ID it will automatically get the structure for us. If we weren't
we would need to pre-download the PDB structure, or fold it using an online server such as AF3 or Chia (you could run
Chia and then remove the ligand as well - my fave option).
```
from docko import *
smiles = 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC'
base_dir = 'some_folder' # A folder on your computer
dock(sequence='',
protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.
smiles=smiles,
ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this
residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)
protein_dir=f'{base_dir}/', # Folder to save the proteins to
ligand_dir=f'{base_dir}/', # Folder to save the input ligand to
output_dir=f'{base_dir}/', # output folder with the docked ligand and config file
pH=7.4, # pH to run docking at
method='vina', # method can be vina, ad4, or diffdock
size_x=5.0, # How far in x is alowed think of this as a cloud around your residues or residue centroid
size_y=5.0,
size_z=5.0,
num_modes=9, # Dunno check vina docks using the defaut
exhaustivenes=32 ) # higher is better but slower, this is a default
# Just checks the output was logged --> this has your "energy data" about how good the docking was
os.path.isfile(f'{base_dir}A0A0H2V871-DEHP_log.txt')
```
e.g. if you wanted to run it on some file, just change it to your path to your downloaded PDB file e.g.:
```
protein_name=f'{base_dir}data/test_existing.pdb',
```
This will then make the name of your directory `test_existing` and then save the resulst in there. I guess just
again don't have funny characters in your filename.
As above, you can also run with the option `ad4` it makes all these other random files, and again was something
that someone asked me to do, was seriously painful and I don't wish it on anyone else so have made it available.
Basically uses some rando forcefield that makes in some cases vina dock better. Who knows. LMK if you have an opinion.
#### Use case 3: you want to use diffdock and use up all the space on your computer
```
dock(sequence='',
protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.
smiles=smiles,
ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this
residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)
protein_dir=f'{base_dir}/', # Folder to save the proteins to
ligand_dir=f'{base_dir}/', # Folder to save the input ligand to
output_dir=f'{base_dir}/', # output folder with the docked ligand and config file
method='diffdock', # As above just change to diffdock
)
```
Basically exactly as above, except you need to specify the method is `diffdock`.
Note you need to have TRILL installed for this to work:
```
micromamba create -n TRILL python=3.10 ; micromamba activate TRILL
micromamba install -c pytorch -c nvidia pytorch=2.1.2 pytorch-cuda=12.1 torchdata
micromamba install -c conda-forge openbabel pdbfixer swig openmm smina fpocket vina openff-toolkit openmmforcefields setuptools=69.5.1
micromamba install -c bioconda foldseek pyrsistent
micromamba install -c "dglteam/label/cu121" dgl
micromamba install -c pyg pyg pytorch-cluster pytorch-sparse pytorch-scatter
pip install git+https://github.com/martinez-zacharya/lightdock.git@03a8bc4888c0ff8c98b7f0df4b3c671e3dbf3b1f git+https://github.com/martinez-zacharya/ECPICK.git setuptools==69.5.1
pip install trill-proteins
```
--------------------------------------------------------------------------------------------------------------
## Other info
## PDB or structure
You need to select your structure from PDB or in liu of that, use alphafold3 server (https://alphafoldserver.com/).
Alternatively, if your IDs are PDB IDs or Uniprot IDs you can just pass those and it will get teh structures for you.
If you use the alphafoldserver you'll get `cif` files and this works with that too!
## Working with heme based files
Unfortunatley since alphafold is some new stuff and we're working with autodock vina we will need to change the files a bit.
First, if we use the AF3 docked heme, this will be automatically "cleaned" before making the pdbqt file. So we use the pipeline on the AF3 structure.
So we need to read-add it back in after also converting it manually.
To convert it manually, we copy and paste (i know lol) the heme from the original pdb file (if you don't have this, go into a program like chimeraX and convert the .cif file to a .pdb file).
Then you open up the alpha fold pdb in a text editor and copy out the heme atoms, ommitting the last one, the `Fe`, as vina doesn't like this one.
Then, we convert this manually by using obabel: ` obabel heme.pdb -o pdbqt > heme.pdbqt`
Once this has been converted, we realise that vina doesn't like many of the tags. So we need to then change this so that we remove all of these.
These include (but probably not limited to)
```
ENDBRANCH
ROOT
BRANCH
ENDROOT
```
Then you can run the program as per usual :D I do this automatically within the scripts but thought I would mention it inacse
things fail for you (whoever you are.)
### References
(1) Martinez, Z. A.; Murray, R. M.; Thomson, M. W. TRILL: Orchestrating Modular Deep-Learning Workflows for Democratized, Scalable Protein Analysis and Engineering. bioRxiv October 27, 2023, p 2023.10.24.563881. https://doi.org/10.1101/2023.10.24.563881.
(2) Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61 (8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203.
(3) Chai Discovery. https://www.chaidiscovery.com/blog/introducing-chai-1 (accessed 2024-09-15).
### THANKX
Lastly if you liked this, give it a star ****
Raw data
{
"_id": null,
"home_page": "https://github.com/arianemora/docko/",
"name": "docko",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "docking, protein-engineering",
"author": "Ariane Mora",
"author_email": "ariane.n.mora@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fb/52/e8a4a98995b1cfdcc90fc3d334dd3af3ffd2aed07412f832e24ebcaaa084/docko-0.1.1.tar.gz",
"platform": null,
"description": "# Docko\nDocking for ligands a discusting mix of code combining the latest from Chai with old scool bro vina.\n\nMade this for myself but others wanted to use. Love only pls. Take it as it is <3 but if you notice bugs, please submit an \nissue, be a g.\n\n## Install\nMake sure you have vina installed: https://autodock-vina.readthedocs.io/en/latest/installation.html\n\nI have not found it to work with pip needs the executable.\n\nWorks on mac and liunx, you need big power tho for Chai so would rec linux.\n\n```\nconda create --name docko python=3.10.14 -y\nconda activate docko\nconda install -c conda-forge pdbfixer -y\nconda config --env --add channels conda-forge\npip install git+https://github.com/chaidiscovery/chai-lab.git\n```\n### install docko now\n```\nconda activate docko\npip install docko\n```\n\n### Lucky last since vina is a b\nYou need to make a second environment just to prepare the ligand, I came across this issue when making all my stuff.\n```\nconda create --name vina python=3.9.7 -y\nconda activate vina\nconda install -c conda-forge numpy openbabel scipy rdkit -y\npip install meeko\n```\n\n\n## Quick start\n\n#### Use case 1: you have a sequence and you want to bind it\nHere you're best bet is using Chai, this will automatically handle everything for you:\n\nExample:\n```\nfrom docko import *\n\nbase_dir = 'some_folder' # A folder on your computer\n\nrun_chai('A0A0E3LLD2_METBA', # name\n 'MSIEKIPGYTYGKTESMSPLNLEDLKLLKDSVMFTEEDEKYLKKAGEVLEDQVEEILDTWYGFVGSHPHLLYYFTSPDGTPNEEYLAAVRKRFSKWILDTCNRNYDQAWLDYQYEIGLRHHRTKKNRTDNVESVPNINYRYLVAFIYPITATIKPFLARKGHTSEEVEKMHQAWFKATVLQVALWSYPYVKQGDF', # sequence\n 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', # ligand as smiles\n base_dir\n )\n```\nThe outputs will now be in `base_dir`.\n\nSay you have a csv of these and you want to make bound structures for all of them:\n```\nrun_chai_df(output_dir, filename, entry_column='Entry', seq_column='Sequence', ligand_column='Substrate')\n```\nThis runs Chai on a csv that contains your sequneces, ligands (Substartes) and the entry name (Entry) \nand makes a new folder using the entry name (this would mean you ideally don't want dumb characters in there.) And puts \nall these new folders in `output_dir`.\n\n#### Use case 2: you have a uniprot ID and you want to get the structure and bind a ligand with vina\n\nHere you got told \"oh wow physics informed models are the best, I don't trust ML!\" this will typically arise\nfrom someone over the age of 40. Here to humour them you can also run `vina`, you'll need to have it installed.\n\nThe smiles is your ligand as smiles, `base_dir` is where you want your data to be output. Note given we are passing \nthe `protein_name='A0A0H2V871'` which is a uniprot ID it will automatically get the structure for us. If we weren't \nwe would need to pre-download the PDB structure, or fold it using an online server such as AF3 or Chia (you could run \nChia and then remove the ligand as well - my fave option).\n\n```\nfrom docko import *\n\nsmiles = 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC'\n\nbase_dir = 'some_folder' # A folder on your computer\n\ndock(sequence='', \n protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.\n smiles=smiles, \n ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this\n residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)\n protein_dir=f'{base_dir}/', # Folder to save the proteins to\n ligand_dir=f'{base_dir}/', # Folder to save the input ligand to\n output_dir=f'{base_dir}/', # output folder with the docked ligand and config file\n pH=7.4, # pH to run docking at\n method='vina', # method can be vina, ad4, or diffdock\n size_x=5.0, # How far in x is alowed think of this as a cloud around your residues or residue centroid\n size_y=5.0, \n size_z=5.0,\n num_modes=9, # Dunno check vina docks using the defaut\n exhaustivenes=32 ) # higher is better but slower, this is a default\n\n# Just checks the output was logged --> this has your \"energy data\" about how good the docking was\nos.path.isfile(f'{base_dir}A0A0H2V871-DEHP_log.txt')\n```\ne.g. if you wanted to run it on some file, just change it to your path to your downloaded PDB file e.g.:\n\n```\nprotein_name=f'{base_dir}data/test_existing.pdb',\n```\nThis will then make the name of your directory `test_existing` and then save the resulst in there. I guess just \nagain don't have funny characters in your filename.\n\nAs above, you can also run with the option `ad4` it makes all these other random files, and again was something\nthat someone asked me to do, was seriously painful and I don't wish it on anyone else so have made it available. \nBasically uses some rando forcefield that makes in some cases vina dock better. Who knows. LMK if you have an opinion.\n\n\n#### Use case 3: you want to use diffdock and use up all the space on your computer\n\n```\ndock(sequence='', \n protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.\n smiles=smiles, \n ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this\n residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)\n protein_dir=f'{base_dir}/', # Folder to save the proteins to\n ligand_dir=f'{base_dir}/', # Folder to save the input ligand to\n output_dir=f'{base_dir}/', # output folder with the docked ligand and config file\n method='diffdock', # As above just change to diffdock\n )\n\n```\nBasically exactly as above, except you need to specify the method is `diffdock`. \nNote you need to have TRILL installed for this to work:\n\n```\nmicromamba create -n TRILL python=3.10 ; micromamba activate TRILL\nmicromamba install -c pytorch -c nvidia pytorch=2.1.2 pytorch-cuda=12.1 torchdata\nmicromamba install -c conda-forge openbabel pdbfixer swig openmm smina fpocket vina openff-toolkit openmmforcefields setuptools=69.5.1\nmicromamba install -c bioconda foldseek pyrsistent\nmicromamba install -c \"dglteam/label/cu121\" dgl\nmicromamba install -c pyg pyg pytorch-cluster pytorch-sparse pytorch-scatter\npip install git+https://github.com/martinez-zacharya/lightdock.git@03a8bc4888c0ff8c98b7f0df4b3c671e3dbf3b1f git+https://github.com/martinez-zacharya/ECPICK.git setuptools==69.5.1\npip install trill-proteins\n```\n\n--------------------------------------------------------------------------------------------------------------\n## Other info\n\n## PDB or structure\nYou need to select your structure from PDB or in liu of that, use alphafold3 server (https://alphafoldserver.com/).\n\nAlternatively, if your IDs are PDB IDs or Uniprot IDs you can just pass those and it will get teh structures for you.\n\nIf you use the alphafoldserver you'll get `cif` files and this works with that too!\n\n## Working with heme based files\nUnfortunatley since alphafold is some new stuff and we're working with autodock vina we will need to change the files a bit. \n\nFirst, if we use the AF3 docked heme, this will be automatically \"cleaned\" before making the pdbqt file. So we use the pipeline on the AF3 structure.\n\nSo we need to read-add it back in after also converting it manually. \n\nTo convert it manually, we copy and paste (i know lol) the heme from the original pdb file (if you don't have this, go into a program like chimeraX and convert the .cif file to a .pdb file).\n\nThen you open up the alpha fold pdb in a text editor and copy out the heme atoms, ommitting the last one, the `Fe`, as vina doesn't like this one.\n\nThen, we convert this manually by using obabel: ` obabel heme.pdb -o pdbqt > heme.pdbqt` \n\nOnce this has been converted, we realise that vina doesn't like many of the tags. So we need to then change this so that we remove all of these.\nThese include (but probably not limited to)\n\n```\nENDBRANCH\nROOT\nBRANCH\nENDROOT\n```\n\nThen you can run the program as per usual :D I do this automatically within the scripts but thought I would mention it inacse\nthings fail for you (whoever you are.)\n\n### References\n\n(1) Martinez, Z. A.; Murray, R. M.; Thomson, M. W. TRILL: Orchestrating Modular Deep-Learning Workflows for Democratized, Scalable Protein Analysis and Engineering. bioRxiv October 27, 2023, p 2023.10.24.563881. https://doi.org/10.1101/2023.10.24.563881. \n(2) Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61 (8), 3891\u20133898. https://doi.org/10.1021/acs.jcim.1c00203. \n(3) Chai Discovery. https://www.chaidiscovery.com/blog/introducing-chai-1 (accessed 2024-09-15).\n\n\n### THANKX\nLastly if you liked this, give it a star ****\n",
"bugtrack_url": null,
"license": "GPL3",
"summary": null,
"version": "0.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/arianemora/docko/",
"Documentation": "https://github.com/arianemora/docko/",
"Homepage": "https://github.com/arianemora/docko/",
"Source Code": "https://github.com/arianemora/docko/"
},
"split_keywords": [
"docking",
" protein-engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c255a80bf4478e4cd4bc0d8299d694a7a9439c1bcc2e6f2581632a6d19f7eac",
"md5": "2ced8af049beae89d6366d46f413a7c8",
"sha256": "7a01e716d2cf4af23e6a4bf048bd134c9a8415bb76acf52bb08e3a88af613948"
},
"downloads": -1,
"filename": "docko-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2ced8af049beae89d6366d46f413a7c8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21784,
"upload_time": "2024-09-16T15:06:09",
"upload_time_iso_8601": "2024-09-16T15:06:09.406658Z",
"url": "https://files.pythonhosted.org/packages/0c/25/5a80bf4478e4cd4bc0d8299d694a7a9439c1bcc2e6f2581632a6d19f7eac/docko-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fb52e8a4a98995b1cfdcc90fc3d334dd3af3ffd2aed07412f832e24ebcaaa084",
"md5": "2da45ecb9eab6cde3bf1ac5def34ebc8",
"sha256": "8819bfc52d3db95fac0523227fa23056f10168a4181ee6234c204784f861b805"
},
"downloads": -1,
"filename": "docko-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "2da45ecb9eab6cde3bf1ac5def34ebc8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21597,
"upload_time": "2024-09-16T15:06:11",
"upload_time_iso_8601": "2024-09-16T15:06:11.145272Z",
"url": "https://files.pythonhosted.org/packages/fb/52/e8a4a98995b1cfdcc90fc3d334dd3af3ffd2aed07412f832e24ebcaaa084/docko-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-16 15:06:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arianemora",
"github_project": "docko",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "docko"
}