[](https://zenodo.org/doi/10.5281/zenodo.11161563)
# PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models
[Paper](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00861-w) |
[Tutorial](https://github.com/Acellera/acegen-open/blob/main/tutorials/using_promptsmiles.md) |
[ACEGEN](https://pubs.acs.org/doi/10.1021/acs.jcim.4c00895)
This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.
# Installation
The libary can be installed via pip
```
pip install promptsmiles
```
Or via obtaining a copy of this repo, promptsmiles requires RDKit.
```
git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./
```
# Use
PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.
## Scaffold Decoration
```python
from promptsmiles import ScaffoldDecorator, FragmentLinker
SD = ScaffoldDecorator(
scaffold="N1(*)CCN(CC1)CCCCN(*)", # Or list of SMILES
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False, # CLM.sampler accepts a list of prompts or not
optimize_prompts=True,
shuffle=True, # Randomly select attachment points within a batch or not
return_all=False,
)
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
```

## Superstructure generation
```python
from promptsmiles import ScaffoldDecorator, FragmentLinker
SD = ScaffoldDecorator(
scaffold="CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C", # Or list of SMILES
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False, # CLM.sampler accepts a list of prompts or not
optimize_prompts=False,
shuffle=False, # Randomly select attachment points within a batch or not
return_all=False,
)
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
```

## Fragment linking / scaffold hopping
```python
FL = FragmentLinker(
fragments=["N1(*)CCNCC1", "C1CC1(*)"],
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False,
optimize_prompts=True,
shuffle=True,
scan=False, # Optional when combining 2 fragments, otherwise is set to true
return_all=False,
)
smiles = FL.sample(batch_size=3)
```

## Required chemical language model functions
Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.
```python
def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
"""
Input: Must have a prompt and batch_size argument.
Output: SMILES [list]
"""
# Encode prompt and sample as per model implementation
return smiles
```
**Note**: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and `batch_prompts` should be set to `True` in the promptsmiles class used.
The second is a function that evaluates the NLL of a list of SMILES
```python
def CLM_evaluater(smiles: list[str]):
"""
Input: A list of SMILES
Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
"""
return nlls
```
Raw data
{
"_id": null,
"home_page": null,
"name": "promptsmiles",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "SMILES, Chemical language models, De novo, Constrained de novo, chemistry, drug design",
"author": null,
"author_email": "Morgan Thomas <morganthomas263@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c1/09/b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0/promptsmiles-1.7.tar.gz",
"platform": null,
"description": "\n\n[](https://zenodo.org/doi/10.5281/zenodo.11161563)\n\n\n# PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models\n\n[Paper](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00861-w) |\n[Tutorial](https://github.com/Acellera/acegen-open/blob/main/tutorials/using_promptsmiles.md) |\n[ACEGEN](https://pubs.acs.org/doi/10.1021/acs.jcim.4c00895)\n\n\nThis library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.\n\n# Installation\nThe libary can be installed via pip\n```\npip install promptsmiles\n```\nOr via obtaining a copy of this repo, promptsmiles requires RDKit.\n```\ngit clone https://github.com/compsciencelab/PromptSMILES.git\ncd PromptSMILES\npip install ./\n```\n\n# Use\nPromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.\n\n## Scaffold Decoration\n```python\nfrom promptsmiles import ScaffoldDecorator, FragmentLinker\n\nSD = ScaffoldDecorator(\n scaffold=\"N1(*)CCN(CC1)CCCCN(*)\", # Or list of SMILES\n batch_size=64,\n sample_fn=CLM.sampler,\n evaluate_fn=CLM.evaluater,\n batch_prompts=False, # CLM.sampler accepts a list of prompts or not\n optimize_prompts=True,\n shuffle=True, # Randomly select attachment points within a batch or not\n return_all=False,\n )\nsmiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired\n```\n\n\n## Superstructure generation\n```python\nfrom promptsmiles import ScaffoldDecorator, FragmentLinker\n\nSD = ScaffoldDecorator(\n scaffold=\"CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C\", # Or list of SMILES\n batch_size=64,\n sample_fn=CLM.sampler,\n evaluate_fn=CLM.evaluater,\n batch_prompts=False, # CLM.sampler accepts a list of prompts or not\n optimize_prompts=False,\n shuffle=False, # Randomly select attachment points within a batch or not\n return_all=False,\n )\nsmiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired\n```\n\n\n## Fragment linking / scaffold hopping\n```python\nFL = FragmentLinker(\n fragments=[\"N1(*)CCNCC1\", \"C1CC1(*)\"],\n batch_size=64,\n sample_fn=CLM.sampler,\n evaluate_fn=CLM.evaluater,\n batch_prompts=False,\n optimize_prompts=True,\n shuffle=True,\n scan=False, # Optional when combining 2 fragments, otherwise is set to true\n return_all=False,\n)\nsmiles = FL.sample(batch_size=3)\n```\n\n## Required chemical language model functions\nNotice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.\n\n```python\ndef CLM_sampler(prompt: Union[str, list[str]], batch_size: int):\n \"\"\"\n Input: Must have a prompt and batch_size argument.\n Output: SMILES [list]\n \"\"\"\n # Encode prompt and sample as per model implementation\n return smiles\n```\n**Note**: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and `batch_prompts` should be set to `True` in the promptsmiles class used.\n\nThe second is a function that evaluates the NLL of a list of SMILES\n```python\ndef CLM_evaluater(smiles: list[str]):\n \"\"\"\n Input: A list of SMILES\n Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)\n \"\"\"\n return nlls\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.",
"version": "1.7",
"project_urls": {
"Homepage": "https://github.com/compsciencelab/PromptSMILES",
"Issues": "https://github.com/compsciencelab/PromptSMILES/issues"
},
"split_keywords": [
"smiles",
" chemical language models",
" de novo",
" constrained de novo",
" chemistry",
" drug design"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a0cab765ae09b8fefc0436abc4abce71faca4358b590ef1d0d1dd86cf03fce68",
"md5": "812252f84adca91d7ad432d627dc85f8",
"sha256": "b97861c73c9d6e6d35c5145bf92d15cf73a9f1a3793ecaf1d44bf14bbfe3e968"
},
"downloads": -1,
"filename": "promptsmiles-1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "812252f84adca91d7ad432d627dc85f8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 20402,
"upload_time": "2025-03-20T13:35:54",
"upload_time_iso_8601": "2025-03-20T13:35:54.107595Z",
"url": "https://files.pythonhosted.org/packages/a0/ca/b765ae09b8fefc0436abc4abce71faca4358b590ef1d0d1dd86cf03fce68/promptsmiles-1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c109b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0",
"md5": "fae71bc7a1286fa7bf0a6576e7e06a6b",
"sha256": "fdfdf4699837612f8a73f06a5666e81aaf8c0833e5d80a6ca76ce77eab1b894a"
},
"downloads": -1,
"filename": "promptsmiles-1.7.tar.gz",
"has_sig": false,
"md5_digest": "fae71bc7a1286fa7bf0a6576e7e06a6b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 24458,
"upload_time": "2025-03-20T13:35:55",
"upload_time_iso_8601": "2025-03-20T13:35:55.239021Z",
"url": "https://files.pythonhosted.org/packages/c1/09/b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0/promptsmiles-1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-03-20 13:35:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "compsciencelab",
"github_project": "PromptSMILES",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "promptsmiles"
}