promptsmiles


Namepromptsmiles JSON
Version 1.7 PyPI version JSON
download
home_pageNone
SummaryA conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.
upload_time2025-03-20 13:35:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords smiles chemical language models de novo constrained de novo chemistry drug design
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

[![DOI](https://zenodo.org/badge/757912118.svg)](https://zenodo.org/doi/10.5281/zenodo.11161563)


# PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

[Paper](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00861-w) |
[Tutorial](https://github.com/Acellera/acegen-open/blob/main/tutorials/using_promptsmiles.md) |
[ACEGEN](https://pubs.acs.org/doi/10.1021/acs.jcim.4c00895)


This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.

# Installation
The libary can be installed via pip
```
pip install promptsmiles
```
Or via obtaining a copy of this repo, promptsmiles requires RDKit.
```
git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./
```

# Use
PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.

## Scaffold Decoration
```python
from promptsmiles import ScaffoldDecorator, FragmentLinker

SD = ScaffoldDecorator(
    scaffold="N1(*)CCN(CC1)CCCCN(*)", # Or list of SMILES
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False, # CLM.sampler accepts a list of prompts or not
    optimize_prompts=True,
    shuffle=True, # Randomly select attachment points within a batch or not
    return_all=False,
    )
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
```
![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/scaff_dec_example.png)

## Superstructure generation
```python
from promptsmiles import ScaffoldDecorator, FragmentLinker

SD = ScaffoldDecorator(
    scaffold="CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C", # Or list of SMILES
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False, # CLM.sampler accepts a list of prompts or not
    optimize_prompts=False,
    shuffle=False, # Randomly select attachment points within a batch or not
    return_all=False,
    )
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
```
![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/scaff_super_example.png)

## Fragment linking / scaffold hopping
```python
FL = FragmentLinker(
    fragments=["N1(*)CCNCC1", "C1CC1(*)"],
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False,
    optimize_prompts=True,
    shuffle=True,
    scan=False, # Optional when combining 2 fragments, otherwise is set to true
    return_all=False,
)
smiles = FL.sample(batch_size=3)
```
![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/frag_link_example.png)
## Required chemical language model functions
Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.

```python
def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
    """
    Input: Must have a prompt and batch_size argument.
    Output: SMILES [list]
    """
    # Encode prompt and sample as per model implementation
    return smiles
```
**Note**: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and `batch_prompts` should be set to `True` in the promptsmiles class used.

The second is a function that evaluates the NLL of a list of SMILES
```python
def CLM_evaluater(smiles: list[str]):
    """
    Input: A list of SMILES
    Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
    """
    return nlls
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "promptsmiles",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "SMILES, Chemical language models, De novo, Constrained de novo, chemistry, drug design",
    "author": null,
    "author_email": "Morgan Thomas <morganthomas263@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c1/09/b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0/promptsmiles-1.7.tar.gz",
    "platform": null,
    "description": "\n\n[![DOI](https://zenodo.org/badge/757912118.svg)](https://zenodo.org/doi/10.5281/zenodo.11161563)\n\n\n# PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models\n\n[Paper](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00861-w) |\n[Tutorial](https://github.com/Acellera/acegen-open/blob/main/tutorials/using_promptsmiles.md) |\n[ACEGEN](https://pubs.acs.org/doi/10.1021/acs.jcim.4c00895)\n\n\nThis library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.\n\n# Installation\nThe libary can be installed via pip\n```\npip install promptsmiles\n```\nOr via obtaining a copy of this repo, promptsmiles requires RDKit.\n```\ngit clone https://github.com/compsciencelab/PromptSMILES.git\ncd PromptSMILES\npip install ./\n```\n\n# Use\nPromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.\n\n## Scaffold Decoration\n```python\nfrom promptsmiles import ScaffoldDecorator, FragmentLinker\n\nSD = ScaffoldDecorator(\n    scaffold=\"N1(*)CCN(CC1)CCCCN(*)\", # Or list of SMILES\n    batch_size=64,\n    sample_fn=CLM.sampler,\n    evaluate_fn=CLM.evaluater,\n    batch_prompts=False, # CLM.sampler accepts a list of prompts or not\n    optimize_prompts=True,\n    shuffle=True, # Randomly select attachment points within a batch or not\n    return_all=False,\n    )\nsmiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired\n```\n![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/scaff_dec_example.png)\n\n## Superstructure generation\n```python\nfrom promptsmiles import ScaffoldDecorator, FragmentLinker\n\nSD = ScaffoldDecorator(\n    scaffold=\"CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C\", # Or list of SMILES\n    batch_size=64,\n    sample_fn=CLM.sampler,\n    evaluate_fn=CLM.evaluater,\n    batch_prompts=False, # CLM.sampler accepts a list of prompts or not\n    optimize_prompts=False,\n    shuffle=False, # Randomly select attachment points within a batch or not\n    return_all=False,\n    )\nsmiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired\n```\n![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/scaff_super_example.png)\n\n## Fragment linking / scaffold hopping\n```python\nFL = FragmentLinker(\n    fragments=[\"N1(*)CCNCC1\", \"C1CC1(*)\"],\n    batch_size=64,\n    sample_fn=CLM.sampler,\n    evaluate_fn=CLM.evaluater,\n    batch_prompts=False,\n    optimize_prompts=True,\n    shuffle=True,\n    scan=False, # Optional when combining 2 fragments, otherwise is set to true\n    return_all=False,\n)\nsmiles = FL.sample(batch_size=3)\n```\n![alt text](https://github.com/MorganCThomas/PromptSMILES/blob/main/images/frag_link_example.png)\n## Required chemical language model functions\nNotice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.\n\n```python\ndef CLM_sampler(prompt: Union[str, list[str]], batch_size: int):\n    \"\"\"\n    Input: Must have a prompt and batch_size argument.\n    Output: SMILES [list]\n    \"\"\"\n    # Encode prompt and sample as per model implementation\n    return smiles\n```\n**Note**: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and `batch_prompts` should be set to `True` in the promptsmiles class used.\n\nThe second is a function that evaluates the NLL of a list of SMILES\n```python\ndef CLM_evaluater(smiles: list[str]):\n    \"\"\"\n    Input: A list of SMILES\n    Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)\n    \"\"\"\n    return nlls\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.",
    "version": "1.7",
    "project_urls": {
        "Homepage": "https://github.com/compsciencelab/PromptSMILES",
        "Issues": "https://github.com/compsciencelab/PromptSMILES/issues"
    },
    "split_keywords": [
        "smiles",
        " chemical language models",
        " de novo",
        " constrained de novo",
        " chemistry",
        " drug design"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a0cab765ae09b8fefc0436abc4abce71faca4358b590ef1d0d1dd86cf03fce68",
                "md5": "812252f84adca91d7ad432d627dc85f8",
                "sha256": "b97861c73c9d6e6d35c5145bf92d15cf73a9f1a3793ecaf1d44bf14bbfe3e968"
            },
            "downloads": -1,
            "filename": "promptsmiles-1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "812252f84adca91d7ad432d627dc85f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 20402,
            "upload_time": "2025-03-20T13:35:54",
            "upload_time_iso_8601": "2025-03-20T13:35:54.107595Z",
            "url": "https://files.pythonhosted.org/packages/a0/ca/b765ae09b8fefc0436abc4abce71faca4358b590ef1d0d1dd86cf03fce68/promptsmiles-1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c109b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0",
                "md5": "fae71bc7a1286fa7bf0a6576e7e06a6b",
                "sha256": "fdfdf4699837612f8a73f06a5666e81aaf8c0833e5d80a6ca76ce77eab1b894a"
            },
            "downloads": -1,
            "filename": "promptsmiles-1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "fae71bc7a1286fa7bf0a6576e7e06a6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 24458,
            "upload_time": "2025-03-20T13:35:55",
            "upload_time_iso_8601": "2025-03-20T13:35:55.239021Z",
            "url": "https://files.pythonhosted.org/packages/c1/09/b4bddc5a7153c4e66f4c7d68eaa70fbc18da3815b931c7ddcd1aed5d2ea0/promptsmiles-1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-20 13:35:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "compsciencelab",
    "github_project": "PromptSMILES",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "promptsmiles"
}
        
Elapsed time: 0.40772s