bayesianflow-for-chem


Namebayesianflow-for-chem JSON
Version 1.4.1 PyPI version JSON
download
home_pagehttps://augus1999.github.io/bayesian-flow-network-for-chemistry/
SummaryBayesian flow network framework for Chemistry
upload_time2025-07-17 13:24:41
maintainerNone
docs_urlNone
authorNianze A. Tao
requires_python>=3.9
licenseAGPL-3.0-or-later
keywords chemistry clm chembfn
VCS
bugtrack_url
requirements rdkit torch numpy loralib lightning scikit-learn typing_extensions
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ChemBFN: Bayesian Flow Network for Chemistry

[![DOI](https://zenodo.org/badge/DOI/10.1021/acs.jcim.4c01792.svg)](https://doi.org/10.1021/acs.jcim.4c01792)
[![arxiv](https://img.shields.io/badge/arXiv-2412.11439-red)](https://arxiv.org/abs/2412.11439)

This is the repository of the PyTorch implementation of ChemBFN model.

## Features

ChemBFN provides the state-of-the-art functionalities of
* SMILES or SELFIES-based *de novo* molecule generation
* Protein sequence *de novo* generation
* Classifier-free guidance conditional generation (single or multi-objective optimisation)
* Context-guided conditional generation (inpaint)
* Outstanding out-of-distribution chemical space sampling
* Fast sampling via ODE solver
* Molecular property and activity prediction finetuning
* Reaction yield prediction finetuning

in an all-in-one-model style.

## News

* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).
* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).
* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).
* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).
* [21/07/2024] Paper was submitted to arXiv.

## Install

```bash
$ pip install -U bayesianflow_for_chem
```

## Usage

You can find example scripts in [šŸ“example](./example) folder.

## Pre-trained Model

You can find pretrained models on our [šŸ¤—Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).

## Dataset Handling

We provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.

1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:
```python
>>> from bayesianflow_for_chem.tool import split_data

>>> split_data("delaney-processed.csv", method="scaffold")
```

2. Load the split data:
```python
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData

>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'], 
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], 
'Minimum Degree': ['2'], 
'Molecular Weight': ['84.14299999999999'], 
'Number of H-Bond Donors': ['0'], 
'Number of Rings': ['1'], 
'Number of Rotatable Bonds': ['0'], 
'Polar Surface Area': ['0.0'], 
'measured log solubility in mols per litre': ['-1.33'], 
'smiles': ['c1ccsc1']}
```

3. Create a mapping function to tokenise the dataset and select values:
```python
>>> import torch

>>> def encode(x):
...   smiles = x["smiles"][0]
...   value = [float(i) for i in x["measured log solubility in mols per litre"]]
...   return {"token": smiles2token(smiles), "value": torch.tensor(value)}

>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), 
'value': tensor([-1.3300])}
```

4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:
```python
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)
```

## Cite This Work

```bibtex
@article{2025chembfn,
    title={Bayesian Flow Network Framework for Chemistry Tasks},
    author={Tao, Nianze and Abe, Minori},
    journal={Journal of Chemical Information and Modeling},
    volume={65},
    number={3},
    pages={1178-1187},
    year={2025},
    doi={10.1021/acs.jcim.4c01792},
}
```
Out-of-distribution generation:
```bibtex
@misc{2024chembfn_ood,
    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, 
    author={Nianze Tao},
    year={2024},
    eprint={2412.11439},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11439}, 
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
    "name": "bayesianflow-for-chem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "Chemistry, CLM, ChemBFN",
    "author": "Nianze A. Tao",
    "author_email": "tao-nianze@hiroshima-u.ac.jp",
    "download_url": "https://files.pythonhosted.org/packages/63/fc/74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e/bayesianflow_for_chem-1.4.1.tar.gz",
    "platform": null,
    "description": "# ChemBFN: Bayesian Flow Network for Chemistry\r\n\r\n[![DOI](https://zenodo.org/badge/DOI/10.1021/acs.jcim.4c01792.svg)](https://doi.org/10.1021/acs.jcim.4c01792)\r\n[![arxiv](https://img.shields.io/badge/arXiv-2412.11439-red)](https://arxiv.org/abs/2412.11439)\r\n\r\nThis is the repository of the PyTorch implementation of ChemBFN model.\r\n\r\n## Features\r\n\r\nChemBFN provides the state-of-the-art functionalities of\r\n* SMILES or SELFIES-based *de novo* molecule generation\r\n* Protein sequence *de novo* generation\r\n* Classifier-free guidance conditional generation (single or multi-objective optimisation)\r\n* Context-guided conditional generation (inpaint)\r\n* Outstanding out-of-distribution chemical space sampling\r\n* Fast sampling via ODE solver\r\n* Molecular property and activity prediction finetuning\r\n* Reaction yield prediction finetuning\r\n\r\nin an all-in-one-model style.\r\n\r\n## News\r\n\r\n* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).\r\n* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).\r\n* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).\r\n* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).\r\n* [21/07/2024] Paper was submitted to arXiv.\r\n\r\n## Install\r\n\r\n```bash\r\n$ pip install -U bayesianflow_for_chem\r\n```\r\n\r\n## Usage\r\n\r\nYou can find example scripts in [\ud83d\udcc1example](./example) folder.\r\n\r\n## Pre-trained Model\r\n\r\nYou can find pretrained models on our [\ud83e\udd17Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).\r\n\r\n## Dataset Handling\r\n\r\nWe provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.\r\n\r\n1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:\r\n```python\r\n>>> from bayesianflow_for_chem.tool import split_data\r\n\r\n>>> split_data(\"delaney-processed.csv\", method=\"scaffold\")\r\n```\r\n\r\n2. Load the split data:\r\n```python\r\n>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData\r\n\r\n>>> dataset = CSVData(\"delaney-processed_train.csv\")\r\n>>> dataset[0]\r\n{'Compound ID': ['Thiophene'], \r\n'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], \r\n'Minimum Degree': ['2'], \r\n'Molecular Weight': ['84.14299999999999'], \r\n'Number of H-Bond Donors': ['0'], \r\n'Number of Rings': ['1'], \r\n'Number of Rotatable Bonds': ['0'], \r\n'Polar Surface Area': ['0.0'], \r\n'measured log solubility in mols per litre': ['-1.33'], \r\n'smiles': ['c1ccsc1']}\r\n```\r\n\r\n3. Create a mapping function to tokenise the dataset and select values:\r\n```python\r\n>>> import torch\r\n\r\n>>> def encode(x):\r\n...   smiles = x[\"smiles\"][0]\r\n...   value = [float(i) for i in x[\"measured log solubility in mols per litre\"]]\r\n...   return {\"token\": smiles2token(smiles), \"value\": torch.tensor(value)}\r\n\r\n>>> dataset.map(encode)\r\n>>> dataset[0]\r\n{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), \r\n'value': tensor([-1.3300])}\r\n```\r\n\r\n4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:\r\n```python\r\n>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)\r\n```\r\n\r\n## Cite This Work\r\n\r\n```bibtex\r\n@article{2025chembfn,\r\n    title={Bayesian Flow Network Framework for Chemistry Tasks},\r\n    author={Tao, Nianze and Abe, Minori},\r\n    journal={Journal of Chemical Information and Modeling},\r\n    volume={65},\r\n    number={3},\r\n    pages={1178-1187},\r\n    year={2025},\r\n    doi={10.1021/acs.jcim.4c01792},\r\n}\r\n```\r\nOut-of-distribution generation:\r\n```bibtex\r\n@misc{2024chembfn_ood,\r\n    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, \r\n    author={Nianze Tao},\r\n    year={2024},\r\n    eprint={2412.11439},\r\n    archivePrefix={arXiv},\r\n    primaryClass={cs.LG},\r\n    url={https://arxiv.org/abs/2412.11439}, \r\n}\r\n```\r\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0-or-later",
    "summary": "Bayesian flow network framework for Chemistry",
    "version": "1.4.1",
    "project_urls": {
        "Homepage": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
        "Source": "https://github.com/Augus1999/bayesian-flow-network-for-chemistry"
    },
    "split_keywords": [
        "chemistry",
        " clm",
        " chembfn"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d40d26f7dd7dc0d2d10043c21a401b86582480875cb9f1832f992c72c9bfa776",
                "md5": "459b88428703052cdcb8edd3ee9fb55a",
                "sha256": "0e0a7b6642c9d0e98c5d0aba316c08549dba8922aa1847096dcdd73c3e1a70a6"
            },
            "downloads": -1,
            "filename": "bayesianflow_for_chem-1.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "459b88428703052cdcb8edd3ee9fb55a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 38714,
            "upload_time": "2025-07-17T13:24:38",
            "upload_time_iso_8601": "2025-07-17T13:24:38.614814Z",
            "url": "https://files.pythonhosted.org/packages/d4/0d/26f7dd7dc0d2d10043c21a401b86582480875cb9f1832f992c72c9bfa776/bayesianflow_for_chem-1.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "63fc74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e",
                "md5": "8f9a19b94b6da9622e0d7263e2db162d",
                "sha256": "9a3407c8d4c69bc910cfe2b5e1d3560da1067a4cce731a540706d56fa2170510"
            },
            "downloads": -1,
            "filename": "bayesianflow_for_chem-1.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8f9a19b94b6da9622e0d7263e2db162d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 39144,
            "upload_time": "2025-07-17T13:24:41",
            "upload_time_iso_8601": "2025-07-17T13:24:41.122798Z",
            "url": "https://files.pythonhosted.org/packages/63/fc/74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e/bayesianflow_for_chem-1.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 13:24:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Augus1999",
    "github_project": "bayesian-flow-network-for-chemistry",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "rdkit",
            "specs": [
                [
                    ">=",
                    "2023.9.6"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.3.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "loralib",
            "specs": [
                [
                    ">=",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "lightning",
            "specs": [
                [
                    ">=",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    ">=",
                    "4.8.0"
                ]
            ]
        }
    ],
    "lcname": "bayesianflow-for-chem"
}
        
Elapsed time: 2.33707s