bayesianflow-for-chem


Namebayesianflow-for-chem JSON
Version 2.2.3 PyPI version JSON
download
home_pagehttps://augus1999.github.io/bayesian-flow-network-for-chemistry/
SummaryBayesian flow network framework for Chemistry
upload_time2025-10-20 10:33:18
maintainerNone
docs_urlNone
authorNianze A. Tao
requires_python>=3.11
licenseAGPL-3.0-or-later
keywords chemistry clm chembfn
VCS
bugtrack_url
requirements rdkit torch torchao colorama numpy scipy loralib lightning scikit-learn
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ChemBFN: Bayesian Flow Network for Chemistry

[![DOI](https://zenodo.org/badge/DOI/10.1021/acs.jcim.4c01792.svg)](https://doi.org/10.1021/acs.jcim.4c01792)
[![arxiv](https://img.shields.io/badge/arXiv-2412.11439-red)](https://arxiv.org/abs/2412.11439)

This is the repository of the PyTorch implementation of ChemBFN model.

### Build State

[![PyPI](https://img.shields.io/pypi/v/bayesianflow-for-chem?color=ff69b4)](https://pypi.org/project/bayesianflow-for-chem/)
![pytest](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/actions/workflows/pytest.yml/badge.svg)
[![document](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/actions/workflows/pages/pages-build-deployment/badge.svg)](https://augus1999.github.io/bayesian-flow-network-for-chemistry/)

## Features

ChemBFN provides the state-of-the-art functionalities of
* SMILES or SELFIES-based *de novo* molecule generation
* Protein sequence *de novo* generation
* Template optimisation (mol2mol)
* Classifier-free guidance conditional generation (single or multi-objective optimisation)
* Context-guided conditional generation (inpaint)
* Outstanding out-of-distribution chemical space sampling
* Fast sampling via ODE solver
* Molecular property and activity prediction finetuning
* Reaction yield prediction finetuning

in an all-in-one-model style.

## News

* [09/10/2025] A web app [`chembfn_webui`](https://github.com/Augus1999/ChemBFN-WebUI) for hosting ChemBFN models is available on [PyPI](https://pypi.org/project/chembfn-webui/).
* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).
* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).
* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).
* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).
* [21/07/2024] Paper was submitted to arXiv.

## Install

```bash
$ pip install -U bayesianflow_for_chem
```

## Usage

You can find example scripts in [šŸ“example](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/tree/main/example) folder.

## Pre-trained Model

You can find pretrained models on our [šŸ¤—Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).

## Dataset Handling

We provide a Python class [`CSVData`](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/blob/main/bayesianflow_for_chem/data.py#L153) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.

1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:
```python
>>> from bayesianflow_for_chem.tool import split_data

>>> split_data("delaney-processed.csv", method="scaffold")
```

2. Load the split data:
```python
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData

>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'], 
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], 
'Minimum Degree': ['2'], 
'Molecular Weight': ['84.14299999999999'], 
'Number of H-Bond Donors': ['0'], 
'Number of Rings': ['1'], 
'Number of Rotatable Bonds': ['0'], 
'Polar Surface Area': ['0.0'], 
'measured log solubility in mols per litre': ['-1.33'], 
'smiles': ['c1ccsc1']}
```

3. Create a mapping function to tokenise the dataset and select values:
```python
>>> import torch

>>> def encode(x):
...   smiles = x["smiles"][0]
...   value = [float(i) for i in x["measured log solubility in mols per litre"]]
...   return {"token": smiles2token(smiles), "value": torch.tensor(value)}

>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), 
'value': tensor([-1.3300])}
```

4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:
```python
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)
```

## Cite This Work

```bibtex
@article{2025chembfn,
    title={Bayesian Flow Network Framework for Chemistry Tasks},
    author={Tao, Nianze and Abe, Minori},
    journal={Journal of Chemical Information and Modeling},
    volume={65},
    number={3},
    pages={1178-1187},
    year={2025},
    doi={10.1021/acs.jcim.4c01792},
}
```
Out-of-distribution generation:
```bibtex
@misc{2024chembfn_ood,
    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, 
    author={Nianze Tao},
    year={2024},
    eprint={2412.11439},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11439}, 
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
    "name": "bayesianflow-for-chem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "Chemistry, CLM, ChemBFN",
    "author": "Nianze A. Tao",
    "author_email": "tao-nianze@hiroshima-u.ac.jp",
    "download_url": "https://files.pythonhosted.org/packages/f4/3b/8c7f024661c34f6fc24b6a6d34047e1fcfc3f403cd2d66c0c5913cfce094/bayesianflow_for_chem-2.2.3.tar.gz",
    "platform": null,
    "description": "# ChemBFN: Bayesian Flow Network for Chemistry\r\n\r\n[![DOI](https://zenodo.org/badge/DOI/10.1021/acs.jcim.4c01792.svg)](https://doi.org/10.1021/acs.jcim.4c01792)\r\n[![arxiv](https://img.shields.io/badge/arXiv-2412.11439-red)](https://arxiv.org/abs/2412.11439)\r\n\r\nThis is the repository of the PyTorch implementation of ChemBFN model.\r\n\r\n### Build State\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/bayesianflow-for-chem?color=ff69b4)](https://pypi.org/project/bayesianflow-for-chem/)\r\n![pytest](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/actions/workflows/pytest.yml/badge.svg)\r\n[![document](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/actions/workflows/pages/pages-build-deployment/badge.svg)](https://augus1999.github.io/bayesian-flow-network-for-chemistry/)\r\n\r\n## Features\r\n\r\nChemBFN provides the state-of-the-art functionalities of\r\n* SMILES or SELFIES-based *de novo* molecule generation\r\n* Protein sequence *de novo* generation\r\n* Template optimisation (mol2mol)\r\n* Classifier-free guidance conditional generation (single or multi-objective optimisation)\r\n* Context-guided conditional generation (inpaint)\r\n* Outstanding out-of-distribution chemical space sampling\r\n* Fast sampling via ODE solver\r\n* Molecular property and activity prediction finetuning\r\n* Reaction yield prediction finetuning\r\n\r\nin an all-in-one-model style.\r\n\r\n## News\r\n\r\n* [09/10/2025] A web app [`chembfn_webui`](https://github.com/Augus1999/ChemBFN-WebUI) for hosting ChemBFN models is available on [PyPI](https://pypi.org/project/chembfn-webui/).\r\n* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).\r\n* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).\r\n* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).\r\n* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).\r\n* [21/07/2024] Paper was submitted to arXiv.\r\n\r\n## Install\r\n\r\n```bash\r\n$ pip install -U bayesianflow_for_chem\r\n```\r\n\r\n## Usage\r\n\r\nYou can find example scripts in [\ud83d\udcc1example](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/tree/main/example) folder.\r\n\r\n## Pre-trained Model\r\n\r\nYou can find pretrained models on our [\ud83e\udd17Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).\r\n\r\n## Dataset Handling\r\n\r\nWe provide a Python class [`CSVData`](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/blob/main/bayesianflow_for_chem/data.py#L153) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.\r\n\r\n1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:\r\n```python\r\n>>> from bayesianflow_for_chem.tool import split_data\r\n\r\n>>> split_data(\"delaney-processed.csv\", method=\"scaffold\")\r\n```\r\n\r\n2. Load the split data:\r\n```python\r\n>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData\r\n\r\n>>> dataset = CSVData(\"delaney-processed_train.csv\")\r\n>>> dataset[0]\r\n{'Compound ID': ['Thiophene'], \r\n'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], \r\n'Minimum Degree': ['2'], \r\n'Molecular Weight': ['84.14299999999999'], \r\n'Number of H-Bond Donors': ['0'], \r\n'Number of Rings': ['1'], \r\n'Number of Rotatable Bonds': ['0'], \r\n'Polar Surface Area': ['0.0'], \r\n'measured log solubility in mols per litre': ['-1.33'], \r\n'smiles': ['c1ccsc1']}\r\n```\r\n\r\n3. Create a mapping function to tokenise the dataset and select values:\r\n```python\r\n>>> import torch\r\n\r\n>>> def encode(x):\r\n...   smiles = x[\"smiles\"][0]\r\n...   value = [float(i) for i in x[\"measured log solubility in mols per litre\"]]\r\n...   return {\"token\": smiles2token(smiles), \"value\": torch.tensor(value)}\r\n\r\n>>> dataset.map(encode)\r\n>>> dataset[0]\r\n{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), \r\n'value': tensor([-1.3300])}\r\n```\r\n\r\n4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:\r\n```python\r\n>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)\r\n```\r\n\r\n## Cite This Work\r\n\r\n```bibtex\r\n@article{2025chembfn,\r\n    title={Bayesian Flow Network Framework for Chemistry Tasks},\r\n    author={Tao, Nianze and Abe, Minori},\r\n    journal={Journal of Chemical Information and Modeling},\r\n    volume={65},\r\n    number={3},\r\n    pages={1178-1187},\r\n    year={2025},\r\n    doi={10.1021/acs.jcim.4c01792},\r\n}\r\n```\r\nOut-of-distribution generation:\r\n```bibtex\r\n@misc{2024chembfn_ood,\r\n    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, \r\n    author={Nianze Tao},\r\n    year={2024},\r\n    eprint={2412.11439},\r\n    archivePrefix={arXiv},\r\n    primaryClass={cs.LG},\r\n    url={https://arxiv.org/abs/2412.11439}, \r\n}\r\n```\r\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0-or-later",
    "summary": "Bayesian flow network framework for Chemistry",
    "version": "2.2.3",
    "project_urls": {
        "Homepage": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
        "Source": "https://github.com/Augus1999/bayesian-flow-network-for-chemistry"
    },
    "split_keywords": [
        "chemistry",
        " clm",
        " chembfn"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "aa2ffb2ef2f033a73534e37af1c6572f137362e862d74b67698604392bac21c2",
                "md5": "2db4edf9fe664287bde18ba28a3a0c99",
                "sha256": "e09b676b87c0227abe1a01723b82f41b170408f57e1c8dd0e65960f672b71fd6"
            },
            "downloads": -1,
            "filename": "bayesianflow_for_chem-2.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2db4edf9fe664287bde18ba28a3a0c99",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 47866,
            "upload_time": "2025-10-20T10:33:16",
            "upload_time_iso_8601": "2025-10-20T10:33:16.237269Z",
            "url": "https://files.pythonhosted.org/packages/aa/2f/fb2ef2f033a73534e37af1c6572f137362e862d74b67698604392bac21c2/bayesianflow_for_chem-2.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f43b8c7f024661c34f6fc24b6a6d34047e1fcfc3f403cd2d66c0c5913cfce094",
                "md5": "14a8d43efbe7bcf78e07c37e738564bd",
                "sha256": "cfab2d206b5e4cb74d26aef8e919d242414874c07dbcd5c0d3f838a820708375"
            },
            "downloads": -1,
            "filename": "bayesianflow_for_chem-2.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "14a8d43efbe7bcf78e07c37e738564bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 48918,
            "upload_time": "2025-10-20T10:33:18",
            "upload_time_iso_8601": "2025-10-20T10:33:18.231041Z",
            "url": "https://files.pythonhosted.org/packages/f4/3b/8c7f024661c34f6fc24b6a6d34047e1fcfc3f403cd2d66c0c5913cfce094/bayesianflow_for_chem-2.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-20 10:33:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Augus1999",
    "github_project": "bayesian-flow-network-for-chemistry",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "rdkit",
            "specs": [
                [
                    ">=",
                    "2025.3.5"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.8.0"
                ]
            ]
        },
        {
            "name": "torchao",
            "specs": [
                [
                    ">=",
                    "0.12"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    ">=",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "2.3.2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.16.1"
                ]
            ]
        },
        {
            "name": "loralib",
            "specs": [
                [
                    ">=",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "lightning",
            "specs": [
                [
                    ">=",
                    "2.5.3"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.7.1"
                ]
            ]
        }
    ],
    "lcname": "bayesianflow-for-chem"
}
        
Elapsed time: 2.09419s