# ChemBFN: Bayesian Flow Network for Chemistry
[](https://doi.org/10.1021/acs.jcim.4c01792)
[](https://arxiv.org/abs/2412.11439)
This is the repository of the PyTorch implementation of ChemBFN model.
## Features
ChemBFN provides the state-of-the-art functionalities of
* SMILES or SELFIES-based *de novo* molecule generation
* Protein sequence *de novo* generation
* Classifier-free guidance conditional generation (single or multi-objective optimisation)
* Context-guided conditional generation (inpaint)
* Outstanding out-of-distribution chemical space sampling
* Fast sampling via ODE solver
* Molecular property and activity prediction finetuning
* Reaction yield prediction finetuning
in an all-in-one-model style.
## News
* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).
* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).
* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).
* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).
* [21/07/2024] Paper was submitted to arXiv.
## Install
```bash
$ pip install -U bayesianflow_for_chem
```
## Usage
You can find example scripts in [šexample](./example) folder.
## Pre-trained Model
You can find pretrained models on our [š¤Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).
## Dataset Handling
We provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.
1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:
```python
>>> from bayesianflow_for_chem.tool import split_data
>>> split_data("delaney-processed.csv", method="scaffold")
```
2. Load the split data:
```python
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData
>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'],
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'],
'Minimum Degree': ['2'],
'Molecular Weight': ['84.14299999999999'],
'Number of H-Bond Donors': ['0'],
'Number of Rings': ['1'],
'Number of Rotatable Bonds': ['0'],
'Polar Surface Area': ['0.0'],
'measured log solubility in mols per litre': ['-1.33'],
'smiles': ['c1ccsc1']}
```
3. Create a mapping function to tokenise the dataset and select values:
```python
>>> import torch
>>> def encode(x):
... smiles = x["smiles"][0]
... value = [float(i) for i in x["measured log solubility in mols per litre"]]
... return {"token": smiles2token(smiles), "value": torch.tensor(value)}
>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([ 1, 151, 23, 151, 151, 154, 151, 23, 2]),
'value': tensor([-1.3300])}
```
4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:
```python
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)
```
## Cite This Work
```bibtex
@article{2025chembfn,
title={Bayesian Flow Network Framework for Chemistry Tasks},
author={Tao, Nianze and Abe, Minori},
journal={Journal of Chemical Information and Modeling},
volume={65},
number={3},
pages={1178-1187},
year={2025},
doi={10.1021/acs.jcim.4c01792},
}
```
Out-of-distribution generation:
```bibtex
@misc{2024chembfn_ood,
title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces},
author={Nianze Tao},
year={2024},
eprint={2412.11439},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.11439},
}
```
Raw data
{
"_id": null,
"home_page": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
"name": "bayesianflow-for-chem",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "Chemistry, CLM, ChemBFN",
"author": "Nianze A. Tao",
"author_email": "tao-nianze@hiroshima-u.ac.jp",
"download_url": "https://files.pythonhosted.org/packages/63/fc/74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e/bayesianflow_for_chem-1.4.1.tar.gz",
"platform": null,
"description": "# ChemBFN: Bayesian Flow Network for Chemistry\r\n\r\n[](https://doi.org/10.1021/acs.jcim.4c01792)\r\n[](https://arxiv.org/abs/2412.11439)\r\n\r\nThis is the repository of the PyTorch implementation of ChemBFN model.\r\n\r\n## Features\r\n\r\nChemBFN provides the state-of-the-art functionalities of\r\n* SMILES or SELFIES-based *de novo* molecule generation\r\n* Protein sequence *de novo* generation\r\n* Classifier-free guidance conditional generation (single or multi-objective optimisation)\r\n* Context-guided conditional generation (inpaint)\r\n* Outstanding out-of-distribution chemical space sampling\r\n* Fast sampling via ODE solver\r\n* Molecular property and activity prediction finetuning\r\n* Reaction yield prediction finetuning\r\n\r\nin an all-in-one-model style.\r\n\r\n## News\r\n\r\n* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).\r\n* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).\r\n* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).\r\n* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).\r\n* [21/07/2024] Paper was submitted to arXiv.\r\n\r\n## Install\r\n\r\n```bash\r\n$ pip install -U bayesianflow_for_chem\r\n```\r\n\r\n## Usage\r\n\r\nYou can find example scripts in [\ud83d\udcc1example](./example) folder.\r\n\r\n## Pre-trained Model\r\n\r\nYou can find pretrained models on our [\ud83e\udd17Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).\r\n\r\n## Dataset Handling\r\n\r\nWe provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.\r\n\r\n1. Download your dataset file (e.g., ESOL from [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:\r\n```python\r\n>>> from bayesianflow_for_chem.tool import split_data\r\n\r\n>>> split_data(\"delaney-processed.csv\", method=\"scaffold\")\r\n```\r\n\r\n2. Load the split data:\r\n```python\r\n>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData\r\n\r\n>>> dataset = CSVData(\"delaney-processed_train.csv\")\r\n>>> dataset[0]\r\n{'Compound ID': ['Thiophene'], \r\n'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], \r\n'Minimum Degree': ['2'], \r\n'Molecular Weight': ['84.14299999999999'], \r\n'Number of H-Bond Donors': ['0'], \r\n'Number of Rings': ['1'], \r\n'Number of Rotatable Bonds': ['0'], \r\n'Polar Surface Area': ['0.0'], \r\n'measured log solubility in mols per litre': ['-1.33'], \r\n'smiles': ['c1ccsc1']}\r\n```\r\n\r\n3. Create a mapping function to tokenise the dataset and select values:\r\n```python\r\n>>> import torch\r\n\r\n>>> def encode(x):\r\n... smiles = x[\"smiles\"][0]\r\n... value = [float(i) for i in x[\"measured log solubility in mols per litre\"]]\r\n... return {\"token\": smiles2token(smiles), \"value\": torch.tensor(value)}\r\n\r\n>>> dataset.map(encode)\r\n>>> dataset[0]\r\n{'token': tensor([ 1, 151, 23, 151, 151, 154, 151, 23, 2]), \r\n'value': tensor([-1.3300])}\r\n```\r\n\r\n4. Wrap the dataset in <u>torch.utils.data.DataLoader</u>:\r\n```python\r\n>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)\r\n```\r\n\r\n## Cite This Work\r\n\r\n```bibtex\r\n@article{2025chembfn,\r\n title={Bayesian Flow Network Framework for Chemistry Tasks},\r\n author={Tao, Nianze and Abe, Minori},\r\n journal={Journal of Chemical Information and Modeling},\r\n volume={65},\r\n number={3},\r\n pages={1178-1187},\r\n year={2025},\r\n doi={10.1021/acs.jcim.4c01792},\r\n}\r\n```\r\nOut-of-distribution generation:\r\n```bibtex\r\n@misc{2024chembfn_ood,\r\n title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, \r\n author={Nianze Tao},\r\n year={2024},\r\n eprint={2412.11439},\r\n archivePrefix={arXiv},\r\n primaryClass={cs.LG},\r\n url={https://arxiv.org/abs/2412.11439}, \r\n}\r\n```\r\n",
"bugtrack_url": null,
"license": "AGPL-3.0-or-later",
"summary": "Bayesian flow network framework for Chemistry",
"version": "1.4.1",
"project_urls": {
"Homepage": "https://augus1999.github.io/bayesian-flow-network-for-chemistry/",
"Source": "https://github.com/Augus1999/bayesian-flow-network-for-chemistry"
},
"split_keywords": [
"chemistry",
" clm",
" chembfn"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d40d26f7dd7dc0d2d10043c21a401b86582480875cb9f1832f992c72c9bfa776",
"md5": "459b88428703052cdcb8edd3ee9fb55a",
"sha256": "0e0a7b6642c9d0e98c5d0aba316c08549dba8922aa1847096dcdd73c3e1a70a6"
},
"downloads": -1,
"filename": "bayesianflow_for_chem-1.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "459b88428703052cdcb8edd3ee9fb55a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 38714,
"upload_time": "2025-07-17T13:24:38",
"upload_time_iso_8601": "2025-07-17T13:24:38.614814Z",
"url": "https://files.pythonhosted.org/packages/d4/0d/26f7dd7dc0d2d10043c21a401b86582480875cb9f1832f992c72c9bfa776/bayesianflow_for_chem-1.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "63fc74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e",
"md5": "8f9a19b94b6da9622e0d7263e2db162d",
"sha256": "9a3407c8d4c69bc910cfe2b5e1d3560da1067a4cce731a540706d56fa2170510"
},
"downloads": -1,
"filename": "bayesianflow_for_chem-1.4.1.tar.gz",
"has_sig": false,
"md5_digest": "8f9a19b94b6da9622e0d7263e2db162d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 39144,
"upload_time": "2025-07-17T13:24:41",
"upload_time_iso_8601": "2025-07-17T13:24:41.122798Z",
"url": "https://files.pythonhosted.org/packages/63/fc/74e32b3ece389bdac107fff9dc89734a12a8e786de2f0d872d5d5af2180e/bayesianflow_for_chem-1.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 13:24:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Augus1999",
"github_project": "bayesian-flow-network-for-chemistry",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "rdkit",
"specs": [
[
">=",
"2023.9.6"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.3.1"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26.4"
]
]
},
{
"name": "loralib",
"specs": [
[
">=",
"0.1.2"
]
]
},
{
"name": "lightning",
"specs": [
[
">=",
"2.2.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
">=",
"4.8.0"
]
]
}
],
"lcname": "bayesianflow-for-chem"
}