 [](https://github.com/simmzx💤/My_Documentation)
# FARScore: Molecular Synthetic Accessibility Predictor
> Fragment Assembly autoRegressive based synthetic accessibility scorer to accelerate drug discovery
## 🎯 What Makes FARScore Different
FARScore revolutionizes synthetic accessibility prediction through **Fragment Assembly autoRegressive pretraining**. Unlike traditional approaches that directly learn synthesis patterns, FARScore first masters molecular construction fundamentals—understanding how molecules are assembled from fragments—then applies this knowledge to predict synthetic accessibility.
### Two-Stage Learning:
* **Stage 1**: Pretrain on 9.2M unlabeled molecules to learn molecular assembly patterns
* **Stage 2**: Finetune on 800K labeled molecules for synthetic accessibility prediction
This mirrors human chemical intuition: experienced chemists understand molecular construction before assessing synthetic difficulty.
## ✨ Key Features
* Easy Integration - Simple CSV input/output format
* Batch Prediction - One-click synthetic accessibility scoring
* High Accuracy - Achieves SOTA performance on multiple test sets with key metrics including accuracy, AUROC and specificity.
## 🌐 Online Service
**Instant molecular synthesis prediction in the cloud.** Simply upload your CSV file with SMILES and receive AI-powered synthetic accessibility scores in seconds.
## 🚀 Quick Start
### 1. Installation
```python
# Clone repository
git clone https://github.com/simmzx/FARScore.git
cd ../FARScore
# Create environment and install dependencies
conda create -n FARScore python=3.8
conda activate FARScore
pip install -r requirements.txt
```
### 2. Prepare Data
Create CSV file with "smiles" field:
molecule_id | smiles|
:---------: | :--------:|
Palbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C |
(+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] |
### 3. Run Prediction
CSV File Mode
```python
python farscore.py --input_file example.csv
```
Direct SMILES Mode
```python
# Single molecule
python farscore.py --smiles "CCO"
# Multiple molecules
python farscore.py --smiles "CCO" "CC(=O)O" "c1ccccc1"
```
### 4. View Results
Output file will contain FARScore values:
| molecule_id | smiles | farscore |
| :------------: |:---------------:|:-----:|
| Palbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C | 0.9453 |
| (+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] | 0.0286 |
**FARScore Interpretation:**
* Close to 1: Easy to synthesize
- Close to 0: Hard to synthesize
* Threshold 0.5: Binary classification cutoff
## 📖 Advanced Usage
Custom Pretraining and Finetuning task
### Pretrain Model
```python
python farscore_pretrain.py \
--dataset smiles.txt \
--vocab fragment.txt
```
Note: `smiles.txt` contains unlabeled molecules, `fragment.txt` is a fragment vocabulary generated by `./scripts/utils/mol/cls.py` from `smiles.txt` for fragment assembly autoregressive pretrain.
### Finetune Model
```python
python farscore_finetune.py \
--input_model_file gnn_pretrained.pth \
--dataset dataset.csv
```
Note: `gnn_pretrained.pth` is a model saved in pretraining stage, `dataset.csv` contains labeled molecules for finetune on specific downstream task.
## 🔧 Requirements
* Python 3.8-3.10
* CUDA-enabled GPU (recommended)
* Key dependencies: PyTorch, RDKit, DGL, DeepChem
## 📄 Citation
If this program is useful to you, please cite our paper:
## :email: Contact
For questions, please contact: Xiang Zhang (Email: zhangxiang@simm.ac.cn)
______________________________________________________________________________________________________
🌟 **Like this project? Give us a Star**
Raw data
{
"_id": null,
"home_page": "https://github.com/simmzx/FARScore",
"name": "farscore",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "chemistry, molecular, synthesizability, deep learning, graph neural networks, cheminformatics, drug discovery, SMILES",
"author": "Xiang Zhang",
"author_email": "776206454@qq.com",
"download_url": "https://files.pythonhosted.org/packages/d9/b8/fc8ebabb0519ec1f3941e6fb34322cf07facc583d0b6ebdd6235d000cd9a/farscore-1.0.0.tar.gz",
"platform": "any",
"description": " [](https://github.com/simmzx\ud83d\udca4/My_Documentation)\r\n\r\n# FARScore: Molecular Synthetic Accessibility Predictor\r\n> Fragment Assembly autoRegressive based synthetic accessibility scorer to accelerate drug discovery\r\n## \ud83c\udfaf What Makes FARScore Different\r\nFARScore revolutionizes synthetic accessibility prediction through **Fragment Assembly autoRegressive pretraining**. Unlike traditional approaches that directly learn synthesis patterns, FARScore first masters molecular construction fundamentals\u2014understanding how molecules are assembled from fragments\u2014then applies this knowledge to predict synthetic accessibility.\r\n### Two-Stage Learning:\r\n* **Stage 1**: Pretrain on 9.2M unlabeled molecules to learn molecular assembly patterns\r\n* **Stage 2**: Finetune on 800K labeled molecules for synthetic accessibility prediction\r\n\r\nThis mirrors human chemical intuition: experienced chemists understand molecular construction before assessing synthetic difficulty.\r\n\r\n## \u2728 Key Features\r\n* Easy Integration - Simple CSV input/output format\r\n* Batch Prediction - One-click synthetic accessibility scoring\r\n* High Accuracy - Achieves SOTA performance on multiple test sets with key metrics including accuracy, AUROC and specificity.\r\n\r\n## \ud83c\udf10 Online Service\r\n**Instant molecular synthesis prediction in the cloud.** Simply upload your CSV file with SMILES and receive AI-powered synthetic accessibility scores in seconds.\r\n\r\n## \ud83d\ude80 Quick Start\r\n### 1. Installation\r\n```python\r\n # Clone repository\r\n git clone https://github.com/simmzx/FARScore.git\r\n cd ../FARScore\r\n\r\n # Create environment and install dependencies\r\n conda create -n FARScore python=3.8\r\n conda activate FARScore\r\n pip install -r requirements.txt\r\n```\r\n### 2. Prepare Data\r\nCreate CSV file with \"smiles\" field:\r\nmolecule_id | smiles|\r\n:---------: | :--------:|\r\nPalbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C |\r\n(+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] |\r\n### 3. Run Prediction\r\nCSV File Mode\r\n```python\r\n python farscore.py --input_file example.csv\r\n```\r\nDirect SMILES Mode\r\n```python\r\n # Single molecule\r\n python farscore.py --smiles \"CCO\"\r\n # Multiple molecules\r\n python farscore.py --smiles \"CCO\" \"CC(=O)O\" \"c1ccccc1\"\r\n```\r\n### 4. View Results\r\nOutput file will contain FARScore values:\r\n| molecule_id | smiles | farscore |\r\n| :------------: |:---------------:|:-----:|\r\n| Palbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C | 0.9453 |\r\n| (+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] | 0.0286 |\r\n\r\n**FARScore Interpretation:**\r\n* Close to 1: Easy to synthesize\r\n- Close to 0: Hard to synthesize\r\n* Threshold 0.5: Binary classification cutoff\r\n\r\n## \ud83d\udcd6 Advanced Usage\r\nCustom Pretraining and Finetuning task\r\n### Pretrain Model\r\n```python\r\n python farscore_pretrain.py \\\r\n --dataset smiles.txt \\\r\n --vocab fragment.txt \r\n```\r\nNote: `smiles.txt` contains unlabeled molecules, `fragment.txt` is a fragment vocabulary generated by `./scripts/utils/mol/cls.py` from `smiles.txt` for fragment assembly autoregressive pretrain.\r\n\r\n### Finetune Model\r\n```python\r\n python farscore_finetune.py \\\r\n --input_model_file gnn_pretrained.pth \\\r\n --dataset dataset.csv\r\n```\r\nNote: `gnn_pretrained.pth` is a model saved in pretraining stage, `dataset.csv` contains labeled molecules for finetune on specific downstream task.\r\n\r\n## \ud83d\udd27 Requirements\r\n* Python 3.8-3.10\r\n* CUDA-enabled GPU (recommended)\r\n* Key dependencies: PyTorch, RDKit, DGL, DeepChem\r\n\r\n## \ud83d\udcc4 Citation\r\nIf this program is useful to you, please cite our paper:\r\n\r\n\r\n## :email: Contact\r\nFor questions, please contact: Xiang Zhang (Email: zhangxiang@simm.ac.cn)\r\n______________________________________________________________________________________________________\r\n\ud83c\udf1f **Like this project? Give us a Star**\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "FARScore: A Synthetic Accseeibility Predictor based Fragment Assembly autoRegressive pretrain",
"version": "1.0.0",
"project_urls": {
"Bug Reports": "https://github.com/simmzx/FARScore/issues",
"Documentation": "https://github.com/simmzx/FARScore/docs",
"Homepage": "https://github.com/simmzx/FARScore",
"Source": "https://github.com/simmzx/FARScore"
},
"split_keywords": [
"chemistry",
" molecular",
" synthesizability",
" deep learning",
" graph neural networks",
" cheminformatics",
" drug discovery",
" smiles"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "28f99cb3905b1c142b7dd692b2297e461629bc183aa3800e97b2e8873b1f4cb5",
"md5": "ba98cb6cf15fdee811a7dde13b6f1cce",
"sha256": "456d7a1e3e40c4bb52ed1dfd950be786ccf48e1a3240ae2220208f2a68ae64fc"
},
"downloads": -1,
"filename": "farscore-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ba98cb6cf15fdee811a7dde13b6f1cce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 14521882,
"upload_time": "2025-07-24T09:40:34",
"upload_time_iso_8601": "2025-07-24T09:40:34.230398Z",
"url": "https://files.pythonhosted.org/packages/28/f9/9cb3905b1c142b7dd692b2297e461629bc183aa3800e97b2e8873b1f4cb5/farscore-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d9b8fc8ebabb0519ec1f3941e6fb34322cf07facc583d0b6ebdd6235d000cd9a",
"md5": "45d47f9b83a78783622c589d5714349c",
"sha256": "3ab6b9cda7a1014ac6d339fda29bd51ade9efa0d859322881c1680191b02ed68"
},
"downloads": -1,
"filename": "farscore-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "45d47f9b83a78783622c589d5714349c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 14498134,
"upload_time": "2025-07-24T09:40:37",
"upload_time_iso_8601": "2025-07-24T09:40:37.251016Z",
"url": "https://files.pythonhosted.org/packages/d9/b8/fc8ebabb0519ec1f3941e6fb34322cf07facc583d0b6ebdd6235d000cd9a/farscore-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 09:40:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simmzx",
"github_project": "FARScore",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"1.12.0"
]
]
},
{
"name": "torch-cluster",
"specs": [
[
">=",
"1.6.0"
]
]
},
{
"name": "torch-geometric",
"specs": [
[
">=",
"2.3.0"
]
]
},
{
"name": "torch-scatter",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "torch-sparse",
"specs": [
[
">=",
"0.6.15"
]
]
},
{
"name": "torch-spline-conv",
"specs": [
[
">=",
"1.2.1"
]
]
},
{
"name": "torchmetrics",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "dgl",
"specs": [
[
">=",
"0.6.1"
]
]
},
{
"name": "dgllife",
"specs": [
[
">=",
"0.2.9"
]
]
},
{
"name": "rdkit",
"specs": [
[
">=",
"2022.3.0"
]
]
},
{
"name": "deepchem",
"specs": [
[
">=",
"2.6.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"1.3.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.12.0"
]
]
},
{
"name": "pillow",
"specs": [
[
">=",
"9.0.0"
]
]
},
{
"name": "openpyxl",
"specs": [
[
">=",
"3.1.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.60.0"
]
]
},
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "numba",
"specs": [
[
">=",
"0.57.0"
]
]
}
],
"lcname": "farscore"
}