<p align="center">
<img src="https://github.com/OlivierBeq/scaffound/raw/master/graphics/logo/scaffound-logo-dark-50percent.png" alt="scaffound logo" width="300px"/>
</p>
--------------------------------------------------------------------------------
A Python library for extracting multiple types of molecular scaffolds, frameworks, and wireframes.
`scaffound` provides a hierarchical approach to molecular decomposition derived from **[[1]](https://doi.org/10.1186/s13321-021-00526-y)**, allowing for a detailed analysis of chemical structures beyond the traditional Bemis-Murcko scaffold.
`scaffound` is an extended implementation of Dompé's *Molecular Anatomy* to identify different types of molecular scaffolds, frameworks and wireframes.
<p align="right">
<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"/>
</a>
</p>
# Core concepts ⚛️
The library extracts three main types of scaffolds:
- **Basic Scaffold**: The core ring systems and their linkers.
- **Decorated Scaffold**: The basic scaffold plus all heteroatoms directly attached to it by unsaturated bonds.
- **Augmented Scaffold**: The decorated scaffold plus all atoms belonging to the longest carbon chain (including substituents and side chains).
These scaffolds can be further abstracted into:
- **Frameworks**: Scaffolds made generic by replacing all heteroatoms with carbons.
- **Wireframes**: Scaffolds made both saturated (all bonds replaced with single bonds) and generic.
<p align="center">
<img src="https://github.com/OlivierBeq/scaffound/raw/master/graphics/hierarchy/scaffolds.svg" alt="scaffound hierarchy" width="800px"/>
</p>
## ⚙️ A Note on Augmented Scaffolds
The seminal algorithm for determining the augmented scaffold relies on identifying the longest path within the molecular graph.
<br/>However, the original description's method does not determine a unique path when multiple paths of the same maximum length exist and chooses one among all solutions.
> "[...] the longest atom chain, considering also substitutions, is retained but all
> terminal non-carbon atoms, belonging to side chains, are iteratively pruned (Augmented Scaffold)."
>
> "[...] three paths can be identified two of them are the longest with the same length and the first identified is retained"
This means that multiple valid paths could be chosen for the same molecule, each resulting in a different augmented scaffold.
Consequently, while `scaffound` strictly adheres to the published logic, its implementation may identify different (yet equally valid) longest paths than those in the original work.
<br/>This can lead to variations in the resulting augmented scaffolds compared to the examples in the source publication.
To address this ambiguity, `scaffound` also **implements its own deterministic canonical longest path algorithm**.
This ensures a single deterministic outcome.
# Installation 🪄
```bash
pip install scaffound
```
# Geting started 🚀
```python
# A simple usage example
from rdkit import Chem
from scaffound import MolecularAnatomy
# Create an RDKit molecule object
mol = Chem.MolFromSmiles('O=C(c1ccccc1)N1CCN(c2ccccc2)CC1')
# Analyze the molecule
anatomy = MolecularAnatomy(mol)
# Access different scaffold/framework/wireframe types
basic_scaffold = anatomy.basic_scaffold
decorated_framework = anatomy.decorated_framework
augmented_wireframe = anatomy.augmented_wireframe
# You can now work with these new molecule objects
print(Chem.MolToSmiles(basic_scaffold))
# Output: c1ccc(C2CCN(c3ccccc3)CC2)cc1
```
The `MolecularAnatomy` object also decomposes the molecule's generic and saturated graphs into all the scaffolds, frameworks and wireframes mentioned above.<br/>
<p align="center">
<img src="https://github.com/OlivierBeq/scaffound/raw/master/graphics/hierarchy/other_graphs.svg" alt="scaffound hierarchy" width="900px"/>
</p>
An entire decomposition can be accessed using its `to_dict()` or `to_pandas()` methods.<br/>
Mind you that some decompositions of the original, generic, and saturated graphs are identical (see [decomposition_equivalence.ipynb](https://github.com/OlivierBeq/Scaffound/blob/master/docs/decomposition_equivalence.ipynb)).<br/>
For instance:
- the basic framework of the original graph is the same as the basic scaffold of the generic graph,
- the basic wireframe of the original graph is the same as the basic wireframe of the generic graph,
- the basic framework of the saturated graph is the same as the decorated_framework of the saturated graph.
# Advanced usage 💪
If performance is needed, one can use functions to access only the type of scaffold/framework/wireframe needed (since the `MolecularAnatomy` decomposes a molecule ahead of time into all the possible scaffolds).
```python
from scaffound import (get_generic_graph, # All heteroatoms replaced by carbons
get_saturated_graph, # All bonds replaced by single bonds
# Scaffold types
get_basic_scaffold, get_decorated_scaffold, get_augmented_scaffold,
# Framework types
get_basic_framework, get_decorated_framework, get_augmented_framework,
# Wireframe types
get_basic_wireframe, get_decorated_wireframe, get_augmented_wireframe)
```
Furthermore, one can deactivate `scaffound`'s deterministic longest path algorithm and revert to the original with the following:
```python
from scaffound import MinMaxShortestPathOptions
opts = MinMaxShortestPathOptions(original_algorithm=True)
MolecularAnatomy(mol, opts=pts)
get_augmented_scaffold(mol, opts=opts)
get_augmented_framework(mol, opts=opts)
get_augmented_wireframe(mol, opts=opts)
```
# Validation ✅
This library has been rigorously tested against the exemplary file from the seminal scientific article that introduced these concepts.
<br/>The reference data has been corrected within this repository to ensure it aligns 100% with the paper's detailed algorithm and figures, providing a reliable and verified tool (see [tests/MODIFICATIONS.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/MODIFICATIONS.txt)).
<br/>An adapted version of this reference data is also provided ([tests/cox2_816_inhibitors_adapted_lsp.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/cox2_816_inhibitors_adapted_lsp.txt)) to reflect the results of `scaffound`'s deterministic longest path algorithm, which is also described in [tests/MODIFICATIONS.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/MODIFICATIONS.txt).
# References 📜
- [1] Manelfi, C., Gemei, M., Talarico, C. et al.<br/>
“Molecular Anatomy”: a new multi-dimensional hierarchical scaffold analysis tool.<br/>
J Cheminform 13, 54 (2021).
https://doi.org/10.1186/s13321-021-00526-y
Raw data
{
"_id": null,
"home_page": "https://github.com/OlivierBeq/scaffound",
"name": "scaffound",
"maintainer": "Olivier J. M. B\u00e9quignon",
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "\"olivier.bequignon.maintainer@gmail.com\"",
"keywords": "molecular anatomy, molecular scaffold, molecular framework, molecular wireframe, cheminformatics",
"author": "Olivier J. M. B\u00e9quignon",
"author_email": "\"olivier.bequignon.maintainer@gmail.com\"",
"download_url": "https://files.pythonhosted.org/packages/04/84/4446a41c724aaebfaa51e33887c5653b06748bc8e4faf12f5821818326f5/scaffound-0.0.1.tar.gz",
"platform": null,
"description": "\r\n<p align=\"center\">\r\n <img src=\"https://github.com/OlivierBeq/scaffound/raw/master/graphics/logo/scaffound-logo-dark-50percent.png\" alt=\"scaffound logo\" width=\"300px\"/>\r\n</p>\r\n\r\n--------------------------------------------------------------------------------\r\n\r\nA Python library for extracting multiple types of molecular scaffolds, frameworks, and wireframes.\r\n\r\n\r\n`scaffound` provides a hierarchical approach to molecular decomposition derived from **[[1]](https://doi.org/10.1186/s13321-021-00526-y)**, allowing for a detailed analysis of chemical structures beyond the traditional Bemis-Murcko scaffold.\r\n\r\n`scaffound` is an extended implementation of Domp\u00e9's *Molecular Anatomy* to identify different types of molecular scaffolds, frameworks and wireframes.\r\n\r\n<p align=\"right\">\r\n <a href=\"https://opensource.org/licenses/MIT\">\r\n <img src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License: MIT\"/>\r\n </a>\r\n</p>\r\n\r\n# Core concepts \u269b\ufe0f\r\n\r\nThe library extracts three main types of scaffolds:\r\n\r\n- **Basic Scaffold**: The core ring systems and their linkers.\r\n- **Decorated Scaffold**: The basic scaffold plus all heteroatoms directly attached to it by unsaturated bonds.\r\n- **Augmented Scaffold**: The decorated scaffold plus all atoms belonging to the longest carbon chain (including substituents and side chains).\r\n\r\nThese scaffolds can be further abstracted into:\r\n- **Frameworks**: Scaffolds made generic by replacing all heteroatoms with carbons.\r\n- **Wireframes**: Scaffolds made both saturated (all bonds replaced with single bonds) and generic.\r\n\r\n\r\n<p align=\"center\">\r\n <img src=\"https://github.com/OlivierBeq/scaffound/raw/master/graphics/hierarchy/scaffolds.svg\" alt=\"scaffound hierarchy\" width=\"800px\"/>\r\n</p>\r\n\r\n## \u2699\ufe0f A Note on Augmented Scaffolds\r\nThe seminal algorithm for determining the augmented scaffold relies on identifying the longest path within the molecular graph.\r\n<br/>However, the original description's method does not determine a unique path when multiple paths of the same maximum length exist and chooses one among all solutions.\r\n\r\n> \"[...] the longest atom chain, considering also substitutions, is retained but all\r\n> terminal non-carbon atoms, belonging to side chains, are iteratively pruned (Augmented Scaffold).\"\r\n>\r\n> \"[...] three paths can be identified two of them are the longest with the same length and the first identified is retained\"\r\n\r\nThis means that multiple valid paths could be chosen for the same molecule, each resulting in a different augmented scaffold.\r\n\r\n\r\nConsequently, while `scaffound` strictly adheres to the published logic, its implementation may identify different (yet equally valid) longest paths than those in the original work.\r\n<br/>This can lead to variations in the resulting augmented scaffolds compared to the examples in the source publication.\r\n\r\nTo address this ambiguity, `scaffound` also **implements its own deterministic canonical longest path algorithm**.\r\nThis ensures a single deterministic outcome.\r\n\r\n# Installation \ud83e\ude84\r\n\r\n```bash\r\npip install scaffound\r\n```\r\n\r\n# Geting started \ud83d\ude80\r\n\r\n```python\r\n# A simple usage example\r\nfrom rdkit import Chem\r\nfrom scaffound import MolecularAnatomy\r\n\r\n# Create an RDKit molecule object\r\nmol = Chem.MolFromSmiles('O=C(c1ccccc1)N1CCN(c2ccccc2)CC1')\r\n\r\n\r\n# Analyze the molecule\r\nanatomy = MolecularAnatomy(mol)\r\n\r\n# Access different scaffold/framework/wireframe types\r\nbasic_scaffold = anatomy.basic_scaffold\r\ndecorated_framework = anatomy.decorated_framework\r\naugmented_wireframe = anatomy.augmented_wireframe\r\n\r\n# You can now work with these new molecule objects\r\nprint(Chem.MolToSmiles(basic_scaffold))\r\n# Output: c1ccc(C2CCN(c3ccccc3)CC2)cc1\r\n```\r\n\r\nThe `MolecularAnatomy` object also decomposes the molecule's generic and saturated graphs into all the scaffolds, frameworks and wireframes mentioned above.<br/>\r\n\r\n<p align=\"center\">\r\n <img src=\"https://github.com/OlivierBeq/scaffound/raw/master/graphics/hierarchy/other_graphs.svg\" alt=\"scaffound hierarchy\" width=\"900px\"/>\r\n</p>\r\n\r\nAn entire decomposition can be accessed using its `to_dict()` or `to_pandas()` methods.<br/>\r\n\r\nMind you that some decompositions of the original, generic, and saturated graphs are identical (see [decomposition_equivalence.ipynb](https://github.com/OlivierBeq/Scaffound/blob/master/docs/decomposition_equivalence.ipynb)).<br/>\r\nFor instance:\r\n- the basic framework of the original graph is the same as the basic scaffold of the generic graph,\r\n- the basic wireframe of the original graph is the same as the basic wireframe of the generic graph,\r\n- the basic framework of the saturated graph is the same as the decorated_framework of the saturated graph.\r\n\r\n\r\n# Advanced usage \ud83d\udcaa\r\n\r\nIf performance is needed, one can use functions to access only the type of scaffold/framework/wireframe needed (since the `MolecularAnatomy` decomposes a molecule ahead of time into all the possible scaffolds). \r\n\r\n\r\n```python\r\nfrom scaffound import (get_generic_graph, # All heteroatoms replaced by carbons\r\n get_saturated_graph, # All bonds replaced by single bonds\r\n # Scaffold types\r\n get_basic_scaffold, get_decorated_scaffold, get_augmented_scaffold,\r\n # Framework types\r\n get_basic_framework, get_decorated_framework, get_augmented_framework,\r\n # Wireframe types\r\n get_basic_wireframe, get_decorated_wireframe, get_augmented_wireframe)\r\n```\r\n\r\nFurthermore, one can deactivate `scaffound`'s deterministic longest path algorithm and revert to the original with the following:\r\n\r\n```python\r\nfrom scaffound import MinMaxShortestPathOptions\r\n\r\nopts = MinMaxShortestPathOptions(original_algorithm=True)\r\n\r\nMolecularAnatomy(mol, opts=pts)\r\nget_augmented_scaffold(mol, opts=opts)\r\nget_augmented_framework(mol, opts=opts)\r\nget_augmented_wireframe(mol, opts=opts)\r\n```\r\n\r\n\r\n# Validation \u2705\r\n\r\nThis library has been rigorously tested against the exemplary file from the seminal scientific article that introduced these concepts.\r\n<br/>The reference data has been corrected within this repository to ensure it aligns 100% with the paper's detailed algorithm and figures, providing a reliable and verified tool (see [tests/MODIFICATIONS.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/MODIFICATIONS.txt)).\r\n<br/>An adapted version of this reference data is also provided ([tests/cox2_816_inhibitors_adapted_lsp.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/cox2_816_inhibitors_adapted_lsp.txt)) to reflect the results of `scaffound`'s deterministic longest path algorithm, which is also described in [tests/MODIFICATIONS.txt](https://github.com/OlivierBeq/Scaffound/blob/master/tests/MODIFICATIONS.txt).\r\n\r\n# References \ud83d\udcdc\r\n\r\n- [1] Manelfi, C., Gemei, M., Talarico, C. et al.<br/>\r\n \u201cMolecular Anatomy\u201d: a new multi-dimensional hierarchical scaffold analysis tool.<br/>\r\n J Cheminform 13, 54 (2021).\r\n https://doi.org/10.1186/s13321-021-00526-y\r\n",
"bugtrack_url": null,
"license": "\"MIT\"",
"summary": "Python implementation of Domp\u00e9's 'Molecular Anatomy'",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/OlivierBeq/scaffound"
},
"split_keywords": [
"molecular anatomy",
" molecular scaffold",
" molecular framework",
" molecular wireframe",
" cheminformatics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "03483fb68b9d98d3f16f4797475e17d50f87775338f54bfc7ffd36cc1787129a",
"md5": "05d968ae71396acabbd5b3a34ae81c87",
"sha256": "40ab80a495b11c0ead94d39cefa3626bee29ba6eae1578d4d41659823331709e"
},
"downloads": -1,
"filename": "scaffound-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "05d968ae71396acabbd5b3a34ae81c87",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 22808,
"upload_time": "2025-08-04T14:46:13",
"upload_time_iso_8601": "2025-08-04T14:46:13.963231Z",
"url": "https://files.pythonhosted.org/packages/03/48/3fb68b9d98d3f16f4797475e17d50f87775338f54bfc7ffd36cc1787129a/scaffound-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "04844446a41c724aaebfaa51e33887c5653b06748bc8e4faf12f5821818326f5",
"md5": "4246e29e55789518001765c620c619e5",
"sha256": "afbfef2be10cc2fcb1dbf604ef88dde2d6e5186bfaa942f130a9bf678573296b"
},
"downloads": -1,
"filename": "scaffound-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "4246e29e55789518001765c620c619e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 28349,
"upload_time": "2025-08-04T14:46:15",
"upload_time_iso_8601": "2025-08-04T14:46:15.027542Z",
"url": "https://files.pythonhosted.org/packages/04/84/4446a41c724aaebfaa51e33887c5653b06748bc8e4faf12f5821818326f5/scaffound-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 14:46:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "OlivierBeq",
"github_project": "scaffound",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "scaffound"
}