ScaffoldGraphDG


NameScaffoldGraphDG JSON
Version 1.1.8 PyPI version JSON
download
home_pagehttps://github.com/UCLCheminformatics/scaffoldgraph
SummaryScaffoldGraph is an open-source cheminformatics library, built using RDKit and NetworkX for generating scaffold networks and scaffold trees.
upload_time2023-01-20 07:33:49
maintainer
docs_urlNone
authorOliver Scott
requires_python>=3.6
licenseMIT
keywords rdkit networkx cheminformatics scaffolds scaffold tree scaffold network
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            [![Conda](https://anaconda.org/uclcheminformatics/scaffoldgraph/badges/installer/conda.svg)](https://anaconda.org/UCLCheminformatics/scaffoldgraph)
[![Anaconda](https://anaconda.org/uclcheminformatics/scaffoldgraph/badges/version.svg)](https://anaconda.org/UCLCheminformatics/scaffoldgraph)
[![Release](https://img.shields.io/pypi/v/scaffoldgraph.svg?style=flat-square)](https://github.com/UCLCheminformatics/ScaffoldGraph/releases)
[![Build Status](https://travis-ci.org/UCLCheminformatics/ScaffoldGraph.svg?branch=master)](https://travis-ci.org/UCLCheminformatics/ScaffoldGraph)
[![Contributing](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/uclcheminformatics/scaffoldgraph#contributing)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/UCLCheminformatics/ScaffoldGraph/blob/master/LICENSE)
[![DOI](https://img.shields.io/badge/DOI-10.1093%2Fbioinformatics%2Fbtaa219-blue)](https://doi.org/10.1093/bioinformatics/btaa219)

# ⌬ ScaffoldGraph  ⌬

**ScaffoldGraph** is an open-source cheminformatics library, built using [RDKit](https://www.rdkit.org/) and
[NetworkX](https://networkx.github.io/), for the generation and analysis of scaffold networks and scaffold trees.

<p align="center">
    <img width="80%", src="https://raw.githubusercontent.com/UCLCheminformatics/ScaffoldGraph/main/img/scaffoldgraph.jpg" />
</p>

[Features](https://github.com/UCLCheminformatics/ScaffoldGraph#features) | 
[Installation](https://github.com/UCLCheminformatics/ScaffoldGraph#installation) |
[Quick-start](https://github.com/UCLCheminformatics/ScaffoldGraph#quick-start) |
[Examples](https://github.com/UCLCheminformatics/ScaffoldGraph/tree/master/examples) |
[Contributing](https://github.com/UCLCheminformatics/ScaffoldGraph#contributing) |
[References](https://github.com/UCLCheminformatics/ScaffoldGraph#references) |
[Citation](https://github.com/UCLCheminformatics/ScaffoldGraph#citation)

## Features

* **Scaffold Network generation** (Varin, 2011)
    * Explore scaffold-space through the iterative removal of available rings, generating all possible sub-scaffolds
      for a set of input molecules. The output is a directed acyclic graph of molecular scaffolds
* **HierS Network Generation** (Wilkens, 2005)
    * Explore scaffold-space through the iterative removal of available rings, generating all possible sub-scaffolds 
      without dissecting fused ring-systems
* **Scaffold Tree generation** (Schuffenhauer, 2007)
    * Explore scaffold-space through the iterative removal of the least-characteristic ring from a molecular scaffold.
      The output is a tree of molecular scaffolds
* **Murcko Fragment generation** (Bemis, 1996)
    * Generate a set of murcko fragments for a molecule through the iterative removal of available rings.
* **Compound Set Enrichment** (Varin, 2010, 2011)
    * Identify active chemical series from primary screening data

### Comparison to existing software

* Scaffold Network Generator (SNG) (Matlock 2013)
* Scaffold Hunter (SH) (Wetzel, 2009)
* Scaffold Tree Generator (STG) (SH CLI predecessor)

|                                      | SG          | SNG         | SH            | STG         |
|--------------------------------------|-------------|-------------|---------------|-------------|
| Computes Scaffold Networks           | X           | X           | -             | -           |
| Computes HierS Networks              | X           | -           | -             | -           |
| Computes Scaffold Trees              | X           | X           | X             | X           |
| Command Line Interface               | X           | X           | -             | X           |
| Graphical Interface                  | - `*`       | -           | X             | -           |
| Accessible Library                   | X           | -           | -             | -           |
| Results can be computed in parallel  | X           | X           | -             | -           |
| Benchmark for 150,000 molecules `**` | 15m 25s     | 27m 6s      | -             | -           |
| Limit on input molecules             | N/A `***`   | 10,000,000  | 200,000 `****`| 10,000,000  |

`*` While ScaffoldGraph has no explicit GUI, it contains functions for interactive scaffoldgraph visualization.

`**` Tests performed on an Intel Core i7-6700 @ 3.4 GHz with 32GB of RAM, without parallel processing. I could not find 
the code for STG and do not intend to search for it, SNG report that both itself and SH are both faster in the
benchmark test.

`***` Limited by available memory

`****` Graphical interface has an upper limit of 2,000 scaffolds

--------------------------------------------------------------------------------

## Installation

- ScaffoldGraph currently supports Python 3.6 and above.

### Install with conda (recommended)
```
conda config --add channels conda-forge
conda install -c uclcheminformatics scaffoldgraph
```
### Install with pip
```
# Basic installation.
pip install scaffoldgraph

# Install with ipycytoscape.
pip install scaffoldgraph[vis]

# Install with rdkit-pypi (Linux, MacOS).
pip install scaffoldgraph[rdkit]

# Install with all optional packages. 
pip install scaffoldgraph[rdkit, vis]
```
__Warning__: rdkit cannot be installed with pip, so must be installed through [other means]('https://www.rdkit.org/docs/Install.html')

__Update (17/06/21)__: rdkit can now be installed through the [rdkit-pypi](https://pypi.org/project/rdkit-pypi/) wheels for
Linux and MacOS, and can be installed alongside ScaffoldGraph optionally (see above instructions).  

__Update (16/11/21)__: Jupyter lab users may also need to follow the extra installation instructions 
[here](https://github.com/cytoscape/ipycytoscape#for-jupyterlab-1x-or-2x) / [here](https://ipycytoscape.readthedocs.io/en/latest/installing.html) 
when using the ipycytoscape visualisation utility.


--------------------------------------------------------------------------------

## Quick Start

### CLI usage

The ScaffoldGraph CLI is almost analogous to SNG consisting of a two step process (Generate --> Aggregate).

ScaffoldGraph can be invoked from the command-line using the following command:

```console
$ scaffoldgraph <command> <input-file> <options>
```
Where "command" is one of: tree, network, hiers, aggregate or select. 

- #### Generating Scaffold Networks/Trees
    
    The first step of the process is to generate an intermediate scaffold graph. The generation commands
    are: network, hiers and tree
    
    For example, if a user would like to generate a network from two files:
    
    ```console
    $ ls
    file_1.sdf  file_2.sdf
    ```
    
    They would first use the commands:
    
    ```console
    $ scaffoldgraph network file_1.sdf file_1.tmp
    $ scaffoldgraph network file_2.sdf file_2.tmp
    ```
    
    Further options:
    
    ```
    --max-rings, -m : ignore molecules with # rings > N (default: 10)
    --flatten-isotopes -i : remove specific isotopes
    --keep-largest-fragment -f : only process the largest disconnected fragment
    --discharge-and-deradicalize -d : remove charges and radicals from scaffolds 
    ```
    
- #### Aggregating Scaffold Graphs

    The second step of the process is aggregating the temporary files into a combined graph representation.
    
    ```console
    $ scaffoldgraph aggregate file_1.tmp file_2.tmp file.tsv
    ```
    
    The final network is now available in 'file.tsv'. Output formats are explained below.
    
    Further options:
    
    ```
    --map-mols, -m  <file>   : generate a file mapping molecule IDs to scaffold IDs 
    --map-annotations <file> : generate a file mapping scaffold IDs to annotations
    --sdf                    : write the output as an SDF file
    ```
    

- #### Selecting Subsets

    ScaffoldGraph allows a user to select a subset of a scaffold network or tree using a molecule-based query,
    i.e. selecting only scaffolds for molecules of interest.
     
    This command can only be performed on an aggregated graph (Not SDF).
    
    ```console
    $ scaffoldgraph select <graph input-file> <input molecules> <output-file> <options>
    ```
    
    Options:
    
    ```
    <graph input-file>   : A TSV graph constructed using the aggregate command
    <input molecules>    : Input query file (SDF, SMILES)
    <output-file>        : Write results to specified file
    --sdf                : Write the output as an SDF file
    ```

- #### Input Formats

    ScaffoldGraphs CLI utility supports input files in the SMILES and SDF formats. Other file formats can be converted
    using [OpenBabel](http://openbabel.org/wiki/Main_Page).

    - ##### Smiles Format:
    
    ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule
    identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.
        
    Example SMILES file:
        
    ```csv
    CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N   CHEMBL4116520
    CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3   CHEMBL3990718
    CN(C\C=C\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23  CHEMBL4116665
    N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4  CHEMBL4116261
    ...
    ```
    
    - ##### SDF Format:
    
    ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule
    identifier is specified in the title line. If the title line is blank, then a hash of the molecule
    will be used as an identifier.
       
    Note: selecting subsets of a graph will not be possible if a name is not supplied 
        
- #### Output Formats

    - ##### TSV Format (default)
    
    The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:
        
    1) Number of rings (hierarchy)
    2) Scaffold SMILES
    3) Sub-scaffold SMILES
    4) Molecule ID(s) (top-level scaffolds (Murcko))

    The aggregate command produces a tsv containing 4 columns
        
    1) Scaffold ID
    2) Number of rings (hierarchy)
    3) Scaffold SMILES
    4) Sub-scaffold IDs
    
    - ##### SDF Format
    
    An SDF file can be produced by the aggregate and select commands. This SDF is 
    formatted according to the SDF specification with added property fields:
        
    1) TITLE field = scaffold ID
    2) SUBSCAFFOLDS field = list of sub-scaffold IDs
    3) HIERARCHY field = number of rings
    4) SMILES field = scaffold canonical SMILES   
  
  
--------------------------------------------------------------------------------

### Library usage

ScaffoldGraph makes it simple to construct a graph using the library API.
The resultant graphs follow the same API as a NetworkX DiGraph.

Some [example](https://github.com/UCLCheminformatics/ScaffoldGraph/tree/master/examples) 
notebooks can be found in the 'examples' directory.

```python
import scaffoldgraph as sg

# construct a scaffold network from an SDF file
network = sg.ScaffoldNetwork.from_sdf('my_sdf_file.sdf')

# construct a scaffold tree from a SMILES file
tree = sg.ScaffoldTree.from_smiles('my_smiles_file.smi')

# construct a scaffold tree from a pandas dataframe
import pandas as pd
df = pd.read_csv('activity_data.csv')
network = sg.ScaffoldTree.from_dataframe(
    df, smiles_column='Smiles', name_column='MolID',
    data_columns=['pIC50', 'MolWt'], progress=True,
)
```


--------------------------------------------------------------------------------


## Advanced Usage

- **Multi-processing**
    
    It is simple to construct a graph from multiple input source in parallel,
    using the concurrent.futures module and the sg.utils.aggregate function.
    
  ```python
  from concurrent.futures import ProcessPoolExecutor
  from functools import partial
  import scaffoldgraph as sg
  import os
      
  directory = './data'
  sdf_files = [f for f in os.listdir(directory) if f.endswith('.sdf')]
      
  func = partial(sg.ScaffoldNetwork.from_sdf, ring_cutoff=10)
        
  graphs = []
  with ProcessPoolExecutor(max_workers=4) as executor:
      futures = executor.map(func, sdf_files)
      for future in futures:
          graphs.append(future)
        
  network = sg.utils.aggregate(graphs)
  ```
    
- **Creating custom scaffold prioritisation rules**

    If required a user can define their own rules for prioritizing scaffolds during scaffold tree construction.
    Rules can be defined by subclassing one of four rule classes:
    
    BaseScaffoldFilterRule, ScaffoldFilterRule, ScaffoldMinFilterRule or ScaffoldMaxFilterRule
    
    When subclassing a name property must be defined and either a condition, get_property or filter function.
    Examples are shown below:
    
  ```python
  import scaffoldgraph as sg
  from scaffoldgraph.prioritization import *
    
  """
  Scaffold filter rule (must implement name and condition)
  The filter will retain all scaffolds which return a True condition
  """
  
  class CustomRule01(ScaffoldFilterRule):
      """Do not remove rings with >= 12 atoms if there are smaller rings to remove"""
  
      def condition(self, child, parent):
          removed_ring = child.rings[parent.removed_ring_idx]
          return removed_ring.size < 12
            
      @property
      def name(self):
          return 'custom rule 01'
          
  """
  Scaffold min/max filter rule (must implement name and get_property)
  The filter will retain all scaffolds with the min/max property value
  """
    
  class CustomRule02(ScaffoldMinFilterRule):
      """Smaller rings are removed first"""
    
      def get_property(self, child, parent):
          return child.rings[parent.removed_ring_idx].size
            
      @property
      def name(self):
          return 'custom rule 02'
        
      
  """
  Scaffold base filter rule (must implement name and filter)
  The filter method must return a list of filtered parent scaffolds
  This rule is used when a more complex rule is required, this example
  defines a tiebreaker rule. Only one scaffold must be left at the end
  of all filter rules in a rule set
  """
    
  class CustomRule03(BaseScaffoldFilterRule):
      """Tie-breaker rule (alphabetical)"""
    
      def filter(self, child, parents):
          return [sorted(parents, key=lambda p: p.smiles)[0]]
    
      @property
      def name(self):
          return 'custom rule 03'  
  ```
    
   Custom rules can subsequently be added to a rule set and supplied to the scaffold tree constructor:
    
   ```python
  ruleset = ScaffoldRuleSet(name='custom rules')
  ruleset.add_rule(CustomRule01())
  ruleset.add_rule(CustomRule02())
  ruleset.add_rule(CustomRule03())
    
  graph = sg.ScaffoldTree.from_sdf('my_sdf_file.sdf', prioritization_rules=ruleset)
  ```

--------------------------------------------------------------------------------

## Contributing

Contributions to ScaffoldGraph will most likely fall into the following categories:

1. Implementing a new Feature:
    * New Features that fit into the scope of this package will be accepted. If you are unsure about the 
      idea/design/implementation, feel free to post an issue.
2. Fixing a Bug:
    * Bug fixes are welcomed, please send a Pull Request each time a bug is encountered. When sending a Pull
      Request please provide a clear description of the encountered bug. If unsure feel free to post an issue

Please send Pull Requests to: 
http://github.com/UCLCheminformatics/ScaffoldGraph

### Testing

ScaffoldGraphs testing is located under `test/`. Run all tests using:

```
$ python setup.py test
```

or run an individual test: `pytest --no-cov tests/core`

When contributing new features please include appropriate test files

### Continuous Integration

ScaffoldGraph uses Travis CI for continuous integration

--------------------------------------------------------------------------------

## References

* Bemis, G. W. and Murcko, M. A. (1996). The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39(15), 2887–2893.
* Matlock, M., Zaretzki, J., Swamidass, J. S. (2013). Scaffold network generator: a tool for mining molecular structures. Bioinformatics, 29(20), 2655-2656
* Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., and Waldmann, H. (2007). The scaffold tree visualization of the scaffold universe by hierarchical scaffold classification. Journal of Chemical Information and Modeling, 47(1), 47–58. PMID: 17238248.
* Varin, T., Schuffenhauer, A., Ertl, P., and Renner, S. (2011). Mining for bioactive scaffolds with scaffold networks: Improved compound set enrichment from primary screening data. Journal of Chemical Information and Modeling, 51(7), 1528–1538.
* Varin, T., Gubler, H., Parker, C., Zhang, J., Raman, P., Ertl, P. and Schuffenhauer, A. (2010) Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data. Journal of Chemical Information and Modeling, 50(12), 2067-2078.
* Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875(8), 581–583.
* Wilkens, J., Janes, J. and Su, A. (2005). HierS:  Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry, 48(9), 3182-3193.

---------------------------------------------------------------------------------

## Citation

If you use this software in your own work please cite our [paper](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa219/5814205),
and the respective papers of the methods used.

```
@article{10.1093/bioinformatics/btaa219,
    author = {Scott, Oliver B and Chan, A W Edith},
    title = "{ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees}",
    journal = {Bioinformatics},
    year = {2020},
    month = {03},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa219},
    url = {https://doi.org/10.1093/bioinformatics/btaa219},
    note = {btaa219}
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa219/32984904/btaa219.pdf},
}
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/UCLCheminformatics/scaffoldgraph",
    "name": "ScaffoldGraphDG",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "rdkit,networkx,cheminformatics,scaffolds,scaffold tree,scaffold network",
    "author": "Oliver Scott",
    "author_email": "oliver.scott.17@ucl.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/bf/53/27273d700dd6e151b6d1aa6e5ba5369905c78fa52cd5668ea5b5f83f1fbb/ScaffoldGraphDG-1.1.8.tar.gz",
    "platform": null,
    "description": "[![Conda](https://anaconda.org/uclcheminformatics/scaffoldgraph/badges/installer/conda.svg)](https://anaconda.org/UCLCheminformatics/scaffoldgraph)\r\n[![Anaconda](https://anaconda.org/uclcheminformatics/scaffoldgraph/badges/version.svg)](https://anaconda.org/UCLCheminformatics/scaffoldgraph)\r\n[![Release](https://img.shields.io/pypi/v/scaffoldgraph.svg?style=flat-square)](https://github.com/UCLCheminformatics/ScaffoldGraph/releases)\r\n[![Build Status](https://travis-ci.org/UCLCheminformatics/ScaffoldGraph.svg?branch=master)](https://travis-ci.org/UCLCheminformatics/ScaffoldGraph)\r\n[![Contributing](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/uclcheminformatics/scaffoldgraph#contributing)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/UCLCheminformatics/ScaffoldGraph/blob/master/LICENSE)\r\n[![DOI](https://img.shields.io/badge/DOI-10.1093%2Fbioinformatics%2Fbtaa219-blue)](https://doi.org/10.1093/bioinformatics/btaa219)\r\n\r\n# &#9004; ScaffoldGraph  &#9004;\r\n\r\n**ScaffoldGraph** is an open-source cheminformatics library, built using [RDKit](https://www.rdkit.org/) and\r\n[NetworkX](https://networkx.github.io/), for the generation and analysis of scaffold networks and scaffold trees.\r\n\r\n<p align=\"center\">\r\n    <img width=\"80%\", src=\"https://raw.githubusercontent.com/UCLCheminformatics/ScaffoldGraph/main/img/scaffoldgraph.jpg\" />\r\n</p>\r\n\r\n[Features](https://github.com/UCLCheminformatics/ScaffoldGraph#features) | \r\n[Installation](https://github.com/UCLCheminformatics/ScaffoldGraph#installation) |\r\n[Quick-start](https://github.com/UCLCheminformatics/ScaffoldGraph#quick-start) |\r\n[Examples](https://github.com/UCLCheminformatics/ScaffoldGraph/tree/master/examples) |\r\n[Contributing](https://github.com/UCLCheminformatics/ScaffoldGraph#contributing) |\r\n[References](https://github.com/UCLCheminformatics/ScaffoldGraph#references) |\r\n[Citation](https://github.com/UCLCheminformatics/ScaffoldGraph#citation)\r\n\r\n## Features\r\n\r\n* **Scaffold Network generation** (Varin, 2011)\r\n    * Explore scaffold-space through the iterative removal of available rings, generating all possible sub-scaffolds\r\n      for a set of input molecules. The output is a directed acyclic graph of molecular scaffolds\r\n* **HierS Network Generation** (Wilkens, 2005)\r\n    * Explore scaffold-space through the iterative removal of available rings, generating all possible sub-scaffolds \r\n      without dissecting fused ring-systems\r\n* **Scaffold Tree generation** (Schuffenhauer, 2007)\r\n    * Explore scaffold-space through the iterative removal of the least-characteristic ring from a molecular scaffold.\r\n      The output is a tree of molecular scaffolds\r\n* **Murcko Fragment generation** (Bemis, 1996)\r\n    * Generate a set of murcko fragments for a molecule through the iterative removal of available rings.\r\n* **Compound Set Enrichment** (Varin, 2010, 2011)\r\n    * Identify active chemical series from primary screening data\r\n\r\n### Comparison to existing software\r\n\r\n* Scaffold Network Generator (SNG) (Matlock 2013)\r\n* Scaffold Hunter (SH) (Wetzel, 2009)\r\n* Scaffold Tree Generator (STG) (SH CLI predecessor)\r\n\r\n|                                      | SG          | SNG         | SH            | STG         |\r\n|--------------------------------------|-------------|-------------|---------------|-------------|\r\n| Computes Scaffold Networks           | X           | X           | -             | -           |\r\n| Computes HierS Networks              | X           | -           | -             | -           |\r\n| Computes Scaffold Trees              | X           | X           | X             | X           |\r\n| Command Line Interface               | X           | X           | -             | X           |\r\n| Graphical Interface                  | - `*`       | -           | X             | -           |\r\n| Accessible Library                   | X           | -           | -             | -           |\r\n| Results can be computed in parallel  | X           | X           | -             | -           |\r\n| Benchmark for 150,000 molecules `**` | 15m 25s     | 27m 6s      | -             | -           |\r\n| Limit on input molecules             | N/A `***`   | 10,000,000  | 200,000 `****`| 10,000,000  |\r\n\r\n`*` While ScaffoldGraph has no explicit GUI, it contains functions for interactive scaffoldgraph visualization.\r\n\r\n`**` Tests performed on an Intel Core i7-6700 @ 3.4 GHz with 32GB of RAM, without parallel processing. I could not find \r\nthe code for STG and do not intend to search for it, SNG report that both itself and SH are both faster in the\r\nbenchmark test.\r\n\r\n`***` Limited by available memory\r\n\r\n`****` Graphical interface has an upper limit of 2,000 scaffolds\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n## Installation\r\n\r\n- ScaffoldGraph currently supports Python 3.6 and above.\r\n\r\n### Install with conda (recommended)\r\n```\r\nconda config --add channels conda-forge\r\nconda install -c uclcheminformatics scaffoldgraph\r\n```\r\n### Install with pip\r\n```\r\n# Basic installation.\r\npip install scaffoldgraph\r\n\r\n# Install with ipycytoscape.\r\npip install scaffoldgraph[vis]\r\n\r\n# Install with rdkit-pypi (Linux, MacOS).\r\npip install scaffoldgraph[rdkit]\r\n\r\n# Install with all optional packages. \r\npip install scaffoldgraph[rdkit, vis]\r\n```\r\n__Warning__: rdkit cannot be installed with pip, so must be installed through [other means]('https://www.rdkit.org/docs/Install.html')\r\n\r\n__Update (17/06/21)__: rdkit can now be installed through the [rdkit-pypi](https://pypi.org/project/rdkit-pypi/) wheels for\r\nLinux and MacOS, and can be installed alongside ScaffoldGraph optionally (see above instructions).  \r\n\r\n__Update (16/11/21)__: Jupyter lab users may also need to follow the extra installation instructions \r\n[here](https://github.com/cytoscape/ipycytoscape#for-jupyterlab-1x-or-2x) / [here](https://ipycytoscape.readthedocs.io/en/latest/installing.html) \r\nwhen using the ipycytoscape visualisation utility.\r\n\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n## Quick Start\r\n\r\n### CLI usage\r\n\r\nThe ScaffoldGraph CLI is almost analogous to SNG consisting of a two step process (Generate --> Aggregate).\r\n\r\nScaffoldGraph can be invoked from the command-line using the following command:\r\n\r\n```console\r\n$ scaffoldgraph <command> <input-file> <options>\r\n```\r\nWhere \"command\" is one of: tree, network, hiers, aggregate or select. \r\n\r\n- #### Generating Scaffold Networks/Trees\r\n    \r\n    The first step of the process is to generate an intermediate scaffold graph. The generation commands\r\n    are: network, hiers and tree\r\n    \r\n    For example, if a user would like to generate a network from two files:\r\n    \r\n    ```console\r\n    $ ls\r\n    file_1.sdf  file_2.sdf\r\n    ```\r\n    \r\n    They would first use the commands:\r\n    \r\n    ```console\r\n    $ scaffoldgraph network file_1.sdf file_1.tmp\r\n    $ scaffoldgraph network file_2.sdf file_2.tmp\r\n    ```\r\n    \r\n    Further options:\r\n    \r\n    ```\r\n    --max-rings, -m : ignore molecules with # rings > N (default: 10)\r\n    --flatten-isotopes -i : remove specific isotopes\r\n    --keep-largest-fragment -f : only process the largest disconnected fragment\r\n    --discharge-and-deradicalize -d : remove charges and radicals from scaffolds \r\n    ```\r\n    \r\n- #### Aggregating Scaffold Graphs\r\n\r\n    The second step of the process is aggregating the temporary files into a combined graph representation.\r\n    \r\n    ```console\r\n    $ scaffoldgraph aggregate file_1.tmp file_2.tmp file.tsv\r\n    ```\r\n    \r\n    The final network is now available in 'file.tsv'. Output formats are explained below.\r\n    \r\n    Further options:\r\n    \r\n    ```\r\n    --map-mols, -m  <file>   : generate a file mapping molecule IDs to scaffold IDs \r\n    --map-annotations <file> : generate a file mapping scaffold IDs to annotations\r\n    --sdf                    : write the output as an SDF file\r\n    ```\r\n    \r\n\r\n- #### Selecting Subsets\r\n\r\n    ScaffoldGraph allows a user to select a subset of a scaffold network or tree using a molecule-based query,\r\n    i.e. selecting only scaffolds for molecules of interest.\r\n     \r\n    This command can only be performed on an aggregated graph (Not SDF).\r\n    \r\n    ```console\r\n    $ scaffoldgraph select <graph input-file> <input molecules> <output-file> <options>\r\n    ```\r\n    \r\n    Options:\r\n    \r\n    ```\r\n    <graph input-file>   : A TSV graph constructed using the aggregate command\r\n    <input molecules>    : Input query file (SDF, SMILES)\r\n    <output-file>        : Write results to specified file\r\n    --sdf                : Write the output as an SDF file\r\n    ```\r\n\r\n- #### Input Formats\r\n\r\n    ScaffoldGraphs CLI utility supports input files in the SMILES and SDF formats. Other file formats can be converted\r\n    using [OpenBabel](http://openbabel.org/wiki/Main_Page).\r\n\r\n    - ##### Smiles Format:\r\n    \r\n    ScaffoldGraph expects a delimited file where the first column defines a SMILES string, followed by a molecule\r\n    identifier. If an identifier is not specified the program will use a hash of the molecule as an identifier.\r\n        \r\n    Example SMILES file:\r\n        \r\n    ```csv\r\n    CCN1CCc2c(C1)sc(NC(=O)Nc3ccc(Cl)cc3)c2C#N   CHEMBL4116520\r\n    CC(N1CC(C1)Oc2ccc(Cl)cc2)C3=Nc4c(cnn4C5CCOCC5)C(=O)N3   CHEMBL3990718\r\n    CN(C\\C=C\\c1ccc(cc1)C(F)(F)F)Cc2coc3ccccc23  CHEMBL4116665\r\n    N=C1N(C(=Nc2ccccc12)c3ccccc3)c4ccc5OCOc5c4  CHEMBL4116261\r\n    ...\r\n    ```\r\n    \r\n    - ##### SDF Format:\r\n    \r\n    ScaffoldGraph expects an [SDF](https://en.wikipedia.org/wiki/Chemical_table_file) file, where the molecule\r\n    identifier is specified in the title line. If the title line is blank, then a hash of the molecule\r\n    will be used as an identifier.\r\n       \r\n    Note: selecting subsets of a graph will not be possible if a name is not supplied \r\n        \r\n- #### Output Formats\r\n\r\n    - ##### TSV Format (default)\r\n    \r\n    The generate commands (network, hiers, tree) produce an intermediate tsv containing 4 columns:\r\n        \r\n    1) Number of rings (hierarchy)\r\n    2) Scaffold SMILES\r\n    3) Sub-scaffold SMILES\r\n    4) Molecule ID(s) (top-level scaffolds (Murcko))\r\n\r\n    The aggregate command produces a tsv containing 4 columns\r\n        \r\n    1) Scaffold ID\r\n    2) Number of rings (hierarchy)\r\n    3) Scaffold SMILES\r\n    4) Sub-scaffold IDs\r\n    \r\n    - ##### SDF Format\r\n    \r\n    An SDF file can be produced by the aggregate and select commands. This SDF is \r\n    formatted according to the SDF specification with added property fields:\r\n        \r\n    1) TITLE field = scaffold ID\r\n    2) SUBSCAFFOLDS field = list of sub-scaffold IDs\r\n    3) HIERARCHY field = number of rings\r\n    4) SMILES field = scaffold canonical SMILES   \r\n  \r\n  \r\n--------------------------------------------------------------------------------\r\n\r\n### Library usage\r\n\r\nScaffoldGraph makes it simple to construct a graph using the library API.\r\nThe resultant graphs follow the same API as a NetworkX DiGraph.\r\n\r\nSome [example](https://github.com/UCLCheminformatics/ScaffoldGraph/tree/master/examples) \r\nnotebooks can be found in the 'examples' directory.\r\n\r\n```python\r\nimport scaffoldgraph as sg\r\n\r\n# construct a scaffold network from an SDF file\r\nnetwork = sg.ScaffoldNetwork.from_sdf('my_sdf_file.sdf')\r\n\r\n# construct a scaffold tree from a SMILES file\r\ntree = sg.ScaffoldTree.from_smiles('my_smiles_file.smi')\r\n\r\n# construct a scaffold tree from a pandas dataframe\r\nimport pandas as pd\r\ndf = pd.read_csv('activity_data.csv')\r\nnetwork = sg.ScaffoldTree.from_dataframe(\r\n    df, smiles_column='Smiles', name_column='MolID',\r\n    data_columns=['pIC50', 'MolWt'], progress=True,\r\n)\r\n```\r\n\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n\r\n## Advanced Usage\r\n\r\n- **Multi-processing**\r\n    \r\n    It is simple to construct a graph from multiple input source in parallel,\r\n    using the concurrent.futures module and the sg.utils.aggregate function.\r\n    \r\n  ```python\r\n  from concurrent.futures import ProcessPoolExecutor\r\n  from functools import partial\r\n  import scaffoldgraph as sg\r\n  import os\r\n      \r\n  directory = './data'\r\n  sdf_files = [f for f in os.listdir(directory) if f.endswith('.sdf')]\r\n      \r\n  func = partial(sg.ScaffoldNetwork.from_sdf, ring_cutoff=10)\r\n        \r\n  graphs = []\r\n  with ProcessPoolExecutor(max_workers=4) as executor:\r\n      futures = executor.map(func, sdf_files)\r\n      for future in futures:\r\n          graphs.append(future)\r\n        \r\n  network = sg.utils.aggregate(graphs)\r\n  ```\r\n    \r\n- **Creating custom scaffold prioritisation rules**\r\n\r\n    If required a user can define their own rules for prioritizing scaffolds during scaffold tree construction.\r\n    Rules can be defined by subclassing one of four rule classes:\r\n    \r\n    BaseScaffoldFilterRule, ScaffoldFilterRule, ScaffoldMinFilterRule or ScaffoldMaxFilterRule\r\n    \r\n    When subclassing a name property must be defined and either a condition, get_property or filter function.\r\n    Examples are shown below:\r\n    \r\n  ```python\r\n  import scaffoldgraph as sg\r\n  from scaffoldgraph.prioritization import *\r\n    \r\n  \"\"\"\r\n  Scaffold filter rule (must implement name and condition)\r\n  The filter will retain all scaffolds which return a True condition\r\n  \"\"\"\r\n  \r\n  class CustomRule01(ScaffoldFilterRule):\r\n      \"\"\"Do not remove rings with >= 12 atoms if there are smaller rings to remove\"\"\"\r\n  \r\n      def condition(self, child, parent):\r\n          removed_ring = child.rings[parent.removed_ring_idx]\r\n          return removed_ring.size < 12\r\n            \r\n      @property\r\n      def name(self):\r\n          return 'custom rule 01'\r\n          \r\n  \"\"\"\r\n  Scaffold min/max filter rule (must implement name and get_property)\r\n  The filter will retain all scaffolds with the min/max property value\r\n  \"\"\"\r\n    \r\n  class CustomRule02(ScaffoldMinFilterRule):\r\n      \"\"\"Smaller rings are removed first\"\"\"\r\n    \r\n      def get_property(self, child, parent):\r\n          return child.rings[parent.removed_ring_idx].size\r\n            \r\n      @property\r\n      def name(self):\r\n          return 'custom rule 02'\r\n        \r\n      \r\n  \"\"\"\r\n  Scaffold base filter rule (must implement name and filter)\r\n  The filter method must return a list of filtered parent scaffolds\r\n  This rule is used when a more complex rule is required, this example\r\n  defines a tiebreaker rule. Only one scaffold must be left at the end\r\n  of all filter rules in a rule set\r\n  \"\"\"\r\n    \r\n  class CustomRule03(BaseScaffoldFilterRule):\r\n      \"\"\"Tie-breaker rule (alphabetical)\"\"\"\r\n    \r\n      def filter(self, child, parents):\r\n          return [sorted(parents, key=lambda p: p.smiles)[0]]\r\n    \r\n      @property\r\n      def name(self):\r\n          return 'custom rule 03'  \r\n  ```\r\n    \r\n   Custom rules can subsequently be added to a rule set and supplied to the scaffold tree constructor:\r\n    \r\n   ```python\r\n  ruleset = ScaffoldRuleSet(name='custom rules')\r\n  ruleset.add_rule(CustomRule01())\r\n  ruleset.add_rule(CustomRule02())\r\n  ruleset.add_rule(CustomRule03())\r\n    \r\n  graph = sg.ScaffoldTree.from_sdf('my_sdf_file.sdf', prioritization_rules=ruleset)\r\n  ```\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n## Contributing\r\n\r\nContributions to ScaffoldGraph will most likely fall into the following categories:\r\n\r\n1. Implementing a new Feature:\r\n    * New Features that fit into the scope of this package will be accepted. If you are unsure about the \r\n      idea/design/implementation, feel free to post an issue.\r\n2. Fixing a Bug:\r\n    * Bug fixes are welcomed, please send a Pull Request each time a bug is encountered. When sending a Pull\r\n      Request please provide a clear description of the encountered bug. If unsure feel free to post an issue\r\n\r\nPlease send Pull Requests to: \r\nhttp://github.com/UCLCheminformatics/ScaffoldGraph\r\n\r\n### Testing\r\n\r\nScaffoldGraphs testing is located under `test/`. Run all tests using:\r\n\r\n```\r\n$ python setup.py test\r\n```\r\n\r\nor run an individual test: `pytest --no-cov tests/core`\r\n\r\nWhen contributing new features please include appropriate test files\r\n\r\n### Continuous Integration\r\n\r\nScaffoldGraph uses Travis CI for continuous integration\r\n\r\n--------------------------------------------------------------------------------\r\n\r\n## References\r\n\r\n* Bemis, G. W. and Murcko, M. A. (1996). The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39(15), 2887\u20132893.\r\n* Matlock, M., Zaretzki, J., Swamidass, J. S. (2013). Scaffold network generator: a tool for mining molecular structures. Bioinformatics, 29(20), 2655-2656\r\n* Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., and Waldmann, H. (2007). The scaffold tree visualization of the scaffold universe by hierarchical scaffold classification. Journal of Chemical Information and Modeling, 47(1), 47\u201358. PMID: 17238248.\r\n* Varin, T., Schuffenhauer, A., Ertl, P., and Renner, S. (2011). Mining for bioactive scaffolds with scaffold networks: Improved compound set enrichment from primary screening data. Journal of Chemical Information and Modeling, 51(7), 1528\u20131538.\r\n* Varin, T., Gubler, H., Parker, C., Zhang, J., Raman, P., Ertl, P. and Schuffenhauer, A. (2010) Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data. Journal of Chemical Information and Modeling, 50(12), 2067-2078.\r\n* Wetzel, S., Klein, K., Renner, S., Rennerauh, D., Oprea, T. I., Mutzel, P., and Waldmann, H. (2009). Interactive exploration of chemical space with scaffold hunter. Nat Chem Biol, 1875(8), 581\u2013583.\r\n* Wilkens, J., Janes, J. and Su, A. (2005). HierS:\u2009 Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry, 48(9), 3182-3193.\r\n\r\n---------------------------------------------------------------------------------\r\n\r\n## Citation\r\n\r\nIf you use this software in your own work please cite our [paper](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa219/5814205),\r\nand the respective papers of the methods used.\r\n\r\n```\r\n@article{10.1093/bioinformatics/btaa219,\r\n    author = {Scott, Oliver B and Chan, A W Edith},\r\n    title = \"{ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees}\",\r\n    journal = {Bioinformatics},\r\n    year = {2020},\r\n    month = {03},\r\n    issn = {1367-4803},\r\n    doi = {10.1093/bioinformatics/btaa219},\r\n    url = {https://doi.org/10.1093/bioinformatics/btaa219},\r\n    note = {btaa219}\r\n    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa219/32984904/btaa219.pdf},\r\n}\r\n```\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "ScaffoldGraph is an open-source cheminformatics library, built using RDKit and NetworkX for generating scaffold networks and scaffold trees.",
    "version": "1.1.8",
    "split_keywords": [
        "rdkit",
        "networkx",
        "cheminformatics",
        "scaffolds",
        "scaffold tree",
        "scaffold network"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bf5327273d700dd6e151b6d1aa6e5ba5369905c78fa52cd5668ea5b5f83f1fbb",
                "md5": "1d19f528b89f9a9b2a28fb73c1902e75",
                "sha256": "ae77fa35f6eb538052bab1342b2851aec2ce99800b6a1856d7f783198a1d5d89"
            },
            "downloads": -1,
            "filename": "ScaffoldGraphDG-1.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "1d19f528b89f9a9b2a28fb73c1902e75",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 84353,
            "upload_time": "2023-01-20T07:33:49",
            "upload_time_iso_8601": "2023-01-20T07:33:49.773693Z",
            "url": "https://files.pythonhosted.org/packages/bf/53/27273d700dd6e151b6d1aa6e5ba5369905c78fa52cd5668ea5b5f83f1fbb/ScaffoldGraphDG-1.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-20 07:33:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "UCLCheminformatics",
    "github_project": "scaffoldgraph",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "scaffoldgraphdg"
}
        
Elapsed time: 0.03134s