kgcnn


Namekgcnn JSON
Version 4.0.1 PyPI version JSON
download
home_pagehttps://github.com/aimat-lab/gcnn_keras
SummaryGeneral Base Layers for Graph Convolutions with Keras
upload_time2024-02-27 12:33:31
maintainer
docs_urlNone
authorPatrick Reiser
requires_python
license
keywords materials science machine learning deep graph networks neural
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![GitHub release (latest by date)](https://img.shields.io/github/v/release/aimat-lab/gcnn_keras)
[![Documentation Status](https://readthedocs.org/projects/kgcnn/badge/?version=latest)](https://kgcnn.readthedocs.io/en/latest/?badge=latest)
[![PyPI version](https://badge.fury.io/py/kgcnn.svg)](https://badge.fury.io/py/kgcnn)
![PyPI - Downloads](https://img.shields.io/pypi/dm/kgcnn)
[![kgcnn_unit_tests](https://github.com/aimat-lab/gcnn_keras/actions/workflows/unittests.yml/badge.svg)](https://github.com/aimat-lab/gcnn_keras/actions/workflows/unittests.yml)
[![DOI](https://img.shields.io/badge/DOI-10.1016%2Fj.simpa.2021.100095%20-blue)](https://doi.org/10.1016/j.simpa.2021.100095)
![GitHub](https://img.shields.io/github/license/aimat-lab/gcnn_keras)
![GitHub issues](https://img.shields.io/github/issues/aimat-lab/gcnn_keras)
![Maintenance](https://img.shields.io/maintenance/yes/2024)

# Keras Graph Convolution Neural Networks
<p align="left">
  <img src="https://github.com/aimat-lab/gcnn_keras/blob/master/docs/source/_static/icon.svg" height="80"/>
</p>


[General](#general) | [Requirements](#requirements) | [Installation](#installation) | [Documentation](#documentation) | [Implementation details](#implementation-details)
 | [Literature](#literature) | [Data](#data)  | [Datasets](#datasets) | [Training](#training) | [Issues](#issues) | [Citing](#citing) | [References](#references)
 

<a name="general"></a>
# General

The package in [kgcnn](kgcnn) contains several layer classes to build up graph convolution models in 
Keras with Tensorflow, PyTorch or Jax as backend. 
Some models are given as an example in literature.
A [documentation](https://kgcnn.readthedocs.io/en/latest/index.html) is generated in [docs](docs).
Focus of [kgcnn](kgcnn) is (batched) graph learning for molecules [kgcnn.molecule](kgcnn/molecule) and materials [kgcnn.crystal](kgcnn/crystal).
If you want to get in contact, feel free to [discuss](https://github.com/aimat-lab/gcnn_keras/discussions). 

Note that kgcnn>=4.0.0 requires keras>=3.0.0. Previous versions of kgcnn were focused on ragged tensors of tensorflow, for which
hyperparameter for models should also transfer to kgcnn 4.0 by adding `input_tensor_type: "ragged"` and checking the order and *dtype* of inputs.

<a name="requirements"></a>
# Requirements

Standard python package requirements are installed automatically.
However, you must make sure to install the GPU/TPU acceleration for the backend of your choice.

<a name="installation"></a>
# Installation

Clone [repository](https://github.com/aimat-lab/gcnn_keras) or latest [release](https://github.com/aimat-lab/gcnn_keras/releases) and install with editable mode or latest release via [Python Package Index](https://pypi.org/project/kgcnn/).
```bash
pip install kgcnn
```
<a name="documentation"></a>
# Documentation

Auto-documentation is generated at https://kgcnn.readthedocs.io/en/latest/index.html .

<a name="implementation-details"></a>
# Implementation details

### Representation

A graph of `N` nodes and `M` edges is commonly represented by a list of node or edge attributes: `node_attr` or `edge_attr`, respectively. 
Plus a list of indices pairs `(i, j)` that represents a directed edge in the graph: `edge_index`. 
The feature dimension of the attributes is denoted by `F`. 
Alternatively, an adjacency matrix `A_ij` of shape `(N, N)` can be ascribed that has 'ones' entries
where there is an edge between nodes and 'zeros' elsewhere. Consequently, sum of `A_ij` will give `M` edges.

<a name="implementation-details-input"></a>
### Input

For learning on batches or single graphs, following tensor representation can be chosen:

###### Batched Graphs

* `node_attr`: Node attributes of shape `(batch, N, F)` and dtype *float*
* `edge_attr`: Edge attributes of shape `(batch, M, F)` and dtype *float*
* `edge_index`: Indices of shape `(batch, M, 2)` and dtype *int*
* `graph_attr`: Graph attributes of shape `(batch, F)` and dtype *float*

Graphs are stacked along the batch dimension `batch`. Note that for flexible sized graphs the tensor has to be padded up to a max `N`/`M` or ragged tensors are used,
with a ragged rank of one.

###### Disjoint Graphs

* `node_attr`: Node attributes of shape `([N], F)` and dtype *float*
* `edge_attr`: Edge attributes of shape `([M], F)` and dtype *float*
* `edge_index`: Indices of shape `(2, [M])` and dtype *int*
* `batch_ID`: Graph ID of shape `([N], )` and dtype *int*

Here, the lists essentially represent one graph but which consists of disjoint sub-graphs from the batch, 
which has been introduced by PytorchGeometric (PyG). 
For pooling, the graph assignment is stored in `batch_ID`. 
Note, that for Jax, we can not have dynamic shapes, so we use a padded disjoint representation assigning 
all padded nodes to a discarded graph with zero index.

### Model

The keras layers in [kgcnn.layers](kgcnn/layers) can be used with PyG compatible tensor representation. 
Or even by simply wrapping a PyG model with `TorchModuleWrapper`. Efficient model loading can be achieved 
in multiple ways (see [kgcnn.io](kgcnn/io)).
For most simple keras-like behaviour, the model can fed with batched padded or ragged tensor which are converted to/from
disjoint representation wrapping the PyG equivalent model.
Here an example of a minimal message passing GNN:

```python
import keras as ks
from kgcnn.layers.casting import CastBatchedIndicesToDisjoint
from kgcnn.layers.gather import GatherNodes
from kgcnn.layers.pooling import PoolingNodes
from kgcnn.layers.aggr import AggregateLocalEdges

# Example for padded input.
ns = ks.layers.Input(shape=(None, 64), dtype="float32", name="node_attributes")
e_idx = ks.layers.Input(shape=(None, 2), dtype="int64", name="edge_indices")
total_n = ks.layers.Input(shape=(), dtype="int64", name="total_nodes")  # Or mask
total_e = ks.layers.Input(shape=(), dtype="int64", name="total_edges")  # Or mask

n, idx, batch_id, _, _, _, _, _ = CastBatchedIndicesToDisjoint(uses_mask=False)([ns, e_idx, total_n, total_e])
n_in_out = GatherNodes()([n, idx])
node_messages = ks.layers.Dense(64, activation='relu')(n_in_out)
node_updates = AggregateLocalEdges()([n, node_messages, idx])
n_node_updates = ks.layers.Concatenate()([n, node_updates])
n_embedding = ks.layers.Dense(1)(n_node_updates)
g_embedding = PoolingNodes()([total_n, n_embedding, batch_id])

message_passing = ks.models.Model(inputs=[ns, e_idx, total_n, total_e], outputs=g_embedding)
```

The actual message passing model can further be structured by e.g. subclassing the message passing base layer:

```python
import keras as ks
from kgcnn.layers.message import MessagePassingBase

class MyMessageNN(MessagePassingBase):

    def __init__(self, units, **kwargs):
        super(MyMessageNN, self).__init__(**kwargs)
        self.dense = ks.layers.Dense(units)
        self.add = ks.layers.Add()

    def message_function(self, inputs, **kwargs):
        n_in, n_out, edges = inputs
        return self.dense(n_out, **kwargs)

    def update_nodes(self, inputs, **kwargs):
        nodes, nodes_update = inputs
        return self.add([nodes, nodes_update], **kwargs)
```

<a name="literature"></a>
# Literature
The following models, proposed in literature, have a module in [literature](kgcnn/literature). The module usually exposes a `make_model` function
to create a ``keras.models.Model``. The models can but must not be build completely from `kgcnn.layers` and can for example include
original implementations (with proper licencing).

* **[AttentiveFP](kgcnn/literature/AttentiveFP)**: [Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism](https://pubs.acs.org/doi/10.1021/acs.jmedchem.9b00959) by Xiong et al. (2019)
* **[CGCNN](kgcnn/literature/CGCNN)**: [Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301) by Xie et al. (2018)
* **[CMPNN](kgcnn/literature/CMPNN)**: [Communicative Representation Learning on Attributed Molecular Graphs](https://www.ijcai.org/proceedings/2020/0392.pdf) by Song et al. (2020)
* **[DGIN](kgcnn/literature/DGIN)**: [Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks ](https://pubmed.ncbi.nlm.nih.gov/34684766/) by Wieder et al. (2021)
* **[DimeNetPP](kgcnn/literature/DimeNetPP)**: [Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules](https://arxiv.org/abs/2011.14115) by Klicpera et al. (2020)
* **[DMPNN](kgcnn/literature/DMPNN)**: [Analyzing Learned Molecular Representations for Property Prediction](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00237) by Yang et al. (2019)
* **[EGNN](kgcnn/literature/EGNN)**: [E(n) Equivariant Graph Neural Networks](https://arxiv.org/abs/2102.09844) by Satorras et al. (2021)
* **[GAT](kgcnn/literature/GAT)**: [Graph Attention Networks](https://arxiv.org/abs/1710.10903) by Veličković et al. (2018)

<details>
<summary> ... and many more <b>(click to expand)</b>.</summary>

* **[GATv2](kgcnn/literature/GATv2)**: [How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2105.14491) by Brody et al. (2021)
* **[GCN](kgcnn/literature/GCN)**: [Semi-Supervised Classification with Graph Convolutional Networks](https://arxiv.org/abs/1609.02907) by Kipf et al. (2016)
* **[GIN](kgcnn/literature/GIN)**: [How Powerful are Graph Neural Networks?](https://arxiv.org/abs/1810.00826) by Xu et al. (2019)
* **[GNNExplainer](kgcnn/literature/GNNExplain)**: [GNNExplainer: Generating Explanations for Graph Neural Networks](https://arxiv.org/abs/1903.03894) by Ying et al. (2019)
* **[GNNFilm](kgcnn/literature/GNNFilm)**: [GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation](https://arxiv.org/abs/1906.12192) by Marc Brockschmidt (2020)
* **[GraphSAGE](kgcnn/literature/GraphSAGE)**: [Inductive Representation Learning on Large Graphs](http://arxiv.org/abs/1706.02216) by Hamilton et al. (2017)
* **[HamNet](kgcnn/literature/HamNet)**: [HamNet: Conformation-Guided Molecular Representation with Hamiltonian Neural Networks](https://arxiv.org/abs/2105.03688) by Li et al. (2021)
* **[HDNNP2nd](kgcnn/literature/HDNNP2nd)**: [Atom-centered symmetry functions for constructing high-dimensional neural network potentials](https://aip.scitation.org/doi/abs/10.1063/1.3553717) by Jörg Behler (2011)
* **[INorp](kgcnn/literature/INorp)**: [Interaction Networks for Learning about Objects,Relations and Physics](https://arxiv.org/abs/1612.00222) by Battaglia et al. (2016)
* **[MAT](kgcnn/literature/MAT)**: [Molecule Attention Transformer](https://arxiv.org/abs/2002.08264) by Maziarka et al. (2020)
* **[MEGAN](kgcnn/literature/MEGAN)**: [MEGAN: Multi-explanation Graph Attention Network](https://link.springer.com/chapter/10.1007/978-3-031-44067-0_18) by Teufel et al. (2023)
* **[Megnet](kgcnn/literature/Megnet)**: [Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals](https://doi.org/10.1021/acs.chemmater.9b01294) by Chen et al. (2019)
* **[MoGAT](kgcnn/literature/MoGAT)**: [Multi-order graph attention network for water solubility prediction and interpretation](https://www.nature.com/articles/s41598-022-25701-5) by Lee et al. (2023)
* **[MXMNet](kgcnn/literature/MXMNet)**: [Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures](https://arxiv.org/abs/2011.07457) by Zhang et al. (2020)
* **[NMPN](kgcnn/literature/NMPN)**: [Neural Message Passing for Quantum Chemistry](http://arxiv.org/abs/1704.01212) by Gilmer et al. (2017)
* **[PAiNN](kgcnn/literature/PAiNN)**: [Equivariant message passing for the prediction of tensorial properties and molecular spectra](https://arxiv.org/pdf/2102.03150.pdf) by Schütt et al. (2020)
* **[RGCN](kgcnn/literature/RGCN)**: [Modeling Relational Data with Graph Convolutional Networks](https://arxiv.org/abs/1703.06103) by Schlichtkrull et al. (2017)
* **[rGIN](kgcnn/literature/rGIN)** [Random Features Strengthen Graph Neural Networks](https://arxiv.org/abs/2002.03155) by Sato et al. (2020)
* **[Schnet](kgcnn/literature/Schnet)**: [SchNet – A deep learning architecture for molecules and materials ](https://aip.scitation.org/doi/10.1063/1.5019779) by Schütt et al. (2017)

</details>


<a name="data"></a>
# Data

Data handling classes are given in `kgcnn.data` which stores graphs as `List[Dict]` .

#### Graph dictionary

Graphs are represented by a dictionary `GraphDict` of (numpy) arrays which behaves like a python `dict`.
There are graph pre- and postprocessors in ``kgcnn.graph`` which take specific properties by name and apply a
processing function or transformation. 

> [!IMPORTANT]  
> They can do any operation but note that `GraphDict` does not impose an actual graph structure!
> For example to sort edge indices make sure that all attributes are sorted accordingly. 


```python
from kgcnn.graph import GraphDict
# Single graph.
graph = GraphDict({"edge_indices": [[1, 0], [0, 1]], "node_label": [[0], [1]]})
graph.set("graph_labels", [0])  # use set(), get() to assign (tensor) properties.
graph.set("edge_attributes", [[1.0], [2.0]])
graph.to_networkx()
# Modify with e.g. preprocessor.
from kgcnn.graph.preprocessor import SortEdgeIndices
SortEdgeIndices(edge_indices="edge_indices", edge_attributes="^edge_(?!indices$).*", in_place=True)(graph)
```

#### List of graph dictionaries

A `MemoryGraphList` should behave identical to a python list but contain only `GraphDict` items.

```python
from kgcnn.data import MemoryGraphList
# List of graph dicts.
graph_list = MemoryGraphList([{"edge_indices": [[0, 1], [1, 0]]}, {"edge_indices": [[0, 0]]}, {}])
graph_list.clean(["edge_indices"])  # Remove graphs without property
graph_list.get("edge_indices")  # opposite is set()
# Easily cast to tensor; makes copy.
tensor = graph_list.tensor([{"name": "edge_indices"}])  # config of keras `Input` layer
# Or directly modify list.
for i, x in enumerate(graph_list):
    x.set("graph_number", [i])
print(len(graph_list), graph_list[:2])  # Also supports indexing lists.
```


<a name="datasets"></a>
# Datasets

The `MemoryGraphDataset` inherits from `MemoryGraphList` but must be initialized with file information on disk that points to a `data_directory` for the dataset.
The `data_directory` can have a subdirectory for files and/or single file such as a CSV file: 

```bash
├── data_directory
    ├── file_directory
    │   ├── *.*
    │   └── ... 
    ├── file_name
    └── dataset_name.kgcnn.pickle
```
A base dataset class is created with path and name information:

```python
from kgcnn.data import MemoryGraphDataset
dataset = MemoryGraphDataset(data_directory="ExampleDir/", 
                             dataset_name="Example",
                             file_name=None, file_directory=None)
dataset.save()  # opposite is load(). 
```

The subclasses `QMDataset`, `ForceDataset`, `MoleculeNetDataset`, `CrystalDataset` and `GraphTUDataset` further have functions required for the specific dataset type to convert and process files such as '.txt', '.sdf', '.xyz' etc. 
Most subclasses implement `prepare_data()` and `read_in_memory()` with dataset dependent arguments.
An example for `MoleculeNetDataset` is shown below. 
For more details find tutorials in [notebooks](notebooks).

```python
from kgcnn.data.moleculenet import MoleculeNetDataset
# File directory and files must exist. 
# Here 'ExampleDir' and 'ExampleDir/data.csv' with columns "smiles" and "label".
dataset = MoleculeNetDataset(dataset_name="Example",
                             data_directory="ExampleDir/",
                             file_name="data.csv")
dataset.prepare_data(overwrite=True, smiles_column_name="smiles", add_hydrogen=True,
                     make_conformers=True, optimize_conformer=True, num_workers=None)
dataset.read_in_memory(label_column_name="label", add_hydrogen=False, 
                       has_conformers=True)
```

In [data.datasets](kgcnn/data/datasets) there are graph learning benchmark datasets as subclasses which are being *downloaded* from e.g. popular graph archives like [TUDatasets](https://chrsmrrs.github.io/datasets/), [MatBench](https://matbench.materialsproject.org/) or [MoleculeNet](https://moleculenet.org/). 
The subclasses `GraphTUDataset2020`, `MatBenchDataset2020` and `MoleculeNetDataset2018` download and read the available datasets by name.
There are also specific dataset subclasses for each dataset to handle additional processing or downloading from individual sources:

```python
from kgcnn.data.datasets.MUTAGDataset import MUTAGDataset
dataset = MUTAGDataset()  # inherits from GraphTUDataset2020
```

Downloaded datasets are stored in `~/.kgcnn/datasets` on your computer. Please remove them manually, if no longer required.

<a name="training"></a>
# Training

A set of example training can be found in [training](training). Training scripts are configurable with a hyperparameter config file and command line arguments regarding model and dataset.

You can find a [table](training/results/README.md) of common benchmark datasets in [results](training/results).

# Issues

Some known issues to be aware of, if using and making new models or layers with `kgcnn`.
* Jagged or nested Tensors loading into models for PyTorch backend is not working.
* BatchNormalization layer dos not support padding yet.
* Keras AUC metric does not seem to work for torch cuda.

<a name="citing"></a>
# Citing

If you want to cite this repo, please refer to our [paper](https://doi.org/10.1016/j.simpa.2021.100095):

```
@article{REISER2021100095,
title = {Graph neural networks in TensorFlow-Keras with RaggedTensor representation (kgcnn)},
journal = {Software Impacts},
pages = {100095},
year = {2021},
issn = {2665-9638},
doi = {https://doi.org/10.1016/j.simpa.2021.100095},
url = {https://www.sciencedirect.com/science/article/pii/S266596382100035X},
author = {Patrick Reiser and Andre Eberhard and Pascal Friederich}
}
```

<a name="references"></a>
# References

- https://www.tensorflow.org/api_docs/python/tf/RaggedTensor

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aimat-lab/gcnn_keras",
    "name": "kgcnn",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "materials,science,machine,learning,deep,graph,networks,neural",
    "author": "Patrick Reiser",
    "author_email": "patrick.reiser@kit.edu",
    "download_url": "https://files.pythonhosted.org/packages/c5/d0/941f50fa2d2c483bb63127305a3d942c2685b2407a7ee2dca98ec8bb4e0b/kgcnn-4.0.1.tar.gz",
    "platform": null,
    "description": "![GitHub release (latest by date)](https://img.shields.io/github/v/release/aimat-lab/gcnn_keras)\r\n[![Documentation Status](https://readthedocs.org/projects/kgcnn/badge/?version=latest)](https://kgcnn.readthedocs.io/en/latest/?badge=latest)\r\n[![PyPI version](https://badge.fury.io/py/kgcnn.svg)](https://badge.fury.io/py/kgcnn)\r\n![PyPI - Downloads](https://img.shields.io/pypi/dm/kgcnn)\r\n[![kgcnn_unit_tests](https://github.com/aimat-lab/gcnn_keras/actions/workflows/unittests.yml/badge.svg)](https://github.com/aimat-lab/gcnn_keras/actions/workflows/unittests.yml)\r\n[![DOI](https://img.shields.io/badge/DOI-10.1016%2Fj.simpa.2021.100095%20-blue)](https://doi.org/10.1016/j.simpa.2021.100095)\r\n![GitHub](https://img.shields.io/github/license/aimat-lab/gcnn_keras)\r\n![GitHub issues](https://img.shields.io/github/issues/aimat-lab/gcnn_keras)\r\n![Maintenance](https://img.shields.io/maintenance/yes/2024)\r\n\r\n# Keras Graph Convolution Neural Networks\r\n<p align=\"left\">\r\n  <img src=\"https://github.com/aimat-lab/gcnn_keras/blob/master/docs/source/_static/icon.svg\" height=\"80\"/>\r\n</p>\r\n\r\n\r\n[General](#general) | [Requirements](#requirements) | [Installation](#installation) | [Documentation](#documentation) | [Implementation details](#implementation-details)\r\n | [Literature](#literature) | [Data](#data)  | [Datasets](#datasets) | [Training](#training) | [Issues](#issues) | [Citing](#citing) | [References](#references)\r\n \r\n\r\n<a name=\"general\"></a>\r\n# General\r\n\r\nThe package in [kgcnn](kgcnn) contains several layer classes to build up graph convolution models in \r\nKeras with Tensorflow, PyTorch or Jax as backend. \r\nSome models are given as an example in literature.\r\nA [documentation](https://kgcnn.readthedocs.io/en/latest/index.html) is generated in [docs](docs).\r\nFocus of [kgcnn](kgcnn) is (batched) graph learning for molecules [kgcnn.molecule](kgcnn/molecule) and materials [kgcnn.crystal](kgcnn/crystal).\r\nIf you want to get in contact, feel free to [discuss](https://github.com/aimat-lab/gcnn_keras/discussions). \r\n\r\nNote that kgcnn>=4.0.0 requires keras>=3.0.0. Previous versions of kgcnn were focused on ragged tensors of tensorflow, for which\r\nhyperparameter for models should also transfer to kgcnn 4.0 by adding `input_tensor_type: \"ragged\"` and checking the order and *dtype* of inputs.\r\n\r\n<a name=\"requirements\"></a>\r\n# Requirements\r\n\r\nStandard python package requirements are installed automatically.\r\nHowever, you must make sure to install the GPU/TPU acceleration for the backend of your choice.\r\n\r\n<a name=\"installation\"></a>\r\n# Installation\r\n\r\nClone [repository](https://github.com/aimat-lab/gcnn_keras) or latest [release](https://github.com/aimat-lab/gcnn_keras/releases) and install with editable mode or latest release via [Python Package Index](https://pypi.org/project/kgcnn/).\r\n```bash\r\npip install kgcnn\r\n```\r\n<a name=\"documentation\"></a>\r\n# Documentation\r\n\r\nAuto-documentation is generated at https://kgcnn.readthedocs.io/en/latest/index.html .\r\n\r\n<a name=\"implementation-details\"></a>\r\n# Implementation details\r\n\r\n### Representation\r\n\r\nA graph of `N` nodes and `M` edges is commonly represented by a list of node or edge attributes: `node_attr` or `edge_attr`, respectively. \r\nPlus a list of indices pairs `(i, j)` that represents a directed edge in the graph: `edge_index`. \r\nThe feature dimension of the attributes is denoted by `F`. \r\nAlternatively, an adjacency matrix `A_ij` of shape `(N, N)` can be ascribed that has 'ones' entries\r\nwhere there is an edge between nodes and 'zeros' elsewhere. Consequently, sum of `A_ij` will give `M` edges.\r\n\r\n<a name=\"implementation-details-input\"></a>\r\n### Input\r\n\r\nFor learning on batches or single graphs, following tensor representation can be chosen:\r\n\r\n###### Batched Graphs\r\n\r\n* `node_attr`: Node attributes of shape `(batch, N, F)` and dtype *float*\r\n* `edge_attr`: Edge attributes of shape `(batch, M, F)` and dtype *float*\r\n* `edge_index`: Indices of shape `(batch, M, 2)` and dtype *int*\r\n* `graph_attr`: Graph attributes of shape `(batch, F)` and dtype *float*\r\n\r\nGraphs are stacked along the batch dimension `batch`. Note that for flexible sized graphs the tensor has to be padded up to a max `N`/`M` or ragged tensors are used,\r\nwith a ragged rank of one.\r\n\r\n###### Disjoint Graphs\r\n\r\n* `node_attr`: Node attributes of shape `([N], F)` and dtype *float*\r\n* `edge_attr`: Edge attributes of shape `([M], F)` and dtype *float*\r\n* `edge_index`: Indices of shape `(2, [M])` and dtype *int*\r\n* `batch_ID`: Graph ID of shape `([N], )` and dtype *int*\r\n\r\nHere, the lists essentially represent one graph but which consists of disjoint sub-graphs from the batch, \r\nwhich has been introduced by PytorchGeometric (PyG). \r\nFor pooling, the graph assignment is stored in `batch_ID`. \r\nNote, that for Jax, we can not have dynamic shapes, so we use a padded disjoint representation assigning \r\nall padded nodes to a discarded graph with zero index.\r\n\r\n### Model\r\n\r\nThe keras layers in [kgcnn.layers](kgcnn/layers) can be used with PyG compatible tensor representation. \r\nOr even by simply wrapping a PyG model with `TorchModuleWrapper`. Efficient model loading can be achieved \r\nin multiple ways (see [kgcnn.io](kgcnn/io)).\r\nFor most simple keras-like behaviour, the model can fed with batched padded or ragged tensor which are converted to/from\r\ndisjoint representation wrapping the PyG equivalent model.\r\nHere an example of a minimal message passing GNN:\r\n\r\n```python\r\nimport keras as ks\r\nfrom kgcnn.layers.casting import CastBatchedIndicesToDisjoint\r\nfrom kgcnn.layers.gather import GatherNodes\r\nfrom kgcnn.layers.pooling import PoolingNodes\r\nfrom kgcnn.layers.aggr import AggregateLocalEdges\r\n\r\n# Example for padded input.\r\nns = ks.layers.Input(shape=(None, 64), dtype=\"float32\", name=\"node_attributes\")\r\ne_idx = ks.layers.Input(shape=(None, 2), dtype=\"int64\", name=\"edge_indices\")\r\ntotal_n = ks.layers.Input(shape=(), dtype=\"int64\", name=\"total_nodes\")  # Or mask\r\ntotal_e = ks.layers.Input(shape=(), dtype=\"int64\", name=\"total_edges\")  # Or mask\r\n\r\nn, idx, batch_id, _, _, _, _, _ = CastBatchedIndicesToDisjoint(uses_mask=False)([ns, e_idx, total_n, total_e])\r\nn_in_out = GatherNodes()([n, idx])\r\nnode_messages = ks.layers.Dense(64, activation='relu')(n_in_out)\r\nnode_updates = AggregateLocalEdges()([n, node_messages, idx])\r\nn_node_updates = ks.layers.Concatenate()([n, node_updates])\r\nn_embedding = ks.layers.Dense(1)(n_node_updates)\r\ng_embedding = PoolingNodes()([total_n, n_embedding, batch_id])\r\n\r\nmessage_passing = ks.models.Model(inputs=[ns, e_idx, total_n, total_e], outputs=g_embedding)\r\n```\r\n\r\nThe actual message passing model can further be structured by e.g. subclassing the message passing base layer:\r\n\r\n```python\r\nimport keras as ks\r\nfrom kgcnn.layers.message import MessagePassingBase\r\n\r\nclass MyMessageNN(MessagePassingBase):\r\n\r\n    def __init__(self, units, **kwargs):\r\n        super(MyMessageNN, self).__init__(**kwargs)\r\n        self.dense = ks.layers.Dense(units)\r\n        self.add = ks.layers.Add()\r\n\r\n    def message_function(self, inputs, **kwargs):\r\n        n_in, n_out, edges = inputs\r\n        return self.dense(n_out, **kwargs)\r\n\r\n    def update_nodes(self, inputs, **kwargs):\r\n        nodes, nodes_update = inputs\r\n        return self.add([nodes, nodes_update], **kwargs)\r\n```\r\n\r\n<a name=\"literature\"></a>\r\n# Literature\r\nThe following models, proposed in literature, have a module in [literature](kgcnn/literature). The module usually exposes a `make_model` function\r\nto create a ``keras.models.Model``. The models can but must not be build completely from `kgcnn.layers` and can for example include\r\noriginal implementations (with proper licencing).\r\n\r\n* **[AttentiveFP](kgcnn/literature/AttentiveFP)**: [Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism](https://pubs.acs.org/doi/10.1021/acs.jmedchem.9b00959) by Xiong et al. (2019)\r\n* **[CGCNN](kgcnn/literature/CGCNN)**: [Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301) by Xie et al. (2018)\r\n* **[CMPNN](kgcnn/literature/CMPNN)**: [Communicative Representation Learning on Attributed Molecular Graphs](https://www.ijcai.org/proceedings/2020/0392.pdf) by Song et al. (2020)\r\n* **[DGIN](kgcnn/literature/DGIN)**: [Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks ](https://pubmed.ncbi.nlm.nih.gov/34684766/) by Wieder et al. (2021)\r\n* **[DimeNetPP](kgcnn/literature/DimeNetPP)**: [Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules](https://arxiv.org/abs/2011.14115) by Klicpera et al. (2020)\r\n* **[DMPNN](kgcnn/literature/DMPNN)**: [Analyzing Learned Molecular Representations for Property Prediction](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00237) by Yang et al. (2019)\r\n* **[EGNN](kgcnn/literature/EGNN)**: [E(n) Equivariant Graph Neural Networks](https://arxiv.org/abs/2102.09844) by Satorras et al. (2021)\r\n* **[GAT](kgcnn/literature/GAT)**: [Graph Attention Networks](https://arxiv.org/abs/1710.10903) by Veli\u010dkovi\u0107 et al. (2018)\r\n\r\n<details>\r\n<summary> ... and many more <b>(click to expand)</b>.</summary>\r\n\r\n* **[GATv2](kgcnn/literature/GATv2)**: [How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2105.14491) by Brody et al. (2021)\r\n* **[GCN](kgcnn/literature/GCN)**: [Semi-Supervised Classification with Graph Convolutional Networks](https://arxiv.org/abs/1609.02907) by Kipf et al. (2016)\r\n* **[GIN](kgcnn/literature/GIN)**: [How Powerful are Graph Neural Networks?](https://arxiv.org/abs/1810.00826) by Xu et al. (2019)\r\n* **[GNNExplainer](kgcnn/literature/GNNExplain)**: [GNNExplainer: Generating Explanations for Graph Neural Networks](https://arxiv.org/abs/1903.03894) by Ying et al. (2019)\r\n* **[GNNFilm](kgcnn/literature/GNNFilm)**: [GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation](https://arxiv.org/abs/1906.12192) by Marc Brockschmidt (2020)\r\n* **[GraphSAGE](kgcnn/literature/GraphSAGE)**: [Inductive Representation Learning on Large Graphs](http://arxiv.org/abs/1706.02216) by Hamilton et al. (2017)\r\n* **[HamNet](kgcnn/literature/HamNet)**: [HamNet: Conformation-Guided Molecular Representation with Hamiltonian Neural Networks](https://arxiv.org/abs/2105.03688) by Li et al. (2021)\r\n* **[HDNNP2nd](kgcnn/literature/HDNNP2nd)**: [Atom-centered symmetry functions for constructing high-dimensional neural network potentials](https://aip.scitation.org/doi/abs/10.1063/1.3553717) by J\u00f6rg Behler (2011)\r\n* **[INorp](kgcnn/literature/INorp)**: [Interaction Networks for Learning about Objects,Relations and Physics](https://arxiv.org/abs/1612.00222) by Battaglia et al. (2016)\r\n* **[MAT](kgcnn/literature/MAT)**: [Molecule Attention Transformer](https://arxiv.org/abs/2002.08264) by Maziarka et al. (2020)\r\n* **[MEGAN](kgcnn/literature/MEGAN)**: [MEGAN: Multi-explanation Graph Attention Network](https://link.springer.com/chapter/10.1007/978-3-031-44067-0_18) by Teufel et al. (2023)\r\n* **[Megnet](kgcnn/literature/Megnet)**: [Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals](https://doi.org/10.1021/acs.chemmater.9b01294) by Chen et al. (2019)\r\n* **[MoGAT](kgcnn/literature/MoGAT)**: [Multi-order graph attention network for water solubility prediction and interpretation](https://www.nature.com/articles/s41598-022-25701-5) by Lee et al. (2023)\r\n* **[MXMNet](kgcnn/literature/MXMNet)**: [Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures](https://arxiv.org/abs/2011.07457) by Zhang et al. (2020)\r\n* **[NMPN](kgcnn/literature/NMPN)**: [Neural Message Passing for Quantum Chemistry](http://arxiv.org/abs/1704.01212) by Gilmer et al. (2017)\r\n* **[PAiNN](kgcnn/literature/PAiNN)**: [Equivariant message passing for the prediction of tensorial properties and molecular spectra](https://arxiv.org/pdf/2102.03150.pdf) by Sch\u00fctt et al. (2020)\r\n* **[RGCN](kgcnn/literature/RGCN)**: [Modeling Relational Data with Graph Convolutional Networks](https://arxiv.org/abs/1703.06103) by Schlichtkrull et al. (2017)\r\n* **[rGIN](kgcnn/literature/rGIN)** [Random Features Strengthen Graph Neural Networks](https://arxiv.org/abs/2002.03155) by Sato et al. (2020)\r\n* **[Schnet](kgcnn/literature/Schnet)**: [SchNet \u2013 A deep learning architecture for molecules and materials ](https://aip.scitation.org/doi/10.1063/1.5019779) by Sch\u00fctt et al. (2017)\r\n\r\n</details>\r\n\r\n\r\n<a name=\"data\"></a>\r\n# Data\r\n\r\nData handling classes are given in `kgcnn.data` which stores graphs as `List[Dict]` .\r\n\r\n#### Graph dictionary\r\n\r\nGraphs are represented by a dictionary `GraphDict` of (numpy) arrays which behaves like a python `dict`.\r\nThere are graph pre- and postprocessors in ``kgcnn.graph`` which take specific properties by name and apply a\r\nprocessing function or transformation. \r\n\r\n> [!IMPORTANT]  \r\n> They can do any operation but note that `GraphDict` does not impose an actual graph structure!\r\n> For example to sort edge indices make sure that all attributes are sorted accordingly. \r\n\r\n\r\n```python\r\nfrom kgcnn.graph import GraphDict\r\n# Single graph.\r\ngraph = GraphDict({\"edge_indices\": [[1, 0], [0, 1]], \"node_label\": [[0], [1]]})\r\ngraph.set(\"graph_labels\", [0])  # use set(), get() to assign (tensor) properties.\r\ngraph.set(\"edge_attributes\", [[1.0], [2.0]])\r\ngraph.to_networkx()\r\n# Modify with e.g. preprocessor.\r\nfrom kgcnn.graph.preprocessor import SortEdgeIndices\r\nSortEdgeIndices(edge_indices=\"edge_indices\", edge_attributes=\"^edge_(?!indices$).*\", in_place=True)(graph)\r\n```\r\n\r\n#### List of graph dictionaries\r\n\r\nA `MemoryGraphList` should behave identical to a python list but contain only `GraphDict` items.\r\n\r\n```python\r\nfrom kgcnn.data import MemoryGraphList\r\n# List of graph dicts.\r\ngraph_list = MemoryGraphList([{\"edge_indices\": [[0, 1], [1, 0]]}, {\"edge_indices\": [[0, 0]]}, {}])\r\ngraph_list.clean([\"edge_indices\"])  # Remove graphs without property\r\ngraph_list.get(\"edge_indices\")  # opposite is set()\r\n# Easily cast to tensor; makes copy.\r\ntensor = graph_list.tensor([{\"name\": \"edge_indices\"}])  # config of keras `Input` layer\r\n# Or directly modify list.\r\nfor i, x in enumerate(graph_list):\r\n    x.set(\"graph_number\", [i])\r\nprint(len(graph_list), graph_list[:2])  # Also supports indexing lists.\r\n```\r\n\r\n\r\n<a name=\"datasets\"></a>\r\n# Datasets\r\n\r\nThe `MemoryGraphDataset` inherits from `MemoryGraphList` but must be initialized with file information on disk that points to a `data_directory` for the dataset.\r\nThe `data_directory` can have a subdirectory for files and/or single file such as a CSV file: \r\n\r\n```bash\r\n\u251c\u2500\u2500 data_directory\r\n    \u251c\u2500\u2500 file_directory\r\n    \u2502   \u251c\u2500\u2500 *.*\r\n    \u2502   \u2514\u2500\u2500 ... \r\n    \u251c\u2500\u2500 file_name\r\n    \u2514\u2500\u2500 dataset_name.kgcnn.pickle\r\n```\r\nA base dataset class is created with path and name information:\r\n\r\n```python\r\nfrom kgcnn.data import MemoryGraphDataset\r\ndataset = MemoryGraphDataset(data_directory=\"ExampleDir/\", \r\n                             dataset_name=\"Example\",\r\n                             file_name=None, file_directory=None)\r\ndataset.save()  # opposite is load(). \r\n```\r\n\r\nThe subclasses `QMDataset`, `ForceDataset`, `MoleculeNetDataset`, `CrystalDataset` and `GraphTUDataset` further have functions required for the specific dataset type to convert and process files such as '.txt', '.sdf', '.xyz' etc. \r\nMost subclasses implement `prepare_data()` and `read_in_memory()` with dataset dependent arguments.\r\nAn example for `MoleculeNetDataset` is shown below. \r\nFor more details find tutorials in [notebooks](notebooks).\r\n\r\n```python\r\nfrom kgcnn.data.moleculenet import MoleculeNetDataset\r\n# File directory and files must exist. \r\n# Here 'ExampleDir' and 'ExampleDir/data.csv' with columns \"smiles\" and \"label\".\r\ndataset = MoleculeNetDataset(dataset_name=\"Example\",\r\n                             data_directory=\"ExampleDir/\",\r\n                             file_name=\"data.csv\")\r\ndataset.prepare_data(overwrite=True, smiles_column_name=\"smiles\", add_hydrogen=True,\r\n                     make_conformers=True, optimize_conformer=True, num_workers=None)\r\ndataset.read_in_memory(label_column_name=\"label\", add_hydrogen=False, \r\n                       has_conformers=True)\r\n```\r\n\r\nIn [data.datasets](kgcnn/data/datasets) there are graph learning benchmark datasets as subclasses which are being *downloaded* from e.g. popular graph archives like [TUDatasets](https://chrsmrrs.github.io/datasets/), [MatBench](https://matbench.materialsproject.org/) or [MoleculeNet](https://moleculenet.org/). \r\nThe subclasses `GraphTUDataset2020`, `MatBenchDataset2020` and `MoleculeNetDataset2018` download and read the available datasets by name.\r\nThere are also specific dataset subclasses for each dataset to handle additional processing or downloading from individual sources:\r\n\r\n```python\r\nfrom kgcnn.data.datasets.MUTAGDataset import MUTAGDataset\r\ndataset = MUTAGDataset()  # inherits from GraphTUDataset2020\r\n```\r\n\r\nDownloaded datasets are stored in `~/.kgcnn/datasets` on your computer. Please remove them manually, if no longer required.\r\n\r\n<a name=\"training\"></a>\r\n# Training\r\n\r\nA set of example training can be found in [training](training). Training scripts are configurable with a hyperparameter config file and command line arguments regarding model and dataset.\r\n\r\nYou can find a [table](training/results/README.md) of common benchmark datasets in [results](training/results).\r\n\r\n# Issues\r\n\r\nSome known issues to be aware of, if using and making new models or layers with `kgcnn`.\r\n* Jagged or nested Tensors loading into models for PyTorch backend is not working.\r\n* BatchNormalization layer dos not support padding yet.\r\n* Keras AUC metric does not seem to work for torch cuda.\r\n\r\n<a name=\"citing\"></a>\r\n# Citing\r\n\r\nIf you want to cite this repo, please refer to our [paper](https://doi.org/10.1016/j.simpa.2021.100095):\r\n\r\n```\r\n@article{REISER2021100095,\r\ntitle = {Graph neural networks in TensorFlow-Keras with RaggedTensor representation (kgcnn)},\r\njournal = {Software Impacts},\r\npages = {100095},\r\nyear = {2021},\r\nissn = {2665-9638},\r\ndoi = {https://doi.org/10.1016/j.simpa.2021.100095},\r\nurl = {https://www.sciencedirect.com/science/article/pii/S266596382100035X},\r\nauthor = {Patrick Reiser and Andre Eberhard and Pascal Friederich}\r\n}\r\n```\r\n\r\n<a name=\"references\"></a>\r\n# References\r\n\r\n- https://www.tensorflow.org/api_docs/python/tf/RaggedTensor\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "General Base Layers for Graph Convolutions with Keras",
    "version": "4.0.1",
    "project_urls": {
        "Homepage": "https://github.com/aimat-lab/gcnn_keras"
    },
    "split_keywords": [
        "materials",
        "science",
        "machine",
        "learning",
        "deep",
        "graph",
        "networks",
        "neural"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a816e3e5ec7dfb9710bede7d8941aa5cc64fa7fcc49975d227529f3d360c0ea4",
                "md5": "fe77bcf831e24f9c9ff62c8380894ba4",
                "sha256": "befb13820023b3796e2c70c76092ac6ef885a0734ffd664b333122eaafbdbb12"
            },
            "downloads": -1,
            "filename": "kgcnn-4.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fe77bcf831e24f9c9ff62c8380894ba4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 468802,
            "upload_time": "2024-02-27T12:33:28",
            "upload_time_iso_8601": "2024-02-27T12:33:28.507161Z",
            "url": "https://files.pythonhosted.org/packages/a8/16/e3e5ec7dfb9710bede7d8941aa5cc64fa7fcc49975d227529f3d360c0ea4/kgcnn-4.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5d0941f50fa2d2c483bb63127305a3d942c2685b2407a7ee2dca98ec8bb4e0b",
                "md5": "2cb4a25898dda5fcf73ec135bfbb0ceb",
                "sha256": "db3868850b9f41ebf9b9f4dff74ad8d53a96f13045e97d1a1db158d69d9d8c4c"
            },
            "downloads": -1,
            "filename": "kgcnn-4.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2cb4a25898dda5fcf73ec135bfbb0ceb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 347035,
            "upload_time": "2024-02-27T12:33:31",
            "upload_time_iso_8601": "2024-02-27T12:33:31.203196Z",
            "url": "https://files.pythonhosted.org/packages/c5/d0/941f50fa2d2c483bb63127305a3d942c2685b2407a7ee2dca98ec8bb4e0b/kgcnn-4.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-27 12:33:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aimat-lab",
    "github_project": "gcnn_keras",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "kgcnn"
}
        
Elapsed time: 0.23035s