catbench


Namecatbench JSON
Version 0.1.18 PyPI version JSON
download
home_pagehttps://github.com/JinukMoon/catbench
SummaryCatBench: Benchmark of Graph Neural Networks for Adsorption Energy Predictions in Heterogeneous Catalysis
upload_time2024-12-05 05:33:03
maintainerNone
docs_urlNone
authorJinukMoon
requires_python>=3.8
licenseMIT
keywords gnn benchmarking for catalysis
VCS
bugtrack_url
requirements ase xlsxwriter numpy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CatBench
CatBench: Benchmark Framework for Graph Neural Networks in Adsorption Energy Predictions

## Installation

```bash
pip install catbench
```

## Overview
![CatBench Schematic](assets/CatBench_Schematic.png)
CatBench is a comprehensive benchmarking framework designed to evaluate Graph Neural Networks (GNNs) for adsorption energy predictions. It provides tools for data processing, model evaluation, and result analysis.

## Usage Workflow

### 1. Data Processing
CatBench supports two types of data sources:

#### A. Direct from Catalysis-Hub

```python
# Import the catbench package
import catbench

# Process data from Catalysis-Hub
catbench.cathub_preprocess("Catalysis-Hub_Dataset_tag")
```

**Example:**
```python
# Process specific dataset from Catalysis-Hub
# Using AraComputational2022 as an example
catbench.cathub_preprocess("AraComputational2022")
```

#### B. User Dataset
For custom datasets, prepare your data structure as follows:

The data structure should include:
- Gas references (`gas/`) containing VASP output files for gas phase molecules
- Surface structures (`surface1/`, `surface2/`, etc.) containing:
  - Clean slab calculations (`slab/`)
  - Adsorbate-surface systems (`H/`, `OH/`, etc.)

Note: Each directory must contain CONTCAR and OSZICAR files. Other VASP output files can be present as well - the `process_output` function will automatically clean up (delete) all files except CONTCAR and OSZICAR.

```
data/
├── gas/
│   ├── H2gas/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   └── H2Ogas/
│       ├── CONTCAR
│       └── OSZICAR
├── surface1/
│   ├── slab/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   ├── H/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   └── OH/
│       ├── CONTCAR
│       └── OSZICAR
└── surface2/
    ├── slab/
    │   ├── CONTCAR
    │   └── OSZICAR
    ├── H/
    │   ├── CONTCAR
    │   └── OSZICAR
    └── OH/
        ├── CONTCAR
        └── OSZICAR
```

Then process using:

```python
import catbench

# Define coefficients for calculating adsorption energies
# For each adsorbate, specify coefficients based on the reaction equation:
# Example for H*: 
#   E_ads(H*) = E(H*) - E(slab) - 1/2 E(H2_gas)
# Example for OH*:
#   E_ads(OH*) = E(OH*) - E(slab) + 1/2 E(H2_gas) - E(H2O_gas)

coeff_setting = {
    "H": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": -1/2,   # Coefficient for H2 gas reference
    },
    "OH": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": +1/2,   # Coefficient for H2 gas reference
        "H2Ogas": -1,    # Coefficient for H2O gas reference
    },
}

# This will clean up directories and keep only CONTCAR and OSZICAR files
catbench.process_output("data", coeff_setting)
catbench.userdata_preprocess("data")
```

### 2. Execute Benchmark

#### A. General Benchmark
This is a general benchmark setup. The `range()` value determines the number of repetitions for reproducibility testing. If reproducibility testing is not needed, it can be set to 1.

```python
import catbench
from your_calculator import Calculator

# Prepare calculator list
# range(5): Run 5 times for reproducibility testing
# range(1): Single run when reproducibility testing is not needed
calculators = [Calculator() for _ in range(5)]

config = {}
catbench.execute_benchmark(calculators, **config)
```

After execution, the following files and directories will be created:

1. A `result` directory is created to store all calculation outputs.
2. Inside the `result` directory, subdirectories are created for each GNN.
3. Each GNN's subdirectory contains:
   - `gases/`: Gas reference molecules for adsorption energy calculations
   - `log/`: Slab and adslab calculation logs
   - `traj/`: Slab and adslab trajectory files
   - `{GNN_name}_gases.json`: Gas molecules energies
   - `{GNN_name}_outlier.json`: Outlier detection status for each adsorption data
   - `{GNN_name}_result.json`: Raw data (energies, calculation times, outlier detection, slab displacements, etc.)

#### B. Single-point Calculation Benchmark

```python
import catbench
from your_calculator import Calculator

calculator = Calculator()

config = {}
catbench.execute_benchmark_single(calculator, **config)
```

### 3. Analysis

```python
import catbench

config = {}
catbench.analysis_GNNs(**config)
```

The analysis function processes the calculation data stored in the `result` directory and generates:

1. A `plot/` directory:
   - Parity plots for each GNN model
   - Combined parity plots for comparison
   - Performance visualization plots

2. An Excel file `{dataset_name}_Benchmarking_Analysis.xlsx`:
   - Comprehensive performance metrics for all GNN models
   - Statistical analysis of predictions
   - Model-specific details and parameters

#### Single-point Calculation Analysis

```python
import catbench

config = {}
catbench.analysis_GNNs_single(**config)
```

## Outputs

### 1. Adsorption Energy Parity Plot (mono_version & multi_version)
You can plot adsorption energy parity plots for each adsorbate across all GNNs, either simply or by adsorbate.
<p float="left">
  <img src="assets/mono_plot.png" width="400" />
  <img src="assets/multi_plot.png" width="400" />
</p>

### 2. Comprehensive Performance Table
View various metrics for all GNNs.
![Comparison Table](assets/comparison_table.png)

### 3. Outlier Analysis
See how outliers are detected for all GNNs.
![Comparison Table](assets/outlier_table.png)

### 4. Analysis by Adsorbate
Observe how each GNN predicts for each adsorbate.
![Comparison Table](assets/adsorbate_comp_table.png)

## Configuration Options

### execute_benchmark
| Option | Description | Default |
|--------|-------------|---------|
| GNN_name | Name of your GNN | Required |
| benchmark | Name of benchmark dataset | Required |
| F_CRIT_RELAX | Force convergence criterion | 0.05 |
| N_CRIT_RELAX | Maximum number of steps | 999 |
| rate | Fix ratio for surface atoms (0: use original constraints, >0: fix atoms from bottom up to specified ratio) | 0.5 |
| disp_thrs_slab | Displacement threshold for slab | 1.0 |
| disp_thrs_ads | Displacement threshold for adsorbate | 1.5 |
| again_seed | Seed variation threshold | 0.2 |
| damping | Damping factor for optimization | 1.0 |
| gas_distance | Cell size for gas molecules | 10 |
| optimizer | Optimization algorithm | "LBFGS" |

### execute_benchmark_single
| Option | Description | Default |
|--------|-------------|---------|
| GNN_name | Name of your GNN | Required |
| benchmark | Name of benchmark dataset | Required |
| gas_distance | Cell size for gas molecules | 10 |

### analysis_GNNs
| Option | Description | Default |
|--------|-------------|---------|
| Benchmarking_name | Name for output files | Current directory name |
| calculating_path | Path to result directory | "./result" |
| GNN_list | List of GNNs to analyze | All GNNs in result directory |
| target_adsorbates | Target adsorbates to analyze | All adsorbates |
| specific_color | Color for plots | "black" |
| min | Plot y-axis minimum | Auto-calculated |
| max | Plot y-axis maximum | Auto-calculated |
| figsize | Figure size | (9, 8) |
| mark_size | Marker size | 100 |
| linewidths | Line width | 1.5 |
| dpi | Plot resolution | 300 |
| legend_off | Toggle legend | False |
| error_bar_display | Toggle error bars | False |

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation
This work will be published soon.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/JinukMoon/catbench",
    "name": "catbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "GNN benchmarking for catalysis",
    "author": "JinukMoon",
    "author_email": "jumoon@snu.ac.kr",
    "download_url": "https://files.pythonhosted.org/packages/a8/ba/e9b20ef2ffb1454c290946bb3bef74cb1f73bd951e97b8b9ad4a701818f5/catbench-0.1.18.tar.gz",
    "platform": null,
    "description": "# CatBench\nCatBench: Benchmark Framework for Graph Neural Networks in Adsorption Energy Predictions\n\n## Installation\n\n```bash\npip install catbench\n```\n\n## Overview\n![CatBench Schematic](assets/CatBench_Schematic.png)\nCatBench is a comprehensive benchmarking framework designed to evaluate Graph Neural Networks (GNNs) for adsorption energy predictions. It provides tools for data processing, model evaluation, and result analysis.\n\n## Usage Workflow\n\n### 1. Data Processing\nCatBench supports two types of data sources:\n\n#### A. Direct from Catalysis-Hub\n\n```python\n# Import the catbench package\nimport catbench\n\n# Process data from Catalysis-Hub\ncatbench.cathub_preprocess(\"Catalysis-Hub_Dataset_tag\")\n```\n\n**Example:**\n```python\n# Process specific dataset from Catalysis-Hub\n# Using AraComputational2022 as an example\ncatbench.cathub_preprocess(\"AraComputational2022\")\n```\n\n#### B. User Dataset\nFor custom datasets, prepare your data structure as follows:\n\nThe data structure should include:\n- Gas references (`gas/`) containing VASP output files for gas phase molecules\n- Surface structures (`surface1/`, `surface2/`, etc.) containing:\n  - Clean slab calculations (`slab/`)\n  - Adsorbate-surface systems (`H/`, `OH/`, etc.)\n\nNote: Each directory must contain CONTCAR and OSZICAR files. Other VASP output files can be present as well - the `process_output` function will automatically clean up (delete) all files except CONTCAR and OSZICAR.\n\n```\ndata/\n\u251c\u2500\u2500 gas/\n\u2502   \u251c\u2500\u2500 H2gas/\n\u2502   \u2502   \u251c\u2500\u2500 CONTCAR\n\u2502   \u2502   \u2514\u2500\u2500 OSZICAR\n\u2502   \u2514\u2500\u2500 H2Ogas/\n\u2502       \u251c\u2500\u2500 CONTCAR\n\u2502       \u2514\u2500\u2500 OSZICAR\n\u251c\u2500\u2500 surface1/\n\u2502   \u251c\u2500\u2500 slab/\n\u2502   \u2502   \u251c\u2500\u2500 CONTCAR\n\u2502   \u2502   \u2514\u2500\u2500 OSZICAR\n\u2502   \u251c\u2500\u2500 H/\n\u2502   \u2502   \u251c\u2500\u2500 CONTCAR\n\u2502   \u2502   \u2514\u2500\u2500 OSZICAR\n\u2502   \u2514\u2500\u2500 OH/\n\u2502       \u251c\u2500\u2500 CONTCAR\n\u2502       \u2514\u2500\u2500 OSZICAR\n\u2514\u2500\u2500 surface2/\n    \u251c\u2500\u2500 slab/\n    \u2502   \u251c\u2500\u2500 CONTCAR\n    \u2502   \u2514\u2500\u2500 OSZICAR\n    \u251c\u2500\u2500 H/\n    \u2502   \u251c\u2500\u2500 CONTCAR\n    \u2502   \u2514\u2500\u2500 OSZICAR\n    \u2514\u2500\u2500 OH/\n        \u251c\u2500\u2500 CONTCAR\n        \u2514\u2500\u2500 OSZICAR\n```\n\nThen process using:\n\n```python\nimport catbench\n\n# Define coefficients for calculating adsorption energies\n# For each adsorbate, specify coefficients based on the reaction equation:\n# Example for H*: \n#   E_ads(H*) = E(H*) - E(slab) - 1/2 E(H2_gas)\n# Example for OH*:\n#   E_ads(OH*) = E(OH*) - E(slab) + 1/2 E(H2_gas) - E(H2O_gas)\n\ncoeff_setting = {\n    \"H\": {\n        \"slab\": -1,      # Coefficient for clean surface\n        \"adslab\": 1,     # Coefficient for adsorbate-surface system\n        \"H2gas\": -1/2,   # Coefficient for H2 gas reference\n    },\n    \"OH\": {\n        \"slab\": -1,      # Coefficient for clean surface\n        \"adslab\": 1,     # Coefficient for adsorbate-surface system\n        \"H2gas\": +1/2,   # Coefficient for H2 gas reference\n        \"H2Ogas\": -1,    # Coefficient for H2O gas reference\n    },\n}\n\n# This will clean up directories and keep only CONTCAR and OSZICAR files\ncatbench.process_output(\"data\", coeff_setting)\ncatbench.userdata_preprocess(\"data\")\n```\n\n### 2. Execute Benchmark\n\n#### A. General Benchmark\nThis is a general benchmark setup. The `range()` value determines the number of repetitions for reproducibility testing. If reproducibility testing is not needed, it can be set to 1.\n\n```python\nimport catbench\nfrom your_calculator import Calculator\n\n# Prepare calculator list\n# range(5): Run 5 times for reproducibility testing\n# range(1): Single run when reproducibility testing is not needed\ncalculators = [Calculator() for _ in range(5)]\n\nconfig = {}\ncatbench.execute_benchmark(calculators, **config)\n```\n\nAfter execution, the following files and directories will be created:\n\n1. A `result` directory is created to store all calculation outputs.\n2. Inside the `result` directory, subdirectories are created for each GNN.\n3. Each GNN's subdirectory contains:\n   - `gases/`: Gas reference molecules for adsorption energy calculations\n   - `log/`: Slab and adslab calculation logs\n   - `traj/`: Slab and adslab trajectory files\n   - `{GNN_name}_gases.json`: Gas molecules energies\n   - `{GNN_name}_outlier.json`: Outlier detection status for each adsorption data\n   - `{GNN_name}_result.json`: Raw data (energies, calculation times, outlier detection, slab displacements, etc.)\n\n#### B. Single-point Calculation Benchmark\n\n```python\nimport catbench\nfrom your_calculator import Calculator\n\ncalculator = Calculator()\n\nconfig = {}\ncatbench.execute_benchmark_single(calculator, **config)\n```\n\n### 3. Analysis\n\n```python\nimport catbench\n\nconfig = {}\ncatbench.analysis_GNNs(**config)\n```\n\nThe analysis function processes the calculation data stored in the `result` directory and generates:\n\n1. A `plot/` directory:\n   - Parity plots for each GNN model\n   - Combined parity plots for comparison\n   - Performance visualization plots\n\n2. An Excel file `{dataset_name}_Benchmarking_Analysis.xlsx`:\n   - Comprehensive performance metrics for all GNN models\n   - Statistical analysis of predictions\n   - Model-specific details and parameters\n\n#### Single-point Calculation Analysis\n\n```python\nimport catbench\n\nconfig = {}\ncatbench.analysis_GNNs_single(**config)\n```\n\n## Outputs\n\n### 1. Adsorption Energy Parity Plot (mono_version & multi_version)\nYou can plot adsorption energy parity plots for each adsorbate across all GNNs, either simply or by adsorbate.\n<p float=\"left\">\n  <img src=\"assets/mono_plot.png\" width=\"400\" />\n  <img src=\"assets/multi_plot.png\" width=\"400\" />\n</p>\n\n### 2. Comprehensive Performance Table\nView various metrics for all GNNs.\n![Comparison Table](assets/comparison_table.png)\n\n### 3. Outlier Analysis\nSee how outliers are detected for all GNNs.\n![Comparison Table](assets/outlier_table.png)\n\n### 4. Analysis by Adsorbate\nObserve how each GNN predicts for each adsorbate.\n![Comparison Table](assets/adsorbate_comp_table.png)\n\n## Configuration Options\n\n### execute_benchmark\n| Option | Description | Default |\n|--------|-------------|---------|\n| GNN_name | Name of your GNN | Required |\n| benchmark | Name of benchmark dataset | Required |\n| F_CRIT_RELAX | Force convergence criterion | 0.05 |\n| N_CRIT_RELAX | Maximum number of steps | 999 |\n| rate | Fix ratio for surface atoms (0: use original constraints, >0: fix atoms from bottom up to specified ratio) | 0.5 |\n| disp_thrs_slab | Displacement threshold for slab | 1.0 |\n| disp_thrs_ads | Displacement threshold for adsorbate | 1.5 |\n| again_seed | Seed variation threshold | 0.2 |\n| damping | Damping factor for optimization | 1.0 |\n| gas_distance | Cell size for gas molecules | 10 |\n| optimizer | Optimization algorithm | \"LBFGS\" |\n\n### execute_benchmark_single\n| Option | Description | Default |\n|--------|-------------|---------|\n| GNN_name | Name of your GNN | Required |\n| benchmark | Name of benchmark dataset | Required |\n| gas_distance | Cell size for gas molecules | 10 |\n\n### analysis_GNNs\n| Option | Description | Default |\n|--------|-------------|---------|\n| Benchmarking_name | Name for output files | Current directory name |\n| calculating_path | Path to result directory | \"./result\" |\n| GNN_list | List of GNNs to analyze | All GNNs in result directory |\n| target_adsorbates | Target adsorbates to analyze | All adsorbates |\n| specific_color | Color for plots | \"black\" |\n| min | Plot y-axis minimum | Auto-calculated |\n| max | Plot y-axis maximum | Auto-calculated |\n| figsize | Figure size | (9, 8) |\n| mark_size | Marker size | 100 |\n| linewidths | Line width | 1.5 |\n| dpi | Plot resolution | 300 |\n| legend_off | Toggle legend | False |\n| error_bar_display | Toggle error bars | False |\n\n## License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Citation\nThis work will be published soon.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CatBench: Benchmark of Graph Neural Networks for Adsorption Energy Predictions in Heterogeneous Catalysis",
    "version": "0.1.18",
    "project_urls": {
        "Homepage": "https://github.com/JinukMoon/catbench"
    },
    "split_keywords": [
        "gnn",
        "benchmarking",
        "for",
        "catalysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e07a55d27c2ebda35e124cdf2d7005ae136fb3ce003d6ecdb5662644bf886859",
                "md5": "559284dc492a6181851bc80adceb5302",
                "sha256": "222054362759d9aef51727d31176fbd8b782574b99c25ac85a8f701909bdf85f"
            },
            "downloads": -1,
            "filename": "catbench-0.1.18-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "559284dc492a6181851bc80adceb5302",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 17399,
            "upload_time": "2024-12-05T05:33:01",
            "upload_time_iso_8601": "2024-12-05T05:33:01.148037Z",
            "url": "https://files.pythonhosted.org/packages/e0/7a/55d27c2ebda35e124cdf2d7005ae136fb3ce003d6ecdb5662644bf886859/catbench-0.1.18-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a8bae9b20ef2ffb1454c290946bb3bef74cb1f73bd951e97b8b9ad4a701818f5",
                "md5": "85de92cfb3e3a79707d8a10191b62940",
                "sha256": "f0b5fad8cf05ec52222372dbd91dc8084ea71ae681bb0886660e9354fc8fa542"
            },
            "downloads": -1,
            "filename": "catbench-0.1.18.tar.gz",
            "has_sig": false,
            "md5_digest": "85de92cfb3e3a79707d8a10191b62940",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19866,
            "upload_time": "2024-12-05T05:33:03",
            "upload_time_iso_8601": "2024-12-05T05:33:03.204688Z",
            "url": "https://files.pythonhosted.org/packages/a8/ba/e9b20ef2ffb1454c290946bb3bef74cb1f73bd951e97b8b9ad4a701818f5/catbench-0.1.18.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-05 05:33:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "JinukMoon",
    "github_project": "catbench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ase",
            "specs": [
                [
                    ">=",
                    "3.22.1"
                ]
            ]
        },
        {
            "name": "xlsxwriter",
            "specs": [
                [
                    ">=",
                    "3.2.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26"
                ]
            ]
        }
    ],
    "lcname": "catbench"
}
        
Elapsed time: 9.56576s