asesurfacefinder


Nameasesurfacefinder JSON
Version 1.0.2 PyPI version JSON
download
home_pageNone
SummaryMachine learned location of chemical adsorbates on high-symmetry surface sites.
upload_time2024-11-21 14:42:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2024 Joe Gilkes Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords ase surface atoms molecule chemistry adsorbate random forest classification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ASESurfaceFinder

![PyPI - Version](https://img.shields.io/pypi/v/asesurfacefinder) ![Conda Version](https://img.shields.io/conda/v/conda-forge/asesurfacefinder)


A utility for determining surface facets and adsorption points of ASE-based systems consisting of molecules on surfaces.

[ASE](https://wiki.fysik.dtu.dk/ase/) comes with an excellent selection of utilities for working with atomic surfaces, enabling the construction of many common surface facets, the definition of high-symmetry points across these surfaces, and the adsorption of arbitrary molecules to these surface sites. However, computationally determining which of these sites an adsorbed molecule is bound to without prior knowledge is a non-trivial task, due to the symmetric equivalence of many sites and high similarity of many surface facets.

ASESurfaceFinder implements automated tools for training and validating random forest classification models (implemented in [scikit-learn](https://scikit-learn.org/stable/index.html)) that can identify surface sites based on the local atomic environment of adsorbed atoms. Given unseen adsorbed systems, it then enables these models to be used for prediction of both surface facet and high-symmetry adsorption site, to be used when cataloguing externally-generated adsorbed systems.

## Installation

ASESurfaceFinder has been developed for Python 3.10 and newer. It is available on PyPI and conda-forge, using their respective package managers:

```bash
# Pip (PyPI)
pip install asesurfacefinder

# Conda (conda-forge)
conda install -c conda-forge asesurfacefinder
```

## Usage

Given a workflow that produces XYZ geometries of molecules adsorbed on periodic solid surfaces in unknown locations, ASESurfaceFinder can be used to quickly categorise the adsorption sites by the surface's available high-symmetry points. For example, assuming the workflow can produce surface/adsorbate systems where the surface is one of the following (a 3x3x3 unit cell is shown):

| Example Surface | ASE Construction | Visualisation (3x3x3) |
|-----------------|------------------|-----------------------|
| Platinum FCC{100} | `fcc100('Pt', (3,3,3))` | ![3x3x3 unit cell of FCC{100} platinum](/examples/Pt_fcc100.svg) |
| Silver FCC{110} | `fcc110('Ag', (3,3,3))` | ![3x3x3 unit cell of FCC{110} silver](/examples/Ag_fcc110.svg) |
| Gold FCC{111} | `fcc111('Au', (3,3,3))` | ![3x3x3 unit cell of FCC{111} gold](/examples/Au_fcc111.svg) |

ASE defines the symmetrically equivalent adsorption sites for each of these surface facets (see [ASE's documentation on surface construction](https://wiki.fysik.dtu.dk/ase/ase/build/surface.html)). These are used as classification targets within ASESurfaceFinder.

To begin, the package's main class is initialised using these surfaces:

```python
from asesurfacefinder import SurfaceFinder
from ase.build import fcc100, fcc110, fcc111

surfaces = [fcc100('Pt', (3,3,3)), fcc110('Ag', (3,3,3)), fcc111('Au', (3,3,3))]
surface_labels = ['Pt_fcc100', 'Ag_fcc110', 'Au_fcc111']

sf = SurfaceFinder(surfaces, labels=surface_labels)
```

The `labels` argument is optional, and can be used for more precisely naming the surface facets; without it, they will be referred to by integers.

> [!IMPORTANT]
> While surfaces outside of those that can be generated by ASE can be trained on, they must contain an `'adsorbate_info'` dictionary internally that specifies their minimal XY unit cell and their high-symmetry adsorption sites.

### Model Training

To train a random forest classification model to recognise the adsorption sites on each of these surfaces, the `train` method of this class can be called:

```python
sf.train(
    samples_per_site=10000,
    ads_xy_noise=0.2,
    ads_z_bounds=(1.0, 2.75),
    n_jobs=4
)
```

For each surface facet passed into `SurfaceFinder`, this samples positions above each adsorption site `samples_per_site` times for possible perturbations and adsorption heights. The XY position is sampled from a bivariate normal distribution with its mean at the adsorption site and XY variance specified by `ads_xy_noise`. The height (Z position) is sampled from a uniform distribution between `ads_z_bounds`.

The local atomic environment around each sampled adsorbate position is then encoded by means of a descriptor from [DScribe](https://singroup.github.io/dscribe/latest/) - a SOAP descriptor with a cutoff of 10 Å is used by default, but this can be modified by passing a `descriptor` keyword argument when constructing the `SurfaceFinder`. Each descriptor is matched with a label representing the surface facet and adsorption site that it represents in the format `{label}_{site}`, where `{label}` is one of the labels passed into `SurfaceFinder` and `{site}` is the ASE high-symmetry site name. The `n_jobs` argument enables parallelism over the specified number of processes during descriptor creation and model training.

A random forest classifier is then trained on this data, such that any future points above a surface can be classified into a surface facet and adsorption site prediction. 

### Sample Visualisation

Possible realisations of the adsorption point sampling can be visualised with ASESurfaceFinder's `SamplePlotter`:

```python
from asesurfacefinder import SamplePlotter
sp = SamplePlotter(fcc111('Au', (3,3,3)),
                   samples_per_site=2000,
                   ads_xy_noise=0.2,
                   ads_z_bounds=(1.0, 2.75))

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,6), layout='constrained')
ax1.set_axis_off(); ax2.set_axis_off()
sp.plot(ax=ax1, rotation='0x,0y,0z')
sp.plot(ax=ax2, rotation='-75x,15y,0z')
```

This acts as a wrapper around `ase.visualize.plot`'s `plot_atoms` function, sampling adsorption points within the given bounds and displaying them on the requested surface. `SamplePlotter.plot()` passes through keyword arguments to this function, allowing for properties such as the rotation of each plot to be modified:

![Adsorption points sampled above high-symmetry sites on a gold FCC{111} surface](/examples/sampled_points.svg)

In this case, the sampled XY position noise may need to be decreased slightly, as the areas covered by the bridge and fcc sites have begun to overlap, which may reduce predictive accuracy.

### Model Validation

The training can be validated before performing any predictions on real systems:

```python
sf.validate(
    samples_per_site=2000,
    surf_mults=[(1,1,1), (3,3,1), (5,5,1)],
    ads_xy_noise=0.25,
    ads_z_bounds=(1.4, 2.7)
)
```

This performs the same adsorbate position sampling procedure as was used to generate local atomic environments in the training of the classifier, but different values of the sampling volume can be passed to test the limits of applicability of the current model. In this case, it is likely that there will be a few failed predictions, as the predictor will have to extrapolate to local environments outside the height and XY bounds that it was trained on.

Additionally, the `surf_mults` argument can be used to scale up the unit cell of each surface facet before sampling to ensure that adsorbates on larger surface slabs are correctly predicted. In this example, since the provided facets were all 3x3x3 unit cells, this will produce sampled adsorbate positions on 3x3x3, 9x9x3 and 15x15x3 surface slabs.

ASESurfaceFinder will output a summary of these validation results once it has predicted labels for these newly generated positions and compared them with their true values. If the classifier has been trained correctly, this will simply be something like

```
66000/66000 sites classified correctly (accuracy = 1.0).
```

However, if this validation fails for any sites, ASESurfaceFinder prints them in a table where `h` is the sampled adsorbate height and `d` is the distance in Angstroms away from the ideal adsorption site:

```
63559/66000 sites classified correctly (accuracy = 0.9630151515151515).

True                                            | Predicted
-------------------------------------------------------------------
Au_fcc111_bridge (5, 5, 1) (h = 2.48, d = 0.44) | Au_fcc111_hcp
Au_fcc111_bridge (5, 5, 1) (h = 1.42, d = 0.67) | Au_fcc111_ontop
Au_fcc111_bridge (5, 5, 1) (h = 1.60, d = 0.42) | Au_fcc111_hcp
Au_fcc111_fcc (5, 5, 1) (h = 2.43, d = 0.52)    | Au_fcc111_bridge
Au_fcc111_fcc (5, 5, 1) (h = 2.40, d = 0.63)    | Au_fcc111_bridge
Au_fcc111_fcc (5, 5, 1) (h = 1.98, d = 0.54)    | Au_fcc111_bridge
...
```

This indicates that either
1. the training samples were not varied enough to account for the sample space covered by the validation samples,
2. the validation samples covered too wide an area and overlapped between neighbouring sites, causing samples to be mislabelled,
3. the training samples accounted for too wide a sample space, causing overlap between neighbouring sites and misclassification.

In this case, we showed above that it is likely that all of these problems are arising - `SamplePlotter` showed that there was overlap in the training samples between some sites of `Au_fcc111`, and `ads_xy_noise` in the validation samples was even greater, creating mislabelled samples. In a production use case, these parameters should be tweaked such that validation returns no errors while sampling as broad a volume above each adsorption site as possible.

### Model Prediction

To predict the high-symmetry adsorption sites and surface facets of real surface/adsorbate systems, these can simply be fed into a trained `SurfaceFinder`.

Taking an example system of 3 methanol molecules adsorbed to 'fcc' sites of gold FCC{111}, with a free methanol molecule above the surface:

![Four methanol molecules on gold FCC{111}](/examples/4MeO_Au_fcc111.svg)

The adsorbates and gas phase molecule can be separated from the surface and classified by their adsorption sites (or lack thereof) with the `SurfaceFinder.predict()` method:

```python
from ase.io import read

real_surface_1 = read('examples/4MeO_Au_fcc111.xyz')
slab, molecules, labels = sf.predict(real_surface_1)
```

This returns three outputs:
1. `slab`: The clean surface geometry as an `ase.Atoms`, without any adsorbates.
2. `molecules`: A list of `ase.Atoms` representing the adsorbates and gas phase molecules found on and above the surface respectively.
3. `labels`: A list of dictionaries corresponding to the entries in `molecules` containing the atoms of the respective adsorbate that are adsorbed onto the surface and their predicted sites.

For example, given the above system:

```python
for molecule, label in zip(molecules, labels):
    print(molecule, label)
```

```
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}
Atoms(symbols='CHOH3', pbc=False, tags=...) {}
```

The first three entries are the adsorbed methanol molecules, correctly classified as being bound to the 'fcc' high-symmetry site on a gold FCC{111} surface. Each molecule's `label` dictionary is keyed by the index of the atom in the corresponding `molecule` which is bound; in the event that a molecule is bound to the surface by multiple atoms, it will contain en entry for each. In this case, all adsorbates are bound by their respective 0th atoms - their oxygens, indicated by the `'bonded_elem'` key.

Additionally, each `label` also provides the coordination of the adsorption site as detected by ASE's connectivity tools. Since the 'fcc' site sits in between three atoms, this is reflected in the `'coordination'` key.

The fourth molecule returned is the gas phase methanol, which is not adsorbed onto the surface. This is reflected in its `label` dictionary, which is empty to indicate no atoms are bound.

A more complex example with different adsorbates bound on a range of sites follows:

![A selection of molecules on a range of gold FCC{111} adsorption sites](/examples/CO+H2O+NHCH3_Au_fcc111.svg)

Running the same prediction workflow reveals carbon monoxide on an 'ontop' site, hydroxide on a 'fcc' site, and methylamine on a 'bridge' site:

```python
real_surface_2 = read('examples/CO+H2O+NHCH3_Au_fcc111.xyz')
slab, molecules, labels = sf.predict(real_surface_2)

for molecule, label in zip(molecules, labels):
    print(molecule, label)
```

```
Atoms(symbols='CO', pbc=False) {0: {'site': 'Au_fcc111_ontop', 'bonded_elem': 'C', 'coordination': 1}}
Atoms(symbols='OH', pbc=False) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}
Atoms(symbols='NHCH3', pbc=False) {0: {'site': 'Au_fcc111_bridge', 'bonded_elem': 'N', 'coordination': 2}}
```

#### A note on adsorbate identification

ASESurfaceFinder does not currently contain a nice, clean way of separating *all* adsorbates from surfaces. In the first instance, it searches for a set of per-atom 'tags' in the underlying `ase.Atoms` object (accessed by `atoms.get_tags()`). These are generated by ASE when using its adsorbate-adding functionality, and represent the layer of the surface that each atom is part of - adsorbates are tagged at layer 0, with increasing layer tags indicating increasing depth.

However, it is unlikely that input surface/adsorbate systems will be tagged like this since if they were generated with ASE, you would already know the sites on which the adsorbates sit. Instead, ASESurfaceFinder currently relies on separation by element. Since it knows which elements the surfaces are made of, it will simply mask these elements off to separate the adsorbates from the surface. A warning will be displayed when this is performed.

Since the majority of surface/adsorbate systems are organic molecules on metal surfaces, this approach will work most of the time. However, when dealing with surfaces that contain the same elements as adsorbates (e.g. diamond, silica, etc.), this approach will fail. A more concrete approach for adsorbate separation is being developed, but will likely require analysis of the training surfaces as a whole.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "asesurfacefinder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "ase, surface, atoms, molecule, chemistry, adsorbate, random forest, classification",
    "author": null,
    "author_email": "Joe Gilkes <joe@joegilk.es>",
    "download_url": "https://files.pythonhosted.org/packages/f3/4d/ae05dbbbe610b9f3cb5d23fa75de6d84d5d98ee2deec4fc15496969e29d7/asesurfacefinder-1.0.2.tar.gz",
    "platform": null,
    "description": "# ASESurfaceFinder\n\n![PyPI - Version](https://img.shields.io/pypi/v/asesurfacefinder) ![Conda Version](https://img.shields.io/conda/v/conda-forge/asesurfacefinder)\n\n\nA utility for determining surface facets and adsorption points of ASE-based systems consisting of molecules on surfaces.\n\n[ASE](https://wiki.fysik.dtu.dk/ase/) comes with an excellent selection of utilities for working with atomic surfaces, enabling the construction of many common surface facets, the definition of high-symmetry points across these surfaces, and the adsorption of arbitrary molecules to these surface sites. However, computationally determining which of these sites an adsorbed molecule is bound to without prior knowledge is a non-trivial task, due to the symmetric equivalence of many sites and high similarity of many surface facets.\n\nASESurfaceFinder implements automated tools for training and validating random forest classification models (implemented in [scikit-learn](https://scikit-learn.org/stable/index.html)) that can identify surface sites based on the local atomic environment of adsorbed atoms. Given unseen adsorbed systems, it then enables these models to be used for prediction of both surface facet and high-symmetry adsorption site, to be used when cataloguing externally-generated adsorbed systems.\n\n## Installation\n\nASESurfaceFinder has been developed for Python 3.10 and newer. It is available on PyPI and conda-forge, using their respective package managers:\n\n```bash\n# Pip (PyPI)\npip install asesurfacefinder\n\n# Conda (conda-forge)\nconda install -c conda-forge asesurfacefinder\n```\n\n## Usage\n\nGiven a workflow that produces XYZ geometries of molecules adsorbed on periodic solid surfaces in unknown locations, ASESurfaceFinder can be used to quickly categorise the adsorption sites by the surface's available high-symmetry points. For example, assuming the workflow can produce surface/adsorbate systems where the surface is one of the following (a 3x3x3 unit cell is shown):\n\n| Example Surface | ASE Construction | Visualisation (3x3x3) |\n|-----------------|------------------|-----------------------|\n| Platinum FCC{100} | `fcc100('Pt', (3,3,3))` | ![3x3x3 unit cell of FCC{100} platinum](/examples/Pt_fcc100.svg) |\n| Silver FCC{110} | `fcc110('Ag', (3,3,3))` | ![3x3x3 unit cell of FCC{110} silver](/examples/Ag_fcc110.svg) |\n| Gold FCC{111} | `fcc111('Au', (3,3,3))` | ![3x3x3 unit cell of FCC{111} gold](/examples/Au_fcc111.svg) |\n\nASE defines the symmetrically equivalent adsorption sites for each of these surface facets (see [ASE's documentation on surface construction](https://wiki.fysik.dtu.dk/ase/ase/build/surface.html)). These are used as classification targets within ASESurfaceFinder.\n\nTo begin, the package's main class is initialised using these surfaces:\n\n```python\nfrom asesurfacefinder import SurfaceFinder\nfrom ase.build import fcc100, fcc110, fcc111\n\nsurfaces = [fcc100('Pt', (3,3,3)), fcc110('Ag', (3,3,3)), fcc111('Au', (3,3,3))]\nsurface_labels = ['Pt_fcc100', 'Ag_fcc110', 'Au_fcc111']\n\nsf = SurfaceFinder(surfaces, labels=surface_labels)\n```\n\nThe `labels` argument is optional, and can be used for more precisely naming the surface facets; without it, they will be referred to by integers.\n\n> [!IMPORTANT]\n> While surfaces outside of those that can be generated by ASE can be trained on, they must contain an `'adsorbate_info'` dictionary internally that specifies their minimal XY unit cell and their high-symmetry adsorption sites.\n\n### Model Training\n\nTo train a random forest classification model to recognise the adsorption sites on each of these surfaces, the `train` method of this class can be called:\n\n```python\nsf.train(\n    samples_per_site=10000,\n    ads_xy_noise=0.2,\n    ads_z_bounds=(1.0, 2.75),\n    n_jobs=4\n)\n```\n\nFor each surface facet passed into `SurfaceFinder`, this samples positions above each adsorption site `samples_per_site` times for possible perturbations and adsorption heights. The XY position is sampled from a bivariate normal distribution with its mean at the adsorption site and XY variance specified by `ads_xy_noise`. The height (Z position) is sampled from a uniform distribution between `ads_z_bounds`.\n\nThe local atomic environment around each sampled adsorbate position is then encoded by means of a descriptor from [DScribe](https://singroup.github.io/dscribe/latest/) - a SOAP descriptor with a cutoff of 10 \u00c5 is used by default, but this can be modified by passing a `descriptor` keyword argument when constructing the `SurfaceFinder`. Each descriptor is matched with a label representing the surface facet and adsorption site that it represents in the format `{label}_{site}`, where `{label}` is one of the labels passed into `SurfaceFinder` and `{site}` is the ASE high-symmetry site name. The `n_jobs` argument enables parallelism over the specified number of processes during descriptor creation and model training.\n\nA random forest classifier is then trained on this data, such that any future points above a surface can be classified into a surface facet and adsorption site prediction. \n\n### Sample Visualisation\n\nPossible realisations of the adsorption point sampling can be visualised with ASESurfaceFinder's `SamplePlotter`:\n\n```python\nfrom asesurfacefinder import SamplePlotter\nsp = SamplePlotter(fcc111('Au', (3,3,3)),\n                   samples_per_site=2000,\n                   ads_xy_noise=0.2,\n                   ads_z_bounds=(1.0, 2.75))\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,6), layout='constrained')\nax1.set_axis_off(); ax2.set_axis_off()\nsp.plot(ax=ax1, rotation='0x,0y,0z')\nsp.plot(ax=ax2, rotation='-75x,15y,0z')\n```\n\nThis acts as a wrapper around `ase.visualize.plot`'s `plot_atoms` function, sampling adsorption points within the given bounds and displaying them on the requested surface. `SamplePlotter.plot()` passes through keyword arguments to this function, allowing for properties such as the rotation of each plot to be modified:\n\n![Adsorption points sampled above high-symmetry sites on a gold FCC{111} surface](/examples/sampled_points.svg)\n\nIn this case, the sampled XY position noise may need to be decreased slightly, as the areas covered by the bridge and fcc sites have begun to overlap, which may reduce predictive accuracy.\n\n### Model Validation\n\nThe training can be validated before performing any predictions on real systems:\n\n```python\nsf.validate(\n    samples_per_site=2000,\n    surf_mults=[(1,1,1), (3,3,1), (5,5,1)],\n    ads_xy_noise=0.25,\n    ads_z_bounds=(1.4, 2.7)\n)\n```\n\nThis performs the same adsorbate position sampling procedure as was used to generate local atomic environments in the training of the classifier, but different values of the sampling volume can be passed to test the limits of applicability of the current model. In this case, it is likely that there will be a few failed predictions, as the predictor will have to extrapolate to local environments outside the height and XY bounds that it was trained on.\n\nAdditionally, the `surf_mults` argument can be used to scale up the unit cell of each surface facet before sampling to ensure that adsorbates on larger surface slabs are correctly predicted. In this example, since the provided facets were all 3x3x3 unit cells, this will produce sampled adsorbate positions on 3x3x3, 9x9x3 and 15x15x3 surface slabs.\n\nASESurfaceFinder will output a summary of these validation results once it has predicted labels for these newly generated positions and compared them with their true values. If the classifier has been trained correctly, this will simply be something like\n\n```\n66000/66000 sites classified correctly (accuracy = 1.0).\n```\n\nHowever, if this validation fails for any sites, ASESurfaceFinder prints them in a table where `h` is the sampled adsorbate height and `d` is the distance in Angstroms away from the ideal adsorption site:\n\n```\n63559/66000 sites classified correctly (accuracy = 0.9630151515151515).\n\nTrue                                            | Predicted\n-------------------------------------------------------------------\nAu_fcc111_bridge (5, 5, 1) (h = 2.48, d = 0.44) | Au_fcc111_hcp\nAu_fcc111_bridge (5, 5, 1) (h = 1.42, d = 0.67) | Au_fcc111_ontop\nAu_fcc111_bridge (5, 5, 1) (h = 1.60, d = 0.42) | Au_fcc111_hcp\nAu_fcc111_fcc (5, 5, 1) (h = 2.43, d = 0.52)    | Au_fcc111_bridge\nAu_fcc111_fcc (5, 5, 1) (h = 2.40, d = 0.63)    | Au_fcc111_bridge\nAu_fcc111_fcc (5, 5, 1) (h = 1.98, d = 0.54)    | Au_fcc111_bridge\n...\n```\n\nThis indicates that either\n1. the training samples were not varied enough to account for the sample space covered by the validation samples,\n2. the validation samples covered too wide an area and overlapped between neighbouring sites, causing samples to be mislabelled,\n3. the training samples accounted for too wide a sample space, causing overlap between neighbouring sites and misclassification.\n\nIn this case, we showed above that it is likely that all of these problems are arising - `SamplePlotter` showed that there was overlap in the training samples between some sites of `Au_fcc111`, and `ads_xy_noise` in the validation samples was even greater, creating mislabelled samples. In a production use case, these parameters should be tweaked such that validation returns no errors while sampling as broad a volume above each adsorption site as possible.\n\n### Model Prediction\n\nTo predict the high-symmetry adsorption sites and surface facets of real surface/adsorbate systems, these can simply be fed into a trained `SurfaceFinder`.\n\nTaking an example system of 3 methanol molecules adsorbed to 'fcc' sites of gold FCC{111}, with a free methanol molecule above the surface:\n\n![Four methanol molecules on gold FCC{111}](/examples/4MeO_Au_fcc111.svg)\n\nThe adsorbates and gas phase molecule can be separated from the surface and classified by their adsorption sites (or lack thereof) with the `SurfaceFinder.predict()` method:\n\n```python\nfrom ase.io import read\n\nreal_surface_1 = read('examples/4MeO_Au_fcc111.xyz')\nslab, molecules, labels = sf.predict(real_surface_1)\n```\n\nThis returns three outputs:\n1. `slab`: The clean surface geometry as an `ase.Atoms`, without any adsorbates.\n2. `molecules`: A list of `ase.Atoms` representing the adsorbates and gas phase molecules found on and above the surface respectively.\n3. `labels`: A list of dictionaries corresponding to the entries in `molecules` containing the atoms of the respective adsorbate that are adsorbed onto the surface and their predicted sites.\n\nFor example, given the above system:\n\n```python\nfor molecule, label in zip(molecules, labels):\n    print(molecule, label)\n```\n\n```\nAtoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}\nAtoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}\nAtoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}\nAtoms(symbols='CHOH3', pbc=False, tags=...) {}\n```\n\nThe first three entries are the adsorbed methanol molecules, correctly classified as being bound to the 'fcc' high-symmetry site on a gold FCC{111} surface. Each molecule's `label` dictionary is keyed by the index of the atom in the corresponding `molecule` which is bound; in the event that a molecule is bound to the surface by multiple atoms, it will contain en entry for each. In this case, all adsorbates are bound by their respective 0th atoms - their oxygens, indicated by the `'bonded_elem'` key.\n\nAdditionally, each `label` also provides the coordination of the adsorption site as detected by ASE's connectivity tools. Since the 'fcc' site sits in between three atoms, this is reflected in the `'coordination'` key.\n\nThe fourth molecule returned is the gas phase methanol, which is not adsorbed onto the surface. This is reflected in its `label` dictionary, which is empty to indicate no atoms are bound.\n\nA more complex example with different adsorbates bound on a range of sites follows:\n\n![A selection of molecules on a range of gold FCC{111} adsorption sites](/examples/CO+H2O+NHCH3_Au_fcc111.svg)\n\nRunning the same prediction workflow reveals carbon monoxide on an 'ontop' site, hydroxide on a 'fcc' site, and methylamine on a 'bridge' site:\n\n```python\nreal_surface_2 = read('examples/CO+H2O+NHCH3_Au_fcc111.xyz')\nslab, molecules, labels = sf.predict(real_surface_2)\n\nfor molecule, label in zip(molecules, labels):\n    print(molecule, label)\n```\n\n```\nAtoms(symbols='CO', pbc=False) {0: {'site': 'Au_fcc111_ontop', 'bonded_elem': 'C', 'coordination': 1}}\nAtoms(symbols='OH', pbc=False) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3}}\nAtoms(symbols='NHCH3', pbc=False) {0: {'site': 'Au_fcc111_bridge', 'bonded_elem': 'N', 'coordination': 2}}\n```\n\n#### A note on adsorbate identification\n\nASESurfaceFinder does not currently contain a nice, clean way of separating *all* adsorbates from surfaces. In the first instance, it searches for a set of per-atom 'tags' in the underlying `ase.Atoms` object (accessed by `atoms.get_tags()`). These are generated by ASE when using its adsorbate-adding functionality, and represent the layer of the surface that each atom is part of - adsorbates are tagged at layer 0, with increasing layer tags indicating increasing depth.\n\nHowever, it is unlikely that input surface/adsorbate systems will be tagged like this since if they were generated with ASE, you would already know the sites on which the adsorbates sit. Instead, ASESurfaceFinder currently relies on separation by element. Since it knows which elements the surfaces are made of, it will simply mask these elements off to separate the adsorbates from the surface. A warning will be displayed when this is performed.\n\nSince the majority of surface/adsorbate systems are organic molecules on metal surfaces, this approach will work most of the time. However, when dealing with surfaces that contain the same elements as adsorbates (e.g. diamond, silica, etc.), this approach will fail. A more concrete approach for adsorbate separation is being developed, but will likely require analysis of the training surfaces as a whole.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Joe Gilkes  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Machine learned location of chemical adsorbates on high-symmetry surface sites.",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/joegilkes/ASESurfaceFinder"
    },
    "split_keywords": [
        "ase",
        " surface",
        " atoms",
        " molecule",
        " chemistry",
        " adsorbate",
        " random forest",
        " classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8426b00a1eda3ceca5928c790eaa886e17634d52f936288a68f286c37c7d3176",
                "md5": "2ec111d3358b2072e2448ef96bf9b84d",
                "sha256": "67ce8b9bf5c88555bb5a169cb75c3679e6526f47bf56bf66e1638d8803d80d50"
            },
            "downloads": -1,
            "filename": "asesurfacefinder-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2ec111d3358b2072e2448ef96bf9b84d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 15756,
            "upload_time": "2024-11-21T14:42:10",
            "upload_time_iso_8601": "2024-11-21T14:42:10.977568Z",
            "url": "https://files.pythonhosted.org/packages/84/26/b00a1eda3ceca5928c790eaa886e17634d52f936288a68f286c37c7d3176/asesurfacefinder-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f34dae05dbbbe610b9f3cb5d23fa75de6d84d5d98ee2deec4fc15496969e29d7",
                "md5": "b09ab75cb61c9e2f826ff1710c94091e",
                "sha256": "23788c1a3ac01b7198edeb634d68e8f8b7512381bed079b7ad5d2a3ad801a5b6"
            },
            "downloads": -1,
            "filename": "asesurfacefinder-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b09ab75cb61c9e2f826ff1710c94091e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 19262,
            "upload_time": "2024-11-21T14:42:12",
            "upload_time_iso_8601": "2024-11-21T14:42:12.033382Z",
            "url": "https://files.pythonhosted.org/packages/f3/4d/ae05dbbbe610b9f3cb5d23fa75de6d84d5d98ee2deec4fc15496969e29d7/asesurfacefinder-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-21 14:42:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "joegilkes",
    "github_project": "ASESurfaceFinder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "asesurfacefinder"
}
        
Elapsed time: 0.37836s