ElMD


NameElMD JSON
Version 0.5.14 PyPI version JSON
download
home_pagehttps://github.com/lrcfmd/ElMD/
SummaryAn implementation of the Element movers distance for chemical similarity of ionic compositions
upload_time2024-10-26 02:14:16
maintainerNone
docs_urlNone
authorCameron Hagreaves
requires_pythonNone
licenseGPL3
keywords cheminformatics materials science machine learning materials representation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ElMD

![A drawing of ants moving earth](https://i.imgur.com/fg8Nrma.png)

The Element Movers Distance (ElMD) is a similarity measure for chemical compositions. This distance between two compositions is calculated from the minimal amount of work taken to transform one distribution of elements to another along the modified Pettifor scale. 

This repository provides the reference implementations as described in our paper "[The Earth Movers Distance as a metric for the space of inorganic compositions](https://chemrxiv.org/articles/preprint/The_Earth_Mover_s_Distance_as_a_Metric_for_the_Space_of_Inorganic_Compositions/12777566)". 

If you wish to compute this metric between lots of compositions, the ElM2D high-performance library may be more useful and can be found at www.github.com/lrcfmd/ElM2D.

We recommend installation via pip

```
pip install ElMD
```

For python 3.8+, due to [known library conflicts](https://github.com/materialsproject/matbench/issues/172) it is reccomended to install ElMD separate to its dependencies 

```
pip install ElMD --no-deps
pip install numpy # if necessary
pip install numba # Gives significant speedup, but can cause dependency issues with other libraries
```

## Usage
For simple usage initiate an object with its compositional formula

```python
> from ElMD import ElMD
> x = ElMD("CaTiO3")
```

Calculate the distance to a second object with the `elmd` method. 

```python
> x.elmd("SrTiO3")
0.2
```

If the assignment plan (how each element in the source composition is mapped to the target composition) is required, this may be returned by setting the `return_assignments` flag in the `elmd` method.

```python
> x.elmd("SrTiO3", return_assignments=True)
(0.2, array([0.2, 0. , 0. , 0. , 0.2, 0. , 0. , 0. , 0.6]))
```

If the `mod_petti` elemental scale is suitable and no assignment plan is required, a significantly faster EMD algorithm may be used by setting `metric="fast"`
```python
> x = ElMD("CaTiO3", metric="fast")
> x.elmd("SrTiO3")
0.2
```

The compositional parser can handle user defined values of `x` when this is applicable.

```python
latp_02 = ElMD("Li1+xAlxTi2-x(PO4)3", x=0.2) # Li1.2Al0.2Ti1.8(PO4)3
latp_03 = ElMD("Li1+xAlxTi2-x(PO4)3", x=0.3) # Li1.3Al0.3Ti1.7(PO4)3
```

Alternate chemical scales may be accessed via the "metric" argument, e.g.

```python
> x = ElMD("CaTiO3", metric="atomic")
> x.elmd("SrTiO3")
3.6
```

The `elmd()` method is overloaded to take two strings, and may be imported directly. The choice of metric is specified with `metric`

```python
from ElMD import elmd
> elmd("NaCl", "LiCl")
0.5
> elmd("NaCl", "LiCl", metric="magpie")
0.688539
```

The `EMD` function can also be called directly, with the input being two vectors of distributions and the associated distance matrix between them.

```python
from ElMD import EMD
> EMD([0.5, 0.5], [0.5, 0.5], [[1, 90], [89, 0]])
0.5
```

## Elemental Similarity
You may use either traditional discrete scales or machine learnt representations for each element. In this instance a vector has been generated for each element, and the distance between elements (not compositions!) is the Euclidean distance. 

Due to the disparity in magnitudes of some of these values, a select few have additionally been scaled.

Linear:
- [mendeleev](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)
- [petti](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)
- [atomic](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)
- [mod_petti](https://iopscience.iop.org/article/10.1088/1367-2630/18/9/093011/meta)

Chemically Derived:
- [oliynyk](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [oliynyk_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [jarvis](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [jarvis_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [magpie](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [magpie_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)

Machine Learnt:
- [cgcnn](https://github.com/CompRhys/roost/tree/master/data/embeddings)
- [elemnet](https://github.com/CompRhys/roost/tree/master/data/embeddings)
- [mat2vec](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)
- [matscholar](https://github.com/CompRhys/roost/tree/master/data/embeddings)
- [megnet16](https://github.com/CompRhys/roost/tree/master/data/embeddings)

Random Numbers:
- [random_200](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)

The Euclidean distance between these vectors is taken as the measure of elemental similarity. 

```python
> x = ElMD("NaCl", metric="magpie")
> x.elmd("LiCl")
46.697806

> x = ElMD("NaCl", metric="magpie_sc")
> x.elmd("LiCl")
0.688539
```

The feature dictionary can be accessed through the `periodic_tab` attribute:

```python
> featurizingDict = ElMD(metric="magpie).periodic_tab
> featurizingDict["Na"]
[2.0, 22.98976928, 370.87, 1.0, 3.0, 166.0, 0.93, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 29.2433333333, 0.0, 0.0, 229.0]
```

## Featurizing
Whilst not the initial purpose, a compositional based feature vector may be generated from ElMD objects should you require it. This is a mean pooling of the weighted composition feature matrix. 

Note that this vector representation is not used at any point during the ElMD distance calculation and is provided solely for convenience.

We construct this by taking the dot product of the ratios of each element with the features of these elements. Pass the argument feature_pooling="mean" to divide by the total number of elements in the compound.

```python
feature_vector = np.dot(ratio_vector, element_feature_matrix)
```

This is accessed through the `feature_vector` attribute.

```python
# For single element compositions, equivalent to x.periodic_tab["Cl"]
> x = ElMD("Cl", metric="magpie")
> x.feature_vector
array([ 94.    ,  35.453 , 171.6   ,  17.    ,   3.    , 102.    ,
         3.16  ,   2.    ,   5.    ,   0.    ,   0.    ,   7.    ,
         0.    ,   1.    ,   0.    ,   0.    ,   1.    ,  24.4975,
         2.493 ,   0.    ,  64.    ])

# Aggregate vector by each elements contribution
> x = ElMD("NaCl", metric="magpie").feature_vector
array([ 48.        ,  29.22138464, 271.235     ,   9.        ,
         3.        , 134.        ,   2.045     ,   1.5       ,
         2.5       ,   0.        ,   0.        ,   4.        ,
         0.5       ,   0.5       ,   0.        ,   0.        ,
         1.        ,  26.87041667,   1.2465    ,   0.        ,
       146.5       ])

```

A feature vector of length 8076 can be generated by concatenating the weighted mean, min, max, range, and standard deviation across each available elemental feature across all featurizing dictionaries for each element in the composition by calling the `full_feature_vector()` method.

```python
> x = ElMD("NaCl").full_feature_vector()
```

When using 1D unpooled elemental vectors, these may be mapped to the associated chemical formula using the `vec_to_formula` method:

```python
x = ElMD("CaTiO3")
y = ElMD("NaCl")

print(x.pretty_formula)
print(x.vec_to_formula(x.feature_vector)) # Same as above         
print(y.vec_to_formula(x.feature_vector)) # Same as above
```

## Citing

If you would like to cite this code in your work, please use the Chemistry of Materials reference

```
@article{doi:10.1021/acs.chemmater.0c03381,
    author = {Hargreaves, Cameron J. and Dyer, Matthew S. and Gaultois, Michael W. and Kurlin, Vitaliy A. and Rosseinsky, Matthew J.},
    title = {The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions},
    journal = {Chemistry of Materials},
    volume = {32},
    number = {24},
    pages = {10610-10620},
    year = {2020},
    doi = {10.1021/acs.chemmater.0c03381},
    URL = { 
        https://doi.org/10.1021/acs.chemmater.0c03381
    },
    eprint = { 
        https://doi.org/10.1021/acs.chemmater.0c03381
    }
}
```

## Issues

Please feel free to post any questions or comments as issues on this GitHub page.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lrcfmd/ElMD/",
    "name": "ElMD",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "ChemInformatics, Materials Science, Machine Learning, Materials Representation",
    "author": "Cameron Hagreaves",
    "author_email": "Cameron Hargreaves <cameron.h@rgreaves.me.uk>",
    "download_url": "https://files.pythonhosted.org/packages/2e/23/ba22c9c036d0562d299479f33291d63875093607b5c8bbb4d99b0c118b1e/ElMD-0.5.14.tar.gz",
    "platform": null,
    "description": "# ElMD\n\n![A drawing of ants moving earth](https://i.imgur.com/fg8Nrma.png)\n\nThe Element Movers Distance (ElMD) is a similarity measure for chemical compositions. This distance between two compositions is calculated from the minimal amount of work taken to transform one distribution of elements to another along the modified Pettifor scale. \n\nThis repository provides the reference implementations as described in our paper \"[The Earth Movers Distance as a metric for the space of inorganic compositions](https://chemrxiv.org/articles/preprint/The_Earth_Mover_s_Distance_as_a_Metric_for_the_Space_of_Inorganic_Compositions/12777566)\". \n\nIf you wish to compute this metric between lots of compositions, the ElM2D high-performance library may be more useful and can be found at www.github.com/lrcfmd/ElM2D.\n\nWe recommend installation via pip\n\n```\npip install ElMD\n```\n\nFor python 3.8+, due to [known library conflicts](https://github.com/materialsproject/matbench/issues/172) it is reccomended to install ElMD separate to its dependencies \n\n```\npip install ElMD --no-deps\npip install numpy # if necessary\npip install numba # Gives significant speedup, but can cause dependency issues with other libraries\n```\n\n## Usage\nFor simple usage initiate an object with its compositional formula\n\n```python\n> from ElMD import ElMD\n> x = ElMD(\"CaTiO3\")\n```\n\nCalculate the distance to a second object with the `elmd` method. \n\n```python\n> x.elmd(\"SrTiO3\")\n0.2\n```\n\nIf the assignment plan (how each element in the source composition is mapped to the target composition) is required, this may be returned by setting the `return_assignments` flag in the `elmd` method.\n\n```python\n> x.elmd(\"SrTiO3\", return_assignments=True)\n(0.2, array([0.2, 0. , 0. , 0. , 0.2, 0. , 0. , 0. , 0.6]))\n```\n\nIf the `mod_petti` elemental scale is suitable and no assignment plan is required, a significantly faster EMD algorithm may be used by setting `metric=\"fast\"`\n```python\n> x = ElMD(\"CaTiO3\", metric=\"fast\")\n> x.elmd(\"SrTiO3\")\n0.2\n```\n\nThe compositional parser can handle user defined values of `x` when this is applicable.\n\n```python\nlatp_02 = ElMD(\"Li1+xAlxTi2-x(PO4)3\", x=0.2) # Li1.2Al0.2Ti1.8(PO4)3\nlatp_03 = ElMD(\"Li1+xAlxTi2-x(PO4)3\", x=0.3) # Li1.3Al0.3Ti1.7(PO4)3\n```\n\nAlternate chemical scales may be accessed via the \"metric\" argument, e.g.\n\n```python\n> x = ElMD(\"CaTiO3\", metric=\"atomic\")\n> x.elmd(\"SrTiO3\")\n3.6\n```\n\nThe `elmd()` method is overloaded to take two strings, and may be imported directly. The choice of metric is specified with `metric`\n\n```python\nfrom ElMD import elmd\n> elmd(\"NaCl\", \"LiCl\")\n0.5\n> elmd(\"NaCl\", \"LiCl\", metric=\"magpie\")\n0.688539\n```\n\nThe `EMD` function can also be called directly, with the input being two vectors of distributions and the associated distance matrix between them.\n\n```python\nfrom ElMD import EMD\n> EMD([0.5, 0.5], [0.5, 0.5], [[1, 90], [89, 0]])\n0.5\n```\n\n## Elemental Similarity\nYou may use either traditional discrete scales or machine learnt representations for each element. In this instance a vector has been generated for each element, and the distance between elements (not compositions!) is the Euclidean distance. \n\nDue to the disparity in magnitudes of some of these values, a select few have additionally been scaled.\n\nLinear:\n- [mendeleev](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)\n- [petti](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)\n- [atomic](https://www.sciencedirect.com/science/article/abs/pii/S0925838803008004)\n- [mod_petti](https://iopscience.iop.org/article/10.1088/1367-2630/18/9/093011/meta)\n\nChemically Derived:\n- [oliynyk](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [oliynyk_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [jarvis](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [jarvis_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [magpie](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [magpie_sc](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n\nMachine Learnt:\n- [cgcnn](https://github.com/CompRhys/roost/tree/master/data/embeddings)\n- [elemnet](https://github.com/CompRhys/roost/tree/master/data/embeddings)\n- [mat2vec](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n- [matscholar](https://github.com/CompRhys/roost/tree/master/data/embeddings)\n- [megnet16](https://github.com/CompRhys/roost/tree/master/data/embeddings)\n\nRandom Numbers:\n- [random_200](https://github.com/anthony-wang/CrabNet/tree/master/data/element_properties)\n\nThe Euclidean distance between these vectors is taken as the measure of elemental similarity. \n\n```python\n> x = ElMD(\"NaCl\", metric=\"magpie\")\n> x.elmd(\"LiCl\")\n46.697806\n\n> x = ElMD(\"NaCl\", metric=\"magpie_sc\")\n> x.elmd(\"LiCl\")\n0.688539\n```\n\nThe feature dictionary can be accessed through the `periodic_tab` attribute:\n\n```python\n> featurizingDict = ElMD(metric=\"magpie).periodic_tab\n> featurizingDict[\"Na\"]\n[2.0, 22.98976928, 370.87, 1.0, 3.0, 166.0, 0.93, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 29.2433333333, 0.0, 0.0, 229.0]\n```\n\n## Featurizing\nWhilst not the initial purpose, a compositional based feature vector may be generated from ElMD objects should you require it. This is a mean pooling of the weighted composition feature matrix. \n\nNote that this vector representation is not used at any point during the ElMD distance calculation and is provided solely for convenience.\n\nWe construct this by taking the dot product of the ratios of each element with the features of these elements. Pass the argument feature_pooling=\"mean\" to divide by the total number of elements in the compound.\n\n```python\nfeature_vector = np.dot(ratio_vector, element_feature_matrix)\n```\n\nThis is accessed through the `feature_vector` attribute.\n\n```python\n# For single element compositions, equivalent to x.periodic_tab[\"Cl\"]\n> x = ElMD(\"Cl\", metric=\"magpie\")\n> x.feature_vector\narray([ 94.    ,  35.453 , 171.6   ,  17.    ,   3.    , 102.    ,\n         3.16  ,   2.    ,   5.    ,   0.    ,   0.    ,   7.    ,\n         0.    ,   1.    ,   0.    ,   0.    ,   1.    ,  24.4975,\n         2.493 ,   0.    ,  64.    ])\n\n# Aggregate vector by each elements contribution\n> x = ElMD(\"NaCl\", metric=\"magpie\").feature_vector\narray([ 48.        ,  29.22138464, 271.235     ,   9.        ,\n         3.        , 134.        ,   2.045     ,   1.5       ,\n         2.5       ,   0.        ,   0.        ,   4.        ,\n         0.5       ,   0.5       ,   0.        ,   0.        ,\n         1.        ,  26.87041667,   1.2465    ,   0.        ,\n       146.5       ])\n\n```\n\nA feature vector of length 8076 can be generated by concatenating the weighted mean, min, max, range, and standard deviation across each available elemental feature across all featurizing dictionaries for each element in the composition by calling the `full_feature_vector()` method.\n\n```python\n> x = ElMD(\"NaCl\").full_feature_vector()\n```\n\nWhen using 1D unpooled elemental vectors, these may be mapped to the associated chemical formula using the `vec_to_formula` method:\n\n```python\nx = ElMD(\"CaTiO3\")\ny = ElMD(\"NaCl\")\n\nprint(x.pretty_formula)\nprint(x.vec_to_formula(x.feature_vector)) # Same as above         \nprint(y.vec_to_formula(x.feature_vector)) # Same as above\n```\n\n## Citing\n\nIf you would like to cite this code in your work, please use the Chemistry of Materials reference\n\n```\n@article{doi:10.1021/acs.chemmater.0c03381,\n    author = {Hargreaves, Cameron J. and Dyer, Matthew S. and Gaultois, Michael W. and Kurlin, Vitaliy A. and Rosseinsky, Matthew J.},\n    title = {The Earth Mover\u2019s Distance as a Metric for the Space of Inorganic Compositions},\n    journal = {Chemistry of Materials},\n    volume = {32},\n    number = {24},\n    pages = {10610-10620},\n    year = {2020},\n    doi = {10.1021/acs.chemmater.0c03381},\n    URL = { \n        https://doi.org/10.1021/acs.chemmater.0c03381\n    },\n    eprint = { \n        https://doi.org/10.1021/acs.chemmater.0c03381\n    }\n}\n```\n\n## Issues\n\nPlease feel free to post any questions or comments as issues on this GitHub page.\n",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": "An implementation of the Element movers distance for chemical similarity of ionic compositions",
    "version": "0.5.14",
    "project_urls": {
        "Download": "https://github.com/lrcfmd/ElMD/archive/v0.5.12.tar.gz",
        "Home": "https://github.com/lrcfmd/ElMD",
        "Homepage": "https://github.com/lrcfmd/ElMD/"
    },
    "split_keywords": [
        "cheminformatics",
        " materials science",
        " machine learning",
        " materials representation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e23ba22c9c036d0562d299479f33291d63875093607b5c8bbb4d99b0c118b1e",
                "md5": "15191d4ebc61af2c288ad920abc3dd5d",
                "sha256": "55c3e1658739c4cb89610152cee9054b1f5fe808be24c925f288dd8b6e953f0d"
            },
            "downloads": -1,
            "filename": "ElMD-0.5.14.tar.gz",
            "has_sig": false,
            "md5_digest": "15191d4ebc61af2c288ad920abc3dd5d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1163536,
            "upload_time": "2024-10-26T02:14:16",
            "upload_time_iso_8601": "2024-10-26T02:14:16.066524Z",
            "url": "https://files.pythonhosted.org/packages/2e/23/ba22c9c036d0562d299479f33291d63875093607b5c8bbb4d99b0c118b1e/ElMD-0.5.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 02:14:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lrcfmd",
    "github_project": "ElMD",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "elmd"
}
        
Elapsed time: 2.69949s