scivae

Name	scivae JSON
Version	1.1.0 JSON
	download
home_page	https://github.com/ArianeMora/scivae
Summary
upload_time	2022-12-02 02:16:57
maintainer
docs_url	None
author	Ariane Mora
requires_python	>=3.8
license	GPL3
keywords	util
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # scivae

Check out our docs: https://arianemora.github.io/scivae/  

If you use this please cite: https://doi.org/10.1101/2021.06.22.449386

scivae is a wrapper around the keras AE that allows you to build/save/visualise with a variational autoencoder.

Blogs & notebooks used as references are noted in the code and a couple at the end of this README.

The primary difference between a VAE and a normal AE is in how the loss function is computed. Here the loss 
has been abstracted out to the loss class (in *loss.py*) where we can use a number of loss metrics MMD, KL and combine this with 
MSE or COR loss.

The VAE (in *vae.py*) class has the general VAE structure.

Saving has been implemented of the VAE state so that you can re-use your trained model on the same data and get 
the same latent space (or use the trained VAE on new data).  

Optimiser was a temporary deviation where we can pass in a VAE structure and using an evolutionary algorithm the 
optimisation class will try to get the best VAE structure. This will be returned.

Validate allows for running simple validations using scikitlearn i.e. if your primary interest is to get a meaningful
 latent space that captures the key features of the dataset, it can be good to compare how much "information" has 
 been captured between your classes. A good way of measuring this is by passing through the latent space and a set 
 of labels and seeing if a simple classifier can distingush your classes better than with the raw data.

## Users
Tested in python 3.10 on a Mac (without M1 chip - this won't work on a Mac with a M1 since they don't work well with tensorflow).

Check out the install page and the documentation or our package on pip: https://pypi.org/project/scivae
```
pip install scivae
```

### Documentation 

It is very easy to call the basic VAE. Simply install the package (or raw code). Then you need to setup 
a config dictionary. This is pretty self explanatory. 


```
from scivae import *
```

    - loss: loss dictionary see Loss class for input details
    - encoding: a dictionary of encoding layers, number of nodes and activation function
    - decoding: same as above but for decoding layers
    - latent: configs for latent space. See (def optimiser(self, optimiser_name: str, params: dict):) in vae.py for details

```

config = {'scale': False, # Whether to min max scale your data VAEs work best when data is pre-normalised & outliers removed for trainiing
           'batch_norm': True, 
          'loss': {'loss_type': 'mse', # mean squared error
           'distance_metric': 'mmd', # Maximum mean discrepency (can use kl but it works worse)
            'mmd_weight': 1}, # Weight of mmd vs mse - basically having this > 1 will weigh making it normally distributed higher
            # and making it < 1 will make reconstruction better.
          'encoding': {'layers': [{'num_nodes': 32, 'activation_fn': 'selu'}, # First layer of encoding
                                  {'num_nodes': 16, 'activation_fn': 'selu'}]}, # Second layer of encoding
          'decoding': {'layers': [{'num_nodes': 16, 'activation_fn': 'selu'},  # First layer of decoding
                                  {'num_nodes': 32, 'activation_fn': 'selu'}]}, # Second layer of decoding 
 'latent': {'num_nodes': 2}, 'optimiser': {'params': {}, 'name': 'adam'}} # Empty params means use default

```

Run the VAE. Numeric data is expected to follow an approximately normal distribution (each column).
It expects a numpy array with each row being a list of features corresponding to some label. Labels mean nothing - 
they just need to be a list of the same size - these are just used for downstream analyses (e.g. colouring).

Note for most configs we want input_data = output_data however I have left this modular so we can upgrade to having 
it be denoising etc in the future.
```
vae_mse = VAE(numpy_array, numpy_array, labels, config, 'vae_label')
# Set batch size and number of epochs
vae_mse.encode('default', epochs=500, bacth_size=50, early_stop=True)
encoded_data_vae_mse = vae_mse.get_encoded_data()
``` 
The VAE can also be used to encode new data.
```
# note this all needs to be normalised like you normalised the training data
new_data_encoded = vae_mse.encode_new_data(some_new_np_array) # i.e. with your outliers in
```

Visualisation is the same as if we got back the PCs from PCA. i.e. the below code will plot a scatter plot of the first 
and second latent nodes.

```
plt.scatter(encoded_data_vae_mse[:,0], encoded_data_vae_mse[:,1])
```

### Real documentation is coming - if you want it raise an issue for what you are interested in and give me a cheeky star 

## Tests
See tests for further examples.


## References
        https://github.com/pren1/keras-MMD-Variational-Autoencoder/blob/master/Keras_MMD_Variational_Autoencoder.ipynb
        https://github.com/s-omranpour/X-VAE-keras/blob/master/VAE/VAE_MMD.ipynb
        https://github.com/ShengjiaZhao/MMD-Variational-Autoencoder/blob/master/mmd_vae.py
        https://github.com/CancerAI-CL/IntegrativeVAEs/blob/master/code/models/mmvae.py

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ArianeMora/scivae",
    "name": "scivae",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "util",
    "author": "Ariane Mora",
    "author_email": "ariane.n.mora@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d9/c4/b3a111bb251117a8c8472e4c088054d2202522c4352afc5ad0b0ae234cc0/scivae-1.1.0.tar.gz",
    "platform": null,
    "description": "# scivae\n\nCheck out our docs: https://arianemora.github.io/scivae/  \n\nIf you use this please cite: https://doi.org/10.1101/2021.06.22.449386\n\nscivae is a wrapper around the keras AE that allows you to build/save/visualise with a variational autoencoder.\n\nBlogs & notebooks used as references are noted in the code and a couple at the end of this README.\n\nThe primary difference between a VAE and a normal AE is in how the loss function is computed. Here the loss \nhas been abstracted out to the loss class (in *loss.py*) where we can use a number of loss metrics MMD, KL and combine this with \nMSE or COR loss.\n\nThe VAE (in *vae.py*) class has the general VAE structure.\n\nSaving has been implemented of the VAE state so that you can re-use your trained model on the same data and get \nthe same latent space (or use the trained VAE on new data).  \n\nOptimiser was a temporary deviation where we can pass in a VAE structure and using an evolutionary algorithm the \noptimisation class will try to get the best VAE structure. This will be returned.\n\nValidate allows for running simple validations using scikitlearn i.e. if your primary interest is to get a meaningful\n latent space that captures the key features of the dataset, it can be good to compare how much \"information\" has \n been captured between your classes. A good way of measuring this is by passing through the latent space and a set \n of labels and seeing if a simple classifier can distingush your classes better than with the raw data.\n\n## Users\nTested in python 3.10 on a Mac (without M1 chip - this won't work on a Mac with a M1 since they don't work well with tensorflow).\n\nCheck out the install page and the documentation or our package on pip: https://pypi.org/project/scivae\n```\npip install scivae\n```\n\n### Documentation \n\nIt is very easy to call the basic VAE. Simply install the package (or raw code). Then you need to setup \na config dictionary. This is pretty self explanatory. \n\n\n```\nfrom scivae import *\n```\n\n    - loss: loss dictionary see Loss class for input details\n    - encoding: a dictionary of encoding layers, number of nodes and activation function\n    - decoding: same as above but for decoding layers\n    - latent: configs for latent space. See (def optimiser(self, optimiser_name: str, params: dict):) in vae.py for details\n\n```\n\nconfig = {'scale': False, # Whether to min max scale your data VAEs work best when data is pre-normalised & outliers removed for trainiing\n           'batch_norm': True, \n          'loss': {'loss_type': 'mse', # mean squared error\n           'distance_metric': 'mmd', # Maximum mean discrepency (can use kl but it works worse)\n            'mmd_weight': 1}, # Weight of mmd vs mse - basically having this > 1 will weigh making it normally distributed higher\n            # and making it < 1 will make reconstruction better.\n          'encoding': {'layers': [{'num_nodes': 32, 'activation_fn': 'selu'}, # First layer of encoding\n                                  {'num_nodes': 16, 'activation_fn': 'selu'}]}, # Second layer of encoding\n          'decoding': {'layers': [{'num_nodes': 16, 'activation_fn': 'selu'},  # First layer of decoding\n                                  {'num_nodes': 32, 'activation_fn': 'selu'}]}, # Second layer of decoding \n 'latent': {'num_nodes': 2}, 'optimiser': {'params': {}, 'name': 'adam'}} # Empty params means use default\n\n```\n\nRun the VAE. Numeric data is expected to follow an approximately normal distribution (each column).\nIt expects a numpy array with each row being a list of features corresponding to some label. Labels mean nothing - \nthey just need to be a list of the same size - these are just used for downstream analyses (e.g. colouring).\n\nNote for most configs we want input_data = output_data however I have left this modular so we can upgrade to having \nit be denoising etc in the future.\n```\nvae_mse = VAE(numpy_array, numpy_array, labels, config, 'vae_label')\n# Set batch size and number of epochs\nvae_mse.encode('default', epochs=500, bacth_size=50, early_stop=True)\nencoded_data_vae_mse = vae_mse.get_encoded_data()\n``` \nThe VAE can also be used to encode new data.\n```\n# note this all needs to be normalised like you normalised the training data\nnew_data_encoded = vae_mse.encode_new_data(some_new_np_array) # i.e. with your outliers in\n```\n\nVisualisation is the same as if we got back the PCs from PCA. i.e. the below code will plot a scatter plot of the first \nand second latent nodes.\n\n```\nplt.scatter(encoded_data_vae_mse[:,0], encoded_data_vae_mse[:,1])\n```\n\n### Real documentation is coming - if you want it raise an issue for what you are interested in and give me a cheeky star \n\n## Tests\nSee tests for further examples.\n\n\n## References\n        https://github.com/pren1/keras-MMD-Variational-Autoencoder/blob/master/Keras_MMD_Variational_Autoencoder.ipynb\n        https://github.com/s-omranpour/X-VAE-keras/blob/master/VAE/VAE_MMD.ipynb\n        https://github.com/ShengjiaZhao/MMD-Variational-Autoencoder/blob/master/mmd_vae.py\n        https://github.com/CancerAI-CL/IntegrativeVAEs/blob/master/code/models/mmvae.py\n\n",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": "",
    "version": "1.1.0",
    "split_keywords": [
        "util"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "50abe32235cfa4475dd1ee83e11bfa97",
                "sha256": "bdd73496bae34ebeaa99bd2fde87ac51f4a65ec6104d862bcc477692acf0b15a"
            },
            "downloads": -1,
            "filename": "scivae-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "50abe32235cfa4475dd1ee83e11bfa97",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 66208,
            "upload_time": "2022-12-02T02:16:55",
            "upload_time_iso_8601": "2022-12-02T02:16:55.237770Z",
            "url": "https://files.pythonhosted.org/packages/17/76/2c0e34222cef4d6f407973b39b48552d23d33c65989543b44499804a179b/scivae-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "62b1cc8ede526d719aa73897bf052d88",
                "sha256": "b63e97865139448e6b2474a5483fada3f4bd7effd416b82cfada869b82626b64"
            },
            "downloads": -1,
            "filename": "scivae-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "62b1cc8ede526d719aa73897bf052d88",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 46808,
            "upload_time": "2022-12-02T02:16:57",
            "upload_time_iso_8601": "2022-12-02T02:16:57.508209Z",
            "url": "https://files.pythonhosted.org/packages/d9/c4/b3a111bb251117a8c8472e4c088054d2202522c4352afc5ad0b0ae234cc0/scivae-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-02 02:16:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ArianeMora",
    "github_project": "scivae",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "scivae"
}

Ariane Mora