bioencoder


Namebioencoder JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryA metric learning toolkit
upload_time2024-04-30 21:07:54
maintainerNone
docs_urlNone
authorNone
requires_python==3.9.*
licenseNone
keywords metric learning biology
VCS
bugtrack_url
requirements albumentations bokeh faiss-cpu ipywidgets matplotlib numpy pandas pytorch-metric-learning scikit-learn streamlit-option-menu tensorboard timm torch-ema torch-lr-finder torch-optimizer
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
    <p><img src="https://github.com/agporto/BioEncoder/raw/main/assets/bioencoder_logo.png" width="300"></p>
</div>

# BioEncoder

BioEncoder is a tool box for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data  (i.e., features) from images. This implementation is based on [SupCon](https://github.com/ivanpanshin/SupCon-Framework) and [timm-vis](https://github.com/novice03/timm-vis). 

Preprint on BioRxiv: [https://doi.org/10.1101/2024.04.03.587987]( https://doi.org/10.1101/2024.04.03.587987)

## Features
- Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)
- Support of [timm models](https://github.com/rwightman/pytorch-image-models), and [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer)
- Access to state-of-the-art metric losses, such as [Supcon](https://arxiv.org/abs/2004.11362) and  [Sub-center ArcFace](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123560715.pdf).
- [Exponential Moving Average](https://github.com/fadel/pytorch_ema) for stable training, and Stochastic Moving Average for better generalization and performance.
- [LRFinder](https://github.com/davidtvs/pytorch-lr-finder) for the second stage of the training.
- Easy customization of hyperparameters, including augmentations, through `YAML` configs (check the [config-templates](config-templates) folder for examples)
- Custom augmentations techniques via [albumentations](https://github.com/albumentations-team/albumentations)
- TensorBoard logs and checkpoints (soon to come: WandB integration)
- Streamlit app with rich model visualizations (e.g., [Grad-CAM](https://arxiv.org/abs/1610.02391))
- Interactive [t-SNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) and [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) plots using [Bokeh](https://bokeh.org/)

<div align="center">
    <p><img src="https://github.com/agporto/BioEncoder/raw/main/assets/bioencoder-interactive-plot.gif" width="500"></p>
</div>

## Quickstart

(for more detailed information consult [the help files](help))

1\. Install BioEncoder (into a virtual environment with pytorch/CUDA): 
````
pip install bioencoder
````

2\. Download example dataset from the data repo: [https://zenodo.org/records/10909614/files/BioEncoder-data.zip](https://zenodo.org/records/10909614/files/BioEncoder-data.zip?download=1&preview=1). 
This archive contains the images and configuration files needed for step 3/4, as well as the final model checkpoints and a script to reproduce the results and figures presented in the paper. To play around with theinteractive figures and the model explorer you can also skip the training / SWA steps. 

3\. Start interactive session (e.g., in Spyder or VS code) and run the following commands one by one:

```python
## use "overwrite=True to redo a step

import bioencoder

## global setup
bioencoder.configure(root_dir=r"~/bioencoder_wd", run_name="v1")

## split dataset
bioencoder.split_dataset(image_dir=r"~/Downloads/damselflies-aligned-trai_val", max_ratio=6, random_seed=42)

## train stage 1
bioencoder.train(config_path=r"bioencoder_configs/train_stage1.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage1.yml")

## explore embedding space and model from stage 1
bioencoder.interactive_plots(config_path=r"bioencoder_configs/plot_stage1.yml")
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage1.yml")

## (optional) learning rate finder for stage 2
bioencoder.lr_finder(config_path=r"bioencoder_configs/lr_finder.yml")

## train stage 2
bioencoder.train(config_path=r"bioencoder_configs/train_stage2.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage2.yml")

## explore model from stage 2
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage2.yml")

```
4\. Alternatively, you can directly use the command line interface: 

```python
## use the flag "--overwrite" to redo a step

bioencoder_configure --root-dir "~/bioencoder_wd" --run-name v1
bioencoder_split_dataset --image-dir "~/Downloads/damselflies-aligned-trai_val" --max-ratio 6 --random-seed 42
bioencoder_train --config-path "bioencoder_configs/train_stage1.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage1.yml"
bioencoder_interactive_plots --config-path "bioencoder_configs/plot_stage1.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage1.yml"
bioencoder_lr_finder --config-path "bioencoder_configs/lr_finder.yml"
bioencoder_train --config-path "bioencoder_configs/train_stage2.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage2.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage2.yml"

```

## Citation

Please cite BioEncoder as follows:

```bibtex

@UNPUBLISHED{Luerig2024-ov,
  title    = "{BioEncoder}: a metric learning toolkit for comparative
              organismal biology",
  author   = "Luerig, Moritz D and Di Martino, Emanuela and Porto, Arthur",
  journal  = "bioRxiv",
  pages    = "2024.04.03.587987",
  month    =  apr,
  year     =  2024,
  language = "en",
  doi      = "10.1101/2024.04.03.587987"
}

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bioencoder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "==3.9.*",
    "maintainer_email": null,
    "keywords": "metric learning, biology",
    "author": null,
    "author_email": "Arthur Porto <agporto@gmail.com>, Moritz L\u00fcrig <moritz.luerig@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/94/b5/41660b076628ddbfddcacec6de6a5a754eb4be5431baa9974037b57a92c5/bioencoder-0.2.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\r\n    <p><img src=\"https://github.com/agporto/BioEncoder/raw/main/assets/bioencoder_logo.png\" width=\"300\"></p>\r\n</div>\r\n\r\n# BioEncoder\r\n\r\nBioEncoder is a tool box for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data  (i.e., features) from images. This implementation is based on [SupCon](https://github.com/ivanpanshin/SupCon-Framework) and [timm-vis](https://github.com/novice03/timm-vis). \r\n\r\nPreprint on BioRxiv: [https://doi.org/10.1101/2024.04.03.587987]( https://doi.org/10.1101/2024.04.03.587987)\r\n\r\n## Features\r\n- Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)\r\n- Support of [timm models](https://github.com/rwightman/pytorch-image-models), and [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer)\r\n- Access to state-of-the-art metric losses, such as [Supcon](https://arxiv.org/abs/2004.11362) and  [Sub-center ArcFace](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123560715.pdf).\r\n- [Exponential Moving Average](https://github.com/fadel/pytorch_ema) for stable training, and Stochastic Moving Average for better generalization and performance.\r\n- [LRFinder](https://github.com/davidtvs/pytorch-lr-finder) for the second stage of the training.\r\n- Easy customization of hyperparameters, including augmentations, through `YAML` configs (check the [config-templates](config-templates) folder for examples)\r\n- Custom augmentations techniques via [albumentations](https://github.com/albumentations-team/albumentations)\r\n- TensorBoard logs and checkpoints (soon to come: WandB integration)\r\n- Streamlit app with rich model visualizations (e.g., [Grad-CAM](https://arxiv.org/abs/1610.02391))\r\n- Interactive [t-SNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) and [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) plots using [Bokeh](https://bokeh.org/)\r\n\r\n<div align=\"center\">\r\n    <p><img src=\"https://github.com/agporto/BioEncoder/raw/main/assets/bioencoder-interactive-plot.gif\" width=\"500\"></p>\r\n</div>\r\n\r\n## Quickstart\r\n\r\n(for more detailed information consult [the help files](help))\r\n\r\n1\\. Install BioEncoder (into a virtual environment with pytorch/CUDA): \r\n````\r\npip install bioencoder\r\n````\r\n\r\n2\\. Download example dataset from the data repo: [https://zenodo.org/records/10909614/files/BioEncoder-data.zip](https://zenodo.org/records/10909614/files/BioEncoder-data.zip?download=1&preview=1). \r\nThis archive contains the images and configuration files needed for step 3/4, as well as the final model checkpoints and a script to reproduce the results and figures presented in the paper. To play around with theinteractive figures and the model explorer you can also skip the training / SWA steps. \r\n\r\n3\\. Start interactive session (e.g., in Spyder or VS code) and run the following commands one by one:\r\n\r\n```python\r\n## use \"overwrite=True to redo a step\r\n\r\nimport bioencoder\r\n\r\n## global setup\r\nbioencoder.configure(root_dir=r\"~/bioencoder_wd\", run_name=\"v1\")\r\n\r\n## split dataset\r\nbioencoder.split_dataset(image_dir=r\"~/Downloads/damselflies-aligned-trai_val\", max_ratio=6, random_seed=42)\r\n\r\n## train stage 1\r\nbioencoder.train(config_path=r\"bioencoder_configs/train_stage1.yml\")\r\nbioencoder.swa(config_path=r\"bioencoder_configs/swa_stage1.yml\")\r\n\r\n## explore embedding space and model from stage 1\r\nbioencoder.interactive_plots(config_path=r\"bioencoder_configs/plot_stage1.yml\")\r\nbioencoder.model_explorer(config_path=r\"bioencoder_configs/explore_stage1.yml\")\r\n\r\n## (optional) learning rate finder for stage 2\r\nbioencoder.lr_finder(config_path=r\"bioencoder_configs/lr_finder.yml\")\r\n\r\n## train stage 2\r\nbioencoder.train(config_path=r\"bioencoder_configs/train_stage2.yml\")\r\nbioencoder.swa(config_path=r\"bioencoder_configs/swa_stage2.yml\")\r\n\r\n## explore model from stage 2\r\nbioencoder.model_explorer(config_path=r\"bioencoder_configs/explore_stage2.yml\")\r\n\r\n```\r\n4\\. Alternatively, you can directly use the command line interface: \r\n\r\n```python\r\n## use the flag \"--overwrite\" to redo a step\r\n\r\nbioencoder_configure --root-dir \"~/bioencoder_wd\" --run-name v1\r\nbioencoder_split_dataset --image-dir \"~/Downloads/damselflies-aligned-trai_val\" --max-ratio 6 --random-seed 42\r\nbioencoder_train --config-path \"bioencoder_configs/train_stage1.yml\"\r\nbioencoder_swa --config-path \"bioencoder_configs/swa_stage1.yml\"\r\nbioencoder_interactive_plots --config-path \"bioencoder_configs/plot_stage1.yml\"\r\nbioencoder_model_explorer --config-path \"bioencoder_configs/explore_stage1.yml\"\r\nbioencoder_lr_finder --config-path \"bioencoder_configs/lr_finder.yml\"\r\nbioencoder_train --config-path \"bioencoder_configs/train_stage2.yml\"\r\nbioencoder_swa --config-path \"bioencoder_configs/swa_stage2.yml\"\r\nbioencoder_model_explorer --config-path \"bioencoder_configs/explore_stage2.yml\"\r\n\r\n```\r\n\r\n## Citation\r\n\r\nPlease cite BioEncoder as follows:\r\n\r\n```bibtex\r\n\r\n@UNPUBLISHED{Luerig2024-ov,\r\n  title    = \"{BioEncoder}: a metric learning toolkit for comparative\r\n              organismal biology\",\r\n  author   = \"Luerig, Moritz D and Di Martino, Emanuela and Porto, Arthur\",\r\n  journal  = \"bioRxiv\",\r\n  pages    = \"2024.04.03.587987\",\r\n  month    =  apr,\r\n  year     =  2024,\r\n  language = \"en\",\r\n  doi      = \"10.1101/2024.04.03.587987\"\r\n}\r\n\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A metric learning toolkit",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/agporto/BioEncoder/issues",
        "Homepage": "https://github.com/agporto/BioEncoder"
    },
    "split_keywords": [
        "metric learning",
        " biology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "051095fb87f1c88a5428204117c5cc27bf2e16f91d21240e72bdc0d9b32e2f34",
                "md5": "4f5a58018a8c749ec6922288c736808c",
                "sha256": "d44d3e9d2ee82bbd03bdd490ca3a35778c2f3e0e0903c49813fb839989f61d43"
            },
            "downloads": -1,
            "filename": "bioencoder-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f5a58018a8c749ec6922288c736808c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "==3.9.*",
            "size": 47914,
            "upload_time": "2024-04-30T21:07:52",
            "upload_time_iso_8601": "2024-04-30T21:07:52.208957Z",
            "url": "https://files.pythonhosted.org/packages/05/10/95fb87f1c88a5428204117c5cc27bf2e16f91d21240e72bdc0d9b32e2f34/bioencoder-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "94b541660b076628ddbfddcacec6de6a5a754eb4be5431baa9974037b57a92c5",
                "md5": "dc7b46a98963b3a30201aa0a0f9e945b",
                "sha256": "32839ce556258e6c1b72cd943946b05ab43e74647f42cd0f3451202b3dceac8f"
            },
            "downloads": -1,
            "filename": "bioencoder-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "dc7b46a98963b3a30201aa0a0f9e945b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "==3.9.*",
            "size": 38307,
            "upload_time": "2024-04-30T21:07:54",
            "upload_time_iso_8601": "2024-04-30T21:07:54.130147Z",
            "url": "https://files.pythonhosted.org/packages/94/b5/41660b076628ddbfddcacec6de6a5a754eb4be5431baa9974037b57a92c5/bioencoder-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 21:07:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "agporto",
    "github_project": "BioEncoder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "albumentations",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "bokeh",
            "specs": []
        },
        {
            "name": "faiss-cpu",
            "specs": []
        },
        {
            "name": "ipywidgets",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "pytorch-metric-learning",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "streamlit-option-menu",
            "specs": []
        },
        {
            "name": "tensorboard",
            "specs": []
        },
        {
            "name": "timm",
            "specs": []
        },
        {
            "name": "torch-ema",
            "specs": [
                [
                    "==",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "torch-lr-finder",
            "specs": []
        },
        {
            "name": "torch-optimizer",
            "specs": [
                [
                    "==",
                    "0.3.0"
                ]
            ]
        }
    ],
    "lcname": "bioencoder"
}
        
Elapsed time: 0.29080s