sigmoid-contrastive-learning

Name	sigmoid-contrastive-learning JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/filipbasara0/sigmoid-contrastive-learning
Summary	Implementation of modulated sigmoid pairwise contrastive loss for self-supervised learning on images
upload_time	2024-04-08 06:45:54
maintainer	None
docs_url	None
author	Filip Basara
requires_python	None
license	MIT
keywords	machine learning pytorch self-supervised learning representation learning contrastive learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# Sigmoid Contrastive Learning on Images

A PyTorch implementation of the sigmoid pairwise loss for contrastive self-supervised learning on images. The training architecture consists of an online and a target encoder (EMA) with a simple critic MLP projector and is based on [Representation Learning via Invariant Causal Mechanisms (ReLIC)](https://arxiv.org/abs/2010.07922). The loss function is a sigmoid constrastive loss adapted from [SigLIP](https://arxiv.org/abs/2303.15343), with an addition of a confidence penalty gamma that balances the ratio of positive and negative samples per batch, amplifying learning from harder examples and improving training stability. Loss function also supports a KL divergence regularization term that acts as an invariance penalty and forces the representations to stay invariant under data augmentations and amplifies intra-class distances.

When using larger batch sizes (eg. larger than 128), it is possible and recommended to enable gamma scheduling, which acts as a form of curriculum learning, balancing the learning from positive and negative samples during early stages, enabling faster convergence and better overall results. Initially, we are learning more from positive samples, but as training progresses the signal from negative samples becomes more prevalent. The result of training for 100 epochs on STL-10 can be seen in the table below, where gamma is specified as `1.0 + schedule`. During that run, gamma is initialized at 1.0 and decayed to 0.0 over 20_000 steps using the cosine schedule.


Repo includes the multi-crop augmentation and extends the loss function is extended to support an arbitrary number of small (local) and large (global) views. Using this technique generally results in more robust and higher quality representations.


# Results

Models are pretrained on training subsets - for `CIFAR10` 50,000 and for `STL10` 100,000 images. For evaluation, I trained and tested LogisticRegression on frozen features from:
1. `CIFAR10` - 50,000 train images
2. `STL10` - features were learned on 100k unlabeled images. LogReg was trained on 5k train images and evaluated on 8k test images.

Linear probing was used for evaluating on features extracted from encoders using the scikit LogisticRegression model.

More detailed evaluation steps and results for [CIFAR10](https://github.com/filipbasara0/relic/blob/main/notebooks/linear-probing-cifar.ipynb) and [STL10](https://github.com/filipbasara0/relic/blob/main/notebooks/linear-probing-stl.ipynb) can be found in the notebooks directory. 

| Evaulation model    | Dataset | Architecture| Encoder   | Feature dim | Proj. head dim | Epochs | Gamma         | Top1 % |
|---------------------|---------|-------------|-----------|-------------|----------------|--------|---------------|--------|
| LogisticRegression  | STL10   | ReLIC       | ResNet-50 | 2048        | 64             | 100    | 1.0           | 85.42  |
| LogisticRegression  | STL10   | ReLIC       | ResNet-50 | 2048        | 64             | 100    | 1.0 + schedule| 86.06  |


# Usage

### Instalation

```bash
$ pip install sigmoid-contrastive-learning
```

Code currently supports ResNet18, ResNet50 and an experimental version of the EfficientNet model. Supported datasets are STL10, CIFAR10 and ImageNet-1k.

All training is done from scratch.

### Examples
`CIFAR10` ResNet-18 model was trained with this command:

`scl_train --dataset_name "cifar10" --encoder_model_name resnet18 --fp16_precision --beta 0.99 --alpha 1.0`

`STL10` ResNet-50 model was trained with this command:

`scl_train --dataset_name "stl10" --encoder_model_name resnet50 --fp16_precision  --beta 0.99 --gamma 1.0 --gamma_scaling_steps 20_000 --use_gamma_scaling`

### Detailed options
Once the code is setup, run the following command with optinos listed below:
`scl_train [args...]⬇️`

```
Sigmoid Contrastive Learning

options:
  -h, --help            show this help message and exit
  --dataset_path DATASET_PATH
                        Path where datasets will be saved
  --dataset_name {stl10,cifar10,tiny_imagenet,food101,imagenet1k}
                        Dataset name
  -m {resnet18,resnet50,efficientnet}, --encoder_model_name {resnet18,resnet50,efficientnet}
                        model architecture: resnet18, resnet50 or efficientnet (default: resnet18)
  -save_model_dir SAVE_MODEL_DIR
                        Path where models
  --num_epochs NUM_EPOCHS
                        Number of epochs for training
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning_rate LEARNING_RATE
  -wd WEIGHT_DECAY, --weight_decay WEIGHT_DECAY
  --fp16_precision      Whether to use 16-bit precision for GPU training
  --proj_out_dim PROJ_OUT_DIM
                        Projector MLP out dimension
  --proj_hidden_dim PROJ_HIDDEN_DIM
                        Projector MLP hidden dimension
  --log_every_n_steps LOG_EVERY_N_STEPS
                        Log every n steps
  --beta BETA           Initial EMA coefficient
  --alpha ALPHA         Regularization loss factor
  --update_beta_after_step UPDATE_BETA_AFTER_STEP
                        Update EMA beta after this step
  --update_beta_every_n_steps UPDATE_BETA_EVERY_N_STEPS
                        Update EMA beta after this many steps
  --gamma GAMMA         Initial confidence penalty
  --gamma_scaling_steps GAMMA_SCALING_STEPS
                        Number of first N steps during which gamma will be scaled from the inital value to 0
  --use_gamma_scaling   Whether to use gamma(conf penalty) cosine scaling
  --ckpt_path CKPT_PATH
                        Specify path to scl_model.pth to resume training
  --num_global_views NUM_GLOBAL_VIEWS
                        Number of global (large) views to generate through augmentation
  --num_local_views NUM_LOCAL_VIEWS
                        Number of local (small) views to generate through augmentation
```

# Citation

```
@misc{mitrovic2020representation,
      title={Representation Learning via Invariant Causal Mechanisms}, 
      author={Jovana Mitrovic and Brian McWilliams and Jacob Walker and Lars Buesing and Charles Blundell},
      year={2020},
      eprint={2010.07922},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{zhai2023sigmoid,
      title={Sigmoid Loss for Language Image Pre-Training}, 
      author={Xiaohua Zhai and Basil Mustafa and Alexander Kolesnikov and Lucas Beyer},
      year={2023},
      eprint={2303.15343},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/filipbasara0/sigmoid-contrastive-learning",
    "name": "sigmoid-contrastive-learning",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "machine learning, pytorch, self-supervised learning, representation learning, contrastive learning",
    "author": "Filip Basara",
    "author_email": "basarafilip@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/59/d4/0dd1d055b85f5050fc34d6eb0ae9364aa52a27770508c44f79d255d4818c/sigmoid-contrastive-learning-0.1.0.tar.gz",
    "platform": null,
    "description": "\n# Sigmoid Contrastive Learning on Images\n\nA PyTorch implementation of the sigmoid pairwise loss for contrastive self-supervised learning on images. The training architecture consists of an online and a target encoder (EMA) with a simple critic MLP projector and is based on [Representation Learning via Invariant Causal Mechanisms (ReLIC)](https://arxiv.org/abs/2010.07922). The loss function is a sigmoid constrastive loss adapted from [SigLIP](https://arxiv.org/abs/2303.15343), with an addition of a confidence penalty gamma that balances the ratio of positive and negative samples per batch, amplifying learning from harder examples and improving training stability. Loss function also supports a KL divergence regularization term that acts as an invariance penalty and forces the representations to stay invariant under data augmentations and amplifies intra-class distances.\n\nWhen using larger batch sizes (eg. larger than 128), it is possible and recommended to enable gamma scheduling, which acts as a form of curriculum learning, balancing the learning from positive and negative samples during early stages, enabling faster convergence and better overall results. Initially, we are learning more from positive samples, but as training progresses the signal from negative samples becomes more prevalent. The result of training for 100 epochs on STL-10 can be seen in the table below, where gamma is specified as `1.0 + schedule`. During that run, gamma is initialized at 1.0 and decayed to 0.0 over 20_000 steps using the cosine schedule.\n\n\nRepo includes the multi-crop augmentation and extends the loss function is extended to support an arbitrary number of small (local) and large (global) views. Using this technique generally results in more robust and higher quality representations.\n\n\n# Results\n\nModels are pretrained on training subsets - for `CIFAR10` 50,000 and for `STL10` 100,000 images. For evaluation, I trained and tested LogisticRegression on frozen features from:\n1. `CIFAR10` - 50,000 train images\n2. `STL10` - features were learned on 100k unlabeled images. LogReg was trained on 5k train images and evaluated on 8k test images.\n\nLinear probing was used for evaluating on features extracted from encoders using the scikit LogisticRegression model.\n\nMore detailed evaluation steps and results for [CIFAR10](https://github.com/filipbasara0/relic/blob/main/notebooks/linear-probing-cifar.ipynb) and [STL10](https://github.com/filipbasara0/relic/blob/main/notebooks/linear-probing-stl.ipynb) can be found in the notebooks directory. \n\n| Evaulation model    | Dataset | Architecture| Encoder   | Feature dim | Proj. head dim | Epochs | Gamma         | Top1 % |\n|---------------------|---------|-------------|-----------|-------------|----------------|--------|---------------|--------|\n| LogisticRegression  | STL10   | ReLIC       | ResNet-50 | 2048        | 64             | 100    | 1.0           | 85.42  |\n| LogisticRegression  | STL10   | ReLIC       | ResNet-50 | 2048        | 64             | 100    | 1.0 + schedule| 86.06  |\n\n\n# Usage\n\n### Instalation\n\n```bash\n$ pip install sigmoid-contrastive-learning\n```\n\nCode currently supports ResNet18, ResNet50 and an experimental version of the EfficientNet model. Supported datasets are STL10, CIFAR10 and ImageNet-1k.\n\nAll training is done from scratch.\n\n### Examples\n`CIFAR10` ResNet-18 model was trained with this command:\n\n`scl_train --dataset_name \"cifar10\" --encoder_model_name resnet18 --fp16_precision --beta 0.99 --alpha 1.0`\n\n`STL10` ResNet-50 model was trained with this command:\n\n`scl_train --dataset_name \"stl10\" --encoder_model_name resnet50 --fp16_precision  --beta 0.99 --gamma 1.0 --gamma_scaling_steps 20_000 --use_gamma_scaling`\n\n### Detailed options\nOnce the code is setup, run the following command with optinos listed below:\n`scl_train [args...]\u2b07\ufe0f`\n\n```\nSigmoid Contrastive Learning\n\noptions:\n  -h, --help            show this help message and exit\n  --dataset_path DATASET_PATH\n                        Path where datasets will be saved\n  --dataset_name {stl10,cifar10,tiny_imagenet,food101,imagenet1k}\n                        Dataset name\n  -m {resnet18,resnet50,efficientnet}, --encoder_model_name {resnet18,resnet50,efficientnet}\n                        model architecture: resnet18, resnet50 or efficientnet (default: resnet18)\n  -save_model_dir SAVE_MODEL_DIR\n                        Path where models\n  --num_epochs NUM_EPOCHS\n                        Number of epochs for training\n  -b BATCH_SIZE, --batch_size BATCH_SIZE\n                        Batch size\n  -lr LEARNING_RATE, --learning_rate LEARNING_RATE\n  -wd WEIGHT_DECAY, --weight_decay WEIGHT_DECAY\n  --fp16_precision      Whether to use 16-bit precision for GPU training\n  --proj_out_dim PROJ_OUT_DIM\n                        Projector MLP out dimension\n  --proj_hidden_dim PROJ_HIDDEN_DIM\n                        Projector MLP hidden dimension\n  --log_every_n_steps LOG_EVERY_N_STEPS\n                        Log every n steps\n  --beta BETA           Initial EMA coefficient\n  --alpha ALPHA         Regularization loss factor\n  --update_beta_after_step UPDATE_BETA_AFTER_STEP\n                        Update EMA beta after this step\n  --update_beta_every_n_steps UPDATE_BETA_EVERY_N_STEPS\n                        Update EMA beta after this many steps\n  --gamma GAMMA         Initial confidence penalty\n  --gamma_scaling_steps GAMMA_SCALING_STEPS\n                        Number of first N steps during which gamma will be scaled from the inital value to 0\n  --use_gamma_scaling   Whether to use gamma(conf penalty) cosine scaling\n  --ckpt_path CKPT_PATH\n                        Specify path to scl_model.pth to resume training\n  --num_global_views NUM_GLOBAL_VIEWS\n                        Number of global (large) views to generate through augmentation\n  --num_local_views NUM_LOCAL_VIEWS\n                        Number of local (small) views to generate through augmentation\n```\n\n# Citation\n\n```\n@misc{mitrovic2020representation,\n      title={Representation Learning via Invariant Causal Mechanisms}, \n      author={Jovana Mitrovic and Brian McWilliams and Jacob Walker and Lars Buesing and Charles Blundell},\n      year={2020},\n      eprint={2010.07922},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n\n@misc{zhai2023sigmoid,\n      title={Sigmoid Loss for Language Image Pre-Training}, \n      author={Xiaohua Zhai and Basil Mustafa and Alexander Kolesnikov and Lucas Beyer},\n      year={2023},\n      eprint={2303.15343},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Implementation of modulated sigmoid pairwise contrastive loss for self-supervised learning on images",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/filipbasara0/sigmoid-contrastive-learning"
    },
    "split_keywords": [
        "machine learning",
        " pytorch",
        " self-supervised learning",
        " representation learning",
        " contrastive learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed861d8f373fb559e7829daae42607115349b38f6738861b227e2eaf7e857f60",
                "md5": "c9fb1599bb7002403abff9ff9a66a422",
                "sha256": "6278d09b1368368e6eb9023f77d292bc420e5b93fa30f0bd319d338e6a06ec13"
            },
            "downloads": -1,
            "filename": "sigmoid_contrastive_learning-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c9fb1599bb7002403abff9ff9a66a422",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13407,
            "upload_time": "2024-04-08T06:45:50",
            "upload_time_iso_8601": "2024-04-08T06:45:50.771779Z",
            "url": "https://files.pythonhosted.org/packages/ed/86/1d8f373fb559e7829daae42607115349b38f6738861b227e2eaf7e857f60/sigmoid_contrastive_learning-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59d40dd1d055b85f5050fc34d6eb0ae9364aa52a27770508c44f79d255d4818c",
                "md5": "e476f83c12b6f50e316d9f55ffa9f74e",
                "sha256": "a4f2fa57244864bce01d7387b416337e81ae8b4f536f2c1fc03ad1438c64fa58"
            },
            "downloads": -1,
            "filename": "sigmoid-contrastive-learning-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e476f83c12b6f50e316d9f55ffa9f74e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12974,
            "upload_time": "2024-04-08T06:45:54",
            "upload_time_iso_8601": "2024-04-08T06:45:54.968427Z",
            "url": "https://files.pythonhosted.org/packages/59/d4/0dd1d055b85f5050fc34d6eb0ae9364aa52a27770508c44f79d255d4818c/sigmoid-contrastive-learning-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 06:45:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "filipbasara0",
    "github_project": "sigmoid-contrastive-learning",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sigmoid-contrastive-learning"
}

Filip Basara