cmonge

Name	cmonge JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/AI4SCR/conditional-monge
Summary	Extension of the Monge Gap to learn conditional optimal transport maps
upload_time	2025-09-01 13:35:19
maintainer	None
docs_url	None
author	Alice Driessen
requires_python	<3.12,>=3.10
license	None
keywords	machine learning optimal transport neural ot monge gap conditional distribution learning
VCS
bugtrack_url
requirements	flax optax ott-jax scikit-learn typer loguru optuna pandas seaborn dotmap umap-learn anndata scanpy chex rdkit jax isort black ruff types-pyyaml scipy
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Conditional Monge Gap

[![CI](https://github.com/AI4SCR/conditional-monge/actions/workflows/ci.yml/badge.svg)](https://github.com/AI4SCR/conditional-monge/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Contents
- [Overview](#overview)
- [Requirements](#systems-and-software-requirements)
- [Installation](#installation-from-pypi)
- [Development installation](#development-setup--installation)
- [Data](#data)
- [Example](#example-usage)
- [Own data instructions](#instructions-for-running-on-your-own-data)
- [Legacy checkpoint loading](#older-checkpoints-loading)
- [Citation](#citation)

## Overview

![](assets/overview.jpg)

An extension of the [Monge Gap](https://proceedings.mlr.press/v202/uscidda23a.html), an approach to estimate transport maps conditionally on arbitrary context vectors. It is based on a two-step training procedure combining an encoder-decoder architecture with an OT estimator. The model is applied to [4i](https://pubmed.ncbi.nlm.nih.gov/30072512/) and [scRNA-seq](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289078/) datasets.

## Systems and software requirements
Software package requirements and version information can be found in `requirements.txt` and/or the `pyproject.toml`. This package has been tested on Python versions 3.10 and 3.11. 
Hardware requirements enough memory (RAM) to process the data and batches. GPU is not needed but does accelerate computation. This software has been tested on HPCs and local machines (iOS). 

## Installation from PyPI

You can install this package as follows
```sh
pip install cmonge
```
which should take about two minutes on a laptop.

## Development setup & installation
The package environment is managed by [poetry](https://python-poetry.org/docs/managing-environments/). 
```sh
pip install poetry
git clone git@github.com:AI4SCR/conditional-monge.git
cd cmonge
poetry install -v
```

If the installation was successful you can run the tests using pytest
```sh
poetry shell # activate env
pytest
```

## Data

The preprocessed version of the Sciplex3 and 4i datasets can be downloaded [here](https://www.research-collection.ethz.ch/handle/20.500.11850/609681).


## Example usage

You can find a demo config in `configs/demo_config.yml`.
To train an autoencoder and CMonge you can use the following script (also provided in `scripts/demo_train.py`), make sure that the paths in the `scripts/demo_train.py`, `configs/demo_condig.yml`, and `configs/autoencoder-demo.yml` point to the correct data reading and saving locations and well as model checkpoint locations.
```py
from loguru import logger

from cmonge.datasets.conditional_loader import ConditionalDataModule
from cmonge.trainers.ae_trainer import AETrainerModule
from cmonge.trainers.conditional_monge_trainer import ConditionalMongeTrainer
from cmonge.utils import load_config


logger_path = "logs/demo_logs.yml"
config_path = "configs/demo_config.yml"

config = load_config(config_path)
logger.info(f"Experiment: Training model on {config.condition.conditions}")


# Train an AE model to reduce data dimension
config.data.ae = True
config.data.reduction = None
datamodule = ConditionalDataModule(config.data, config.condition)
ae_trainer = AETrainerModule(config.ae)
ae_trainer.train(datamodule)

# Train conditional monge model
config.data.ae = False
config.ae.model.act_fn = "gelu"
config.data.reduction = "ae"
datamodule = ConditionalDataModule(config.data, config.condition, ae_config=config.ae)
trainer = ConditionalMongeTrainer(
    jobid=1, logger_path=logger_path, config=config.model, datamodule=datamodule
)
trainer.train(datamodule)
trainer.evaluate(datamodule)

```
This demo trains a CMonge model using in-distribution data split. First, an autoencoder is trained to reduce the dimensionality of the data, which can be found in `data/dummy_data.h5ad`. This autoencoder model is checkpointed and the resulting checkpoint can also be found in `models/demo`. This example uses the RDkit fingerprint embedding, which uses the SMILES information provided in the `data` directory to compute a numerical embedding, which is saved in  `models/embed/rdkit`. Next, the conditional monge model is trained and evaluated, the results of training and evaluation are saved in the logger file which is defined by the `logger_path` variable. For this example, the logs can thus be found in `logs/demo.yml`. On a 2025 MacBook Air with M4 chip, this demo only takes a few minutes.

## Instructions for running on your own data
For running CMonge on your own data, you probably need to adapt the dataloader to ensure the correct handling of your data. At least you need to implement your own single data loader, of which examples can be found in `cmonge/datasets/single_loader.py`. This single loader needs to be passed to the conditional dataloader, as can be found in `cmonge/datasets/conditional_loader.py`, where some adaptions might be necessary. The conditional dataloader interacts with the CMonge model (and if needed the autoencoder). The hyperparameters of the CMonge neural network and the autoencoder can be defined in the config files.

## Older checkpoints loading
If you want to load model weights of older checkpoints (cmonge-{moa, rdkit}-ood or cmonge-{moa, rdkit}-homogeneous), make sure you are on the tag `cmonge_checkpoint_loading`.

```sh
git checkout cmonge_checkpoint_loading
```

## Citation
If you use the package, please cite:
```bib
@article{driessen2025towards,
  title={Towards generalizable single-cell perturbation modeling via the Conditional Monge Gap},
  author={Driessen, Alice and Harsanyi, Benedek and Rapsomaniki, Marianna and Born, Jannis},
  journal={arXiv preprint arXiv:2504.08328},
  note={Preliminary version at ICLR 2024 Workshop on Machine Learning for Genomics Explorations}
  year={2025}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AI4SCR/conditional-monge",
    "name": "cmonge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.10",
    "maintainer_email": null,
    "keywords": "Machine Learning, Optimal Transport, Neural OT, Monge Gap, Conditional Distribution Learning",
    "author": "Alice Driessen",
    "author_email": "adr@zurich.ibm.com",
    "download_url": "https://files.pythonhosted.org/packages/f9/96/fbce8b76d8ccc14bedb9048aae4146e5d6833348b033100b588a0ac99ad1/cmonge-0.1.2.tar.gz",
    "platform": null,
    "description": "# Conditional Monge Gap\n\n[![CI](https://github.com/AI4SCR/conditional-monge/actions/workflows/ci.yml/badge.svg)](https://github.com/AI4SCR/conditional-monge/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## Contents\n- [Overview](#overview)\n- [Requirements](#systems-and-software-requirements)\n- [Installation](#installation-from-pypi)\n- [Development installation](#development-setup--installation)\n- [Data](#data)\n- [Example](#example-usage)\n- [Own data instructions](#instructions-for-running-on-your-own-data)\n- [Legacy checkpoint loading](#older-checkpoints-loading)\n- [Citation](#citation)\n\n## Overview\n\n![](assets/overview.jpg)\n\nAn extension of the [Monge Gap](https://proceedings.mlr.press/v202/uscidda23a.html), an approach to estimate transport maps conditionally on arbitrary context vectors. It is based on a two-step training procedure combining an encoder-decoder architecture with an OT estimator. The model is applied to [4i](https://pubmed.ncbi.nlm.nih.gov/30072512/) and [scRNA-seq](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289078/) datasets.\n\n## Systems and software requirements\nSoftware package requirements and version information can be found in `requirements.txt` and/or the `pyproject.toml`. This package has been tested on Python versions 3.10 and 3.11. \nHardware requirements enough memory (RAM) to process the data and batches. GPU is not needed but does accelerate computation. This software has been tested on HPCs and local machines (iOS). \n\n## Installation from PyPI\n\nYou can install this package as follows\n```sh\npip install cmonge\n```\nwhich should take about two minutes on a laptop.\n\n## Development setup & installation\nThe package environment is managed by [poetry](https://python-poetry.org/docs/managing-environments/). \n```sh\npip install poetry\ngit clone git@github.com:AI4SCR/conditional-monge.git\ncd cmonge\npoetry install -v\n```\n\nIf the installation was successful you can run the tests using pytest\n```sh\npoetry shell # activate env\npytest\n```\n\n## Data\n\nThe preprocessed version of the Sciplex3 and 4i datasets can be downloaded [here](https://www.research-collection.ethz.ch/handle/20.500.11850/609681).\n\n\n## Example usage\n\nYou can find a demo config in `configs/demo_config.yml`.\nTo train an autoencoder and CMonge you can use the following script (also provided in `scripts/demo_train.py`), make sure that the paths in the `scripts/demo_train.py`, `configs/demo_condig.yml`, and `configs/autoencoder-demo.yml` point to the correct data reading and saving locations and well as model checkpoint locations.\n```py\nfrom loguru import logger\n\nfrom cmonge.datasets.conditional_loader import ConditionalDataModule\nfrom cmonge.trainers.ae_trainer import AETrainerModule\nfrom cmonge.trainers.conditional_monge_trainer import ConditionalMongeTrainer\nfrom cmonge.utils import load_config\n\n\nlogger_path = \"logs/demo_logs.yml\"\nconfig_path = \"configs/demo_config.yml\"\n\nconfig = load_config(config_path)\nlogger.info(f\"Experiment: Training model on {config.condition.conditions}\")\n\n\n# Train an AE model to reduce data dimension\nconfig.data.ae = True\nconfig.data.reduction = None\ndatamodule = ConditionalDataModule(config.data, config.condition)\nae_trainer = AETrainerModule(config.ae)\nae_trainer.train(datamodule)\n\n# Train conditional monge model\nconfig.data.ae = False\nconfig.ae.model.act_fn = \"gelu\"\nconfig.data.reduction = \"ae\"\ndatamodule = ConditionalDataModule(config.data, config.condition, ae_config=config.ae)\ntrainer = ConditionalMongeTrainer(\n    jobid=1, logger_path=logger_path, config=config.model, datamodule=datamodule\n)\ntrainer.train(datamodule)\ntrainer.evaluate(datamodule)\n\n```\nThis demo trains a CMonge model using in-distribution data split. First, an autoencoder is trained to reduce the dimensionality of the data, which can be found in `data/dummy_data.h5ad`. This autoencoder model is checkpointed and the resulting checkpoint can also be found in `models/demo`. This example uses the RDkit fingerprint embedding, which uses the SMILES information provided in the `data` directory to compute a numerical embedding, which is saved in  `models/embed/rdkit`. Next, the conditional monge model is trained and evaluated, the results of training and evaluation are saved in the logger file which is defined by the `logger_path` variable. For this example, the logs can thus be found in `logs/demo.yml`. On a 2025 MacBook Air with M4 chip, this demo only takes a few minutes.\n\n## Instructions for running on your own data\nFor running CMonge on your own data, you probably need to adapt the dataloader to ensure the correct handling of your data. At least you need to implement your own single data loader, of which examples can be found in `cmonge/datasets/single_loader.py`. This single loader needs to be passed to the conditional dataloader, as can be found in `cmonge/datasets/conditional_loader.py`, where some adaptions might be necessary. The conditional dataloader interacts with the CMonge model (and if needed the autoencoder). The hyperparameters of the CMonge neural network and the autoencoder can be defined in the config files.\n\n## Older checkpoints loading\nIf you want to load model weights of older checkpoints (cmonge-{moa, rdkit}-ood or cmonge-{moa, rdkit}-homogeneous), make sure you are on the tag `cmonge_checkpoint_loading`.\n\n```sh\ngit checkout cmonge_checkpoint_loading\n```\n\n## Citation\nIf you use the package, please cite:\n```bib\n@article{driessen2025towards,\n  title={Towards generalizable single-cell perturbation modeling via the Conditional Monge Gap},\n  author={Driessen, Alice and Harsanyi, Benedek and Rapsomaniki, Marianna and Born, Jannis},\n  journal={arXiv preprint arXiv:2504.08328},\n  note={Preliminary version at ICLR 2024 Workshop on Machine Learning for Genomics Explorations}\n  year={2025}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Extension of the Monge Gap to learn conditional optimal transport maps",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/AI4SCR/conditional-monge",
        "Repository": "https://github.com/AI4SCR/conditional-monge"
    },
    "split_keywords": [
        "machine learning",
        " optimal transport",
        " neural ot",
        " monge gap",
        " conditional distribution learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cfb15d715165669b46129747c39ff607cecb21679f2216850c8cb2bcebc31c46",
                "md5": "aa997052d12887746d1fc4961f6a63b1",
                "sha256": "1beb5e6390caf1321a16b127577fea2ae8913a696fdc790fabf2f76ccfade7ca"
            },
            "downloads": -1,
            "filename": "cmonge-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aa997052d12887746d1fc4961f6a63b1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.10",
            "size": 42514,
            "upload_time": "2025-09-01T13:35:17",
            "upload_time_iso_8601": "2025-09-01T13:35:17.945243Z",
            "url": "https://files.pythonhosted.org/packages/cf/b1/5d715165669b46129747c39ff607cecb21679f2216850c8cb2bcebc31c46/cmonge-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f996fbce8b76d8ccc14bedb9048aae4146e5d6833348b033100b588a0ac99ad1",
                "md5": "7b85891e666239dcc200c8539874cf6c",
                "sha256": "4633c05e40ffe2883970b1f9e748172bed08d808686ce5d43966e86446db7e60"
            },
            "downloads": -1,
            "filename": "cmonge-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7b85891e666239dcc200c8539874cf6c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.10",
            "size": 36717,
            "upload_time": "2025-09-01T13:35:19",
            "upload_time_iso_8601": "2025-09-01T13:35:19.224233Z",
            "url": "https://files.pythonhosted.org/packages/f9/96/fbce8b76d8ccc14bedb9048aae4146e5d6833348b033100b588a0ac99ad1/cmonge-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-01 13:35:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AI4SCR",
    "github_project": "conditional-monge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "flax",
            "specs": [
                [
                    ">=",
                    "0.10.2"
                ],
                [
                    "==",
                    "0.10.*"
                ]
            ]
        },
        {
            "name": "optax",
            "specs": [
                [
                    ">=",
                    "0.2.4"
                ],
                [
                    "==",
                    "0.2.*"
                ]
            ]
        },
        {
            "name": "ott-jax",
            "specs": [
                [
                    ">=",
                    "0.5.0"
                ],
                [
                    "==",
                    "0.5.*"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ],
                [
                    "==",
                    "1.4.*"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    "==",
                    "0.9.*"
                ],
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.*"
                ],
                [
                    ">=",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "optuna",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ],
                [
                    "==",
                    "3.5.*"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.*"
                ],
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "==",
                    "0.13.*"
                ],
                [
                    ">=",
                    "0.13.2"
                ]
            ]
        },
        {
            "name": "dotmap",
            "specs": [
                [
                    ">=",
                    "1.3.30"
                ],
                [
                    "==",
                    "1.3.*"
                ]
            ]
        },
        {
            "name": "umap-learn",
            "specs": [
                [
                    ">=",
                    "0.5.5"
                ],
                [
                    "==",
                    "0.5.*"
                ]
            ]
        },
        {
            "name": "anndata",
            "specs": [
                [
                    ">=",
                    "0.10.5.post1"
                ],
                [
                    "==",
                    "0.10.*"
                ]
            ]
        },
        {
            "name": "scanpy",
            "specs": [
                [
                    "==",
                    "1.9.*"
                ],
                [
                    ">=",
                    "1.9.8"
                ]
            ]
        },
        {
            "name": "chex",
            "specs": [
                [
                    "==",
                    "0.1.*"
                ],
                [
                    ">=",
                    "0.1.85"
                ]
            ]
        },
        {
            "name": "rdkit",
            "specs": [
                [
                    ">=",
                    "2023.9.5"
                ]
            ]
        },
        {
            "name": "jax",
            "specs": [
                [
                    ">=",
                    "0.4.28"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    ">=",
                    "5.13.2"
                ],
                [
                    "==",
                    "5.13.*"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    "==",
                    "24.4.*"
                ],
                [
                    ">=",
                    "24.4.2"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    ">=",
                    "0.5.4"
                ],
                [
                    "==",
                    "0.5.*"
                ]
            ]
        },
        {
            "name": "types-pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0.12.20240311"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.12.*"
                ],
                [
                    "==",
                    "1.12.0"
                ]
            ]
        }
    ],
    "lcname": "cmonge"
}

Alice Driessen