causalbench


Namecausalbench JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://www.gsk.ai/causalbenchchallenge
Summary
upload_time2023-11-06 16:04:59
maintainer
docs_urlNone
authorsee README.txt
requires_python>=3.8
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CausalBench
![Python version](https://img.shields.io/badge/Python-3.8-blue)
![Library version](https://img.shields.io/badge/Version-1.1.1-blue)

## Introduction

Mapping biological mechanisms in cellular systems is a fundamental step in early stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, inferring graphical networks of the size typically encountered in real-world gene-gene interaction networks is difficult in terms of both achieving and evaluating faithfulness to the true underlying causal graph. Moreover, standardised benchmarks for comparing methods for causal discovery in perturbational single-cell data do not yet exist. Here, we introduce CausalBench - a comprehensive benchmark suite for evaluating network inference methods on large-scale perturbational single-cell gene expression data. CausalBench introduces several biologically meaningful performance metrics and operates on two large, curated and openly available benchmark data sets for evaluating methods on the inference of gene regulatory networks from single-cell data generated under perturbations. With real-world datasets consisting of over 200 000 training samples under interventions, CausalBench could potentially help facilitate advances in causal network inference by providing what is - to the best of our knowledge - the largest openly available test bed for causal discovery from real-world perturbation data to date.

## CausalBench ICLR-23 Challenge

Learn more about the CausalBench challenge for graph inference on perturbational gene expression data [here](https://www.gsk.ai/causalbench-challenge/).

## Datasets

- RPE1 day 7 Perturb-seq (RD7): targeting DepMap essential genes at day 7 after transduction
- K562 day 6 Perturb-seq (KD7): targeting DepMap essential genes at day 6 after transduction


## Training Regimes

- Observational: only observational data is given as training data to the model.
- Observational and partial interventional: observational as well as interventional data for part of the variables is given as training data to the model.
- Observational and full interventional: observational as well as interventional data for all the variables is given as training data to the model.

## Install

```bash
pip install causalbench
```

## How to run the benchmark?

Example of command to run a model on the k562 dataset in the observational regime. 

```bash
causalbench_run \
    --dataset_name weissmann_k562 \
    --output_directory /path/to/output/ \
    --data_directory /path/to/data/storage \
    --training_regime observational \
    --model_name pc \
    --subset_data 1.0 \
    --model_seed 0 \
    --do_filter \
    --max_path_length -1 \
    --omission_estimation_size 500
```

Results are written to the folder at `/path/to/output/`, and processed datasets will be cached at `/path/to/data/storage`. See the MainApp class for more hyperparameter options, especially in the (partial) interventional setting.

## Add a model

New models can be easily added. The only contract for a model is to implement the [AbstractInferenceModel] class.

```python
from causalscbench.models.abstract_model import AbstractInferenceModel

class FullyConnected(AbstractInferenceModel):
    def __init__(self) -> None:
        super().__init__()

    def __call__(
        self,
        expression_matrix: np.array,
        interventions: List[str],
        gene_names: List[str],
        training_regime: TrainingRegime,
        seed: int = 0,
    ) -> List[Tuple]:
        random.seed(seed)
        edges = set()
        for i in range(len(gene_names)):
            a = gene_names[i]
            for j in range(i + 1, len(gene_names)):
                b = gene_names[j]
                edges.add((a, b))
                edges.add((b, a))
        return list(edges)
```

## Citation

Please consider citing, if you reference or use our methodology, code or results in your work: 

    @article{chevalley2022causalbench,
        title={{CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data}},
        author={Chevalley, Mathieu and Roohani, Yusuf and Mehrjou, Arash and Leskovec, Jure and Schwab, Patrick},
        journal={arXiv preprint arXiv:2210.17283},
        year={2022}
    }


### License

[License](LICENSE.txt)

## External data

Data in the /data_access/data folder were aggregated from the following resource: "ChIP-Atlas © Shinya Oki (Kyoto University) licensed under CC Attribution-Share Alike 4.0 International" [link to license](https://dbarchive.biosciencedbc.jp/en/chip-atlas/lic.html). The adapted datasets based on a part of the original database is here redistributed under the same license, which can be found in the LICENSE.txt file.

This codebase also links to muliple data sources to be downloaded.  The associated licenses are summarized here:

Replogle et al (perturb-seq screen): https://gwps.wi.mit.edu/ (LICENSE: CC-BY-4.0)

CORUM: http://mips.helmholtz-muenchen.de/corum/ (LICENSE; CC-BY-NC)

StringDB : https://string-db.org/cgi/download.pl (LICENSE: CC-BY-4)

CellTalkDB: http://tcm.zju.edu.cn/celltalkdb/download.php (LICENSE: GNU General Public License v3.0)

### Authors

Mathieu Chevalley, GSK plc<br/>
Yusuf H Roohani, GSK plc and Stanford University<br/>
Arash Mehrjou, GSK plc<br/>
Jure Leskovec, Stanford University<br/>
Patrick Schwab, GSK plc<br/>

### Acknowledgements

MC, YR, AM and PS are employees and shareholders of GlaxoSmithKline plc.

            

Raw data

            {
    "_id": null,
    "home_page": "https://www.gsk.ai/causalbenchchallenge",
    "name": "causalbench",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "see README.txt",
    "author_email": "biomedical-ai-external@gsk.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/e7/017712cccbdc403f12832edf0842f1414be8e3a232ab444d647ec38ec6eb/causalbench-1.1.1.tar.gz",
    "platform": null,
    "description": "# CausalBench\n![Python version](https://img.shields.io/badge/Python-3.8-blue)\n![Library version](https://img.shields.io/badge/Version-1.1.1-blue)\n\n## Introduction\n\nMapping biological mechanisms in cellular systems is a fundamental step in early stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, inferring graphical networks of the size typically encountered in real-world gene-gene interaction networks is difficult in terms of both achieving and evaluating faithfulness to the true underlying causal graph. Moreover, standardised benchmarks for comparing methods for causal discovery in perturbational single-cell data do not yet exist. Here, we introduce CausalBench - a comprehensive benchmark suite for evaluating network inference methods on large-scale perturbational single-cell gene expression data. CausalBench introduces several biologically meaningful performance metrics and operates on two large, curated and openly available benchmark data sets for evaluating methods on the inference of gene regulatory networks from single-cell data generated under perturbations. With real-world datasets consisting of over 200 000 training samples under interventions, CausalBench could potentially help facilitate advances in causal network inference by providing what is - to the best of our knowledge - the largest openly available test bed for causal discovery from real-world perturbation data to date.\n\n## CausalBench ICLR-23 Challenge\n\nLearn more about the CausalBench challenge for graph inference on perturbational gene expression data [here](https://www.gsk.ai/causalbench-challenge/).\n\n## Datasets\n\n- RPE1 day 7 Perturb-seq (RD7): targeting DepMap essential genes at day 7 after transduction\n- K562 day 6 Perturb-seq (KD7): targeting DepMap essential genes at day 6 after transduction\n\n\n## Training Regimes\n\n- Observational: only observational data is given as training data to the model.\n- Observational and partial interventional: observational as well as interventional data for part of the variables is given as training data to the model.\n- Observational and full interventional: observational as well as interventional data for all the variables is given as training data to the model.\n\n## Install\n\n```bash\npip install causalbench\n```\n\n## How to run the benchmark?\n\nExample of command to run a model on the k562 dataset in the observational regime. \n\n```bash\ncausalbench_run \\\n    --dataset_name weissmann_k562 \\\n    --output_directory /path/to/output/ \\\n    --data_directory /path/to/data/storage \\\n    --training_regime observational \\\n    --model_name pc \\\n    --subset_data 1.0 \\\n    --model_seed 0 \\\n    --do_filter \\\n    --max_path_length -1 \\\n    --omission_estimation_size 500\n```\n\nResults are written to the folder at `/path/to/output/`, and processed datasets will be cached at `/path/to/data/storage`. See the MainApp class for more hyperparameter options, especially in the (partial) interventional setting.\n\n## Add a model\n\nNew models can be easily added. The only contract for a model is to implement the [AbstractInferenceModel] class.\n\n```python\nfrom causalscbench.models.abstract_model import AbstractInferenceModel\n\nclass FullyConnected(AbstractInferenceModel):\n    def __init__(self) -> None:\n        super().__init__()\n\n    def __call__(\n        self,\n        expression_matrix: np.array,\n        interventions: List[str],\n        gene_names: List[str],\n        training_regime: TrainingRegime,\n        seed: int = 0,\n    ) -> List[Tuple]:\n        random.seed(seed)\n        edges = set()\n        for i in range(len(gene_names)):\n            a = gene_names[i]\n            for j in range(i + 1, len(gene_names)):\n                b = gene_names[j]\n                edges.add((a, b))\n                edges.add((b, a))\n        return list(edges)\n```\n\n## Citation\n\nPlease consider citing, if you reference or use our methodology, code or results in your work: \n\n    @article{chevalley2022causalbench,\n        title={{CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data}},\n        author={Chevalley, Mathieu and Roohani, Yusuf and Mehrjou, Arash and Leskovec, Jure and Schwab, Patrick},\n        journal={arXiv preprint arXiv:2210.17283},\n        year={2022}\n    }\n\n\n### License\n\n[License](LICENSE.txt)\n\n## External data\n\nData in the /data_access/data folder were aggregated from the following resource: \"ChIP-Atlas \u00a9 Shinya Oki (Kyoto University) licensed under CC Attribution-Share Alike 4.0 International\" [link to license](https://dbarchive.biosciencedbc.jp/en/chip-atlas/lic.html). The adapted datasets based on a part of the original database is here redistributed under the same license, which can be found in the LICENSE.txt file.\n\nThis codebase also links to muliple data sources to be downloaded.  The associated licenses are summarized here:\n\nReplogle et al (perturb-seq screen): https://gwps.wi.mit.edu/ (LICENSE: CC-BY-4.0)\n\nCORUM: http://mips.helmholtz-muenchen.de/corum/ (LICENSE; CC-BY-NC)\n\nStringDB : https://string-db.org/cgi/download.pl (LICENSE: CC-BY-4)\n\nCellTalkDB: http://tcm.zju.edu.cn/celltalkdb/download.php (LICENSE: GNU General Public License v3.0)\n\n### Authors\n\nMathieu Chevalley, GSK plc<br/>\nYusuf H Roohani, GSK plc and Stanford University<br/>\nArash Mehrjou, GSK plc<br/>\nJure Leskovec, Stanford University<br/>\nPatrick Schwab, GSK plc<br/>\n\n### Acknowledgements\n\nMC, YR, AM and PS are employees and shareholders of GlaxoSmithKline plc.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://www.gsk.ai/causalbenchchallenge"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "69e4824c1eb495170a5bd98abf4abe65f577fbfeeade0ba3054b289766559f0b",
                "md5": "718df495f84097670bc7f2d82bea1f31",
                "sha256": "710a02684a42cf02c2cc428db7d0296dbfdf4c5c55fdeed581c3480c9c02a314"
            },
            "downloads": -1,
            "filename": "causalbench-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "718df495f84097670bc7f2d82bea1f31",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 14524538,
            "upload_time": "2023-11-06T16:04:55",
            "upload_time_iso_8601": "2023-11-06T16:04:55.087573Z",
            "url": "https://files.pythonhosted.org/packages/69/e4/824c1eb495170a5bd98abf4abe65f577fbfeeade0ba3054b289766559f0b/causalbench-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b7e7017712cccbdc403f12832edf0842f1414be8e3a232ab444d647ec38ec6eb",
                "md5": "ccc4be16b48bfb14e4a543c5d829dd06",
                "sha256": "261b05806d6fa79d2e789550a8ae63271fbd1cc122bc08a32e1d854b6743a0f5"
            },
            "downloads": -1,
            "filename": "causalbench-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ccc4be16b48bfb14e4a543c5d829dd06",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 14445923,
            "upload_time": "2023-11-06T16:04:59",
            "upload_time_iso_8601": "2023-11-06T16:04:59.595602Z",
            "url": "https://files.pythonhosted.org/packages/b7/e7/017712cccbdc403f12832edf0842f1414be8e3a232ab444d647ec38ec6eb/causalbench-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-06 16:04:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "causalbench"
}
        
Elapsed time: 0.31348s