mcbackend


Namemcbackend JSON
Version 0.5.2 PyPI version JSON
download
home_pagehttps://github.com/michaelosthege/mcbackend
SummaryFramework agnostic backends for MCMC sample storage
upload_time2024-01-17 23:33:51
maintainer
docs_urlNone
authorMichael Osthege
requires_python>=3.7
licenseAGPLv3
keywords
VCS
bugtrack_url
requirements betterproto hagelkorn numpy pandas
Travis-CI No Travis.
coveralls test coverage
            [![PyPI version](https://img.shields.io/pypi/v/mcbackend)](https://pypi.org/project/mcbackend)
[![pipeline](https://github.com/michaelosthege/mcbackend/workflows/pipeline/badge.svg)](https://github.com/michaelosthege/mcbackend/actions)
[![coverage](https://codecov.io/gh/michaelosthege/mcbackend/branch/main/graph/badge.svg)](https://codecov.io/gh/michaelosthege/mcbackend)

Where do _you_ want to store your MCMC draws?
In memory?
On disk?
Or in a database running in a datacenter?

No matter where you want to put them, or which <abbr title="probabilistic programming language">PPL</abbr> generates them: McBackend takes care of your MCMC samples.

## Quickstart
The `mcbackend` package consists of three parts:

### Part 1: A schema for MCMC run & chain metadata
No matter which programming language your favorite PPL is written in, the [ProtocolBuffers](https://developers.google.com/protocol-buffers/) from McBackend can be used to generate code in languages like C++, C#, Python and many more to represent commonly used metadata about MCMC runs, chains and model variables.

The definitions in [`protobufs/meta.proto`](./protobufs/meta.proto) are designed to maximize compatibility with [`ArviZ`](https://github.com/arviz-devs/arviz) objects, making it easy to transform MCMC draws stored according to the McBackend schema to `InferenceData` objects for plotting & analysis.

### Part 2: A storage backend interface
The  `draws` and `stats` created by MCMC sampling algorithms at runtime need to be stored _somewhere_.

This "somewhere" is called the storage _backend_ in PPLs/MCMC frameworks like [PyMC](https://github.com/pymc-devs/pymc) or [emcee](https://github.com/dfm/emcee).

Most storage backends must be initialized with metadata about the model variables so they can, for example, pre-allocated memory for the `draws` and `stats` they're about to receive.
After then receiving thousands of `draws` and `stats` they must then provide methods by which the `draws`/`stats` can be retrieved.

The `mcbackend.core` module has classes such as `Backend`, `Run`, and `Chain` to define these interfaces for any storage backend, no matter if it's an in-memory, filesystem or database storage.
Albeit this implementation is currently Python-only, the interface signature should be portable to e.g. C++.

Via `mcbackend.backends` the McBackend package then provides backend _implementations_.
Currently you may choose from:

```python
backend = mcbackend.NumPyBackend()
backend = mcbackend.ClickHouseBackend( client=clickhouse_driver.Client("localhost") )

# All that matters:
isinstance(backend, mcbackend.Backend)
# >>> True
```

### Part 3: PPL adapters
Anything that is a `Backend` can be wrapped by an [adapter](https://en.wikipedia.org/wiki/Adapter_pattern) that makes it compatible with your favorite PPL.

In the example below, a `ClickHouseBackend` is initialized to store MCMC draws from a PyMC model in a [ClickHouse](http://clickhouse.com/) database.
See below for [how to run it in Docker](#development).

```python
import clickhouse_driver
import mcbackend
import pymc as pm

# 1. Create _any_ kind of backend
ch_client = clickhouse_driver.Client("localhost")
backend = mcbackend.ClickHouseBackend(ch_client)

with pm.Model():
    # 2. Create your model
    ...
    # 3. Hit the inference button ™ while passing the backend!
    pm.sample(trace=backend)
```

In case of PyMC the adapter lives in the PyMC codebase [since version 5.1.1](https://github.com/pymc-devs/pymc/releases/tag/v5.1.1),
so all you need to do is pass any `mcbackend.Backend` via the `pm.sample(trace=...)` parameter!

Instead of using PyMC's built-in NumPy backend, the MCMC draws now end up in ClickHouse.

### Retrieving the `draws` & `stats`
Continuing the example from above we can now retrieve draws from the backend.

Note that since this example wrote the draws to ClickHouse, we could run the code below on another machine, and even while the above model is still sampling!

```python
backend = mcbackend.ClickHouseBackend(ch_client)

# Fetch the run from the database (downloads just metadata)
run = backend.get_run(trace.run_id)

# Get all draws from a chain
chain = run.get_chains()[0]
chain.get_draws("my favorite variable")
# >>> array([ ... ])

# Convert everything to `InferenceData`
idata = run.to_inferencedata()
print(idata)
# >>> Inference data with groups:
# >>> 	> posterior
# >>> 	> sample_stats
# >>> 	> observed_data
# >>> 	> constant_data
# >>>
# >>> Warmup iterations saved (warmup_*).
```

# Contributing what's next
McBackend just started and is looking for contributions.
For example:
* Schema discussion: Which metadata is needed? (related: [PyMC #5160](https://github.com/pymc-devs/pymc/issues/5160))
* Interface discussion: How should `Backend`/`Run`/`Chain` evolve?
* Python Backends for disk storage (HDF5, `*.proto`, ...)
* C++ `Backend`/`Run`/`Chain` interfaces
* C++ ClickHouse backend (via [`clickhouse-cpp`](https://github.com/ClickHouse/clickhouse-cpp))

As the schema and API stabilizes a mid-term goal might be to replace PyMC `BaseTrace`/`MultiTrace` entirely to rely on `mcbackend`.

Getting rid of `MultiTrace` was a [long-term goal](https://github.com/pymc-devs/pymc/issues/4372#issuecomment-770100410) behind making `pm.sample(return_inferencedata=True)` the default.

## Development
First clone the repository and install `mcbackend` locally:

```bash
pip install -e .
```

To run the tests:

```bash
pip install -r requirements-dev.txt
pytest -v
```

Some tests need a ClickHouse database server running locally.
To start one in Docker:

```bash
docker run --detach --rm --name mcbackend-db -p 9000:9000 --ulimit nofile=262144:262144 clickhouse/clickhouse-server
```

### Compiling the ProtocolBuffers
If you don't already have it, first install the protobuf compiler:
```bash
conda install protobuf
pip install --pre "betterproto[compiler]"
```

To compile the `*.proto` files for languages other than Python, check the [ProtocolBuffers documentation](https://developers.google.com/protocol-buffers/docs/tutorials).

The following script compiles them for Python using the [`betterproto`](https://github.com/danielgtaylor/python-betterproto) compiler plugin to get nice-looking dataclasses.
It also copies the generated files to the right place in `mcbackend`.

```bash
python protobufs/generate.py
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/michaelosthege/mcbackend",
    "name": "mcbackend",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Michael Osthege",
    "author_email": "michael.osthege@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/0c/34/0265c9ee4173c7130d73e49501ba1353ef39c7fd75a5788789b176a84344/mcbackend-0.5.2.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://img.shields.io/pypi/v/mcbackend)](https://pypi.org/project/mcbackend)\n[![pipeline](https://github.com/michaelosthege/mcbackend/workflows/pipeline/badge.svg)](https://github.com/michaelosthege/mcbackend/actions)\n[![coverage](https://codecov.io/gh/michaelosthege/mcbackend/branch/main/graph/badge.svg)](https://codecov.io/gh/michaelosthege/mcbackend)\n\nWhere do _you_ want to store your MCMC draws?\nIn memory?\nOn disk?\nOr in a database running in a datacenter?\n\nNo matter where you want to put them, or which <abbr title=\"probabilistic programming language\">PPL</abbr> generates them: McBackend takes care of your MCMC samples.\n\n## Quickstart\nThe `mcbackend` package consists of three parts:\n\n### Part 1: A schema for MCMC run & chain metadata\nNo matter which programming language your favorite PPL is written in, the [ProtocolBuffers](https://developers.google.com/protocol-buffers/) from McBackend can be used to generate code in languages like C++, C#, Python and many more to represent commonly used metadata about MCMC runs, chains and model variables.\n\nThe definitions in [`protobufs/meta.proto`](./protobufs/meta.proto) are designed to maximize compatibility with [`ArviZ`](https://github.com/arviz-devs/arviz) objects, making it easy to transform MCMC draws stored according to the McBackend schema to `InferenceData` objects for plotting & analysis.\n\n### Part 2: A storage backend interface\nThe  `draws` and `stats` created by MCMC sampling algorithms at runtime need to be stored _somewhere_.\n\nThis \"somewhere\" is called the storage _backend_ in PPLs/MCMC frameworks like [PyMC](https://github.com/pymc-devs/pymc) or [emcee](https://github.com/dfm/emcee).\n\nMost storage backends must be initialized with metadata about the model variables so they can, for example, pre-allocated memory for the `draws` and `stats` they're about to receive.\nAfter then receiving thousands of `draws` and `stats` they must then provide methods by which the `draws`/`stats` can be retrieved.\n\nThe `mcbackend.core` module has classes such as `Backend`, `Run`, and `Chain` to define these interfaces for any storage backend, no matter if it's an in-memory, filesystem or database storage.\nAlbeit this implementation is currently Python-only, the interface signature should be portable to e.g. C++.\n\nVia `mcbackend.backends` the McBackend package then provides backend _implementations_.\nCurrently you may choose from:\n\n```python\nbackend = mcbackend.NumPyBackend()\nbackend = mcbackend.ClickHouseBackend( client=clickhouse_driver.Client(\"localhost\") )\n\n# All that matters:\nisinstance(backend, mcbackend.Backend)\n# >>> True\n```\n\n### Part 3: PPL adapters\nAnything that is a `Backend` can be wrapped by an [adapter](https://en.wikipedia.org/wiki/Adapter_pattern) that makes it compatible with your favorite PPL.\n\nIn the example below, a `ClickHouseBackend` is initialized to store MCMC draws from a PyMC model in a [ClickHouse](http://clickhouse.com/) database.\nSee below for [how to run it in Docker](#development).\n\n```python\nimport clickhouse_driver\nimport mcbackend\nimport pymc as pm\n\n# 1. Create _any_ kind of backend\nch_client = clickhouse_driver.Client(\"localhost\")\nbackend = mcbackend.ClickHouseBackend(ch_client)\n\nwith pm.Model():\n    # 2. Create your model\n    ...\n    # 3. Hit the inference button \u2122 while passing the backend!\n    pm.sample(trace=backend)\n```\n\nIn case of PyMC the adapter lives in the PyMC codebase [since version 5.1.1](https://github.com/pymc-devs/pymc/releases/tag/v5.1.1),\nso all you need to do is pass any `mcbackend.Backend` via the `pm.sample(trace=...)` parameter!\n\nInstead of using PyMC's built-in NumPy backend, the MCMC draws now end up in ClickHouse.\n\n### Retrieving the `draws` & `stats`\nContinuing the example from above we can now retrieve draws from the backend.\n\nNote that since this example wrote the draws to ClickHouse, we could run the code below on another machine, and even while the above model is still sampling!\n\n```python\nbackend = mcbackend.ClickHouseBackend(ch_client)\n\n# Fetch the run from the database (downloads just metadata)\nrun = backend.get_run(trace.run_id)\n\n# Get all draws from a chain\nchain = run.get_chains()[0]\nchain.get_draws(\"my favorite variable\")\n# >>> array([ ... ])\n\n# Convert everything to `InferenceData`\nidata = run.to_inferencedata()\nprint(idata)\n# >>> Inference data with groups:\n# >>> \t> posterior\n# >>> \t> sample_stats\n# >>> \t> observed_data\n# >>> \t> constant_data\n# >>>\n# >>> Warmup iterations saved (warmup_*).\n```\n\n# Contributing what's next\nMcBackend just started and is looking for contributions.\nFor example:\n* Schema discussion: Which metadata is needed? (related: [PyMC #5160](https://github.com/pymc-devs/pymc/issues/5160))\n* Interface discussion: How should `Backend`/`Run`/`Chain` evolve?\n* Python Backends for disk storage (HDF5, `*.proto`, ...)\n* C++ `Backend`/`Run`/`Chain` interfaces\n* C++ ClickHouse backend (via [`clickhouse-cpp`](https://github.com/ClickHouse/clickhouse-cpp))\n\nAs the schema and API stabilizes a mid-term goal might be to replace PyMC `BaseTrace`/`MultiTrace` entirely to rely on `mcbackend`.\n\nGetting rid of `MultiTrace` was a [long-term goal](https://github.com/pymc-devs/pymc/issues/4372#issuecomment-770100410) behind making `pm.sample(return_inferencedata=True)` the default.\n\n## Development\nFirst clone the repository and install `mcbackend` locally:\n\n```bash\npip install -e .\n```\n\nTo run the tests:\n\n```bash\npip install -r requirements-dev.txt\npytest -v\n```\n\nSome tests need a ClickHouse database server running locally.\nTo start one in Docker:\n\n```bash\ndocker run --detach --rm --name mcbackend-db -p 9000:9000 --ulimit nofile=262144:262144 clickhouse/clickhouse-server\n```\n\n### Compiling the ProtocolBuffers\nIf you don't already have it, first install the protobuf compiler:\n```bash\nconda install protobuf\npip install --pre \"betterproto[compiler]\"\n```\n\nTo compile the `*.proto` files for languages other than Python, check the [ProtocolBuffers documentation](https://developers.google.com/protocol-buffers/docs/tutorials).\n\nThe following script compiles them for Python using the [`betterproto`](https://github.com/danielgtaylor/python-betterproto) compiler plugin to get nice-looking dataclasses.\nIt also copies the generated files to the right place in `mcbackend`.\n\n```bash\npython protobufs/generate.py\n```\n",
    "bugtrack_url": null,
    "license": "AGPLv3",
    "summary": "Framework agnostic backends for MCMC sample storage",
    "version": "0.5.2",
    "project_urls": {
        "Homepage": "https://github.com/michaelosthege/mcbackend"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcde14d8aa7c467244442c5e8fa2a5ea20096d3bb4ccef7146771a0cc2146829",
                "md5": "2203d85675a070443d9449b9b062af59",
                "sha256": "0d64a8550541d089b20172c2e44a1acf287ffddd5d3cccdc31d8d976364dfac9"
            },
            "downloads": -1,
            "filename": "mcbackend-0.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2203d85675a070443d9449b9b062af59",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 41102,
            "upload_time": "2024-01-17T23:33:50",
            "upload_time_iso_8601": "2024-01-17T23:33:50.188888Z",
            "url": "https://files.pythonhosted.org/packages/dc/de/14d8aa7c467244442c5e8fa2a5ea20096d3bb4ccef7146771a0cc2146829/mcbackend-0.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c340265c9ee4173c7130d73e49501ba1353ef39c7fd75a5788789b176a84344",
                "md5": "1bcdf6d0d70de880c237c48b2b33c89c",
                "sha256": "7e0e2d1d0734bf5cc0e1bb5b2a03dfbcf81d5b7f87a2d0d491187d0896ec1644"
            },
            "downloads": -1,
            "filename": "mcbackend-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "1bcdf6d0d70de880c237c48b2b33c89c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 38715,
            "upload_time": "2024-01-17T23:33:51",
            "upload_time_iso_8601": "2024-01-17T23:33:51.982680Z",
            "url": "https://files.pythonhosted.org/packages/0c/34/0265c9ee4173c7130d73e49501ba1353ef39c7fd75a5788789b176a84344/mcbackend-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-17 23:33:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "michaelosthege",
    "github_project": "mcbackend",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "betterproto",
            "specs": [
                [
                    "==",
                    "2.0.0b6"
                ]
            ]
        },
        {
            "name": "hagelkorn",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        }
    ],
    "lcname": "mcbackend"
}
        
Elapsed time: 0.17579s