caterva2


Namecaterva2 JSON
Version 2024.7.1 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-07-01 10:08:31
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseGNU Affero General Public License version 3
keywords blosc2 pubsub
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Caterva2: On-demand access to Blosc2 data repositories

## What is it?

Caterva2 is a distributed system written in Python meant for sharing [Blosc2][] datasets (either native or converted on-the-fly from HDF5) among different hosts by using a [publish–subscribe][] messaging pattern.  Here, publishers categorize datasets into root groups that are announced to the broker and propagated to subscribers.  Also, every subscriber exposes a REST interface that allows clients to access the datasets.

<img src="./doc/_static/Caterva2-PubSub.png" alt="Figure: Caterva2 publish-subscribe" width="90%"/>

[Blosc2]: https://www.blosc.org/pages/blosc-in-depth/
    "What Is Blosc? (Blosc blog)"

[publish–subscribe]: https://en.wikipedia.org/wiki/Publish–subscribe_pattern
    "Publish–subscribe pattern (Wikipedia)"

Caterva2 subscribers perform on demand data access with local caching (fit for re-publishing), which can be particularly useful for the efficient sharing of remote datasets locally, thus optimizing communication and storage resources within work groups.

![Figure: Caterva2 on-demand data access](./doc/_static/Caterva2-Data-On-Demand.png)

## Components of Caterva2

A Caterva2 deployment includes:

- One **broker** service to enable the communication between publishers and subscribers.
- Several **publishers**, each one providing subscribers with access to one root and the datasets that it contains. The root may be a native Caterva2 directory with Blosc2 and plain files, or an HDF5 file (support for other formats may be added).
- Several **subscribers**, each one tracking changes in multiple roots and datasets from publishers, and caching them locally for efficient reuse.
- Several **clients**, each one asking a subscriber to track roots and datasets, and provide access to their data and metadata.

Publishers and subscribers may be apart, in different networks with limited or expensive connectivity between them, while subscribers and clients will usually be close enough to have fast and cheap connectivity (e.g. a local network).

The Caterva2 package includes all the aforementioned components, although its main role is to provide a very simple and lightweight library to build your own Caterva2 clients.

## Use with caution

Although this project is in advanced beta stage, it is not meant for production use yet.  In case you are interested in Caterva2, please contact us at <contact@blosc.org>.

## Installation

You may install Caterva2 in several ways:.

- Pre-built wheel from PyPI:

  ```sh
  python -m pip install caterva2
  ```

- Wheel built from source code:

  ```sh
  git clone https://github.com/ironArray/Caterva2
  cd Caterva2
  python -m build
  python -m pip install dist/caterva2-*.whl
  ```

- Developer setup:

  ```sh
  git clone https://github.com/ironArray/Caterva2
  cd Caterva2
  python -m pip install -e .
  ```

In any case, if you intend to run Caterva2 services, client programs, or the test suite, you need to enable the proper extra features by appending `[feature1,feature2...]` to the last argument of `pip` commands above.  The following extras are supported:

- `services` for running all Caterva2 services (broker, publisher, subscriber)
- `base-services` for running the Caterva2 broker or publisher services (lighter, less dependencies)
- `subscriber` for running the Caterva2 subscriber service specifically (heavier, more dependencies)
- `clients` to use Caterva2 client programs (command-line or terminal)
- `hdf5` to enable serving HDF5 files as Caterva2 roots at the publisher
- `blosc2-plugins` to enable extra Blosc2 features like Btune or JPEG 2000 support
- `plugins` to enable Web client features like the tomography display
- `tools` for additional utilities like `cat2import` and `cat2export` (see below)
- `tests` if you want to run the Caterva2 test suite

### Testing

After installing with the `[tests]` extra, you can quickly check that the package is sane by running the test suite (that comes with the package):

```sh
python -m caterva2.tests -v
```

You may also run tests from source code:

```sh
cd Caterva2
python -m pytest -v
```

Tests will use a copy of Caterva2's `root-example` directory.  After they finish, state files will be left under the `_caterva2_tests` directory for inspection (it will be re-created when tests are run again).

In case you want to run the tests with your own running daemons, you can do:

```shell
env CATERVA2_USE_EXTERNAL=1 python -m caterva2.tests -v
```

Neither `root-example` nor `_caterva2_tests` will be used in this case.

## Quick start

(Find more detailed step-by-step [tutorials](Tutorials) in Caterva2 documentation.)

For the purpose of this quick start, let's use the datasets within the `root-example` folder:

```sh
cd Caterva2
ls -F root-example/
```

```
README.md               dir2/                   ds-1d-fields.b2nd       ds-2d-fields.b2nd       ds-sc-attr.b2nd
dir1/                   ds-1d-b.b2nd            ds-1d.b2nd              ds-hello.b2frame
```

First, create a virtual environment and install Caterva2 with the `[services,clients]` extras (see above).  Then fire up the broker, start publishing a root named `foo` with `root-example` datasets, and create a subscriber:

```sh
cat2bro &  # broker
cat2pub foo root-example &  # publisher
cat2sub &  # subscriber
```

(To stop them later on, bring each one to the foreground with `fg` and press Ctrl+C.)

### HDF5 roots

If you want to try and publish your own HDF5 file as a root, you need to include the `hdf5` extra in your Caterva2 installation.  Then you may just run:

```sh
cat2pub foo /path/to/your-file.h5 &
```

You can also get an example HDF5 file with some datasets by running:

```sh
python -m caterva2.services.hdf5root root-example.h5
```

You may want to test compatibility with [silx' HDF5 examples](https://www.silx.org/pub/h5web/) (`epics.h5` and `grove.h5` are quite illustrative).

### The command line client

Now that the services are running, we can use the `cat2cli` client to talk
to the subscriber. In another shell, let's list all the available roots in the system:

```sh
cat2cli roots
```

```
foo
```

We only have the `foo` root that we started publishing. If other publishers were running,
we would see them listed here too.

Let's ask our local subscriber to subscribe to the `foo` root:

```sh
cat2cli subscribe foo  # -> Ok
```

Now, one can list the datasets in the `foo` root:

```sh
cat2cli list foo
```

```
foo/README.md
...
foo/ds-hello.b2frame
...
foo/dir2/ds-4d.b2nd
```

Let's ask the subscriber for more info about the `foo/dir2/ds-4d.b2nd` dataset:

```sh
cat2cli info foo/dir2/ds-4d.b2nd
```

```
{
    'shape': [2, 3, 4, 5],
    'chunks': [1, 2, 3, 4],
    'blocks': [1, 2, 2, 2],
    'dtype': 'complex128',
    'schunk': {
        # ...
    }
}
```

Let's print data from a specified dataset:

```sh
cat2cli show foo/ds-hello.b2frame[:12]  # -> Hello world!
```

It allows printing slices instead of the whole dataset too:

```sh
cat2cli show foo/dir2/ds-4d.b2nd[1,2,3]
```

```
[115.+115.j 116.+116.j 117.+117.j 118.+118.j 119.+119.j]
```

Finally, we can tell the subscriber to download the dataset:

```sh
cat2cli download foo/dir2/ds-4d.b2nd
```

```
Dataset saved to foo/dir2/ds-4d.b2nd
```

### Using a configuration file

All the services mentioned above (and clients, to some limited extent) may get their configuration from a `caterva2.toml` file at the current directory (or an alternative file given with the `--conf` option).  Caterva2 source code includes a fully documented `caterva2.sample.toml` file (see also [caterva2.toml](caterva2.toml) in Caterva2 tutorials).

### Experimental user authentication

The Caterva2 subscriber includes some initial and incomplete support for authenticating users.  To enable it, run the subscriber with the environment variable `CATERVA2_AUTH_SECRET` set to some non-empty, secure string that will be used for various user management operations.  After that, accessing the subscriber's Web client will only be possible after logging in with an email address and a password.  New accounts may be registered, but their addresses are not verified.  Password recovery does not work either.

To tell the command line client to authenticate against a subscriber, add the `--username` and `--password` options:

```sh
cat2cli --user "user@example.com" --pass "foobar" info foo/README.md
```

## Tools

Although Caterva2 allows publishing an HDF5 file directly as a root (with datasets converted to Blosc2 arrays on-the-fly), it also includes a simple script that can import its full hierarchy to a new Caterva2 root directory.  You may use it like:

```sh
cat2import existing-hdf5-file.h5 new-caterva2-root
```

The tool is still pretty limited in its supported input and generated output, please invoke it with `--help` for more information (see also [cat2import](cat2import) in Caterva2 utilities documentation).

Caterva2 also ships a complementary tool to export a Caterva root directory to an HDF5 file; see [cat2export](cat2export) in Caterva2 utilities documentation.  You may use it like:
```sh
cat2export existing-caterva2-root new-hdf5-file.h5
```

That's all folks!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "caterva2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "blosc2, pubsub",
    "author": null,
    "author_email": "ironArray SLU <contact@ironarray.io>",
    "download_url": "https://files.pythonhosted.org/packages/db/ad/55d89c7a9be18af26fe473e76debd1fed08641d5f5ab78a4c30e398dbc40/caterva2-2024.7.1.tar.gz",
    "platform": null,
    "description": "# Caterva2: On-demand access to Blosc2 data repositories\n\n## What is it?\n\nCaterva2 is a distributed system written in Python meant for sharing [Blosc2][] datasets (either native or converted on-the-fly from HDF5) among different hosts by using a [publish\u2013subscribe][] messaging pattern.  Here, publishers categorize datasets into root groups that are announced to the broker and propagated to subscribers.  Also, every subscriber exposes a REST interface that allows clients to access the datasets.\n\n<img src=\"./doc/_static/Caterva2-PubSub.png\" alt=\"Figure: Caterva2 publish-subscribe\" width=\"90%\"/>\n\n[Blosc2]: https://www.blosc.org/pages/blosc-in-depth/\n    \"What Is Blosc? (Blosc blog)\"\n\n[publish\u2013subscribe]: https://en.wikipedia.org/wiki/Publish\u2013subscribe_pattern\n    \"Publish\u2013subscribe pattern (Wikipedia)\"\n\nCaterva2 subscribers perform on demand data access with local caching (fit for re-publishing), which can be particularly useful for the efficient sharing of remote datasets locally, thus optimizing communication and storage resources within work groups.\n\n![Figure: Caterva2 on-demand data access](./doc/_static/Caterva2-Data-On-Demand.png)\n\n## Components of Caterva2\n\nA Caterva2 deployment includes:\n\n- One **broker** service to enable the communication between publishers and subscribers.\n- Several **publishers**, each one providing subscribers with access to one root and the datasets that it contains. The root may be a native Caterva2 directory with Blosc2 and plain files, or an HDF5 file (support for other formats may be added).\n- Several **subscribers**, each one tracking changes in multiple roots and datasets from publishers, and caching them locally for efficient reuse.\n- Several **clients**, each one asking a subscriber to track roots and datasets, and provide access to their data and metadata.\n\nPublishers and subscribers may be apart, in different networks with limited or expensive connectivity between them, while subscribers and clients will usually be close enough to have fast and cheap connectivity (e.g. a local network).\n\nThe Caterva2 package includes all the aforementioned components, although its main role is to provide a very simple and lightweight library to build your own Caterva2 clients.\n\n## Use with caution\n\nAlthough this project is in advanced beta stage, it is not meant for production use yet.  In case you are interested in Caterva2, please contact us at <contact@blosc.org>.\n\n## Installation\n\nYou may install Caterva2 in several ways:.\n\n- Pre-built wheel from PyPI:\n\n  ```sh\n  python -m pip install caterva2\n  ```\n\n- Wheel built from source code:\n\n  ```sh\n  git clone https://github.com/ironArray/Caterva2\n  cd Caterva2\n  python -m build\n  python -m pip install dist/caterva2-*.whl\n  ```\n\n- Developer setup:\n\n  ```sh\n  git clone https://github.com/ironArray/Caterva2\n  cd Caterva2\n  python -m pip install -e .\n  ```\n\nIn any case, if you intend to run Caterva2 services, client programs, or the test suite, you need to enable the proper extra features by appending `[feature1,feature2...]` to the last argument of `pip` commands above.  The following extras are supported:\n\n- `services` for running all Caterva2 services (broker, publisher, subscriber)\n- `base-services` for running the Caterva2 broker or publisher services (lighter, less dependencies)\n- `subscriber` for running the Caterva2 subscriber service specifically (heavier, more dependencies)\n- `clients` to use Caterva2 client programs (command-line or terminal)\n- `hdf5` to enable serving HDF5 files as Caterva2 roots at the publisher\n- `blosc2-plugins` to enable extra Blosc2 features like Btune or JPEG 2000 support\n- `plugins` to enable Web client features like the tomography display\n- `tools` for additional utilities like `cat2import` and `cat2export` (see below)\n- `tests` if you want to run the Caterva2 test suite\n\n### Testing\n\nAfter installing with the `[tests]` extra, you can quickly check that the package is sane by running the test suite (that comes with the package):\n\n```sh\npython -m caterva2.tests -v\n```\n\nYou may also run tests from source code:\n\n```sh\ncd Caterva2\npython -m pytest -v\n```\n\nTests will use a copy of Caterva2's `root-example` directory.  After they finish, state files will be left under the `_caterva2_tests` directory for inspection (it will be re-created when tests are run again).\n\nIn case you want to run the tests with your own running daemons, you can do:\n\n```shell\nenv CATERVA2_USE_EXTERNAL=1 python -m caterva2.tests -v\n```\n\nNeither `root-example` nor `_caterva2_tests` will be used in this case.\n\n## Quick start\n\n(Find more detailed step-by-step [tutorials](Tutorials) in Caterva2 documentation.)\n\nFor the purpose of this quick start, let's use the datasets within the `root-example` folder:\n\n```sh\ncd Caterva2\nls -F root-example/\n```\n\n```\nREADME.md               dir2/                   ds-1d-fields.b2nd       ds-2d-fields.b2nd       ds-sc-attr.b2nd\ndir1/                   ds-1d-b.b2nd            ds-1d.b2nd              ds-hello.b2frame\n```\n\nFirst, create a virtual environment and install Caterva2 with the `[services,clients]` extras (see above).  Then fire up the broker, start publishing a root named `foo` with `root-example` datasets, and create a subscriber:\n\n```sh\ncat2bro &  # broker\ncat2pub foo root-example &  # publisher\ncat2sub &  # subscriber\n```\n\n(To stop them later on, bring each one to the foreground with `fg` and press Ctrl+C.)\n\n### HDF5 roots\n\nIf you want to try and publish your own HDF5 file as a root, you need to include the `hdf5` extra in your Caterva2 installation.  Then you may just run:\n\n```sh\ncat2pub foo /path/to/your-file.h5 &\n```\n\nYou can also get an example HDF5 file with some datasets by running:\n\n```sh\npython -m caterva2.services.hdf5root root-example.h5\n```\n\nYou may want to test compatibility with [silx' HDF5 examples](https://www.silx.org/pub/h5web/) (`epics.h5` and `grove.h5` are quite illustrative).\n\n### The command line client\n\nNow that the services are running, we can use the `cat2cli` client to talk\nto the subscriber. In another shell, let's list all the available roots in the system:\n\n```sh\ncat2cli roots\n```\n\n```\nfoo\n```\n\nWe only have the `foo` root that we started publishing. If other publishers were running,\nwe would see them listed here too.\n\nLet's ask our local subscriber to subscribe to the `foo` root:\n\n```sh\ncat2cli subscribe foo  # -> Ok\n```\n\nNow, one can list the datasets in the `foo` root:\n\n```sh\ncat2cli list foo\n```\n\n```\nfoo/README.md\n...\nfoo/ds-hello.b2frame\n...\nfoo/dir2/ds-4d.b2nd\n```\n\nLet's ask the subscriber for more info about the `foo/dir2/ds-4d.b2nd` dataset:\n\n```sh\ncat2cli info foo/dir2/ds-4d.b2nd\n```\n\n```\n{\n    'shape': [2, 3, 4, 5],\n    'chunks': [1, 2, 3, 4],\n    'blocks': [1, 2, 2, 2],\n    'dtype': 'complex128',\n    'schunk': {\n        # ...\n    }\n}\n```\n\nLet's print data from a specified dataset:\n\n```sh\ncat2cli show foo/ds-hello.b2frame[:12]  # -> Hello world!\n```\n\nIt allows printing slices instead of the whole dataset too:\n\n```sh\ncat2cli show foo/dir2/ds-4d.b2nd[1,2,3]\n```\n\n```\n[115.+115.j 116.+116.j 117.+117.j 118.+118.j 119.+119.j]\n```\n\nFinally, we can tell the subscriber to download the dataset:\n\n```sh\ncat2cli download foo/dir2/ds-4d.b2nd\n```\n\n```\nDataset saved to foo/dir2/ds-4d.b2nd\n```\n\n### Using a configuration file\n\nAll the services mentioned above (and clients, to some limited extent) may get their configuration from a `caterva2.toml` file at the current directory (or an alternative file given with the `--conf` option).  Caterva2 source code includes a fully documented `caterva2.sample.toml` file (see also [caterva2.toml](caterva2.toml) in Caterva2 tutorials).\n\n### Experimental user authentication\n\nThe Caterva2 subscriber includes some initial and incomplete support for authenticating users.  To enable it, run the subscriber with the environment variable `CATERVA2_AUTH_SECRET` set to some non-empty, secure string that will be used for various user management operations.  After that, accessing the subscriber's Web client will only be possible after logging in with an email address and a password.  New accounts may be registered, but their addresses are not verified.  Password recovery does not work either.\n\nTo tell the command line client to authenticate against a subscriber, add the `--username` and `--password` options:\n\n```sh\ncat2cli --user \"user@example.com\" --pass \"foobar\" info foo/README.md\n```\n\n## Tools\n\nAlthough Caterva2 allows publishing an HDF5 file directly as a root (with datasets converted to Blosc2 arrays on-the-fly), it also includes a simple script that can import its full hierarchy to a new Caterva2 root directory.  You may use it like:\n\n```sh\ncat2import existing-hdf5-file.h5 new-caterva2-root\n```\n\nThe tool is still pretty limited in its supported input and generated output, please invoke it with `--help` for more information (see also [cat2import](cat2import) in Caterva2 utilities documentation).\n\nCaterva2 also ships a complementary tool to export a Caterva root directory to an HDF5 file; see [cat2export](cat2export) in Caterva2 utilities documentation.  You may use it like:\n```sh\ncat2export existing-caterva2-root new-hdf5-file.h5\n```\n\nThat's all folks!\n",
    "bugtrack_url": null,
    "license": "GNU Affero General Public License version 3",
    "summary": null,
    "version": "2024.7.1",
    "project_urls": {
        "Home": "https://github.com/ironArray/Caterva2"
    },
    "split_keywords": [
        "blosc2",
        " pubsub"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d32a7373f478f030565b0b8b00b3ad7a354ead52f13e1edd1563ca7fbe7460a7",
                "md5": "d0c62de45fb0cae5ef73935608834c7c",
                "sha256": "8daaa6d5e286b26c10290c87892a079a4108a38855aebef46cf2e66086d6c753"
            },
            "downloads": -1,
            "filename": "caterva2-2024.7.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0c62de45fb0cae5ef73935608834c7c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 825182,
            "upload_time": "2024-07-01T10:08:29",
            "upload_time_iso_8601": "2024-07-01T10:08:29.571832Z",
            "url": "https://files.pythonhosted.org/packages/d3/2a/7373f478f030565b0b8b00b3ad7a354ead52f13e1edd1563ca7fbe7460a7/caterva2-2024.7.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbad55d89c7a9be18af26fe473e76debd1fed08641d5f5ab78a4c30e398dbc40",
                "md5": "8be3ff58f4e0a3b814950804ef733d45",
                "sha256": "4e6debcc06f0274f2ac897956cadc409031bcf46ba374ca9887eabd7a792e31b"
            },
            "downloads": -1,
            "filename": "caterva2-2024.7.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8be3ff58f4e0a3b814950804ef733d45",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 2747547,
            "upload_time": "2024-07-01T10:08:31",
            "upload_time_iso_8601": "2024-07-01T10:08:31.579703Z",
            "url": "https://files.pythonhosted.org/packages/db/ad/55d89c7a9be18af26fe473e76debd1fed08641d5f5ab78a4c30e398dbc40/caterva2-2024.7.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-01 10:08:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ironArray",
    "github_project": "Caterva2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "caterva2"
}
        
Elapsed time: 0.49401s