epic-bitstore


Nameepic-bitstore JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/epic-framework/epic-bitstore
SummaryMulti-tiered cloud-backed blob storage system
upload_time2025-01-09 18:30:04
maintainerNone
docs_urlNone
authorYonatan Perry
requires_python<4.0,>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Epic bitstore &mdash; Multi-tiered cloud-backed blob storage system
[![Epic-bitstore CI](https://github.com/epic-framework/epic-bitstore/actions/workflows/ci.yml/badge.svg)](https://github.com/epic-framework/epic-bitstore/actions/workflows/ci.yml)

## What is it?

The **epic-bitstore** Python library provides client access to a multi-tiered blob storage system based on cloud
backends, with the option to use API backends and caching mechanisms as well.

## Usage

For example, let's assume you store blobs in the following locations:
1. In an AWS bucket: `s3://aws_customer_data/files/<sha1>`
2. In a GCP bucket: `gs://gcp_customer_data/blobs/<sha1>`
3. In another GCP bucket: `gs://my_project/more_files/<sha1>`

Using epic-bitstore, you could fetch a blob from any of these storages using a single `get` command. The library would
iterate on the sources in order and would retrieve the data from the first matching source. This could also run in
parallel on multiple blobs, backed by the [ultima](https://github.com/Cybereason/ultima) parallelization library.

To implement the above strategy, you create a `Composite` store, and add each of your sources in order:
```python
from epic.bitstore import Sha1Composite, Sha1Store, S3Raw, GSRaw

blob_store = Sha1Composite()
blob_store.append_source(Sha1Store(S3Raw(), "s3://aws_customer_data/files/"))
blob_store.append_source(Sha1Store(GSRaw(), "gs://gcp_customer_data/blobs/"))
blob_store.append_source(Sha1Store(GSRaw(), "gs://my_project/more_files/"))

data = blob_store.get("4bc39c7d87318382feb3cc5a684c767fbd913968")
```

You can then use parallelization to efficiently map an iterator of hashes into an iterator of byte buffers:
```python
from ultima import ultimap

data_iter = ultimap(blob_store.get, iter_hashes, backend='threading', n_workers=16)
```

## API sources and caching layers

Let's also assume that you have an API that can retrieve blobs given their SHA1.
You would like to use it for fetching blobs, but only when they're not found in the above "passive" sources.

You can implement a `Sha1APISource` for your API, and add it to the `Composite` object:
```python
from epic.bitstore import Sha1APISource

class MyAPIStore(Sha1APISource):
    def __init__(self, api_client):
        super().__init__()
        self.api_client = api_client
    
    def api_get(self, sha1):
        # return None for a blob that can't be found
        return self.api_client.get_bytes(sha1)

# ctd after adding the three passive stores
blob_store.append_source(MyAPIStore(my_api_client))
```

API sources are often expensive, either in cost or performance.
You can add a caching store, and configure the API source to store fetched blobs into the cache.
It is important to append the cache *before* adding the API source, so that its cached blobs have precedence.

Append the cache and the API source:
```python
from epic.bitstore import Sha1Cache, GSRaw

# ctd after adding the three passive stores
blob_store.append_cache(Sha1Cache(GSRaw(), "gs://cache_for_api/"))
blob_store.append_store(MyAPIStore(my_api_client))
```

Now, when you retrieve a missing blob for the first time, the API is used; and after that the cache is used.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/epic-framework/epic-bitstore",
    "name": "epic-bitstore",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Yonatan Perry",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/05/d3/f0dd0fd460d154d29cb28fead6f15cd9c35f2a04691865f403b2ee69d37c/epic_bitstore-1.1.0.tar.gz",
    "platform": null,
    "description": "# Epic bitstore &mdash; Multi-tiered cloud-backed blob storage system\n[![Epic-bitstore CI](https://github.com/epic-framework/epic-bitstore/actions/workflows/ci.yml/badge.svg)](https://github.com/epic-framework/epic-bitstore/actions/workflows/ci.yml)\n\n## What is it?\n\nThe **epic-bitstore** Python library provides client access to a multi-tiered blob storage system based on cloud\nbackends, with the option to use API backends and caching mechanisms as well.\n\n## Usage\n\nFor example, let's assume you store blobs in the following locations:\n1. In an AWS bucket: `s3://aws_customer_data/files/<sha1>`\n2. In a GCP bucket: `gs://gcp_customer_data/blobs/<sha1>`\n3. In another GCP bucket: `gs://my_project/more_files/<sha1>`\n\nUsing epic-bitstore, you could fetch a blob from any of these storages using a single `get` command. The library would\niterate on the sources in order and would retrieve the data from the first matching source. This could also run in\nparallel on multiple blobs, backed by the [ultima](https://github.com/Cybereason/ultima) parallelization library.\n\nTo implement the above strategy, you create a `Composite` store, and add each of your sources in order:\n```python\nfrom epic.bitstore import Sha1Composite, Sha1Store, S3Raw, GSRaw\n\nblob_store = Sha1Composite()\nblob_store.append_source(Sha1Store(S3Raw(), \"s3://aws_customer_data/files/\"))\nblob_store.append_source(Sha1Store(GSRaw(), \"gs://gcp_customer_data/blobs/\"))\nblob_store.append_source(Sha1Store(GSRaw(), \"gs://my_project/more_files/\"))\n\ndata = blob_store.get(\"4bc39c7d87318382feb3cc5a684c767fbd913968\")\n```\n\nYou can then use parallelization to efficiently map an iterator of hashes into an iterator of byte buffers:\n```python\nfrom ultima import ultimap\n\ndata_iter = ultimap(blob_store.get, iter_hashes, backend='threading', n_workers=16)\n```\n\n## API sources and caching layers\n\nLet's also assume that you have an API that can retrieve blobs given their SHA1.\nYou would like to use it for fetching blobs, but only when they're not found in the above \"passive\" sources.\n\nYou can implement a `Sha1APISource` for your API, and add it to the `Composite` object:\n```python\nfrom epic.bitstore import Sha1APISource\n\nclass MyAPIStore(Sha1APISource):\n    def __init__(self, api_client):\n        super().__init__()\n        self.api_client = api_client\n    \n    def api_get(self, sha1):\n        # return None for a blob that can't be found\n        return self.api_client.get_bytes(sha1)\n\n# ctd after adding the three passive stores\nblob_store.append_source(MyAPIStore(my_api_client))\n```\n\nAPI sources are often expensive, either in cost or performance.\nYou can add a caching store, and configure the API source to store fetched blobs into the cache.\nIt is important to append the cache *before* adding the API source, so that its cached blobs have precedence.\n\nAppend the cache and the API source:\n```python\nfrom epic.bitstore import Sha1Cache, GSRaw\n\n# ctd after adding the three passive stores\nblob_store.append_cache(Sha1Cache(GSRaw(), \"gs://cache_for_api/\"))\nblob_store.append_store(MyAPIStore(my_api_client))\n```\n\nNow, when you retrieve a missing blob for the first time, the API is used; and after that the cache is used.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Multi-tiered cloud-backed blob storage system",
    "version": "1.1.0",
    "project_urls": {
        "Homepage": "https://github.com/epic-framework/epic-bitstore",
        "Repository": "https://github.com/epic-framework/epic-bitstore"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "05d3f0dd0fd460d154d29cb28fead6f15cd9c35f2a04691865f403b2ee69d37c",
                "md5": "a54832675e78f093a52d391898f6846d",
                "sha256": "9d1fbf53359e36d2bb21cf562d0d31ac51aedc549b1e454907f3ed7979b386b1"
            },
            "downloads": -1,
            "filename": "epic_bitstore-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a54832675e78f093a52d391898f6846d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 6284,
            "upload_time": "2025-01-09T18:30:04",
            "upload_time_iso_8601": "2025-01-09T18:30:04.881057Z",
            "url": "https://files.pythonhosted.org/packages/05/d3/f0dd0fd460d154d29cb28fead6f15cd9c35f2a04691865f403b2ee69d37c/epic_bitstore-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-09 18:30:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "epic-framework",
    "github_project": "epic-bitstore",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "epic-bitstore"
}
        
Elapsed time: 0.83812s