blooms

Name	blooms JSON
Version	2.0.0 JSON
	download
home_page	None
Summary	Lightweight Bloom filter data structure derived from the built-in bytearray type.
upload_time	2024-09-29 23:17:26
maintainer	None
docs_url	None
author	Andrei Lapets
requires_python	>=3.7
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ======
blooms
======

Lightweight Bloom filter data structure derived from the built-in bytearray type.

|pypi| |readthedocs| |actions| |coveralls|

.. |pypi| image:: https://badge.fury.io/py/blooms.svg#
   :target: https://badge.fury.io/py/blooms
   :alt: PyPI version and link.

.. |readthedocs| image:: https://readthedocs.org/projects/blooms/badge/?version=latest
   :target: https://blooms.readthedocs.io/en/latest/?badge=latest
   :alt: Read the Docs documentation status.

.. |actions| image:: https://github.com/nthparty/blooms/workflows/lint-test-cover-docs/badge.svg#
   :target: https://github.com/nthparty/blooms/actions/workflows/lint-test-cover-docs.yml
   :alt: GitHub Actions status.

.. |coveralls| image:: https://coveralls.io/repos/github/nthparty/blooms/badge.svg?branch=main
   :target: https://coveralls.io/github/nthparty/blooms?branch=main
   :alt: Coveralls test coverage summary.

Purpose
-------

.. |bytearray| replace:: ``bytearray``
.. _bytearray: https://docs.python.org/3/library/stdtypes.html#bytearray

This library provides a simple and lightweight data structure for representing `Bloom filters <https://en.wikipedia.org/wiki/Bloom_filter>`__ that is derived from the built-in |bytearray|_ type. The data structure has methods for the insertion, membership, union, and subset operations. In addition, methods for estimating capacity and for converting to and from Base64 strings are available.

Installation and Usage
----------------------
This library is available as a `package on PyPI <https://pypi.org/project/blooms>`__:

.. code-block:: bash

    python -m pip install blooms

The library can be imported in the usual ways:

.. code-block:: python

    import blooms
    from blooms import blooms

Examples
^^^^^^^^
This library makes it possible to concisely create, populate, and query simple `Bloom filters <https://en.wikipedia.org/wiki/Bloom_filter>`__. The example below constructs a Bloom filter that is 32 bits (*i.e.*, four bytes) in size:

.. code-block:: python

    >>> from blooms import blooms
    >>> b = blooms(4)

.. |insertion_operator| replace:: insertion operator ``@=``
.. _insertion_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__imatmul__

A bytes-like object can be inserted into an instance using the |insertion_operator|_. It is the responsibility of the user of the library to hash and truncate the bytes-like object being inserted. Only the bytes that remain after truncation contribute to the membership of the bytes-like object within the Bloom filter:

.. code-block:: python

    >>> from hashlib import sha256
    >>> x = 'abc' # Value to insert.
    >>> h = sha256(x.encode()).digest() # Hash of value.
    >>> t = h[:2] # Truncated hash.
    >>> b @= t # Insert the value into the Bloom filter.
    >>> b.hex()
    '00000004'

.. |membership_operator| replace:: membership operator ``@``
.. _membership_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__rmatmul__

When testing whether a bytes-like object is a member using the |membership_operator|_ of an instance, the same hashing and truncation operations should be applied:

.. code-block:: python

    >>> sha256('abc'.encode()).digest()[:2] @ b
    True
    >>> sha256('xyz'.encode()).digest()[:2] @ b
    False


The |insertion_operator|_ also accepts iterable containers:

.. code-block:: python

    >>> x = sha256('x'.encode()).digest()[:2]
    >>> y = sha256('y'.encode()).digest()[:2]
    >>> z = sha256('z'.encode()).digest()[:2]
    >>> b @= [x, y, z]
    >>> b.hex()
    '02200006'

.. |union_operator| replace:: built-in ``|`` operator
.. _union_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__or__

The union of two Bloom filters (both having the same size) can be computed via the |union_operator|_:

.. code-block:: python

    >>> c = blooms(4)
    >>> c @= sha256('xyz'.encode()).digest()[:2]
    >>> d = c | b
    >>> sha256('abc'.encode()).digest()[:2] @ d
    True
    >>> sha256('xyz'.encode()).digest()[:2] @ d
    True

It is also possible to check whether the members of one Bloom filter `are a subset <https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.issubset>`__ of the members of another Bloom filter:

.. code-block:: python

    >>> b.issubset(c)
    False
    >>> b.issubset(d)
    True

.. |saturation| replace:: ``saturation``
.. _saturation: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.saturation

.. |float| replace:: ``float``
.. _float: https://docs.python.org/3/library/functions.html#float

The |saturation|_ method calculates the saturation of a Bloom filter. The *saturation* is a |float|_ value (between ``0.0`` and ``1.0``) that represents an upper bound on the rate with which false positives will occur when testing bytes-like objects (of a specific length) for membership within the Bloom filter:

.. code-block:: python

    >>> b = blooms(32)
    >>> from secrets import token_bytes
    >>> for _ in range(8):
    ...     b @= token_bytes(4)
    >>> b.saturation(4)
    0.03125

.. |capacity| replace:: ``capacity``
.. _capacity: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.capacity

It is also possible to determine the approximate maximum capacity of a Bloom filter for a given saturation limit using the |capacity|_ method. For example, the output below indicates that a saturation of ``0.05`` will likely be reached after more than ``28`` insertions of bytes-like objects of length ``8``:

.. code-block:: python

    >>> b = blooms(32)
    >>> b.capacity(8, 0.05)
    28

In addition, conversion methods to and from Base64 strings are included to support concise encoding and decoding:

.. code-block:: python

    >>> b.to_base64()
    'AiAABg=='
    >>> sha256('abc'.encode()).digest()[:2] @ blooms.from_base64('AiAABg==')
    True

.. |specialize| replace:: ``specialize``
.. _specialize: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.specialize

If it is preferable to have a Bloom filter data structure that encapsulates a particular serialization, hashing, and truncation scheme, the recommended approach is to define a derived class. The |specialize|_ method makes it possible to do so in a concise way:

.. code-block:: python

    >>> encode = lambda x: sha256(x).digest()[:2]
    >>> blooms_custom = blooms.specialize(name='blooms_custom', encode=encode)
    >>> b = blooms_custom(4)
    >>> b @= bytes([1, 2, 3])
    >>> bytes([1, 2, 3]) @ b
    True

.. |from_base64| replace:: ``from_base64``
.. _from_base64: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.from_base64

The user of the library is responsible for ensuring that Base64-encoded Bloom filters are converted back into an an instance of the appropriate derived class by using the |from_base64|_ method that belongs to that derived class:

.. code-block:: python

    >>> isinstance(blooms_custom.from_base64(b.to_base64()), blooms_custom)
    True

Development
-----------
All installation and development dependencies are fully specified in ``pyproject.toml``. The ``project.optional-dependencies`` object is used to `specify optional requirements <https://peps.python.org/pep-0621>`__ for various development tasks. This makes it possible to specify additional options (such as ``docs``, ``lint``, and so on) when performing installation using `pip <https://pypi.org/project/pip>`__:

.. code-block:: bash

    python -m pip install ".[docs,lint]"

Documentation
^^^^^^^^^^^^^
The documentation can be generated automatically from the source files using `Sphinx <https://www.sphinx-doc.org>`__:

.. code-block:: bash

    python -m pip install ".[docs]"
    cd docs
    sphinx-apidoc -f -E --templatedir=_templates -o _source .. && make html

Testing and Conventions
^^^^^^^^^^^^^^^^^^^^^^^
All unit tests are executed and their coverage is measured when using `pytest <https://docs.pytest.org>`__ (see the ``pyproject.toml`` file for configuration details):

.. code-block:: bash

    python -m pip install ".[test]"
    python -m pytest

The subset of the unit tests included in the module itself and can be executed using `doctest <https://docs.python.org/3/library/doctest.html>`__:

.. code-block:: bash

    python src/blooms/blooms.py -v

Style conventions are enforced using `Pylint <https://pylint.readthedocs.io>`__:

.. code-block:: bash

    python -m pip install ".[lint]"
    python -m pylint src/blooms test/test_blooms.py

Contributions
^^^^^^^^^^^^^
In order to contribute to the source code, open an issue or submit a pull request on the `GitHub page <https://github.com/nthparty/blooms>`__ for this library.

Versioning
^^^^^^^^^^
The version number format for this library and the changes to the library associated with version number increments conform with `Semantic Versioning 2.0.0 <https://semver.org/#semantic-versioning-200>`__.

Publishing
^^^^^^^^^^
This library can be published as a `package on PyPI <https://pypi.org/project/blooms>`__ via the GitHub Actions workflow found in ``.github/workflows/build-publish-sign-release.yml`` that follows the `recommendations found in the Python Packaging User Guide <https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/>`__.

Ensure that the correct version number appears in ``pyproject.toml``, and that any links in this README document to the Read the Docs documentation of this package (or its dependencies) have appropriate version numbers. Also ensure that the Read the Docs project for this library has an `automation rule <https://docs.readthedocs.io/en/stable/automation-rules.html>`__ that activates and sets as the default all tagged versions.

To publish the package, create and push a tag for the version being published (replacing ``?.?.?`` with the version number):

.. code-block:: bash

    git tag ?.?.?
    git push origin ?.?.?

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "blooms",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Andrei Lapets",
    "author_email": "a@lapets.io",
    "download_url": "https://files.pythonhosted.org/packages/c6/f8/1a1f59d352198ba476a8d9c653dca7da91dad542d3f614b7d8719adae62a/blooms-2.0.0.tar.gz",
    "platform": null,
    "description": "======\nblooms\n======\n\nLightweight Bloom filter data structure derived from the built-in bytearray type.\n\n|pypi| |readthedocs| |actions| |coveralls|\n\n.. |pypi| image:: https://badge.fury.io/py/blooms.svg#\n   :target: https://badge.fury.io/py/blooms\n   :alt: PyPI version and link.\n\n.. |readthedocs| image:: https://readthedocs.org/projects/blooms/badge/?version=latest\n   :target: https://blooms.readthedocs.io/en/latest/?badge=latest\n   :alt: Read the Docs documentation status.\n\n.. |actions| image:: https://github.com/nthparty/blooms/workflows/lint-test-cover-docs/badge.svg#\n   :target: https://github.com/nthparty/blooms/actions/workflows/lint-test-cover-docs.yml\n   :alt: GitHub Actions status.\n\n.. |coveralls| image:: https://coveralls.io/repos/github/nthparty/blooms/badge.svg?branch=main\n   :target: https://coveralls.io/github/nthparty/blooms?branch=main\n   :alt: Coveralls test coverage summary.\n\nPurpose\n-------\n\n.. |bytearray| replace:: ``bytearray``\n.. _bytearray: https://docs.python.org/3/library/stdtypes.html#bytearray\n\nThis library provides a simple and lightweight data structure for representing `Bloom filters <https://en.wikipedia.org/wiki/Bloom_filter>`__ that is derived from the built-in |bytearray|_ type. The data structure has methods for the insertion, membership, union, and subset operations. In addition, methods for estimating capacity and for converting to and from Base64 strings are available.\n\nInstallation and Usage\n----------------------\nThis library is available as a `package on PyPI <https://pypi.org/project/blooms>`__:\n\n.. code-block:: bash\n\n    python -m pip install blooms\n\nThe library can be imported in the usual ways:\n\n.. code-block:: python\n\n    import blooms\n    from blooms import blooms\n\nExamples\n^^^^^^^^\nThis library makes it possible to concisely create, populate, and query simple `Bloom filters <https://en.wikipedia.org/wiki/Bloom_filter>`__. The example below constructs a Bloom filter that is 32 bits (*i.e.*, four bytes) in size:\n\n.. code-block:: python\n\n    >>> from blooms import blooms\n    >>> b = blooms(4)\n\n.. |insertion_operator| replace:: insertion operator ``@=``\n.. _insertion_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__imatmul__\n\nA bytes-like object can be inserted into an instance using the |insertion_operator|_. It is the responsibility of the user of the library to hash and truncate the bytes-like object being inserted. Only the bytes that remain after truncation contribute to the membership of the bytes-like object within the Bloom filter:\n\n.. code-block:: python\n\n    >>> from hashlib import sha256\n    >>> x = 'abc' # Value to insert.\n    >>> h = sha256(x.encode()).digest() # Hash of value.\n    >>> t = h[:2] # Truncated hash.\n    >>> b @= t # Insert the value into the Bloom filter.\n    >>> b.hex()\n    '00000004'\n\n.. |membership_operator| replace:: membership operator ``@``\n.. _membership_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__rmatmul__\n\nWhen testing whether a bytes-like object is a member using the |membership_operator|_ of an instance, the same hashing and truncation operations should be applied:\n\n.. code-block:: python\n\n    >>> sha256('abc'.encode()).digest()[:2] @ b\n    True\n    >>> sha256('xyz'.encode()).digest()[:2] @ b\n    False\n\n\nThe |insertion_operator|_ also accepts iterable containers:\n\n.. code-block:: python\n\n    >>> x = sha256('x'.encode()).digest()[:2]\n    >>> y = sha256('y'.encode()).digest()[:2]\n    >>> z = sha256('z'.encode()).digest()[:2]\n    >>> b @= [x, y, z]\n    >>> b.hex()\n    '02200006'\n\n.. |union_operator| replace:: built-in ``|`` operator\n.. _union_operator: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.__or__\n\nThe union of two Bloom filters (both having the same size) can be computed via the |union_operator|_:\n\n.. code-block:: python\n\n    >>> c = blooms(4)\n    >>> c @= sha256('xyz'.encode()).digest()[:2]\n    >>> d = c | b\n    >>> sha256('abc'.encode()).digest()[:2] @ d\n    True\n    >>> sha256('xyz'.encode()).digest()[:2] @ d\n    True\n\nIt is also possible to check whether the members of one Bloom filter `are a subset <https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.issubset>`__ of the members of another Bloom filter:\n\n.. code-block:: python\n\n    >>> b.issubset(c)\n    False\n    >>> b.issubset(d)\n    True\n\n.. |saturation| replace:: ``saturation``\n.. _saturation: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.saturation\n\n.. |float| replace:: ``float``\n.. _float: https://docs.python.org/3/library/functions.html#float\n\nThe |saturation|_ method calculates the saturation of a Bloom filter. The *saturation* is a |float|_ value (between ``0.0`` and ``1.0``) that represents an upper bound on the rate with which false positives will occur when testing bytes-like objects (of a specific length) for membership within the Bloom filter:\n\n.. code-block:: python\n\n    >>> b = blooms(32)\n    >>> from secrets import token_bytes\n    >>> for _ in range(8):\n    ...     b @= token_bytes(4)\n    >>> b.saturation(4)\n    0.03125\n\n.. |capacity| replace:: ``capacity``\n.. _capacity: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.capacity\n\nIt is also possible to determine the approximate maximum capacity of a Bloom filter for a given saturation limit using the |capacity|_ method. For example, the output below indicates that a saturation of ``0.05`` will likely be reached after more than ``28`` insertions of bytes-like objects of length ``8``:\n\n.. code-block:: python\n\n    >>> b = blooms(32)\n    >>> b.capacity(8, 0.05)\n    28\n\nIn addition, conversion methods to and from Base64 strings are included to support concise encoding and decoding:\n\n.. code-block:: python\n\n    >>> b.to_base64()\n    'AiAABg=='\n    >>> sha256('abc'.encode()).digest()[:2] @ blooms.from_base64('AiAABg==')\n    True\n\n.. |specialize| replace:: ``specialize``\n.. _specialize: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.specialize\n\nIf it is preferable to have a Bloom filter data structure that encapsulates a particular serialization, hashing, and truncation scheme, the recommended approach is to define a derived class. The |specialize|_ method makes it possible to do so in a concise way:\n\n.. code-block:: python\n\n    >>> encode = lambda x: sha256(x).digest()[:2]\n    >>> blooms_custom = blooms.specialize(name='blooms_custom', encode=encode)\n    >>> b = blooms_custom(4)\n    >>> b @= bytes([1, 2, 3])\n    >>> bytes([1, 2, 3]) @ b\n    True\n\n.. |from_base64| replace:: ``from_base64``\n.. _from_base64: https://blooms.readthedocs.io/en/2.0.0/_source/blooms.html#blooms.blooms.blooms.from_base64\n\nThe user of the library is responsible for ensuring that Base64-encoded Bloom filters are converted back into an an instance of the appropriate derived class by using the |from_base64|_ method that belongs to that derived class:\n\n.. code-block:: python\n\n    >>> isinstance(blooms_custom.from_base64(b.to_base64()), blooms_custom)\n    True\n\nDevelopment\n-----------\nAll installation and development dependencies are fully specified in ``pyproject.toml``. The ``project.optional-dependencies`` object is used to `specify optional requirements <https://peps.python.org/pep-0621>`__ for various development tasks. This makes it possible to specify additional options (such as ``docs``, ``lint``, and so on) when performing installation using `pip <https://pypi.org/project/pip>`__:\n\n.. code-block:: bash\n\n    python -m pip install \".[docs,lint]\"\n\nDocumentation\n^^^^^^^^^^^^^\nThe documentation can be generated automatically from the source files using `Sphinx <https://www.sphinx-doc.org>`__:\n\n.. code-block:: bash\n\n    python -m pip install \".[docs]\"\n    cd docs\n    sphinx-apidoc -f -E --templatedir=_templates -o _source .. && make html\n\nTesting and Conventions\n^^^^^^^^^^^^^^^^^^^^^^^\nAll unit tests are executed and their coverage is measured when using `pytest <https://docs.pytest.org>`__ (see the ``pyproject.toml`` file for configuration details):\n\n.. code-block:: bash\n\n    python -m pip install \".[test]\"\n    python -m pytest\n\nThe subset of the unit tests included in the module itself and can be executed using `doctest <https://docs.python.org/3/library/doctest.html>`__:\n\n.. code-block:: bash\n\n    python src/blooms/blooms.py -v\n\nStyle conventions are enforced using `Pylint <https://pylint.readthedocs.io>`__:\n\n.. code-block:: bash\n\n    python -m pip install \".[lint]\"\n    python -m pylint src/blooms test/test_blooms.py\n\nContributions\n^^^^^^^^^^^^^\nIn order to contribute to the source code, open an issue or submit a pull request on the `GitHub page <https://github.com/nthparty/blooms>`__ for this library.\n\nVersioning\n^^^^^^^^^^\nThe version number format for this library and the changes to the library associated with version number increments conform with `Semantic Versioning 2.0.0 <https://semver.org/#semantic-versioning-200>`__.\n\nPublishing\n^^^^^^^^^^\nThis library can be published as a `package on PyPI <https://pypi.org/project/blooms>`__ via the GitHub Actions workflow found in ``.github/workflows/build-publish-sign-release.yml`` that follows the `recommendations found in the Python Packaging User Guide <https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/>`__.\n\nEnsure that the correct version number appears in ``pyproject.toml``, and that any links in this README document to the Read the Docs documentation of this package (or its dependencies) have appropriate version numbers. Also ensure that the Read the Docs project for this library has an `automation rule <https://docs.readthedocs.io/en/stable/automation-rules.html>`__ that activates and sets as the default all tagged versions.\n\nTo publish the package, create and push a tag for the version being published (replacing ``?.?.?`` with the version number):\n\n.. code-block:: bash\n\n    git tag ?.?.?\n    git push origin ?.?.?\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Lightweight Bloom filter data structure derived from the built-in bytearray type.",
    "version": "2.0.0",
    "project_urls": {
        "Documentation": "https://blooms.readthedocs.io",
        "Repository": "https://github.com/nthparty/blooms"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "00b5afc0fcc28a47f060ef811ef905793a7f95faf524570423224cc72c3a5e5b",
                "md5": "cdea1ef31d0afedf051b7d316e44e3cc",
                "sha256": "ac5a4eb82afbe211a187836f326addc271fea8ee168b8ab1702c7b78982d0192"
            },
            "downloads": -1,
            "filename": "blooms-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cdea1ef31d0afedf051b7d316e44e3cc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10591,
            "upload_time": "2024-09-29T23:17:24",
            "upload_time_iso_8601": "2024-09-29T23:17:24.491855Z",
            "url": "https://files.pythonhosted.org/packages/00/b5/afc0fcc28a47f060ef811ef905793a7f95faf524570423224cc72c3a5e5b/blooms-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6f81a1f59d352198ba476a8d9c653dca7da91dad542d3f614b7d8719adae62a",
                "md5": "a09aff3710a6371ecec0003aeb417df2",
                "sha256": "785d244b35253b276bb031d5fc34bd1aa1d064cfced17f8f4551da981dd25dac"
            },
            "downloads": -1,
            "filename": "blooms-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a09aff3710a6371ecec0003aeb417df2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 14353,
            "upload_time": "2024-09-29T23:17:26",
            "upload_time_iso_8601": "2024-09-29T23:17:26.149728Z",
            "url": "https://files.pythonhosted.org/packages/c6/f8/1a1f59d352198ba476a8d9c653dca7da91dad542d3f614b7d8719adae62a/blooms-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-29 23:17:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nthparty",
    "github_project": "blooms",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "blooms"
}

Andrei Lapets