roaringbitmap


Nameroaringbitmap JSON
Version 0.7.2 PyPI version JSON
download
home_pagehttp://roaringbitmap.readthedocs.io
SummaryRoaring Bitmap
upload_time2022-12-29 15:02:16
maintainer
docs_urlNone
authorAndreas van Cranenburgh
requires_python
licenseGPL
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Roaring Bitmap in Cython
========================

A roaring bitmap is an efficient compressed datastructure to store a set
of integers. A Roaring bitmap stores a set of 32-bit integers in a series of
arrays and bitmaps, whichever takes the least space (which is always
``2 ** 16`` bits or less).

This datastructure is useful for storing a large number of integers, e.g., for
an inverted index used by search engines and databases. In particular, it is
possible to quickly compute the intersection of a series of sets, which can be
used to implement a query as the conjunction of subqueries.

This implementation is based on the Java and C implementations at
https://github.com/lemire/RoaringBitmap
and https://github.com/lemire/CRoaring

Additional features of this implementation:

- Inverted list representation: blocks that are mostly full are stored
  compactly as an array of non-members (instead of as an array of members or a
  fixed-size bitmap).
- Collections of immutable roaring bitmaps can be efficiently serialized with
  ``mmap`` in a single file.

Missing features w.r.t. CRoaring:

- Run-length encoded blocks
- Various AVX2 / SSE optimizations

See also PyRoaringBitmap, a Python wrapper of CRoaring:
https://github.com/Ezibenroc/PyRoaringBitMap

License, requirements
---------------------
The code is licensed under GNU GPL v2, or any later version at your option.

- Python 2.7+/3.3+  http://www.python.org (headers required, e.g. python-dev package)
- Cython 0.20+      http://www.cython.org

Installation, usage
-------------------

::

    $ git clone https://github.com/andreasvc/roaringbitmap.git
    $ cd roaringbitmap
    $ make

(or ``make py2`` for Python 2)

A ``RoaringBitmap()`` can be used as a replacement for a normal (mutable)
Python set containing (unsigned) 32-bit integers:

.. code-block:: python

    >>> from roaringbitmap import RoaringBitmap
    >>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))
    RoaringBitmap({5, 6, 7, 8, 9})

``ImmutableRoaringBitmap`` is an immutable variant (analogous to ``frozenset``)
which is stored compactly as a contiguous block of memory.

A sequence of immutable RoaringBitmaps can be stored in a single file and
accessed efficiently with ``mmap``, without needing to copy or deserialize:

.. code-block:: python

    >>> from roaringbitmap import MultiRoaringBitmap
    >>> mrb = MultiRoaringBitmap([range(n, n + 5) for n in range(10)], filename='index')

    >>> mrb = MultiRoaringBitmap.fromfile('index')
    >>> mrb[5]
    ImmutableRoaringBitmap({5, 6, 7, 8, 9})

For API documentation cf. http://roaringbitmap.readthedocs.io

Benchmarks
----------
Output of ``$ make bench``::

    small sparse set
    100 runs with sets of 200 random elements n s.t. 0 <= n < 40000
                    set()  RoaringBitmap()    ratio
    init         0.000834          0.00138    0.603
    initsort      0.00085         0.000394     2.16
    and           0.00102         8.49e-05     12.1
    or            0.00171         0.000169     10.1
    xor           0.00152         0.000213     7.11
    sub          0.000934         0.000197     4.74
    iand         1.29e-05         2.97e-06     4.35
    ior           9.7e-06         3.26e-06     2.98
    ixor         8.98e-06         3.43e-06     2.62
    isub         6.83e-06          3.3e-06     2.07
    eq           0.000438         1.17e-05     37.6
    neq          6.37e-06         7.81e-06    0.816
    jaccard        0.0029         0.000126     23.1

    medium load factor
    100 runs with sets of 59392 random elements n s.t. 0 <= n < 118784
                    set()  RoaringBitmap()    ratio
    init            0.564            0.324     1.74
    initsort        0.696            0.273     2.55
    and             0.613         0.000418     1466
    or              0.976         0.000292     3344
    xor             0.955         0.000294     3250
    sub             0.346         0.000316     1092
    iand          0.00658         1.14e-05      575
    ior           0.00594         1.08e-05      548
    ixor          0.00434         1.12e-05      385
    isub          0.00431         1.09e-05      397
    eq             0.0991         0.000116      851
    neq          9.62e-06         1.29e-05    0.743
    jaccard          1.62          0.00025     6476

    dense set / high load factor
    100 runs with sets of 39800 random elements n s.t. 0 <= n < 40000
                    set()  RoaringBitmap()    ratio
    init             0.33           0.0775     4.26
    initsort        0.352            0.148     2.38
    and              0.24         0.000223     1078
    or               0.45         0.000165     2734
    xor             0.404         0.000161     2514
    sub             0.169         0.000173      973
    iand          0.00287         6.02e-06      477
    ior           0.00179         6.34e-06      282
    ixor          0.00195         5.53e-06      353
    isub           0.0017         6.35e-06      267
    eq             0.0486         4.65e-05     1045
    neq          1.01e-05         1.13e-05    0.888
    jaccard         0.722         0.000118     6136

See https://github.com/Ezibenroc/roaring_analysis/ for a performance comparison
of PyRoaringBitmap and this library.

References
----------
- http://roaringbitmap.org/
- Chambi, S., Lemire, D., Kaser, O., & Godin, R. (2016). Better bitmap
  performance with Roaring bitmaps. Software: practice and experience, 46(5),
  pp. 709-719. http://arxiv.org/abs/1402.6407
- The idea of using the inverted list representation is based on
  https://issues.apache.org/jira/browse/LUCENE-5983

            

Raw data

            {
    "_id": null,
    "home_page": "http://roaringbitmap.readthedocs.io",
    "name": "roaringbitmap",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Andreas van Cranenburgh",
    "author_email": "A.W.van.Cranenburgh@rug.nl",
    "download_url": "https://files.pythonhosted.org/packages/ca/52/335b9551b15c535ef78a0833e6b4e2279ef176694e3ea1de824b67e9bf2d/roaringbitmap-0.7.2.tar.gz",
    "platform": "Many",
    "description": "Roaring Bitmap in Cython\n========================\n\nA roaring bitmap is an efficient compressed datastructure to store a set\nof integers. A Roaring bitmap stores a set of 32-bit integers in a series of\narrays and bitmaps, whichever takes the least space (which is always\n``2 ** 16`` bits or less).\n\nThis datastructure is useful for storing a large number of integers, e.g., for\nan inverted index used by search engines and databases. In particular, it is\npossible to quickly compute the intersection of a series of sets, which can be\nused to implement a query as the conjunction of subqueries.\n\nThis implementation is based on the Java and C implementations at\nhttps://github.com/lemire/RoaringBitmap\nand https://github.com/lemire/CRoaring\n\nAdditional features of this implementation:\n\n- Inverted list representation: blocks that are mostly full are stored\n  compactly as an array of non-members (instead of as an array of members or a\n  fixed-size bitmap).\n- Collections of immutable roaring bitmaps can be efficiently serialized with\n  ``mmap`` in a single file.\n\nMissing features w.r.t. CRoaring:\n\n- Run-length encoded blocks\n- Various AVX2 / SSE optimizations\n\nSee also PyRoaringBitmap, a Python wrapper of CRoaring:\nhttps://github.com/Ezibenroc/PyRoaringBitMap\n\nLicense, requirements\n---------------------\nThe code is licensed under GNU GPL v2, or any later version at your option.\n\n- Python 2.7+/3.3+  http://www.python.org (headers required, e.g. python-dev package)\n- Cython 0.20+      http://www.cython.org\n\nInstallation, usage\n-------------------\n\n::\n\n    $ git clone https://github.com/andreasvc/roaringbitmap.git\n    $ cd roaringbitmap\n    $ make\n\n(or ``make py2`` for Python 2)\n\nA ``RoaringBitmap()`` can be used as a replacement for a normal (mutable)\nPython set containing (unsigned) 32-bit integers:\n\n.. code-block:: python\n\n    >>> from roaringbitmap import RoaringBitmap\n    >>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))\n    RoaringBitmap({5, 6, 7, 8, 9})\n\n``ImmutableRoaringBitmap`` is an immutable variant (analogous to ``frozenset``)\nwhich is stored compactly as a contiguous block of memory.\n\nA sequence of immutable RoaringBitmaps can be stored in a single file and\naccessed efficiently with ``mmap``, without needing to copy or deserialize:\n\n.. code-block:: python\n\n    >>> from roaringbitmap import MultiRoaringBitmap\n    >>> mrb = MultiRoaringBitmap([range(n, n + 5) for n in range(10)], filename='index')\n\n    >>> mrb = MultiRoaringBitmap.fromfile('index')\n    >>> mrb[5]\n    ImmutableRoaringBitmap({5, 6, 7, 8, 9})\n\nFor API documentation cf. http://roaringbitmap.readthedocs.io\n\nBenchmarks\n----------\nOutput of ``$ make bench``::\n\n    small sparse set\n    100 runs with sets of 200 random elements n s.t. 0 <= n < 40000\n                    set()  RoaringBitmap()    ratio\n    init         0.000834          0.00138    0.603\n    initsort      0.00085         0.000394     2.16\n    and           0.00102         8.49e-05     12.1\n    or            0.00171         0.000169     10.1\n    xor           0.00152         0.000213     7.11\n    sub          0.000934         0.000197     4.74\n    iand         1.29e-05         2.97e-06     4.35\n    ior           9.7e-06         3.26e-06     2.98\n    ixor         8.98e-06         3.43e-06     2.62\n    isub         6.83e-06          3.3e-06     2.07\n    eq           0.000438         1.17e-05     37.6\n    neq          6.37e-06         7.81e-06    0.816\n    jaccard        0.0029         0.000126     23.1\n\n    medium load factor\n    100 runs with sets of 59392 random elements n s.t. 0 <= n < 118784\n                    set()  RoaringBitmap()    ratio\n    init            0.564            0.324     1.74\n    initsort        0.696            0.273     2.55\n    and             0.613         0.000418     1466\n    or              0.976         0.000292     3344\n    xor             0.955         0.000294     3250\n    sub             0.346         0.000316     1092\n    iand          0.00658         1.14e-05      575\n    ior           0.00594         1.08e-05      548\n    ixor          0.00434         1.12e-05      385\n    isub          0.00431         1.09e-05      397\n    eq             0.0991         0.000116      851\n    neq          9.62e-06         1.29e-05    0.743\n    jaccard          1.62          0.00025     6476\n\n    dense set / high load factor\n    100 runs with sets of 39800 random elements n s.t. 0 <= n < 40000\n                    set()  RoaringBitmap()    ratio\n    init             0.33           0.0775     4.26\n    initsort        0.352            0.148     2.38\n    and              0.24         0.000223     1078\n    or               0.45         0.000165     2734\n    xor             0.404         0.000161     2514\n    sub             0.169         0.000173      973\n    iand          0.00287         6.02e-06      477\n    ior           0.00179         6.34e-06      282\n    ixor          0.00195         5.53e-06      353\n    isub           0.0017         6.35e-06      267\n    eq             0.0486         4.65e-05     1045\n    neq          1.01e-05         1.13e-05    0.888\n    jaccard         0.722         0.000118     6136\n\nSee https://github.com/Ezibenroc/roaring_analysis/ for a performance comparison\nof PyRoaringBitmap and this library.\n\nReferences\n----------\n- http://roaringbitmap.org/\n- Chambi, S., Lemire, D., Kaser, O., & Godin, R. (2016). Better bitmap\n  performance with Roaring bitmaps. Software: practice and experience, 46(5),\n  pp. 709-719. http://arxiv.org/abs/1402.6407\n- The idea of using the inverted list representation is based on\n  https://issues.apache.org/jira/browse/LUCENE-5983\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "Roaring Bitmap",
    "version": "0.7.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "2253891280e88d438f23b613daf87d18",
                "sha256": "bfe049fd9f200a011081ea5e4b2746d0c35256c755cd78886e375a0ef473fa0d"
            },
            "downloads": -1,
            "filename": "roaringbitmap-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2253891280e88d438f23b613daf87d18",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 233062,
            "upload_time": "2022-12-29T15:02:16",
            "upload_time_iso_8601": "2022-12-29T15:02:16.683173Z",
            "url": "https://files.pythonhosted.org/packages/ca/52/335b9551b15c535ef78a0833e6b4e2279ef176694e3ea1de824b67e9bf2d/roaringbitmap-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-29 15:02:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "roaringbitmap"
}
        
Elapsed time: 0.03851s