bloomfilter-lite


Namebloomfilter-lite JSON
Version 1.0.7 PyPI version JSON
download
home_pagehttps://github.com/lorenzomaiuri-dev/bloomfilter-py
SummaryA space-efficient probabilistic data structure for membership testing.
upload_time2025-02-05 12:16:35
maintainerNone
docs_urlNone
authorLorenzo maiuri
requires_python>=3.6
licenseNone
keywords bloom filter probabilistic data structures python
VCS
bugtrack_url
requirements bitarray bloomfilter-lite pytest coverage pytest-cov
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Stargazers][stars-shield]][stars-url]
[![License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
[![PyPI version][pypi-shield]][pypi-url]
[![Coverage][coverage-shield]][coverage-url]
[![Workflow Status][workflow-shield]][workflow-url]

# Bloom Filter in Python

## Table of Contents

- [Introduction](#introduction)
- [Theory](#theory)
- [Installation](#installation)
- [Usage](#usage)
- [Benchmark](#benchmark)
- [Running Tests](#running-tests)
- [Contributing](#contributing)
- [License](#license)

## Introduction

A **Bloom Filter** is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It allows false positives but never false negatives. This makes it ideal for applications where memory efficiency is crucial, such as caching, networking, and databases.

This implementation is optimized for configurability and performance, allowing users to specify the expected number of elements and the desired false positive probability.

## Theory

A Bloom Filter consists of a **bit array** of size `m` and uses `k` different hash functions. When an element is inserted, all `k` hash functions generate indices in the bit array, and the corresponding bits are set to 1.

To check if an element is present:

- Compute its `k` hash values.
- If all bits at those positions are 1, the element **may be present** (with a certain probability of false positives).
- If at least one bit is 0, the element is **definitely not present**.

## Mathematical Formulas

- **Optimal bit array size**:  
  $$m = - \frac{n \log p}{(\log 2)^2}$$  
  where \( n \) is the number of expected elements and \( p \) is the false positive rate.

- **Optimal number of hash functions**:  
  $$k = \frac{m}{n} \log 2$$  


## Installation

To install the Bloom Filter package, run:

```sh
pip install bloomfilter-lite
```

### Installation from source

1. Clone this repository:

   ```bash
    git clone https://github.com/lorenzomaiuri-dev/bloomfilter-py.git
    cd bloomfilter-py
2. Create a virtual environment (optional but recommended):

    ```bash
    python -m venv venv

    source venv/bin/activate  # On Linux/Mac
    venv\Scripts\activate  # On Windows
3. Install the dependencies

    ```bash
    pip install -r requirements.txt

## Usage

### Basic Example

```python
from bloomfilter_lite import BloomFilter

# Create a Bloom Filter for 1000 expected elements with a 1% false positive rate
bf = BloomFilter(expected_items=1000, false_positive_rate=0.01)

# Add elements
bf.add("hello")
bf.add("world")

# Check for membership
print(bf.check("hello"))  # True (probably)
print(bf.check("python")) # False (definitely not present)
```

## Benchmark

Performance testing for different dataset sizes:

| Elements | False Positive Rate | Memory (bits) | Time per Insert (ms) | Time per Lookup (ms) |
|----------|---------------------|--------------|--------------------|--------------------|
| 1,000    | 1%                  | ~9.6 KB      | 0.01               | 0.008              |
| 10,000   | 1%                  | ~96 KB       | 0.015              | 0.010              |
| 100,000  | 1%                  | ~960 KB      | 0.020              | 0.012              |

### Reproducing Benchmarks

To verify the benchmarks, run the following script:

```sh
python benchmarks/run_benchmark.py
```

This script tests insertions and lookups for varying dataset sizes and prints the execution time and memory usage.

## Running Tests

To run the unit tests using `pytest`:

```sh
pytest tests/
```

## Contributing

Contributions are welcome! If you'd like to contribute to this project, please follow these steps:

1. Fork the repository
2. Create a new branch (git checkout -b feature/your-feature)
3. Commit your changes (git commit -am 'Add new feature')
4. Push the branch (git push origin feature/your-feature)
5. Open a Pull Request

Please ensure all pull requests pass the existing tests and include new tests for any added functionality

## License

This project is licensed under the MIT License. See the LICENSE file for more details

<!-- LINKS & IMAGES -->
[stars-shield]: https://img.shields.io/github/stars/lorenzomaiuri-dev/bloomfilter-py?style=social
[stars-url]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/stargazers
[license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
[license-url]: https://opensource.org/licenses/MIT
[linkedin-shield]: https://img.shields.io/badge/LinkedIn-Profile-blue?logo=linkedin&logoColor=white
[linkedin-url]: https://www.linkedin.com/in/maiurilorenzo
[pypi-shield]: https://img.shields.io/pypi/v/bloomfilter-lite
[pypi-url]: https://pypi.org/project/bloomfilter-lite/
[coverage-shield]: https://codecov.io/gh/lorenzomaiuri-dev/bloomfilter-py/branch/main/graph/badge.svg
[coverage-url]: https://codecov.io/gh/lorenzomaiuri-dev/bloomfilter-py
[workflow-shield]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/actions/workflows/publish.yml/badge.svg
[workflow-url]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/actions

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lorenzomaiuri-dev/bloomfilter-py",
    "name": "bloomfilter-lite",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "bloom filter, probabilistic data structures, python",
    "author": "Lorenzo maiuri",
    "author_email": "maiurilorenzo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/74/ed/df2ef53853b783a722ea54e1c05cf25356863762cbfa605499ee65b79313/bloomfilter_lite-1.0.7.tar.gz",
    "platform": null,
    "description": "[![Stargazers][stars-shield]][stars-url]\n[![License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n[![PyPI version][pypi-shield]][pypi-url]\n[![Coverage][coverage-shield]][coverage-url]\n[![Workflow Status][workflow-shield]][workflow-url]\n\n# Bloom Filter in Python\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Theory](#theory)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Benchmark](#benchmark)\n- [Running Tests](#running-tests)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Introduction\n\nA **Bloom Filter** is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It allows false positives but never false negatives. This makes it ideal for applications where memory efficiency is crucial, such as caching, networking, and databases.\n\nThis implementation is optimized for configurability and performance, allowing users to specify the expected number of elements and the desired false positive probability.\n\n## Theory\n\nA Bloom Filter consists of a **bit array** of size `m` and uses `k` different hash functions. When an element is inserted, all `k` hash functions generate indices in the bit array, and the corresponding bits are set to 1.\n\nTo check if an element is present:\n\n- Compute its `k` hash values.\n- If all bits at those positions are 1, the element **may be present** (with a certain probability of false positives).\n- If at least one bit is 0, the element is **definitely not present**.\n\n## Mathematical Formulas\n\n- **Optimal bit array size**:  \n  $$m = - \\frac{n \\log p}{(\\log 2)^2}$$  \n  where \\( n \\) is the number of expected elements and \\( p \\) is the false positive rate.\n\n- **Optimal number of hash functions**:  \n  $$k = \\frac{m}{n} \\log 2$$  \n\n\n## Installation\n\nTo install the Bloom Filter package, run:\n\n```sh\npip install bloomfilter-lite\n```\n\n### Installation from source\n\n1. Clone this repository:\n\n   ```bash\n    git clone https://github.com/lorenzomaiuri-dev/bloomfilter-py.git\n    cd bloomfilter-py\n2. Create a virtual environment (optional but recommended):\n\n    ```bash\n    python -m venv venv\n\n    source venv/bin/activate  # On Linux/Mac\n    venv\\Scripts\\activate  # On Windows\n3. Install the dependencies\n\n    ```bash\n    pip install -r requirements.txt\n\n## Usage\n\n### Basic Example\n\n```python\nfrom bloomfilter_lite import BloomFilter\n\n# Create a Bloom Filter for 1000 expected elements with a 1% false positive rate\nbf = BloomFilter(expected_items=1000, false_positive_rate=0.01)\n\n# Add elements\nbf.add(\"hello\")\nbf.add(\"world\")\n\n# Check for membership\nprint(bf.check(\"hello\"))  # True (probably)\nprint(bf.check(\"python\")) # False (definitely not present)\n```\n\n## Benchmark\n\nPerformance testing for different dataset sizes:\n\n| Elements | False Positive Rate | Memory (bits) | Time per Insert (ms) | Time per Lookup (ms) |\n|----------|---------------------|--------------|--------------------|--------------------|\n| 1,000    | 1%                  | ~9.6 KB      | 0.01               | 0.008              |\n| 10,000   | 1%                  | ~96 KB       | 0.015              | 0.010              |\n| 100,000  | 1%                  | ~960 KB      | 0.020              | 0.012              |\n\n### Reproducing Benchmarks\n\nTo verify the benchmarks, run the following script:\n\n```sh\npython benchmarks/run_benchmark.py\n```\n\nThis script tests insertions and lookups for varying dataset sizes and prints the execution time and memory usage.\n\n## Running Tests\n\nTo run the unit tests using `pytest`:\n\n```sh\npytest tests/\n```\n\n## Contributing\n\nContributions are welcome! If you'd like to contribute to this project, please follow these steps:\n\n1. Fork the repository\n2. Create a new branch (git checkout -b feature/your-feature)\n3. Commit your changes (git commit -am 'Add new feature')\n4. Push the branch (git push origin feature/your-feature)\n5. Open a Pull Request\n\nPlease ensure all pull requests pass the existing tests and include new tests for any added functionality\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for more details\n\n<!-- LINKS & IMAGES -->\n[stars-shield]: https://img.shields.io/github/stars/lorenzomaiuri-dev/bloomfilter-py?style=social\n[stars-url]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/stargazers\n[license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg\n[license-url]: https://opensource.org/licenses/MIT\n[linkedin-shield]: https://img.shields.io/badge/LinkedIn-Profile-blue?logo=linkedin&logoColor=white\n[linkedin-url]: https://www.linkedin.com/in/maiurilorenzo\n[pypi-shield]: https://img.shields.io/pypi/v/bloomfilter-lite\n[pypi-url]: https://pypi.org/project/bloomfilter-lite/\n[coverage-shield]: https://codecov.io/gh/lorenzomaiuri-dev/bloomfilter-py/branch/main/graph/badge.svg\n[coverage-url]: https://codecov.io/gh/lorenzomaiuri-dev/bloomfilter-py\n[workflow-shield]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/actions/workflows/publish.yml/badge.svg\n[workflow-url]: https://github.com/lorenzomaiuri-dev/bloomfilter-py/actions\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A space-efficient probabilistic data structure for membership testing.",
    "version": "1.0.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/lorenzomaiuri-dev/bloomfilter-py/issues",
        "Homepage": "https://github.com/lorenzomaiuri-dev/bloomfilter-py",
        "Source": "https://github.com/lorenzomaiuri-dev/bloomfilter-py"
    },
    "split_keywords": [
        "bloom filter",
        " probabilistic data structures",
        " python"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8679b0b686d256f8937eea1022b3915deb92aaa3623a57b85fa24835f41c9398",
                "md5": "fb00857c3617be8c3a2685d633d191c8",
                "sha256": "764c460308b154609e1add0deefe53facbd7ae4ddd8130a72760a3ee8ccd2973"
            },
            "downloads": -1,
            "filename": "bloomfilter_lite-1.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fb00857c3617be8c3a2685d633d191c8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 4253,
            "upload_time": "2025-02-05T12:16:33",
            "upload_time_iso_8601": "2025-02-05T12:16:33.979133Z",
            "url": "https://files.pythonhosted.org/packages/86/79/b0b686d256f8937eea1022b3915deb92aaa3623a57b85fa24835f41c9398/bloomfilter_lite-1.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "74eddf2ef53853b783a722ea54e1c05cf25356863762cbfa605499ee65b79313",
                "md5": "9a76645780371b0ccdfeb5db38091503",
                "sha256": "aa17b5eb6edec103fcb738b718a6f6fb22c6a521e73f370e534e75fed064e0b3"
            },
            "downloads": -1,
            "filename": "bloomfilter_lite-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "9a76645780371b0ccdfeb5db38091503",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 4826,
            "upload_time": "2025-02-05T12:16:35",
            "upload_time_iso_8601": "2025-02-05T12:16:35.499439Z",
            "url": "https://files.pythonhosted.org/packages/74/ed/df2ef53853b783a722ea54e1c05cf25356863762cbfa605499ee65b79313/bloomfilter_lite-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-05 12:16:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lorenzomaiuri-dev",
    "github_project": "bloomfilter-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "bitarray",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "bloomfilter-lite",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.4"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        }
    ],
    "lcname": "bloomfilter-lite"
}
        
Elapsed time: 0.48608s