ultraloglog


Nameultraloglog JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryPython bindings for UltraLogLog, a space-efficient alternative to HyperLogLog for approximate distinct counting
upload_time2025-07-28 22:28:37
maintainerNone
docs_urlNone
authorRuihang Xia <waynestxia@gmail.com>, Jianshu Zhao <jianshuzhao@yahoo.com>
requires_python>=3.8
licenseApache-2.0
keywords ultraloglog algorithm sketch probabilistic hyperloglog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # UltraLogLog

Rust implementation of the [UltraLogLog algorithm](https://arxiv.org/pdf/2308.16862). Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used. 

## Usage

```rust
use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};

let mut ull = UltraLogLog::new(6).unwrap();

ull.add_value("apple")
    .add_value("banana")
    .add_value("cherry")
    .add_value("033");
let est = ull.get_distinct_count_estimate();
```

The serde feature can be activated so that the sketch can be saved to disk and then loaded.
```rust
use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};
use std::fs::{remove_file, File};
use std::io::{BufReader, BufWriter};

let file_path = "test_ultraloglog.bin";

// Create UltraLogLog and add data
let mut ull = UltraLogLog::new(5).expect("Failed to create ULL");
ull.add(123456789);
ull.add(987654321);
let original_estimate = ull.get_distinct_count_estimate();

// Save to file using writer
let file = File::create(file_path).expect("Failed to create file");
let writer = BufWriter::new(file);
ull.save(writer).expect("Failed to save UltraLogLog");

// Load from file using reader
let file = File::open(file_path).expect("Failed to open file");
let reader = BufReader::new(file);
let loaded_ull = UltraLogLog::load(reader).expect("Failed to load UltraLogLog");
let loaded_estimate = loaded_ull.get_distinct_count_estimate();
```

## Python Bindings
This crate also provides Python bindings for the UltraLogLog algorithm using [PyO3](https://pyo3.rs/). See [example.py](./example.py) for usage.

```python
import ultraloglog

# Create a new UltraLogLog sketch
ull = ultraloglog.PyUltraLogLog(12)  # precision parameter

# Add values
ull.add_str("hello")
ull.add_int(42)
ull.add_float(3.14)

# Get estimated count
print(f"Estimated distinct count: {ull.count()}")
```

### Installation

#### Using pip

Installation with pip is on the way.

#### From Source

*`uv` is recommended to manage virtual environments.*

1. Install Rust, and maturin `pip install maturin`
2. Build and install: `maturin develop --release`


## 64-bit hash function
As mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details. 

## Reference
Ertl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ultraloglog",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ultraloglog, algorithm, sketch, probabilistic, hyperloglog",
    "author": "Ruihang Xia <waynestxia@gmail.com>, Jianshu Zhao <jianshuzhao@yahoo.com>",
    "author_email": "Ruihang Xia <waynestxia@gmail.com>, Jianshu Zhao <jianshuzhao@yahoo.com>",
    "download_url": "https://files.pythonhosted.org/packages/16/d1/f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a/ultraloglog-0.1.3.tar.gz",
    "platform": null,
    "description": "# UltraLogLog\n\nRust implementation of the [UltraLogLog algorithm](https://arxiv.org/pdf/2308.16862). Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used. \n\n## Usage\n\n```rust\nuse ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};\n\nlet mut ull = UltraLogLog::new(6).unwrap();\n\null.add_value(\"apple\")\n    .add_value(\"banana\")\n    .add_value(\"cherry\")\n    .add_value(\"033\");\nlet est = ull.get_distinct_count_estimate();\n```\n\nThe serde feature can be activated so that the sketch can be saved to disk and then loaded.\n```rust\nuse ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};\nuse std::fs::{remove_file, File};\nuse std::io::{BufReader, BufWriter};\n\nlet file_path = \"test_ultraloglog.bin\";\n\n// Create UltraLogLog and add data\nlet mut ull = UltraLogLog::new(5).expect(\"Failed to create ULL\");\null.add(123456789);\null.add(987654321);\nlet original_estimate = ull.get_distinct_count_estimate();\n\n// Save to file using writer\nlet file = File::create(file_path).expect(\"Failed to create file\");\nlet writer = BufWriter::new(file);\null.save(writer).expect(\"Failed to save UltraLogLog\");\n\n// Load from file using reader\nlet file = File::open(file_path).expect(\"Failed to open file\");\nlet reader = BufReader::new(file);\nlet loaded_ull = UltraLogLog::load(reader).expect(\"Failed to load UltraLogLog\");\nlet loaded_estimate = loaded_ull.get_distinct_count_estimate();\n```\n\n## Python Bindings\nThis crate also provides Python bindings for the UltraLogLog algorithm using [PyO3](https://pyo3.rs/). See [example.py](./example.py) for usage.\n\n```python\nimport ultraloglog\n\n# Create a new UltraLogLog sketch\null = ultraloglog.PyUltraLogLog(12)  # precision parameter\n\n# Add values\null.add_str(\"hello\")\null.add_int(42)\null.add_float(3.14)\n\n# Get estimated count\nprint(f\"Estimated distinct count: {ull.count()}\")\n```\n\n### Installation\n\n#### Using pip\n\nInstallation with pip is on the way.\n\n#### From Source\n\n*`uv` is recommended to manage virtual environments.*\n\n1. Install Rust, and maturin `pip install maturin`\n2. Build and install: `maturin develop --release`\n\n\n## 64-bit hash function\nAs mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details. \n\n## Reference\nErtl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python bindings for UltraLogLog, a space-efficient alternative to HyperLogLog for approximate distinct counting",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/waynexia/ultraloglog/blob/master/README.md",
        "Homepage": "https://github.com/waynexia/ultraloglog",
        "Repository": "https://github.com/waynexia/ultraloglog"
    },
    "split_keywords": [
        "ultraloglog",
        " algorithm",
        " sketch",
        " probabilistic",
        " hyperloglog"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b0ee421c5d228f93e568db62f00a72049e77fcb365b608fce70d4c9dd16b868d",
                "md5": "df3c1383a84987866ceacd062638ff76",
                "sha256": "d6db4b9c2b44bc48f0099b8cd79afc5ab32b7ca6d434911463703861b385bd7a"
            },
            "downloads": -1,
            "filename": "ultraloglog-0.1.3-cp313-cp313-manylinux_2_39_x86_64.whl",
            "has_sig": false,
            "md5_digest": "df3c1383a84987866ceacd062638ff76",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.8",
            "size": 251137,
            "upload_time": "2025-07-28T22:28:33",
            "upload_time_iso_8601": "2025-07-28T22:28:33.381346Z",
            "url": "https://files.pythonhosted.org/packages/b0/ee/421c5d228f93e568db62f00a72049e77fcb365b608fce70d4c9dd16b868d/ultraloglog-0.1.3-cp313-cp313-manylinux_2_39_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "16d1f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a",
                "md5": "e108f6f3151e1f45c7d121a2792c9fbf",
                "sha256": "92823f459f6262193304bd2048418387783477dbb777a8c850c4862fba6e6e7b"
            },
            "downloads": -1,
            "filename": "ultraloglog-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e108f6f3151e1f45c7d121a2792c9fbf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 28176,
            "upload_time": "2025-07-28T22:28:37",
            "upload_time_iso_8601": "2025-07-28T22:28:37.529988Z",
            "url": "https://files.pythonhosted.org/packages/16/d1/f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a/ultraloglog-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-28 22:28:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "waynexia",
    "github_project": "ultraloglog",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ultraloglog"
}
        
Elapsed time: 0.90770s