# UltraLogLog
Rust implementation of the [UltraLogLog algorithm](https://arxiv.org/pdf/2308.16862). Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used.
## Usage
```rust
use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};
let mut ull = UltraLogLog::new(6).unwrap();
ull.add_value("apple")
.add_value("banana")
.add_value("cherry")
.add_value("033");
let est = ull.get_distinct_count_estimate();
```
The serde feature can be activated so that the sketch can be saved to disk and then loaded.
```rust
use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};
use std::fs::{remove_file, File};
use std::io::{BufReader, BufWriter};
let file_path = "test_ultraloglog.bin";
// Create UltraLogLog and add data
let mut ull = UltraLogLog::new(5).expect("Failed to create ULL");
ull.add(123456789);
ull.add(987654321);
let original_estimate = ull.get_distinct_count_estimate();
// Save to file using writer
let file = File::create(file_path).expect("Failed to create file");
let writer = BufWriter::new(file);
ull.save(writer).expect("Failed to save UltraLogLog");
// Load from file using reader
let file = File::open(file_path).expect("Failed to open file");
let reader = BufReader::new(file);
let loaded_ull = UltraLogLog::load(reader).expect("Failed to load UltraLogLog");
let loaded_estimate = loaded_ull.get_distinct_count_estimate();
```
## Python Bindings
This crate also provides Python bindings for the UltraLogLog algorithm using [PyO3](https://pyo3.rs/). See [example.py](./example.py) for usage.
```python
import ultraloglog
# Create a new UltraLogLog sketch
ull = ultraloglog.PyUltraLogLog(12) # precision parameter
# Add values
ull.add_str("hello")
ull.add_int(42)
ull.add_float(3.14)
# Get estimated count
print(f"Estimated distinct count: {ull.count()}")
```
### Installation
#### Using pip
Installation with pip is on the way.
#### From Source
*`uv` is recommended to manage virtual environments.*
1. Install Rust, and maturin `pip install maturin`
2. Build and install: `maturin develop --release`
## 64-bit hash function
As mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details.
## Reference
Ertl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.
Raw data
{
"_id": null,
"home_page": null,
"name": "ultraloglog",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "ultraloglog, algorithm, sketch, probabilistic, hyperloglog",
"author": "Ruihang Xia <waynestxia@gmail.com>, Jianshu Zhao <jianshuzhao@yahoo.com>",
"author_email": "Ruihang Xia <waynestxia@gmail.com>, Jianshu Zhao <jianshuzhao@yahoo.com>",
"download_url": "https://files.pythonhosted.org/packages/16/d1/f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a/ultraloglog-0.1.3.tar.gz",
"platform": null,
"description": "# UltraLogLog\n\nRust implementation of the [UltraLogLog algorithm](https://arxiv.org/pdf/2308.16862). Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used. \n\n## Usage\n\n```rust\nuse ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};\n\nlet mut ull = UltraLogLog::new(6).unwrap();\n\null.add_value(\"apple\")\n .add_value(\"banana\")\n .add_value(\"cherry\")\n .add_value(\"033\");\nlet est = ull.get_distinct_count_estimate();\n```\n\nThe serde feature can be activated so that the sketch can be saved to disk and then loaded.\n```rust\nuse ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};\nuse std::fs::{remove_file, File};\nuse std::io::{BufReader, BufWriter};\n\nlet file_path = \"test_ultraloglog.bin\";\n\n// Create UltraLogLog and add data\nlet mut ull = UltraLogLog::new(5).expect(\"Failed to create ULL\");\null.add(123456789);\null.add(987654321);\nlet original_estimate = ull.get_distinct_count_estimate();\n\n// Save to file using writer\nlet file = File::create(file_path).expect(\"Failed to create file\");\nlet writer = BufWriter::new(file);\null.save(writer).expect(\"Failed to save UltraLogLog\");\n\n// Load from file using reader\nlet file = File::open(file_path).expect(\"Failed to open file\");\nlet reader = BufReader::new(file);\nlet loaded_ull = UltraLogLog::load(reader).expect(\"Failed to load UltraLogLog\");\nlet loaded_estimate = loaded_ull.get_distinct_count_estimate();\n```\n\n## Python Bindings\nThis crate also provides Python bindings for the UltraLogLog algorithm using [PyO3](https://pyo3.rs/). See [example.py](./example.py) for usage.\n\n```python\nimport ultraloglog\n\n# Create a new UltraLogLog sketch\null = ultraloglog.PyUltraLogLog(12) # precision parameter\n\n# Add values\null.add_str(\"hello\")\null.add_int(42)\null.add_float(3.14)\n\n# Get estimated count\nprint(f\"Estimated distinct count: {ull.count()}\")\n```\n\n### Installation\n\n#### Using pip\n\nInstallation with pip is on the way.\n\n#### From Source\n\n*`uv` is recommended to manage virtual environments.*\n\n1. Install Rust, and maturin `pip install maturin`\n2. Build and install: `maturin develop --release`\n\n\n## 64-bit hash function\nAs mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details. \n\n## Reference\nErtl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Python bindings for UltraLogLog, a space-efficient alternative to HyperLogLog for approximate distinct counting",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/waynexia/ultraloglog/blob/master/README.md",
"Homepage": "https://github.com/waynexia/ultraloglog",
"Repository": "https://github.com/waynexia/ultraloglog"
},
"split_keywords": [
"ultraloglog",
" algorithm",
" sketch",
" probabilistic",
" hyperloglog"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b0ee421c5d228f93e568db62f00a72049e77fcb365b608fce70d4c9dd16b868d",
"md5": "df3c1383a84987866ceacd062638ff76",
"sha256": "d6db4b9c2b44bc48f0099b8cd79afc5ab32b7ca6d434911463703861b385bd7a"
},
"downloads": -1,
"filename": "ultraloglog-0.1.3-cp313-cp313-manylinux_2_39_x86_64.whl",
"has_sig": false,
"md5_digest": "df3c1383a84987866ceacd062638ff76",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.8",
"size": 251137,
"upload_time": "2025-07-28T22:28:33",
"upload_time_iso_8601": "2025-07-28T22:28:33.381346Z",
"url": "https://files.pythonhosted.org/packages/b0/ee/421c5d228f93e568db62f00a72049e77fcb365b608fce70d4c9dd16b868d/ultraloglog-0.1.3-cp313-cp313-manylinux_2_39_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "16d1f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a",
"md5": "e108f6f3151e1f45c7d121a2792c9fbf",
"sha256": "92823f459f6262193304bd2048418387783477dbb777a8c850c4862fba6e6e7b"
},
"downloads": -1,
"filename": "ultraloglog-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "e108f6f3151e1f45c7d121a2792c9fbf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 28176,
"upload_time": "2025-07-28T22:28:37",
"upload_time_iso_8601": "2025-07-28T22:28:37.529988Z",
"url": "https://files.pythonhosted.org/packages/16/d1/f8d793d73d91f1ff6c57a55ca0b1019c68be198e8b03652c1b8a92c5f41a/ultraloglog-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-28 22:28:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "waynexia",
"github_project": "ultraloglog",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ultraloglog"
}