cmsketch


Namecmsketch JSON
Version 0.1.10 PyPI version JSON
download
home_pageNone
SummaryHigh-performance Count-Min Sketch implementation with C++ and Python versions
upload_time2025-09-13 15:51:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT License Copyright (c) 2025 Isaac Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords count-min-sketch probabilistic data-structure streaming
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Count-Min Sketch

A high-performance C++ implementation of the Count-Min Sketch probabilistic data structure with Python bindings.

[![Python Package](https://img.shields.io/pypi/v/cmsketch)](https://pypi.org/project/cmsketch/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![C++17](https://img.shields.io/badge/C%2B%2B-17-blue.svg)](https://en.cppreference.com/w/cpp/17)

## Project Purpose

This project serves as an educational exploration of:

- **Python Package Development**: Building Python packages with C++ implementations using modern tools (pybind11, scikit-build-core, uv)
- **Performance Comparison**: Comparing C++ and Python native implementations of the same algorithm
- **Build & Publishing Pipeline**: Complete workflow from C++ development to Python package distribution
- **Modern C++ Features**: Template-based design, thread safety, and CMake integration

The implementation is inspired by the [CMU 15-445/645 Database Systems course Project #0](https://15445.courses.cs.cmu.edu/fall2025/project0/), which focuses on implementing a Count-Min Sketch data structure. This project extends that educational foundation by exploring how to package C++ implementations for Python consumption and comparing performance characteristics.

## What is Count-Min Sketch?

The Count-Min Sketch is a probabilistic data structure that provides approximate frequency counts for items in a stream. It's particularly useful for:

> **Learn more**: [Count-Min Sketch on Wikipedia](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch)

- **Streaming data analysis** - Process large datasets without storing all items
- **Frequency estimation** - Get approximate counts with bounded error
- **Memory efficiency** - O(width ร— depth) space complexity
- **Real-time applications** - Fast insertions and queries

## Features

- โšก **High Performance** - Optimized C++ with atomic operations for thread safety
- ๐Ÿ”ง **Template-Based** - Supports any hashable key type (strings, integers, etc.)
- ๐Ÿ **Python Bindings** - Easy-to-use Python interface via pybind11
- ๐Ÿงต **Thread-Safe** - Concurrent access with atomic operations
- ๐ŸŒ **Cross-Platform** - Works on Linux, macOS, and Windows
- ๐Ÿ“ฆ **Easy Installation** - Available on PyPI

## Quick Start

### Installation

```bash
# Using pip
pip install cmsketch

# Using uv (recommended)
uv add cmsketch
```

### Basic Usage

```python
import cmsketch

# Create a sketch for strings
sketch = cmsketch.CountMinSketchStr(1000, 5)

# Add elements
sketch.insert("apple")
sketch.insert("apple")
sketch.insert("banana")

# Query frequencies
print(f"apple: {sketch.count('apple')}")    # 2
print(f"banana: {sketch.count('banana')}")  # 1
print(f"cherry: {sketch.count('cherry')}")  # 0

# Get top-k items
candidates = ["apple", "banana", "cherry"]
top_k = sketch.top_k(2, candidates)
for item, count in top_k:
    print(f"{item}: {count}")
```

### C++ Usage

```cpp
#include "cmsketch/cmsketch.h"
#include <iostream>

int main() {
    // Create a sketch
    cmsketch::CountMinSketch<std::string> sketch(1000, 5);
    
    // Add elements
    sketch.Insert("apple");
    sketch.Insert("apple");
    sketch.Insert("banana");
    
    // Query frequencies
    std::cout << "apple: " << sketch.Count("apple") << std::endl;    // 2
    std::cout << "banana: " << sketch.Count("banana") << std::endl;  // 1
    std::cout << "cherry: " << sketch.Count("cherry") << std::endl;  // 0
    
    return 0;
}
```

## API Reference

### Python Classes

| Class | Description |
|-------|-------------|
| `CountMinSketchStr` | String-based sketch |
| `CountMinSketchInt` | Integer-based sketch |

### Key Methods

| Method | Description |
|--------|-------------|
| `insert(item)` | Insert an item into the sketch |
| `count(item)` | Get estimated count of an item |
| `top_k(k, candidates)` | Get top k items from candidates |
| `merge(other)` | Merge another sketch |
| `clear()` | Reset sketch to initial state |
| `get_width()` | Get sketch width |
| `get_depth()` | Get sketch depth |

## Configuration

The sketch is configured with two parameters:

- **Width**: Number of counters per hash function (higher = more accurate)
- **Depth**: Number of hash functions (higher = more accurate)

```python
# More accurate but uses more memory
sketch = cmsketch.CountMinSketchStr(10000, 7)

# Less accurate but uses less memory  
sketch = cmsketch.CountMinSketchStr(1000, 3)
```

## Error Bounds

The Count-Min Sketch provides the following guarantees:

- **Overestimate**: Estimates are always โ‰ฅ actual frequency
- **Error Bound**: Error is bounded by sketch dimensions
- **Memory**: O(width ร— depth) counters
- **Thread Safety**: Atomic operations ensure concurrent access

## Performance

The C++ implementation provides significant performance improvements:

- **Insertion**: 10-50x faster than Python
- **Query**: 5-20x faster than Python  
- **Memory**: 2-5x more efficient than Python
- **Thread Safety**: Native atomic operations vs GIL limitations

### Benchmark Suite

The project includes a comprehensive benchmark suite that tests real-world scenarios:

#### Test Data
- **100,000 IP address samples** generated using Faker with weighted distribution (10 unique IPs)
- **Realistic frequency patterns** (most frequent IP appears ~10% of the time)
- **Threaded processing** with 10 concurrent workers and 1,000-item batches

#### Benchmark Categories

| Category | Description | Tests |
|----------|-------------|-------|
| **Insert** | Bulk insertion performance | C++ vs Python with 100k threaded inserts |
| **Count** | Query performance | Frequency counting for all unique items |
| **Top-K** | Top-k retrieval | Finding top 3 most frequent items |
| **Streaming** | End-to-end workflows | Complete insert + top-k pipeline |

#### Running Benchmarks

```bash
# Run all benchmarks
uv run python ./benchmarks/run.py

# Save results to JSON
uv run python ./benchmarks/run.py --json

# Generate test data
uv run python ./benchmarks/generate_data.py
```

#### Benchmark Features
- **Synthetic data**: Uses Faker-generated IP addresses with realistic distributions
- **Threaded testing**: Tests concurrent access patterns
- **Comparative analysis**: Direct C++ vs Python performance comparison
- **Statistical accuracy**: Uses pytest-benchmark for reliable measurements
- **Automated data generation**: Creates test data if missing

## Building from Source

### Prerequisites

- C++17 compatible compiler
- CMake 3.15+
- Python 3.11+ (for Python bindings)
- pybind11 (for Python bindings)

### Quick Build

```bash
# Clone the repository
git clone https://github.com/isaac-fate/count-min-sketch.git
cd count-min-sketch

# Build everything
make build

# Run tests
make test

# Run example
make example
```

### Development Setup

```bash
# Clone the repository
git clone https://github.com/isaac-fate/count-min-sketch.git
cd count-min-sketch

# Install all dependencies (including dev dependencies)
uv sync --dev

# Build the C++ library and Python bindings
uv run python -m pip install -e .

# Run Python tests
uv run pytest pytests/

# Run C++ tests
make build-dev
cd build && make test

# Run benchmarks
uv run python ./benchmarks/run.py
```

## GitHub Actions

This project uses GitHub Actions for automated CI/CD workflows:

### Workflows

- **`test.yml`**: Runs C++ and Python tests on all platforms
- **`wheels.yml`**: Builds wheels for Windows, Linux, and macOS using [cibuildwheel](https://github.com/pypa/cibuildwheel)
- **`release.yml`**: Automatically publishes wheels to PyPI on release

### Supported Platforms

- **Python Versions**: 3.11 and 3.12
- **Architectures**: 
  - Windows: x86_64
  - Linux: x86_64  
  - macOS: Intel (x86_64) and Apple Silicon (arm64)

### Triggering Workflows

```bash
# Push to trigger tests and wheel builds
git push origin main

# Create a release to upload all wheels to PyPI
git tag v0.1.0
git push origin v0.1.0
```

### Workflow Features

- **Cross-Platform Compilation**: Uses [cibuildwheel](https://github.com/pypa/cibuildwheel) for consistent wheel building
- **Dependency Management**: Automated dependency installation and caching
- **Test Coverage**: Comprehensive testing across all supported platforms
- **Automated Publishing**: PyPI upload on release

## Project Structure

```
count-min-sketch/
โ”œโ”€โ”€ include/cmsketch/                    # C++ header files
โ”‚   โ”œโ”€โ”€ cmsketch.h                      # Main header (include this)
โ”‚   โ”œโ”€โ”€ count_min_sketch.h              # Core Count-Min Sketch template class
โ”‚   โ””โ”€โ”€ hash_util.h                     # Hash utility functions
โ”œโ”€โ”€ src/cmsketchcpp/                    # C++ source files
โ”‚   โ””โ”€โ”€ count_min_sketch.cc             # Core implementation
โ”œโ”€โ”€ src/cmsketch/                       # Python package source
โ”‚   โ”œโ”€โ”€ __init__.py                     # Package initialization
โ”‚   โ”œโ”€โ”€ base.py                         # Base classes and interfaces
โ”‚   โ”œโ”€โ”€ _core.pyi                       # Type stubs for C++ bindings
โ”‚   โ”œโ”€โ”€ _version.py                     # Version information
โ”‚   โ”œโ”€โ”€ py.typed                        # Type checking marker
โ”‚   โ””โ”€โ”€ py/                             # Pure Python implementations
โ”‚       โ”œโ”€โ”€ count_min_sketch.py         # Python Count-Min Sketch implementation
โ”‚       โ””โ”€โ”€ hash_util.py                # Python hash utilities
โ”œโ”€โ”€ src/                                # Additional source files
โ”‚   โ”œโ”€โ”€ main.cc                         # Example C++ application
โ”‚   โ””โ”€โ”€ python_bindings.cc              # Python bindings (pybind11)
โ”œโ”€โ”€ tests/                              # C++ unit tests
โ”‚   โ”œโ”€โ”€ CMakeLists.txt                  # Test configuration
โ”‚   โ”œโ”€โ”€ test_count_min_sketch.cc        # Core functionality tests
โ”‚   โ”œโ”€โ”€ test_hash_functions.cc          # Hash function tests
โ”‚   โ””โ”€โ”€ test_sketch_config.cc           # Configuration tests
โ”œโ”€โ”€ pytests/                            # Python tests
โ”‚   โ”œโ”€โ”€ __init__.py                     # Test package init
โ”‚   โ”œโ”€โ”€ conftest.py                     # Pytest configuration
โ”‚   โ”œโ”€โ”€ test_count_min_sketch.py        # Core Python tests
โ”‚   โ”œโ”€โ”€ test_hash_util.py               # Hash utility tests
โ”‚   โ”œโ”€โ”€ test_mixins.py                  # Mixin class tests
โ”‚   โ””โ”€โ”€ test_py_count_min_sketch.py     # Pure Python implementation tests
โ”œโ”€โ”€ benchmarks/                         # Performance benchmarks
โ”‚   โ”œโ”€โ”€ __init__.py                     # Benchmark package init
โ”‚   โ”œโ”€โ”€ generate_data.py                # Data generation utilities
โ”‚   โ”œโ”€โ”€ run.py                          # Benchmark runner
โ”‚   โ””โ”€โ”€ test_benchmarks.py              # Benchmark validation tests
โ”œโ”€โ”€ examples/                           # Example scripts
โ”‚   โ””โ”€โ”€ example.py                      # Python usage example
โ”œโ”€โ”€ scripts/                            # Build and deployment scripts
โ”‚   โ”œโ”€โ”€ build.sh                        # Production build script
โ”‚   โ””โ”€โ”€ build-dev.sh                    # Development build script
โ”œโ”€โ”€ data/                               # Sample data files
โ”‚   โ”œโ”€โ”€ ips.txt                         # IP address sample data
โ”‚   โ””โ”€โ”€ unique-ips.txt                  # Unique IP sample data
โ”œโ”€โ”€ build/                              # Build artifacts (generated)
โ”‚   โ”œโ”€โ”€ _core.cpython-*.so              # Compiled Python extensions
โ”‚   โ”œโ”€โ”€ cmsketch_example                # Compiled C++ example
โ”‚   โ”œโ”€โ”€ libcmsketch.a                   # Static library
โ”‚   โ””โ”€โ”€ tests/                          # Compiled test binaries
โ”œโ”€โ”€ dist/                               # Distribution packages (generated)
โ”‚   โ””โ”€โ”€ cmsketch-*.whl                  # Python wheel packages
โ”œโ”€โ”€ CMakeLists.txt                      # Main CMake configuration
โ”œโ”€โ”€ pyproject.toml                      # Python package configuration
โ”œโ”€โ”€ uv.lock                             # uv lock file
โ”œโ”€โ”€ Makefile                            # Convenience make targets
โ”œโ”€โ”€ LICENSE                             # MIT License
โ””โ”€โ”€ README.md                           # This file
```

## Educational Value

This project demonstrates several important software engineering concepts:

### 1. Python Package Development with C++ Extensions
- **pybind11 Integration**: Seamless C++ to Python binding generation
- **scikit-build-core**: Modern Python build system for C++ extensions
- **uv Package Management**: Fast, modern Python package management
- **Type Stubs**: Complete type information for Python IDEs

### 2. Performance Engineering
- **C++ vs Python**: Direct performance comparison between implementations
- **Memory Efficiency**: Optimized data structures and memory usage patterns
- **Thread Safety**: Atomic operations and concurrent access patterns
- **Benchmarking**: Comprehensive performance testing and profiling

### 3. Build System Integration
- **CMake**: Cross-platform C++ build configuration
- **Python Packaging**: Complete pip-installable package creation
- **CI/CD**: Automated testing and publishing workflows
- **Cross-Platform**: Support for multiple operating systems and architectures

### 4. Modern C++ Practices
- **Template Metaprogramming**: Generic, type-safe implementations
- **RAII**: Resource management and exception safety
- **STL Integration**: Standard library containers and algorithms
- **Google Style Guide**: Consistent, readable code formatting

## Contributing

1. Fork the repository
2. Create a feature branch
3. Follow Google C++ Style Guide
4. Add tests for new features
5. Ensure all tests pass
6. Submit a pull request

## License

MIT License - see [LICENSE](LICENSE) file for details.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cmsketch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "isaac-fei <isaac.omega.fei@gmail.com>",
    "keywords": "count-min-sketch, probabilistic, data-structure, streaming",
    "author": null,
    "author_email": "isaac-fei <isaac.omega.fei@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/08/2c/116d045c7e044ead5b7a7d5026a0b3385527b22c1e32ff9b2c66084f4030/cmsketch-0.1.10.tar.gz",
    "platform": null,
    "description": "# Count-Min Sketch\n\nA high-performance C++ implementation of the Count-Min Sketch probabilistic data structure with Python bindings.\n\n[![Python Package](https://img.shields.io/pypi/v/cmsketch)](https://pypi.org/project/cmsketch/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![C++17](https://img.shields.io/badge/C%2B%2B-17-blue.svg)](https://en.cppreference.com/w/cpp/17)\n\n## Project Purpose\n\nThis project serves as an educational exploration of:\n\n- **Python Package Development**: Building Python packages with C++ implementations using modern tools (pybind11, scikit-build-core, uv)\n- **Performance Comparison**: Comparing C++ and Python native implementations of the same algorithm\n- **Build & Publishing Pipeline**: Complete workflow from C++ development to Python package distribution\n- **Modern C++ Features**: Template-based design, thread safety, and CMake integration\n\nThe implementation is inspired by the [CMU 15-445/645 Database Systems course Project #0](https://15445.courses.cs.cmu.edu/fall2025/project0/), which focuses on implementing a Count-Min Sketch data structure. This project extends that educational foundation by exploring how to package C++ implementations for Python consumption and comparing performance characteristics.\n\n## What is Count-Min Sketch?\n\nThe Count-Min Sketch is a probabilistic data structure that provides approximate frequency counts for items in a stream. It's particularly useful for:\n\n> **Learn more**: [Count-Min Sketch on Wikipedia](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch)\n\n- **Streaming data analysis** - Process large datasets without storing all items\n- **Frequency estimation** - Get approximate counts with bounded error\n- **Memory efficiency** - O(width \u00d7 depth) space complexity\n- **Real-time applications** - Fast insertions and queries\n\n## Features\n\n- \u26a1 **High Performance** - Optimized C++ with atomic operations for thread safety\n- \ud83d\udd27 **Template-Based** - Supports any hashable key type (strings, integers, etc.)\n- \ud83d\udc0d **Python Bindings** - Easy-to-use Python interface via pybind11\n- \ud83e\uddf5 **Thread-Safe** - Concurrent access with atomic operations\n- \ud83c\udf0d **Cross-Platform** - Works on Linux, macOS, and Windows\n- \ud83d\udce6 **Easy Installation** - Available on PyPI\n\n## Quick Start\n\n### Installation\n\n```bash\n# Using pip\npip install cmsketch\n\n# Using uv (recommended)\nuv add cmsketch\n```\n\n### Basic Usage\n\n```python\nimport cmsketch\n\n# Create a sketch for strings\nsketch = cmsketch.CountMinSketchStr(1000, 5)\n\n# Add elements\nsketch.insert(\"apple\")\nsketch.insert(\"apple\")\nsketch.insert(\"banana\")\n\n# Query frequencies\nprint(f\"apple: {sketch.count('apple')}\")    # 2\nprint(f\"banana: {sketch.count('banana')}\")  # 1\nprint(f\"cherry: {sketch.count('cherry')}\")  # 0\n\n# Get top-k items\ncandidates = [\"apple\", \"banana\", \"cherry\"]\ntop_k = sketch.top_k(2, candidates)\nfor item, count in top_k:\n    print(f\"{item}: {count}\")\n```\n\n### C++ Usage\n\n```cpp\n#include \"cmsketch/cmsketch.h\"\n#include <iostream>\n\nint main() {\n    // Create a sketch\n    cmsketch::CountMinSketch<std::string> sketch(1000, 5);\n    \n    // Add elements\n    sketch.Insert(\"apple\");\n    sketch.Insert(\"apple\");\n    sketch.Insert(\"banana\");\n    \n    // Query frequencies\n    std::cout << \"apple: \" << sketch.Count(\"apple\") << std::endl;    // 2\n    std::cout << \"banana: \" << sketch.Count(\"banana\") << std::endl;  // 1\n    std::cout << \"cherry: \" << sketch.Count(\"cherry\") << std::endl;  // 0\n    \n    return 0;\n}\n```\n\n## API Reference\n\n### Python Classes\n\n| Class | Description |\n|-------|-------------|\n| `CountMinSketchStr` | String-based sketch |\n| `CountMinSketchInt` | Integer-based sketch |\n\n### Key Methods\n\n| Method | Description |\n|--------|-------------|\n| `insert(item)` | Insert an item into the sketch |\n| `count(item)` | Get estimated count of an item |\n| `top_k(k, candidates)` | Get top k items from candidates |\n| `merge(other)` | Merge another sketch |\n| `clear()` | Reset sketch to initial state |\n| `get_width()` | Get sketch width |\n| `get_depth()` | Get sketch depth |\n\n## Configuration\n\nThe sketch is configured with two parameters:\n\n- **Width**: Number of counters per hash function (higher = more accurate)\n- **Depth**: Number of hash functions (higher = more accurate)\n\n```python\n# More accurate but uses more memory\nsketch = cmsketch.CountMinSketchStr(10000, 7)\n\n# Less accurate but uses less memory  \nsketch = cmsketch.CountMinSketchStr(1000, 3)\n```\n\n## Error Bounds\n\nThe Count-Min Sketch provides the following guarantees:\n\n- **Overestimate**: Estimates are always \u2265 actual frequency\n- **Error Bound**: Error is bounded by sketch dimensions\n- **Memory**: O(width \u00d7 depth) counters\n- **Thread Safety**: Atomic operations ensure concurrent access\n\n## Performance\n\nThe C++ implementation provides significant performance improvements:\n\n- **Insertion**: 10-50x faster than Python\n- **Query**: 5-20x faster than Python  \n- **Memory**: 2-5x more efficient than Python\n- **Thread Safety**: Native atomic operations vs GIL limitations\n\n### Benchmark Suite\n\nThe project includes a comprehensive benchmark suite that tests real-world scenarios:\n\n#### Test Data\n- **100,000 IP address samples** generated using Faker with weighted distribution (10 unique IPs)\n- **Realistic frequency patterns** (most frequent IP appears ~10% of the time)\n- **Threaded processing** with 10 concurrent workers and 1,000-item batches\n\n#### Benchmark Categories\n\n| Category | Description | Tests |\n|----------|-------------|-------|\n| **Insert** | Bulk insertion performance | C++ vs Python with 100k threaded inserts |\n| **Count** | Query performance | Frequency counting for all unique items |\n| **Top-K** | Top-k retrieval | Finding top 3 most frequent items |\n| **Streaming** | End-to-end workflows | Complete insert + top-k pipeline |\n\n#### Running Benchmarks\n\n```bash\n# Run all benchmarks\nuv run python ./benchmarks/run.py\n\n# Save results to JSON\nuv run python ./benchmarks/run.py --json\n\n# Generate test data\nuv run python ./benchmarks/generate_data.py\n```\n\n#### Benchmark Features\n- **Synthetic data**: Uses Faker-generated IP addresses with realistic distributions\n- **Threaded testing**: Tests concurrent access patterns\n- **Comparative analysis**: Direct C++ vs Python performance comparison\n- **Statistical accuracy**: Uses pytest-benchmark for reliable measurements\n- **Automated data generation**: Creates test data if missing\n\n## Building from Source\n\n### Prerequisites\n\n- C++17 compatible compiler\n- CMake 3.15+\n- Python 3.11+ (for Python bindings)\n- pybind11 (for Python bindings)\n\n### Quick Build\n\n```bash\n# Clone the repository\ngit clone https://github.com/isaac-fate/count-min-sketch.git\ncd count-min-sketch\n\n# Build everything\nmake build\n\n# Run tests\nmake test\n\n# Run example\nmake example\n```\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/isaac-fate/count-min-sketch.git\ncd count-min-sketch\n\n# Install all dependencies (including dev dependencies)\nuv sync --dev\n\n# Build the C++ library and Python bindings\nuv run python -m pip install -e .\n\n# Run Python tests\nuv run pytest pytests/\n\n# Run C++ tests\nmake build-dev\ncd build && make test\n\n# Run benchmarks\nuv run python ./benchmarks/run.py\n```\n\n## GitHub Actions\n\nThis project uses GitHub Actions for automated CI/CD workflows:\n\n### Workflows\n\n- **`test.yml`**: Runs C++ and Python tests on all platforms\n- **`wheels.yml`**: Builds wheels for Windows, Linux, and macOS using [cibuildwheel](https://github.com/pypa/cibuildwheel)\n- **`release.yml`**: Automatically publishes wheels to PyPI on release\n\n### Supported Platforms\n\n- **Python Versions**: 3.11 and 3.12\n- **Architectures**: \n  - Windows: x86_64\n  - Linux: x86_64  \n  - macOS: Intel (x86_64) and Apple Silicon (arm64)\n\n### Triggering Workflows\n\n```bash\n# Push to trigger tests and wheel builds\ngit push origin main\n\n# Create a release to upload all wheels to PyPI\ngit tag v0.1.0\ngit push origin v0.1.0\n```\n\n### Workflow Features\n\n- **Cross-Platform Compilation**: Uses [cibuildwheel](https://github.com/pypa/cibuildwheel) for consistent wheel building\n- **Dependency Management**: Automated dependency installation and caching\n- **Test Coverage**: Comprehensive testing across all supported platforms\n- **Automated Publishing**: PyPI upload on release\n\n## Project Structure\n\n```\ncount-min-sketch/\n\u251c\u2500\u2500 include/cmsketch/                    # C++ header files\n\u2502   \u251c\u2500\u2500 cmsketch.h                      # Main header (include this)\n\u2502   \u251c\u2500\u2500 count_min_sketch.h              # Core Count-Min Sketch template class\n\u2502   \u2514\u2500\u2500 hash_util.h                     # Hash utility functions\n\u251c\u2500\u2500 src/cmsketchcpp/                    # C++ source files\n\u2502   \u2514\u2500\u2500 count_min_sketch.cc             # Core implementation\n\u251c\u2500\u2500 src/cmsketch/                       # Python package source\n\u2502   \u251c\u2500\u2500 __init__.py                     # Package initialization\n\u2502   \u251c\u2500\u2500 base.py                         # Base classes and interfaces\n\u2502   \u251c\u2500\u2500 _core.pyi                       # Type stubs for C++ bindings\n\u2502   \u251c\u2500\u2500 _version.py                     # Version information\n\u2502   \u251c\u2500\u2500 py.typed                        # Type checking marker\n\u2502   \u2514\u2500\u2500 py/                             # Pure Python implementations\n\u2502       \u251c\u2500\u2500 count_min_sketch.py         # Python Count-Min Sketch implementation\n\u2502       \u2514\u2500\u2500 hash_util.py                # Python hash utilities\n\u251c\u2500\u2500 src/                                # Additional source files\n\u2502   \u251c\u2500\u2500 main.cc                         # Example C++ application\n\u2502   \u2514\u2500\u2500 python_bindings.cc              # Python bindings (pybind11)\n\u251c\u2500\u2500 tests/                              # C++ unit tests\n\u2502   \u251c\u2500\u2500 CMakeLists.txt                  # Test configuration\n\u2502   \u251c\u2500\u2500 test_count_min_sketch.cc        # Core functionality tests\n\u2502   \u251c\u2500\u2500 test_hash_functions.cc          # Hash function tests\n\u2502   \u2514\u2500\u2500 test_sketch_config.cc           # Configuration tests\n\u251c\u2500\u2500 pytests/                            # Python tests\n\u2502   \u251c\u2500\u2500 __init__.py                     # Test package init\n\u2502   \u251c\u2500\u2500 conftest.py                     # Pytest configuration\n\u2502   \u251c\u2500\u2500 test_count_min_sketch.py        # Core Python tests\n\u2502   \u251c\u2500\u2500 test_hash_util.py               # Hash utility tests\n\u2502   \u251c\u2500\u2500 test_mixins.py                  # Mixin class tests\n\u2502   \u2514\u2500\u2500 test_py_count_min_sketch.py     # Pure Python implementation tests\n\u251c\u2500\u2500 benchmarks/                         # Performance benchmarks\n\u2502   \u251c\u2500\u2500 __init__.py                     # Benchmark package init\n\u2502   \u251c\u2500\u2500 generate_data.py                # Data generation utilities\n\u2502   \u251c\u2500\u2500 run.py                          # Benchmark runner\n\u2502   \u2514\u2500\u2500 test_benchmarks.py              # Benchmark validation tests\n\u251c\u2500\u2500 examples/                           # Example scripts\n\u2502   \u2514\u2500\u2500 example.py                      # Python usage example\n\u251c\u2500\u2500 scripts/                            # Build and deployment scripts\n\u2502   \u251c\u2500\u2500 build.sh                        # Production build script\n\u2502   \u2514\u2500\u2500 build-dev.sh                    # Development build script\n\u251c\u2500\u2500 data/                               # Sample data files\n\u2502   \u251c\u2500\u2500 ips.txt                         # IP address sample data\n\u2502   \u2514\u2500\u2500 unique-ips.txt                  # Unique IP sample data\n\u251c\u2500\u2500 build/                              # Build artifacts (generated)\n\u2502   \u251c\u2500\u2500 _core.cpython-*.so              # Compiled Python extensions\n\u2502   \u251c\u2500\u2500 cmsketch_example                # Compiled C++ example\n\u2502   \u251c\u2500\u2500 libcmsketch.a                   # Static library\n\u2502   \u2514\u2500\u2500 tests/                          # Compiled test binaries\n\u251c\u2500\u2500 dist/                               # Distribution packages (generated)\n\u2502   \u2514\u2500\u2500 cmsketch-*.whl                  # Python wheel packages\n\u251c\u2500\u2500 CMakeLists.txt                      # Main CMake configuration\n\u251c\u2500\u2500 pyproject.toml                      # Python package configuration\n\u251c\u2500\u2500 uv.lock                             # uv lock file\n\u251c\u2500\u2500 Makefile                            # Convenience make targets\n\u251c\u2500\u2500 LICENSE                             # MIT License\n\u2514\u2500\u2500 README.md                           # This file\n```\n\n## Educational Value\n\nThis project demonstrates several important software engineering concepts:\n\n### 1. Python Package Development with C++ Extensions\n- **pybind11 Integration**: Seamless C++ to Python binding generation\n- **scikit-build-core**: Modern Python build system for C++ extensions\n- **uv Package Management**: Fast, modern Python package management\n- **Type Stubs**: Complete type information for Python IDEs\n\n### 2. Performance Engineering\n- **C++ vs Python**: Direct performance comparison between implementations\n- **Memory Efficiency**: Optimized data structures and memory usage patterns\n- **Thread Safety**: Atomic operations and concurrent access patterns\n- **Benchmarking**: Comprehensive performance testing and profiling\n\n### 3. Build System Integration\n- **CMake**: Cross-platform C++ build configuration\n- **Python Packaging**: Complete pip-installable package creation\n- **CI/CD**: Automated testing and publishing workflows\n- **Cross-Platform**: Support for multiple operating systems and architectures\n\n### 4. Modern C++ Practices\n- **Template Metaprogramming**: Generic, type-safe implementations\n- **RAII**: Resource management and exception safety\n- **STL Integration**: Standard library containers and algorithms\n- **Google Style Guide**: Consistent, readable code formatting\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Follow Google C++ Style Guide\n4. Add tests for new features\n5. Ensure all tests pass\n6. Submit a pull request\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.",
    "bugtrack_url": null,
    "license": "MIT License\n         \n         Copyright (c) 2025 Isaac\n         \n         Permission is hereby granted, free of charge, to any person obtaining a copy\n         of this software and associated documentation files (the \"Software\"), to deal\n         in the Software without restriction, including without limitation the rights\n         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n         copies of the Software, and to permit persons to whom the Software is\n         furnished to do so, subject to the following conditions:\n         \n         The above copyright notice and this permission notice shall be included in all\n         copies or substantial portions of the Software.\n         \n         THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n         SOFTWARE.\n         ",
    "summary": "High-performance Count-Min Sketch implementation with C++ and Python versions",
    "version": "0.1.10",
    "project_urls": {
        "Documentation": "https://github.com/isaac-fate/count-min-sketch#readme",
        "Homepage": "https://github.com/isaac-fate/count-min-sketch",
        "Issues": "https://github.com/isaac-fate/count-min-sketch/issues",
        "Repository": "https://github.com/isaac-fate/count-min-sketch"
    },
    "split_keywords": [
        "count-min-sketch",
        " probabilistic",
        " data-structure",
        " streaming"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1dce555026a9e9c7d13dce6be79ceb820094889a318a680a2bb0e0b92b472035",
                "md5": "1d05177f5d69a464179a6f44df2d0e02",
                "sha256": "dec0f68c23c8a90d1487589193c1c3458fbc3d29ec0e12357796125950e6e6fc"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp311-cp311-macosx_10_15_x86_64.whl",
            "has_sig": false,
            "md5_digest": "1d05177f5d69a464179a6f44df2d0e02",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 110436,
            "upload_time": "2025-09-13T15:51:12",
            "upload_time_iso_8601": "2025-09-13T15:51:12.602178Z",
            "url": "https://files.pythonhosted.org/packages/1d/ce/555026a9e9c7d13dce6be79ceb820094889a318a680a2bb0e0b92b472035/cmsketch-0.1.10-cp311-cp311-macosx_10_15_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ee21f07375f734a28a78e018c5bd949401a51bfc3d7e92dc97813917e9fd7fe1",
                "md5": "566dd812877e0de73eaec38edff9be86",
                "sha256": "823eb0d0557fa9c2095d937de37a5af204f50f856aa3a9ff2aeb87c8ec6a218b"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "566dd812877e0de73eaec38edff9be86",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 103702,
            "upload_time": "2025-09-13T15:51:13",
            "upload_time_iso_8601": "2025-09-13T15:51:13.908348Z",
            "url": "https://files.pythonhosted.org/packages/ee/21/f07375f734a28a78e018c5bd949401a51bfc3d7e92dc97813917e9fd7fe1/cmsketch-0.1.10-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6d3c8d36f72dadb31a14bb165859450ab0a74520f3566838d143a6e6ed88f415",
                "md5": "686058f578d14d6ee9f0c4fa817f2cea",
                "sha256": "34ac3afed28389aff5795c0fc4f6dfbb5f5801a397b63c0a7bae0f88fd0d35b5"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "686058f578d14d6ee9f0c4fa817f2cea",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 113074,
            "upload_time": "2025-09-13T15:51:14",
            "upload_time_iso_8601": "2025-09-13T15:51:14.793300Z",
            "url": "https://files.pythonhosted.org/packages/6d/3c/8d36f72dadb31a14bb165859450ab0a74520f3566838d143a6e6ed88f415/cmsketch-0.1.10-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4e976cd52dd74a09781b5d47df453eb7663015f844592bc0242692a424574f53",
                "md5": "364fc93dc7a94556e02a1de1b867f62f",
                "sha256": "819c4d41189f251bb3ddb495b32cfe3a00abfb9db246b453dd60bac465b23ea6"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "364fc93dc7a94556e02a1de1b867f62f",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 289202,
            "upload_time": "2025-09-13T15:51:15",
            "upload_time_iso_8601": "2025-09-13T15:51:15.727183Z",
            "url": "https://files.pythonhosted.org/packages/4e/97/6cd52dd74a09781b5d47df453eb7663015f844592bc0242692a424574f53/cmsketch-0.1.10-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cb48a2bf9cdafaee877bbdcdd077c60ab826dc843b01c1683481256a7da5a0d4",
                "md5": "6747b720e95039110d07095edb3506e1",
                "sha256": "64f6e5dcb8a50dea8e44c14079096d8b4fbc5c478b1e3092a815ca29624ba6e5"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp312-cp312-macosx_10_15_x86_64.whl",
            "has_sig": false,
            "md5_digest": "6747b720e95039110d07095edb3506e1",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.11",
            "size": 111921,
            "upload_time": "2025-09-13T15:51:16",
            "upload_time_iso_8601": "2025-09-13T15:51:16.786570Z",
            "url": "https://files.pythonhosted.org/packages/cb/48/a2bf9cdafaee877bbdcdd077c60ab826dc843b01c1683481256a7da5a0d4/cmsketch-0.1.10-cp312-cp312-macosx_10_15_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c3ba6c6795bab231b267c89c59f0c42c7d39dde3447069fb3bd9fca9b7cb261b",
                "md5": "7020758b9a06f16a549eeb11c7c61fd7",
                "sha256": "5aa12596277532b35b71c0a684e2bb8da8568287e90efb9191d3afbfd9969c6c"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "7020758b9a06f16a549eeb11c7c61fd7",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.11",
            "size": 104529,
            "upload_time": "2025-09-13T15:51:17",
            "upload_time_iso_8601": "2025-09-13T15:51:17.831401Z",
            "url": "https://files.pythonhosted.org/packages/c3/ba/6c6795bab231b267c89c59f0c42c7d39dde3447069fb3bd9fca9b7cb261b/cmsketch-0.1.10-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d8cd1e9b6c2f858dc016523c21477e7961ba2e0825bb233537dccd6024a04a35",
                "md5": "dce5c1a3f2a846f57c1419bec617f5f2",
                "sha256": "90a4e02591ef23c964e75f6f70b0b7025256a2e5a36e8ca087cfaf2384a60ae9"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dce5c1a3f2a846f57c1419bec617f5f2",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.11",
            "size": 113331,
            "upload_time": "2025-09-13T15:51:19",
            "upload_time_iso_8601": "2025-09-13T15:51:19.111881Z",
            "url": "https://files.pythonhosted.org/packages/d8/cd/1e9b6c2f858dc016523c21477e7961ba2e0825bb233537dccd6024a04a35/cmsketch-0.1.10-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "380c722640d0cc9ee1a8f03552dda95be9119e208c22324643aa8eeb414c3cbc",
                "md5": "3e1d36e7c83e48903b13c4b7beaa5211",
                "sha256": "b3aea397bccc72c455443551691fe514402f1260f091f95ca598d775dc06ff61"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10-cp312-cp312-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "3e1d36e7c83e48903b13c4b7beaa5211",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.11",
            "size": 290015,
            "upload_time": "2025-09-13T15:51:20",
            "upload_time_iso_8601": "2025-09-13T15:51:20.423679Z",
            "url": "https://files.pythonhosted.org/packages/38/0c/722640d0cc9ee1a8f03552dda95be9119e208c22324643aa8eeb414c3cbc/cmsketch-0.1.10-cp312-cp312-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "082c116d045c7e044ead5b7a7d5026a0b3385527b22c1e32ff9b2c66084f4030",
                "md5": "690eeb1d52bee4ab0aedc7bcd1b1fe67",
                "sha256": "f9fa6f7cf490b8cf0560d4b0bf602daa24a7b72e3bf3e4f03e7e09a3194a406d"
            },
            "downloads": -1,
            "filename": "cmsketch-0.1.10.tar.gz",
            "has_sig": false,
            "md5_digest": "690eeb1d52bee4ab0aedc7bcd1b1fe67",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 70021,
            "upload_time": "2025-09-13T15:51:21",
            "upload_time_iso_8601": "2025-09-13T15:51:21.782459Z",
            "url": "https://files.pythonhosted.org/packages/08/2c/116d045c7e044ead5b7a7d5026a0b3385527b22c1e32ff9b2c66084f4030/cmsketch-0.1.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-13 15:51:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "isaac-fate",
    "github_project": "count-min-sketch#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cmsketch"
}
        
Elapsed time: 1.49554s