shared_hashmap


Nameshared_hashmap JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryHigh-performance cross-process shared memory hashmap for Python multiprocessing
upload_time2025-10-23 05:18:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords multiprocessing shared-memory hashmap concurrent atomic
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SharedHashMap

A high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.

## Features

- **Process-safe**: Uses atomic operations from the `atomics` package for lock-free synchronization
- **Shared memory**: Built on Python's `multiprocessing.shared_memory` for efficient cross-process data sharing
- **Optimized serialization**: Avoids pickle overhead for common types (strings, bytes, integers, None)
- **Dict-like interface**: Familiar Python dictionary API
- **Open addressing**: Linear probing for collision resolution
- **Fully tested**: Comprehensive test suite including multiprocess stress tests

## Installation

```bash
# Install dependencies
pip install atomics

# Or install the entire project
pip install .
```

## Quick Start

```python
from shared_hashmap import SharedHashMap

# Create a shared hashmap
with SharedHashMap(name="my_hashmap", capacity=1024, create=True) as shm:
    # Set values
    shm["key1"] = "value1"
    shm["key2"] = 42

    # Get values
    print(shm["key1"])  # "value1"
    print(shm.get("key2"))  # 42

    # Check existence
    if "key1" in shm:
        print("key1 exists!")

    # Delete keys
    del shm["key1"]

    # Size
    print(f"Hashmap size: {shm.size()}")

    # Cleanup
    shm.unlink()  # Delete shared memory
```

## Multiprocess Usage

### Producer-Consumer Pattern

```python
import multiprocessing as mp
from shared_hashmap import SharedHashMap

def producer(hashmap_name, producer_id, num_items):
    # Attach to existing shared memory
    shm = SharedHashMap(name=hashmap_name, create=False)

    for i in range(num_items):
        shm[f"item_{producer_id}_{i}"] = f"data from producer {producer_id}"

    shm.close()

def consumer(hashmap_name, producer_id, num_items):
    shm = SharedHashMap(name=hashmap_name, create=False)

    for i in range(num_items):
        value = shm.get(f"item_{producer_id}_{i}")
        print(f"Consumed: {value}")

    shm.close()

# Main process
if __name__ == "__main__":
    hashmap_name = "producer_consumer_example"

    # Create the shared hashmap
    with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:
        # Start producer and consumer processes
        p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))
        p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))

        p1.start()
        p2.start()

        p1.join()
        p2.join()

        shm.unlink()
```

## API Reference

### Constructor

```python
SharedHashMap(
    name: str,
    capacity: int = 1024,
    max_key_size: int = 256,
    max_value_size: int = 1024,
    create: bool = True
)
```

**Parameters:**
- `name`: Unique name for the shared memory block
- `capacity`: Number of buckets in the hashmap
- `max_key_size`: Maximum size in bytes for serialized keys
- `max_value_size`: Maximum size in bytes for serialized values
- `create`: If True, create new shared memory; if False, attach to existing

### Methods

#### `set(key, value)`
Set a key-value pair in the hashmap.

#### `get(key, default=None)`
Get a value from the hashmap. Returns `default` if key not found.

#### `delete(key)`
Delete a key from the hashmap. Returns `True` if deleted, `False` if key didn't exist.

#### `size()`
Return the number of key-value pairs in the hashmap.

#### `close()`
Close the shared memory handle (keeps shared memory alive for other processes).

#### `unlink()`
Delete the shared memory block (should be called by the last process using it).

### Dict-like Operations

```python
shm["key"] = "value"  # Set
value = shm["key"]     # Get (raises KeyError if not found)
del shm["key"]         # Delete (raises KeyError if not found)
"key" in shm           # Check existence
```

## Serialization

SharedHashMap optimizes serialization for common types:

| Type | Serialization Method | Notes |
|------|---------------------|-------|
| `str` | UTF-8 encoding | No pickle overhead |
| `bytes` | Direct storage | No pickle overhead |
| `int` | ASCII encoding | No pickle overhead |
| `None` | Empty bytes | No pickle overhead |
| Other | `pickle.dumps()` | Fallback for complex types |

## Performance

SharedHashMap delivers exceptional performance for cross-process data sharing:

**Key Metrics:**
- **String reads**: ~2,600 ops/sec (382μs mean)
- **String writes**: ~1,200 ops/sec (826μs mean)
- **Integer operations**: ~6,000+ ops/sec
- **Mixed workloads**: ~1,170 ops/sec (854μs mean)
- **Concurrent writers**: Scales to multiple processes with minimal contention

**Run benchmarks:**
```bash
pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v
```

## Performance Considerations

1. **Capacity**: Choose a capacity larger than your expected number of items to minimize collisions
2. **Max sizes**: Set `max_key_size` and `max_value_size` appropriately for your data
3. **Alignment**: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations
4. **Serialization**: Use strings, bytes, or integers when possible for best performance

## Thread Safety

SharedHashMap uses atomic compare-and-swap operations to ensure thread safety:
- Multiple processes can safely read and write concurrently
- No locks or mutexes required
- Lock-free design for high concurrency

## Limitations

1. **Fixed capacity**: The hashmap size is fixed at creation time
2. **No iteration**: Currently doesn't support iterating over keys/values
3. **No resizing**: Cannot dynamically grow the hashmap
4. **Size limits**: Keys and values must fit within configured max sizes

## Examples

See `examples/basic_usage.py` for complete examples including:
- Basic operations
- Producer-consumer pattern
- Distributed computation
- Stress testing

Run the examples:
```bash
python examples/basic_usage.py
```

## Testing

Run the test suite:
```bash
pytest tests/test_shared_hashmap.py -v
```

Tests include:
- Basic operations (set, get, delete, contains)
- Concurrent writes from multiple processes
- Concurrent reads and writes
- Stress testing with random operations
- Edge cases (collisions, empty strings, None values)

Run benchmarks:
```bash
pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v
```

Benchmarks include:
- Single-process baseline performance
- Cross-process concurrent operations
- High contention stress tests
- Large dataset performance
- Memory churn with deletions

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "shared_hashmap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "multiprocessing, shared-memory, hashmap, concurrent, atomic",
    "author": null,
    "author_email": "Raymond Chastain <RaymondLC92@protonmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ac/f2/643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb/shared_hashmap-0.1.0.tar.gz",
    "platform": null,
    "description": "# SharedHashMap\n\nA high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.\n\n## Features\n\n- **Process-safe**: Uses atomic operations from the `atomics` package for lock-free synchronization\n- **Shared memory**: Built on Python's `multiprocessing.shared_memory` for efficient cross-process data sharing\n- **Optimized serialization**: Avoids pickle overhead for common types (strings, bytes, integers, None)\n- **Dict-like interface**: Familiar Python dictionary API\n- **Open addressing**: Linear probing for collision resolution\n- **Fully tested**: Comprehensive test suite including multiprocess stress tests\n\n## Installation\n\n```bash\n# Install dependencies\npip install atomics\n\n# Or install the entire project\npip install .\n```\n\n## Quick Start\n\n```python\nfrom shared_hashmap import SharedHashMap\n\n# Create a shared hashmap\nwith SharedHashMap(name=\"my_hashmap\", capacity=1024, create=True) as shm:\n    # Set values\n    shm[\"key1\"] = \"value1\"\n    shm[\"key2\"] = 42\n\n    # Get values\n    print(shm[\"key1\"])  # \"value1\"\n    print(shm.get(\"key2\"))  # 42\n\n    # Check existence\n    if \"key1\" in shm:\n        print(\"key1 exists!\")\n\n    # Delete keys\n    del shm[\"key1\"]\n\n    # Size\n    print(f\"Hashmap size: {shm.size()}\")\n\n    # Cleanup\n    shm.unlink()  # Delete shared memory\n```\n\n## Multiprocess Usage\n\n### Producer-Consumer Pattern\n\n```python\nimport multiprocessing as mp\nfrom shared_hashmap import SharedHashMap\n\ndef producer(hashmap_name, producer_id, num_items):\n    # Attach to existing shared memory\n    shm = SharedHashMap(name=hashmap_name, create=False)\n\n    for i in range(num_items):\n        shm[f\"item_{producer_id}_{i}\"] = f\"data from producer {producer_id}\"\n\n    shm.close()\n\ndef consumer(hashmap_name, producer_id, num_items):\n    shm = SharedHashMap(name=hashmap_name, create=False)\n\n    for i in range(num_items):\n        value = shm.get(f\"item_{producer_id}_{i}\")\n        print(f\"Consumed: {value}\")\n\n    shm.close()\n\n# Main process\nif __name__ == \"__main__\":\n    hashmap_name = \"producer_consumer_example\"\n\n    # Create the shared hashmap\n    with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:\n        # Start producer and consumer processes\n        p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))\n        p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))\n\n        p1.start()\n        p2.start()\n\n        p1.join()\n        p2.join()\n\n        shm.unlink()\n```\n\n## API Reference\n\n### Constructor\n\n```python\nSharedHashMap(\n    name: str,\n    capacity: int = 1024,\n    max_key_size: int = 256,\n    max_value_size: int = 1024,\n    create: bool = True\n)\n```\n\n**Parameters:**\n- `name`: Unique name for the shared memory block\n- `capacity`: Number of buckets in the hashmap\n- `max_key_size`: Maximum size in bytes for serialized keys\n- `max_value_size`: Maximum size in bytes for serialized values\n- `create`: If True, create new shared memory; if False, attach to existing\n\n### Methods\n\n#### `set(key, value)`\nSet a key-value pair in the hashmap.\n\n#### `get(key, default=None)`\nGet a value from the hashmap. Returns `default` if key not found.\n\n#### `delete(key)`\nDelete a key from the hashmap. Returns `True` if deleted, `False` if key didn't exist.\n\n#### `size()`\nReturn the number of key-value pairs in the hashmap.\n\n#### `close()`\nClose the shared memory handle (keeps shared memory alive for other processes).\n\n#### `unlink()`\nDelete the shared memory block (should be called by the last process using it).\n\n### Dict-like Operations\n\n```python\nshm[\"key\"] = \"value\"  # Set\nvalue = shm[\"key\"]     # Get (raises KeyError if not found)\ndel shm[\"key\"]         # Delete (raises KeyError if not found)\n\"key\" in shm           # Check existence\n```\n\n## Serialization\n\nSharedHashMap optimizes serialization for common types:\n\n| Type | Serialization Method | Notes |\n|------|---------------------|-------|\n| `str` | UTF-8 encoding | No pickle overhead |\n| `bytes` | Direct storage | No pickle overhead |\n| `int` | ASCII encoding | No pickle overhead |\n| `None` | Empty bytes | No pickle overhead |\n| Other | `pickle.dumps()` | Fallback for complex types |\n\n## Performance\n\nSharedHashMap delivers exceptional performance for cross-process data sharing:\n\n**Key Metrics:**\n- **String reads**: ~2,600 ops/sec (382\u03bcs mean)\n- **String writes**: ~1,200 ops/sec (826\u03bcs mean)\n- **Integer operations**: ~6,000+ ops/sec\n- **Mixed workloads**: ~1,170 ops/sec (854\u03bcs mean)\n- **Concurrent writers**: Scales to multiple processes with minimal contention\n\n**Run benchmarks:**\n```bash\npytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v\n```\n\n## Performance Considerations\n\n1. **Capacity**: Choose a capacity larger than your expected number of items to minimize collisions\n2. **Max sizes**: Set `max_key_size` and `max_value_size` appropriately for your data\n3. **Alignment**: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations\n4. **Serialization**: Use strings, bytes, or integers when possible for best performance\n\n## Thread Safety\n\nSharedHashMap uses atomic compare-and-swap operations to ensure thread safety:\n- Multiple processes can safely read and write concurrently\n- No locks or mutexes required\n- Lock-free design for high concurrency\n\n## Limitations\n\n1. **Fixed capacity**: The hashmap size is fixed at creation time\n2. **No iteration**: Currently doesn't support iterating over keys/values\n3. **No resizing**: Cannot dynamically grow the hashmap\n4. **Size limits**: Keys and values must fit within configured max sizes\n\n## Examples\n\nSee `examples/basic_usage.py` for complete examples including:\n- Basic operations\n- Producer-consumer pattern\n- Distributed computation\n- Stress testing\n\nRun the examples:\n```bash\npython examples/basic_usage.py\n```\n\n## Testing\n\nRun the test suite:\n```bash\npytest tests/test_shared_hashmap.py -v\n```\n\nTests include:\n- Basic operations (set, get, delete, contains)\n- Concurrent writes from multiple processes\n- Concurrent reads and writes\n- Stress testing with random operations\n- Edge cases (collisions, empty strings, None values)\n\nRun benchmarks:\n```bash\npytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v\n```\n\nBenchmarks include:\n- Single-process baseline performance\n- Cross-process concurrent operations\n- High contention stress tests\n- Large dataset performance\n- Memory churn with deletions\n\n## License\n\nMIT License\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance cross-process shared memory hashmap for Python multiprocessing",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "multiprocessing",
        " shared-memory",
        " hashmap",
        " concurrent",
        " atomic"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b6a7d48cb81ce86372b2e3788f254b374c8ec86c141dea292f0f770569f23c9c",
                "md5": "9929daacd6d307d2c2e251312131cab7",
                "sha256": "627f371c8d68a3fd2ddaddb1a3604a202e4078b66fd104a23a40aa7f19bee8d8"
            },
            "downloads": -1,
            "filename": "shared_hashmap-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9929daacd6d307d2c2e251312131cab7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8927,
            "upload_time": "2025-10-23T05:18:04",
            "upload_time_iso_8601": "2025-10-23T05:18:04.556405Z",
            "url": "https://files.pythonhosted.org/packages/b6/a7/d48cb81ce86372b2e3788f254b374c8ec86c141dea292f0f770569f23c9c/shared_hashmap-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "acf2643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb",
                "md5": "30777cb5053f04a4c21a02695c81339b",
                "sha256": "64b530dc6b4859aa01f0b5765f8de77e8e30b6b2a18516d74d65f9a58ad68798"
            },
            "downloads": -1,
            "filename": "shared_hashmap-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "30777cb5053f04a4c21a02695c81339b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 15843,
            "upload_time": "2025-10-23T05:18:05",
            "upload_time_iso_8601": "2025-10-23T05:18:05.777173Z",
            "url": "https://files.pythonhosted.org/packages/ac/f2/643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb/shared_hashmap-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-23 05:18:05",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "shared_hashmap"
}
        
Elapsed time: 1.29747s