# SharedHashMap
A high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.
## Features
- **Process-safe**: Uses atomic operations from the `atomics` package for lock-free synchronization
- **Shared memory**: Built on Python's `multiprocessing.shared_memory` for efficient cross-process data sharing
- **Optimized serialization**: Avoids pickle overhead for common types (strings, bytes, integers, None)
- **Dict-like interface**: Familiar Python dictionary API
- **Open addressing**: Linear probing for collision resolution
- **Fully tested**: Comprehensive test suite including multiprocess stress tests
## Installation
```bash
# Install dependencies
pip install atomics
# Or install the entire project
pip install .
```
## Quick Start
```python
from shared_hashmap import SharedHashMap
# Create a shared hashmap
with SharedHashMap(name="my_hashmap", capacity=1024, create=True) as shm:
# Set values
shm["key1"] = "value1"
shm["key2"] = 42
# Get values
print(shm["key1"]) # "value1"
print(shm.get("key2")) # 42
# Check existence
if "key1" in shm:
print("key1 exists!")
# Delete keys
del shm["key1"]
# Size
print(f"Hashmap size: {shm.size()}")
# Cleanup
shm.unlink() # Delete shared memory
```
## Multiprocess Usage
### Producer-Consumer Pattern
```python
import multiprocessing as mp
from shared_hashmap import SharedHashMap
def producer(hashmap_name, producer_id, num_items):
# Attach to existing shared memory
shm = SharedHashMap(name=hashmap_name, create=False)
for i in range(num_items):
shm[f"item_{producer_id}_{i}"] = f"data from producer {producer_id}"
shm.close()
def consumer(hashmap_name, producer_id, num_items):
shm = SharedHashMap(name=hashmap_name, create=False)
for i in range(num_items):
value = shm.get(f"item_{producer_id}_{i}")
print(f"Consumed: {value}")
shm.close()
# Main process
if __name__ == "__main__":
hashmap_name = "producer_consumer_example"
# Create the shared hashmap
with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:
# Start producer and consumer processes
p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))
p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))
p1.start()
p2.start()
p1.join()
p2.join()
shm.unlink()
```
## API Reference
### Constructor
```python
SharedHashMap(
name: str,
capacity: int = 1024,
max_key_size: int = 256,
max_value_size: int = 1024,
create: bool = True
)
```
**Parameters:**
- `name`: Unique name for the shared memory block
- `capacity`: Number of buckets in the hashmap
- `max_key_size`: Maximum size in bytes for serialized keys
- `max_value_size`: Maximum size in bytes for serialized values
- `create`: If True, create new shared memory; if False, attach to existing
### Methods
#### `set(key, value)`
Set a key-value pair in the hashmap.
#### `get(key, default=None)`
Get a value from the hashmap. Returns `default` if key not found.
#### `delete(key)`
Delete a key from the hashmap. Returns `True` if deleted, `False` if key didn't exist.
#### `size()`
Return the number of key-value pairs in the hashmap.
#### `close()`
Close the shared memory handle (keeps shared memory alive for other processes).
#### `unlink()`
Delete the shared memory block (should be called by the last process using it).
### Dict-like Operations
```python
shm["key"] = "value" # Set
value = shm["key"] # Get (raises KeyError if not found)
del shm["key"] # Delete (raises KeyError if not found)
"key" in shm # Check existence
```
## Serialization
SharedHashMap optimizes serialization for common types:
| Type | Serialization Method | Notes |
|------|---------------------|-------|
| `str` | UTF-8 encoding | No pickle overhead |
| `bytes` | Direct storage | No pickle overhead |
| `int` | ASCII encoding | No pickle overhead |
| `None` | Empty bytes | No pickle overhead |
| Other | `pickle.dumps()` | Fallback for complex types |
## Performance
SharedHashMap delivers exceptional performance for cross-process data sharing:
**Key Metrics:**
- **String reads**: ~2,600 ops/sec (382μs mean)
- **String writes**: ~1,200 ops/sec (826μs mean)
- **Integer operations**: ~6,000+ ops/sec
- **Mixed workloads**: ~1,170 ops/sec (854μs mean)
- **Concurrent writers**: Scales to multiple processes with minimal contention
**Run benchmarks:**
```bash
pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v
```
## Performance Considerations
1. **Capacity**: Choose a capacity larger than your expected number of items to minimize collisions
2. **Max sizes**: Set `max_key_size` and `max_value_size` appropriately for your data
3. **Alignment**: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations
4. **Serialization**: Use strings, bytes, or integers when possible for best performance
## Thread Safety
SharedHashMap uses atomic compare-and-swap operations to ensure thread safety:
- Multiple processes can safely read and write concurrently
- No locks or mutexes required
- Lock-free design for high concurrency
## Limitations
1. **Fixed capacity**: The hashmap size is fixed at creation time
2. **No iteration**: Currently doesn't support iterating over keys/values
3. **No resizing**: Cannot dynamically grow the hashmap
4. **Size limits**: Keys and values must fit within configured max sizes
## Examples
See `examples/basic_usage.py` for complete examples including:
- Basic operations
- Producer-consumer pattern
- Distributed computation
- Stress testing
Run the examples:
```bash
python examples/basic_usage.py
```
## Testing
Run the test suite:
```bash
pytest tests/test_shared_hashmap.py -v
```
Tests include:
- Basic operations (set, get, delete, contains)
- Concurrent writes from multiple processes
- Concurrent reads and writes
- Stress testing with random operations
- Edge cases (collisions, empty strings, None values)
Run benchmarks:
```bash
pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v
```
Benchmarks include:
- Single-process baseline performance
- Cross-process concurrent operations
- High contention stress tests
- Large dataset performance
- Memory churn with deletions
## License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Raw data
{
"_id": null,
"home_page": null,
"name": "shared_hashmap",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "multiprocessing, shared-memory, hashmap, concurrent, atomic",
"author": null,
"author_email": "Raymond Chastain <RaymondLC92@protonmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ac/f2/643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb/shared_hashmap-0.1.0.tar.gz",
"platform": null,
"description": "# SharedHashMap\n\nA high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.\n\n## Features\n\n- **Process-safe**: Uses atomic operations from the `atomics` package for lock-free synchronization\n- **Shared memory**: Built on Python's `multiprocessing.shared_memory` for efficient cross-process data sharing\n- **Optimized serialization**: Avoids pickle overhead for common types (strings, bytes, integers, None)\n- **Dict-like interface**: Familiar Python dictionary API\n- **Open addressing**: Linear probing for collision resolution\n- **Fully tested**: Comprehensive test suite including multiprocess stress tests\n\n## Installation\n\n```bash\n# Install dependencies\npip install atomics\n\n# Or install the entire project\npip install .\n```\n\n## Quick Start\n\n```python\nfrom shared_hashmap import SharedHashMap\n\n# Create a shared hashmap\nwith SharedHashMap(name=\"my_hashmap\", capacity=1024, create=True) as shm:\n # Set values\n shm[\"key1\"] = \"value1\"\n shm[\"key2\"] = 42\n\n # Get values\n print(shm[\"key1\"]) # \"value1\"\n print(shm.get(\"key2\")) # 42\n\n # Check existence\n if \"key1\" in shm:\n print(\"key1 exists!\")\n\n # Delete keys\n del shm[\"key1\"]\n\n # Size\n print(f\"Hashmap size: {shm.size()}\")\n\n # Cleanup\n shm.unlink() # Delete shared memory\n```\n\n## Multiprocess Usage\n\n### Producer-Consumer Pattern\n\n```python\nimport multiprocessing as mp\nfrom shared_hashmap import SharedHashMap\n\ndef producer(hashmap_name, producer_id, num_items):\n # Attach to existing shared memory\n shm = SharedHashMap(name=hashmap_name, create=False)\n\n for i in range(num_items):\n shm[f\"item_{producer_id}_{i}\"] = f\"data from producer {producer_id}\"\n\n shm.close()\n\ndef consumer(hashmap_name, producer_id, num_items):\n shm = SharedHashMap(name=hashmap_name, create=False)\n\n for i in range(num_items):\n value = shm.get(f\"item_{producer_id}_{i}\")\n print(f\"Consumed: {value}\")\n\n shm.close()\n\n# Main process\nif __name__ == \"__main__\":\n hashmap_name = \"producer_consumer_example\"\n\n # Create the shared hashmap\n with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:\n # Start producer and consumer processes\n p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))\n p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))\n\n p1.start()\n p2.start()\n\n p1.join()\n p2.join()\n\n shm.unlink()\n```\n\n## API Reference\n\n### Constructor\n\n```python\nSharedHashMap(\n name: str,\n capacity: int = 1024,\n max_key_size: int = 256,\n max_value_size: int = 1024,\n create: bool = True\n)\n```\n\n**Parameters:**\n- `name`: Unique name for the shared memory block\n- `capacity`: Number of buckets in the hashmap\n- `max_key_size`: Maximum size in bytes for serialized keys\n- `max_value_size`: Maximum size in bytes for serialized values\n- `create`: If True, create new shared memory; if False, attach to existing\n\n### Methods\n\n#### `set(key, value)`\nSet a key-value pair in the hashmap.\n\n#### `get(key, default=None)`\nGet a value from the hashmap. Returns `default` if key not found.\n\n#### `delete(key)`\nDelete a key from the hashmap. Returns `True` if deleted, `False` if key didn't exist.\n\n#### `size()`\nReturn the number of key-value pairs in the hashmap.\n\n#### `close()`\nClose the shared memory handle (keeps shared memory alive for other processes).\n\n#### `unlink()`\nDelete the shared memory block (should be called by the last process using it).\n\n### Dict-like Operations\n\n```python\nshm[\"key\"] = \"value\" # Set\nvalue = shm[\"key\"] # Get (raises KeyError if not found)\ndel shm[\"key\"] # Delete (raises KeyError if not found)\n\"key\" in shm # Check existence\n```\n\n## Serialization\n\nSharedHashMap optimizes serialization for common types:\n\n| Type | Serialization Method | Notes |\n|------|---------------------|-------|\n| `str` | UTF-8 encoding | No pickle overhead |\n| `bytes` | Direct storage | No pickle overhead |\n| `int` | ASCII encoding | No pickle overhead |\n| `None` | Empty bytes | No pickle overhead |\n| Other | `pickle.dumps()` | Fallback for complex types |\n\n## Performance\n\nSharedHashMap delivers exceptional performance for cross-process data sharing:\n\n**Key Metrics:**\n- **String reads**: ~2,600 ops/sec (382\u03bcs mean)\n- **String writes**: ~1,200 ops/sec (826\u03bcs mean)\n- **Integer operations**: ~6,000+ ops/sec\n- **Mixed workloads**: ~1,170 ops/sec (854\u03bcs mean)\n- **Concurrent writers**: Scales to multiple processes with minimal contention\n\n**Run benchmarks:**\n```bash\npytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v\n```\n\n## Performance Considerations\n\n1. **Capacity**: Choose a capacity larger than your expected number of items to minimize collisions\n2. **Max sizes**: Set `max_key_size` and `max_value_size` appropriately for your data\n3. **Alignment**: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations\n4. **Serialization**: Use strings, bytes, or integers when possible for best performance\n\n## Thread Safety\n\nSharedHashMap uses atomic compare-and-swap operations to ensure thread safety:\n- Multiple processes can safely read and write concurrently\n- No locks or mutexes required\n- Lock-free design for high concurrency\n\n## Limitations\n\n1. **Fixed capacity**: The hashmap size is fixed at creation time\n2. **No iteration**: Currently doesn't support iterating over keys/values\n3. **No resizing**: Cannot dynamically grow the hashmap\n4. **Size limits**: Keys and values must fit within configured max sizes\n\n## Examples\n\nSee `examples/basic_usage.py` for complete examples including:\n- Basic operations\n- Producer-consumer pattern\n- Distributed computation\n- Stress testing\n\nRun the examples:\n```bash\npython examples/basic_usage.py\n```\n\n## Testing\n\nRun the test suite:\n```bash\npytest tests/test_shared_hashmap.py -v\n```\n\nTests include:\n- Basic operations (set, get, delete, contains)\n- Concurrent writes from multiple processes\n- Concurrent reads and writes\n- Stress testing with random operations\n- Edge cases (collisions, empty strings, None values)\n\nRun benchmarks:\n```bash\npytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v\n```\n\nBenchmarks include:\n- Single-process baseline performance\n- Cross-process concurrent operations\n- High contention stress tests\n- Large dataset performance\n- Memory churn with deletions\n\n## License\n\nMIT License\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "High-performance cross-process shared memory hashmap for Python multiprocessing",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"multiprocessing",
" shared-memory",
" hashmap",
" concurrent",
" atomic"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b6a7d48cb81ce86372b2e3788f254b374c8ec86c141dea292f0f770569f23c9c",
"md5": "9929daacd6d307d2c2e251312131cab7",
"sha256": "627f371c8d68a3fd2ddaddb1a3604a202e4078b66fd104a23a40aa7f19bee8d8"
},
"downloads": -1,
"filename": "shared_hashmap-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9929daacd6d307d2c2e251312131cab7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 8927,
"upload_time": "2025-10-23T05:18:04",
"upload_time_iso_8601": "2025-10-23T05:18:04.556405Z",
"url": "https://files.pythonhosted.org/packages/b6/a7/d48cb81ce86372b2e3788f254b374c8ec86c141dea292f0f770569f23c9c/shared_hashmap-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "acf2643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb",
"md5": "30777cb5053f04a4c21a02695c81339b",
"sha256": "64b530dc6b4859aa01f0b5765f8de77e8e30b6b2a18516d74d65f9a58ad68798"
},
"downloads": -1,
"filename": "shared_hashmap-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "30777cb5053f04a4c21a02695c81339b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 15843,
"upload_time": "2025-10-23T05:18:05",
"upload_time_iso_8601": "2025-10-23T05:18:05.777173Z",
"url": "https://files.pythonhosted.org/packages/ac/f2/643e5057266c9fcde4f42921a72701ee7521cb5f689e64aa0eaba6346dbb/shared_hashmap-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-23 05:18:05",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "shared_hashmap"
}