dynamic-prefetching-cache


Namedynamic-prefetching-cache JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA predictive caching framework that dynamically pre-loads data items with minimal latency
upload_time2025-07-11 20:24:47
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords cache prefetch prediction performance data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dynamic Prefetching Cache for Python

A Python library for memory-efficient file reading through speculative precaching. Instead of loading entire datasets into memory, this framework uses a user defined predictive function to anticipate data access patterns and proactively read and cache the most likely needed next items.

## Use Cases

This library is designed for scenarios where you need to process large files or datasets sequentially or with predictable access patterns, but cannot or do not want to load everything into memory at once.

**Primary use case**: Video frame analysis and MOT (Multiple Object Tracking) data processing, where users typically navigate through frames sequentially but may jump to specific positions. The library includes optimized providers and predictors for this scenario.

**Other applications**: Any situation where you can predict future data access patterns - time series analysis, log file processing, document processing pipelines, or any sequential data where memory usage needs to be controlled. The library is designed to be flexible and can be used in a wide range of scenarios.

## How It Works

Rather than reactive caching (loading data only after it's requested), this system implements **speculative precaching**:

1. **Predict**: Uses a user defined predictive function to identify the most likely next items
2. **Prefetch**: Loads predicted data in background thread before it's needed
3. **Serve**: Returns cached data when requested (if prediction was correct) or loads synchronously as fallback
4. **Manage**: Automatically evicts old data to maintain memory limits

## Quick Start

```python
from dynamic_prefetching_cache.predictors import DynamicDataPredictor
from dynamic_prefetching_cache.providers import MOTDataProvider
from dynamic_prefetching_cache.cache import DynamicPrefetchingCache

provider = MOTDataProvider("examples/data/large_data.txt") # Note: Use generate_large_mot_data.py to generate data
predictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])

# Create cache with automatic resource management
with DynamicPrefetchingCache(provider, predictor, max_keys_cached=512) as cache:
    for key in range(100):
        data = cache.get(key)  # Returns immediately if prefetched, else loads synchronously
        print(data)
        
    # Monitor performance
    stats = cache.stats()
    print(f"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}")
```

## Core Protocols

### DataProvider Protocol
Implement this interface to connect your data source:

```python
class MyDataProvider:
    def load(self, key: int) -> Any:
        """Load data for the given key. Must be thread-safe."""
        # May be reading line(s) from a file or database
        # May be parsing a line from a text file
        # See src/dynamic_prefetching_cache/providers.py for an example
        return fetch_from_file_or_database(key)
    
    def get_available_frames(self) -> set[int]:
        """Return set of valid keys."""
        return {1, 2, 3, 4, 5}
    
    def get_total_frames(self) -> int:
        """Return total number of available keys."""
        return 5
    
    def get_stats(self) -> dict:
        """Return provider statistics."""
        return {"status": "ok"}
```

### AccessPredictor Protocol
Implement this interface to define prediction logic:

```python
class MyAccessPredictor:
    def get_likelihoods(self, current_key: int, history: list[int]) -> dict[int, float]:
        """Return likelihood scores for potential next keys."""
        return {
            current_key + 1: 0.8,  # High likelihood
            current_key + 2: 0.3,  # Medium likelihood
            current_key - 1: 0.1,  # Low likelihood
        }
```

## Built-in Components

### Access Predictors

The library includes several ready-to-use predictors:

- **`DistanceDecayPredictor`**: Simple distance-based prediction with configurable decay rates
- **`DynamicDistanceDecayPredictor`**: Forward-biased predictor optimized for media playback
- **`DynamicDataPredictor`**: Advanced predictor with jump detection and history analysis

```python
from dynamic_prefetching_cache.predictors import DynamicDataPredictor

# Optimized for video/media navigation patterns
predictor = DynamicDataPredictor(
    possible_jumps=[-15, -5, -1, 1, 5, 15, 30],  # Common seek distances
    forward_bias=2.0,     # Favor forward progression
    jump_boost=5.0,       # Boost exact jump targets
    proximity_boost=2.0   # Boost areas near jump targets
)
```

### MOT Data Provider

High-performance provider for MOT (Multiple Object Tracking) data files:

```python
from dynamic_prefetching_cache.providers import MOTDataProvider

# Optimized for MOT format files with built-in indexing and caching
provider = MOTDataProvider('data/tracking_results.txt', cache_size=100)

# Includes comprehensive statistics
stats = provider.get_stats()
print(f"Provider cache hit rate: {stats['cache_hit_rate']:.2%}")
```

## Configuration

```python
cache = DynamicPrefetchingCache(
    provider=my_provider,
    predictor=my_predictor,
    max_keys_cached=1000,                   # Maximum items in cache
    max_keys_prefetched=8,                  # Max concurrent prefetch tasks
    history_size=30,                        # Access history for prediction
    eviction_policy=EvictionPolicyOldest,   # Cache eviction strategy
    on_event=my_event_handler               # Optional event monitoring
)
```

## Event Monitoring

Monitor cache operations for debugging and optimization:

```python
def handle_cache_events(event_name: str, **kwargs):
    if event_name == 'prefetch_error':
        logger.warning(f"Prefetch failed for key {kwargs['key']}: {kwargs['error']}")
    elif event_name == 'cache_evict':
        logger.debug(f"Evicted key {kwargs['key']} from cache")

cache = DynamicPrefetchingCache(provider, predictor, on_event=handle_cache_events)
```

Available events: `cache_load_start/complete/error`, `prefetch_start/success/error`, `cache_evict`, `worker_error`

## Performance Monitoring

```python
stats = cache.stats()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}")
print(f"Active prefetch tasks: {stats['active_prefetch_tasks']}")
```

## Thread Safety

- `get()` method is thread-safe for concurrent access
- Background worker thread handles all prefetching operations
- `close()` method ensures clean resource cleanup
- All internal state is properly synchronized

## Examples

### Basic Usage Example

```python
from dynamic_prefetching_cache.cache import DynamicPrefetchingCache
from dynamic_prefetching_cache.predictors import DynamicDataPredictor
from dynamic_prefetching_cache.providers import MOTDataProvider

# Set up for video frame analysis
provider = MOTDataProvider('examples/data/example_data.txt')
predictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])

with DynamicPrefetchingCache(provider, predictor, max_keys_cached=200) as cache:
    for frame_id in range(100):
        detections = cache.get(frame_id)
        print(f"Frame {frame_id}: {len(detections.detections)} objects detected")
```

### Visual Interactive Demo

```bash
python examples/visable_example.py
```

### Performance Profiling

```bash
python examples/profile_example.py
```

## Installation

```bash
# Install from PyPI (when published)
pip install dynamic-prefetching-cache

# Or install from source
git clone <repository-url>
cd dynamic-prefetching-cache
pip install -e .

# Install with development dependencies
pip install -e ".[dev]"

# Install with example dependencies (for GUI demo)
pip install -e ".[examples]"
```

## Getting Started

1. **Implement DataProvider** - Connect to your data source
2. **Choose or implement AccessPredictor** - Define prediction logic for your use case
3. **Configure cache parameters** - Set memory limits and prefetch behavior
4. **Use `cache.get(key)`** - The system handles prefetching automatically

The library abstracts away the complexity of memory management, concurrent prefetching, and prediction logic, allowing you to focus on your core data processing tasks.

## Generating Test Data

The repository includes a script to generate realistic MOT (Multiple Object Tracking) format test data for testing and development. This eliminates the need to download or upload large data files.

```bash
# Generate small test file (100 tracks, 1000 frames, ~1MB)
python scripts/generate_large_mot_data.py -o examples/data/test_data.txt -t 100 -f 1000

# Generate medium test file (500 tracks, 10000 frames, ~50MB)
python scripts/generate_large_mot_data.py -o examples/data/medium_data.txt -t 500 -f 10000

# Generate large test file (1000 tracks, 100000 frames, ~500MB)
python scripts/generate_large_mot_data.py -o examples/data/large_data.txt -t 1000 -f 100000

# Custom data generation with full options
python scripts/generate_large_mot_data.py \
    --output examples/data/custom_data.txt \
    --tracks 200 \                    # Number of object tracks
    --frames 5000 \                   # Number of frames
    --width 1920 \                    # Image width (pixels)
    --height 1080 \                   # Image height (pixels)
    --min-track-length 10 \           # Minimum track duration
    --max-track-length 200 \          # Maximum track duration
    --seed 42                         # Random seed for reproducibility
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dynamic-prefetching-cache",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "cache, prefetch, prediction, performance, data",
    "author": null,
    "author_email": "Rasmus Rynell <rynell.rasmus@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/96/02/ba0733e198c6310dd60df872bbed893c85a82ceee36d12c0cc69cc3b60dd/dynamic_prefetching_cache-0.1.2.tar.gz",
    "platform": null,
    "description": "# Dynamic Prefetching Cache for Python\n\nA Python library for memory-efficient file reading through speculative precaching. Instead of loading entire datasets into memory, this framework uses a user defined predictive function to anticipate data access patterns and proactively read and cache the most likely needed next items.\n\n## Use Cases\n\nThis library is designed for scenarios where you need to process large files or datasets sequentially or with predictable access patterns, but cannot or do not want to load everything into memory at once.\n\n**Primary use case**: Video frame analysis and MOT (Multiple Object Tracking) data processing, where users typically navigate through frames sequentially but may jump to specific positions. The library includes optimized providers and predictors for this scenario.\n\n**Other applications**: Any situation where you can predict future data access patterns - time series analysis, log file processing, document processing pipelines, or any sequential data where memory usage needs to be controlled. The library is designed to be flexible and can be used in a wide range of scenarios.\n\n## How It Works\n\nRather than reactive caching (loading data only after it's requested), this system implements **speculative precaching**:\n\n1. **Predict**: Uses a user defined predictive function to identify the most likely next items\n2. **Prefetch**: Loads predicted data in background thread before it's needed\n3. **Serve**: Returns cached data when requested (if prediction was correct) or loads synchronously as fallback\n4. **Manage**: Automatically evicts old data to maintain memory limits\n\n## Quick Start\n\n```python\nfrom dynamic_prefetching_cache.predictors import DynamicDataPredictor\nfrom dynamic_prefetching_cache.providers import MOTDataProvider\nfrom dynamic_prefetching_cache.cache import DynamicPrefetchingCache\n\nprovider = MOTDataProvider(\"examples/data/large_data.txt\") # Note: Use generate_large_mot_data.py to generate data\npredictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])\n\n# Create cache with automatic resource management\nwith DynamicPrefetchingCache(provider, predictor, max_keys_cached=512) as cache:\n    for key in range(100):\n        data = cache.get(key)  # Returns immediately if prefetched, else loads synchronously\n        print(data)\n        \n    # Monitor performance\n    stats = cache.stats()\n    print(f\"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}\")\n```\n\n## Core Protocols\n\n### DataProvider Protocol\nImplement this interface to connect your data source:\n\n```python\nclass MyDataProvider:\n    def load(self, key: int) -> Any:\n        \"\"\"Load data for the given key. Must be thread-safe.\"\"\"\n        # May be reading line(s) from a file or database\n        # May be parsing a line from a text file\n        # See src/dynamic_prefetching_cache/providers.py for an example\n        return fetch_from_file_or_database(key)\n    \n    def get_available_frames(self) -> set[int]:\n        \"\"\"Return set of valid keys.\"\"\"\n        return {1, 2, 3, 4, 5}\n    \n    def get_total_frames(self) -> int:\n        \"\"\"Return total number of available keys.\"\"\"\n        return 5\n    \n    def get_stats(self) -> dict:\n        \"\"\"Return provider statistics.\"\"\"\n        return {\"status\": \"ok\"}\n```\n\n### AccessPredictor Protocol\nImplement this interface to define prediction logic:\n\n```python\nclass MyAccessPredictor:\n    def get_likelihoods(self, current_key: int, history: list[int]) -> dict[int, float]:\n        \"\"\"Return likelihood scores for potential next keys.\"\"\"\n        return {\n            current_key + 1: 0.8,  # High likelihood\n            current_key + 2: 0.3,  # Medium likelihood\n            current_key - 1: 0.1,  # Low likelihood\n        }\n```\n\n## Built-in Components\n\n### Access Predictors\n\nThe library includes several ready-to-use predictors:\n\n- **`DistanceDecayPredictor`**: Simple distance-based prediction with configurable decay rates\n- **`DynamicDistanceDecayPredictor`**: Forward-biased predictor optimized for media playback\n- **`DynamicDataPredictor`**: Advanced predictor with jump detection and history analysis\n\n```python\nfrom dynamic_prefetching_cache.predictors import DynamicDataPredictor\n\n# Optimized for video/media navigation patterns\npredictor = DynamicDataPredictor(\n    possible_jumps=[-15, -5, -1, 1, 5, 15, 30],  # Common seek distances\n    forward_bias=2.0,     # Favor forward progression\n    jump_boost=5.0,       # Boost exact jump targets\n    proximity_boost=2.0   # Boost areas near jump targets\n)\n```\n\n### MOT Data Provider\n\nHigh-performance provider for MOT (Multiple Object Tracking) data files:\n\n```python\nfrom dynamic_prefetching_cache.providers import MOTDataProvider\n\n# Optimized for MOT format files with built-in indexing and caching\nprovider = MOTDataProvider('data/tracking_results.txt', cache_size=100)\n\n# Includes comprehensive statistics\nstats = provider.get_stats()\nprint(f\"Provider cache hit rate: {stats['cache_hit_rate']:.2%}\")\n```\n\n## Configuration\n\n```python\ncache = DynamicPrefetchingCache(\n    provider=my_provider,\n    predictor=my_predictor,\n    max_keys_cached=1000,                   # Maximum items in cache\n    max_keys_prefetched=8,                  # Max concurrent prefetch tasks\n    history_size=30,                        # Access history for prediction\n    eviction_policy=EvictionPolicyOldest,   # Cache eviction strategy\n    on_event=my_event_handler               # Optional event monitoring\n)\n```\n\n## Event Monitoring\n\nMonitor cache operations for debugging and optimization:\n\n```python\ndef handle_cache_events(event_name: str, **kwargs):\n    if event_name == 'prefetch_error':\n        logger.warning(f\"Prefetch failed for key {kwargs['key']}: {kwargs['error']}\")\n    elif event_name == 'cache_evict':\n        logger.debug(f\"Evicted key {kwargs['key']} from cache\")\n\ncache = DynamicPrefetchingCache(provider, predictor, on_event=handle_cache_events)\n```\n\nAvailable events: `cache_load_start/complete/error`, `prefetch_start/success/error`, `cache_evict`, `worker_error`\n\n## Performance Monitoring\n\n```python\nstats = cache.stats()\nprint(f\"Cache hits: {stats['hits']}\")\nprint(f\"Cache misses: {stats['misses']}\")\nprint(f\"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}\")\nprint(f\"Active prefetch tasks: {stats['active_prefetch_tasks']}\")\n```\n\n## Thread Safety\n\n- `get()` method is thread-safe for concurrent access\n- Background worker thread handles all prefetching operations\n- `close()` method ensures clean resource cleanup\n- All internal state is properly synchronized\n\n## Examples\n\n### Basic Usage Example\n\n```python\nfrom dynamic_prefetching_cache.cache import DynamicPrefetchingCache\nfrom dynamic_prefetching_cache.predictors import DynamicDataPredictor\nfrom dynamic_prefetching_cache.providers import MOTDataProvider\n\n# Set up for video frame analysis\nprovider = MOTDataProvider('examples/data/example_data.txt')\npredictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])\n\nwith DynamicPrefetchingCache(provider, predictor, max_keys_cached=200) as cache:\n    for frame_id in range(100):\n        detections = cache.get(frame_id)\n        print(f\"Frame {frame_id}: {len(detections.detections)} objects detected\")\n```\n\n### Visual Interactive Demo\n\n```bash\npython examples/visable_example.py\n```\n\n### Performance Profiling\n\n```bash\npython examples/profile_example.py\n```\n\n## Installation\n\n```bash\n# Install from PyPI (when published)\npip install dynamic-prefetching-cache\n\n# Or install from source\ngit clone <repository-url>\ncd dynamic-prefetching-cache\npip install -e .\n\n# Install with development dependencies\npip install -e \".[dev]\"\n\n# Install with example dependencies (for GUI demo)\npip install -e \".[examples]\"\n```\n\n## Getting Started\n\n1. **Implement DataProvider** - Connect to your data source\n2. **Choose or implement AccessPredictor** - Define prediction logic for your use case\n3. **Configure cache parameters** - Set memory limits and prefetch behavior\n4. **Use `cache.get(key)`** - The system handles prefetching automatically\n\nThe library abstracts away the complexity of memory management, concurrent prefetching, and prediction logic, allowing you to focus on your core data processing tasks.\n\n## Generating Test Data\n\nThe repository includes a script to generate realistic MOT (Multiple Object Tracking) format test data for testing and development. This eliminates the need to download or upload large data files.\n\n```bash\n# Generate small test file (100 tracks, 1000 frames, ~1MB)\npython scripts/generate_large_mot_data.py -o examples/data/test_data.txt -t 100 -f 1000\n\n# Generate medium test file (500 tracks, 10000 frames, ~50MB)\npython scripts/generate_large_mot_data.py -o examples/data/medium_data.txt -t 500 -f 10000\n\n# Generate large test file (1000 tracks, 100000 frames, ~500MB)\npython scripts/generate_large_mot_data.py -o examples/data/large_data.txt -t 1000 -f 100000\n\n# Custom data generation with full options\npython scripts/generate_large_mot_data.py \\\n    --output examples/data/custom_data.txt \\\n    --tracks 200 \\                    # Number of object tracks\n    --frames 5000 \\                   # Number of frames\n    --width 1920 \\                    # Image width (pixels)\n    --height 1080 \\                   # Image height (pixels)\n    --min-track-length 10 \\           # Minimum track duration\n    --max-track-length 200 \\          # Maximum track duration\n    --seed 42                         # Random seed for reproducibility\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A predictive caching framework that dynamically pre-loads data items with minimal latency",
    "version": "0.1.2",
    "project_urls": {
        "Repository": "https://github.com/rasmusrynell/dynamic-prefetching-cache"
    },
    "split_keywords": [
        "cache",
        " prefetch",
        " prediction",
        " performance",
        " data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "55296db54e81b74dee693c701c1fee34d7bed6a037478968ab6a0a5d09749580",
                "md5": "92e1103fd46e91a6d680297266be409c",
                "sha256": "f66d42068cb71ca20409c282fb6357b6202814c3f880145c498460aa322939c1"
            },
            "downloads": -1,
            "filename": "dynamic_prefetching_cache-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "92e1103fd46e91a6d680297266be409c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19524,
            "upload_time": "2025-07-11T20:24:46",
            "upload_time_iso_8601": "2025-07-11T20:24:46.667188Z",
            "url": "https://files.pythonhosted.org/packages/55/29/6db54e81b74dee693c701c1fee34d7bed6a037478968ab6a0a5d09749580/dynamic_prefetching_cache-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9602ba0733e198c6310dd60df872bbed893c85a82ceee36d12c0cc69cc3b60dd",
                "md5": "e658879c71e23fa18ce5da1ce976bbe3",
                "sha256": "c79e6a30fd24fb3d77c77be348a292f012e022c90cd2269474c36cc0ebb4d70f"
            },
            "downloads": -1,
            "filename": "dynamic_prefetching_cache-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e658879c71e23fa18ce5da1ce976bbe3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20444,
            "upload_time": "2025-07-11T20:24:47",
            "upload_time_iso_8601": "2025-07-11T20:24:47.454364Z",
            "url": "https://files.pythonhosted.org/packages/96/02/ba0733e198c6310dd60df872bbed893c85a82ceee36d12c0cc69cc3b60dd/dynamic_prefetching_cache-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 20:24:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rasmusrynell",
    "github_project": "dynamic-prefetching-cache",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dynamic-prefetching-cache"
}
        
Elapsed time: 2.24559s