<div align="center">
<img src="./assets/logo.png" alt="CacheFuse Logo" width="100"/>
# CacheFuse
**Enterprise-grade caching framework for LLM responses and embeddings**
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://badge.fury.io/py/cachefuse)
*Dramatically reduce LLM API costs and latency with intelligent caching*
</div>
---
## π Why CacheFuse?
CacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.
### π° **Massive Cost Savings**
- **60-90% API cost reduction** in typical applications
- **100x faster responses** for cached queries (<3ms vs 2-5 seconds)
- **Smart invalidation** prevents stale results
### β‘ **Enterprise-Ready Features**
- **Deterministic cache keys** - Same inputs always produce same cache keys
- **Stampede protection** - Concurrent requests handled intelligently
- **Multi-backend support** - SQLite (local) or Redis (distributed)
- **Privacy-compliant** - Hash-only mode with optional redaction hooks
- **Production monitoring** - Hit rates, latency metrics, and CLI tools
### π§ **Developer-First Design**
- **Drop-in decorators** - Add `@llm` or `@embed` to existing functions
- **Zero configuration** - Works out of the box with sensible defaults
- **Flexible invalidation** - TTL, tags, and template versioning
- **Thread-safe** - Handles concurrency without race conditions
## π¦ Installation
### Production
```bash
pip install cachefuse
```
### Development
```bash
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
```
### Optional Dependencies
```bash
pip install cachefuse[redis] # For Redis backend support
```
## β‘ Quickstart
### Basic LLM Caching
```python
from cachefuse.api.cache import Cache
from cachefuse.api.decorators import llm
import openai
# Initialize cache (works out of the box)
cache = Cache.from_env()
@llm(cache=cache, ttl="7d", tag="summarize-v1", template_version="1")
def summarize(text: str, model: str = "gpt-4o-mini") -> str:
response = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": f"Summarize: {text}"}]
)
return response.choices[0].message.content
# First call: API request (slow + costs money)
summary1 = summarize("CacheFuse speeds up LLM applications") # ~2-5 seconds
# Second call: Cache hit (fast + free)
summary2 = summarize("CacheFuse speeds up LLM applications") # ~3ms
print(f"Results identical: {summary1 == summary2}") # True
print(f"Cache stats: {cache.stats()}") # Hit rate, latency, savings
```
### Embedding Caching
```python
from cachefuse.api.decorators import embed
@embed(cache=cache, ttl="30d", tag="embeddings-v1")
def get_embeddings(texts: list[str], model: str = "text-embedding-ada-002") -> list[float]:
response = openai.embeddings.create(
model=model,
input=texts
)
return [embedding.embedding for embedding in response.data]
# Expensive embedding calls cached automatically
vectors = get_embeddings(["Hello world", "Goodbye world"])
```
### CLI Management
```bash
# View cache statistics
cachefuse stats
# Clear specific tags
cachefuse purge --tag summarize-v1
# Compact SQLite database
cachefuse vacuum
# View help
cachefuse --help
```
### Real-World Example
```python
# RAG application with caching
@llm(cache=cache, ttl="1h", tag="rag-v1", template_version="2")
def answer_question(question: str, context: str, model: str = "gpt-4") -> str:
return openai.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "Answer based on the context provided."},
{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
]
).choices[0].message.content
# Same questions with same context = instant responses + no API costs
answer = answer_question("What is CacheFuse?", "CacheFuse is a caching framework...")
```
## ποΈ Architecture
CacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β @llm / @embed β
β Decorators β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β Cache Facade β
β β’ Deterministic fingerprinting β
β β’ Stampede protection (per-key locks) β
β β’ Metrics collection β
β β’ Privacy mode handling β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββΌβββββββββββββββ
β Backends β
ββββββββββββββ¬ββββββββββββββββ€
β SQLite β Redis β
β (local) β (distributed) β
ββββββββββββββ΄ββββββββββββββββ
```
### Key Components
- **Decorators** - Simple `@llm` and `@embed` decorators for drop-in caching
- **Cache Facade** - Intelligent cache management with fingerprinting and concurrency control
- **Multi-Backend** - SQLite for local development, Redis for production scale
- **Metrics System** - Real-time performance tracking and cost analysis
## βοΈ Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `CF_BACKEND` | `sqlite` | Backend type (`sqlite` or `redis`) |
| `CF_SQLITE_PATH` | `~/.cache/cachefuse/cache.db` | SQLite database file path |
| `CF_REDIS_URL` | - | Redis connection string (e.g., `redis://localhost:6379/0`) |
| `CF_MODE` | `normal` | Privacy mode (`normal` or `hash_only`) |
| `CF_LOCK_TIMEOUT` | `30` | Per-key lock timeout in seconds |
### Configuration Methods
```python
# Method 1: Environment-based (recommended)
from cachefuse.api.cache import Cache
cache = Cache.from_env()
# Method 2: Explicit configuration
from cachefuse.config import CacheConfig
config = CacheConfig(
backend="redis",
redis_url="redis://localhost:6379/0",
mode="hash_only"
)
cache = Cache.from_config(config)
```
## ποΈ Storage Backends
### SQLite Backend (Default)
Perfect for local development, single-machine deployments, and applications requiring file-based persistence.
**Features:**
- Single-file storage with WAL mode for optimal performance
- Built-in ACID transactions
- Automatic schema migration
- Vacuum support for space reclamation
- Zero external dependencies
```python
# Automatic (default)
cache = Cache.from_env()
# Explicit configuration
cache = Cache.from_config(CacheConfig(
backend="sqlite",
sqlite_path="/custom/path/cache.db"
))
```
### Redis Backend
Ideal for distributed applications, horizontal scaling, and shared cache scenarios.
**Features:**
- Distributed caching across multiple instances
- Built-in TTL expiration
- Atomic operations with Redis transactions
- Tag-based bulk operations using sets
- High availability and clustering support
```python
cache = Cache.from_config(CacheConfig(
backend="redis",
redis_url="redis://localhost:6379/0"
))
```
**Redis Key Layout:**
- `cf:entry:<key>` - Cache entry data
- `cf:tag:<tag>` - Set of keys with specific tag
## ποΈ Advanced Features
### TTL (Time-To-Live)
Flexible expiration control with human-readable formats:
```python
@llm(cache=cache, ttl="7d") # 7 days
@llm(cache=cache, ttl="2h") # 2 hours
@llm(cache=cache, ttl="30m") # 30 minutes
@llm(cache=cache, ttl="300s") # 300 seconds
@llm(cache=cache, ttl=0) # No expiration
```
### Tags & Bulk Invalidation
Group related cache entries for easy management:
```python
# Tag entries by version, feature, or use case
@llm(cache=cache, ttl="1h", tag="summarize-v2")
def summarize_v2(text: str) -> str: ...
@llm(cache=cache, ttl="1h", tags=["rag", "qa-v1"])
def answer_question(question: str, context: str) -> str: ...
# Bulk invalidation
cache.purge_tag("summarize-v2") # Clear all v2 summaries
```
```bash
# CLI bulk operations
cachefuse purge --tag rag # Clear all RAG cache entries
cachefuse purge --tag qa-v1 # Clear v1 Q&A entries
```
### Template Versioning
Automatic cache invalidation when prompts change:
```python
# Version 1
@llm(cache=cache, ttl="1d", template_version="1")
def analyze_sentiment(text: str) -> str:
return f"Analyze sentiment: {text}"
# Version 2 - automatically uses different cache keys
@llm(cache=cache, ttl="1d", template_version="2")
def analyze_sentiment(text: str) -> str:
return f"Analyze sentiment with context: {text}"
```
### Deterministic Cache Keys
Cache keys are generated from:
- **Function type** (`llm` or `embed`)
- **Model parameters** (model name, temperature, etc.)
- **Template version**
- **Input hash** (SHA256 of processed input)
- **Provider info** (optional)
## π Privacy & Security
### Hash-Only Mode
For privacy-sensitive applications, store only hashes instead of raw content:
```python
from cachefuse.config import CacheConfig
# Enable privacy mode
config = CacheConfig(backend="sqlite", mode="hash_only")
cache = Cache.from_config(config)
@llm(cache=cache, ttl="1h")
def process_sensitive_data(user_input: str) -> str:
# Raw input never stored, only hash-based cache keys
return llm_provider_call(user_input)
```
### Content Redaction
Automatically redact sensitive information before hashing:
```python
def redactor(text: str) -> str:
# Custom redaction logic
return text.replace("SECRET_TOKEN", "[REDACTED]").replace("PASSWORD", "[REDACTED]")
cache = Cache(backend=cache._backend, config=config, redactor=redactor)
# Both calls hit the same cache (identical after redaction)
result1 = process_data("User SECRET_TOKEN abc123")
result2 = process_data("User [REDACTED] abc123") # Cache hit!
```
### Security Features
- **No sensitive data storage** in hash-only mode
- **Deterministic redaction** ensures consistent cache hits
- **Configurable redaction functions** for custom privacy needs
- **Thread-safe operations** prevent race conditions
## π Performance Monitoring
### Real-Time Metrics
Track cache performance and cost savings:
```python
stats = cache.stats()
print(f"""
Cache Performance:
Entries: {stats['entries']}
Total Calls: {stats['total_calls']}
Cache Hits: {stats['hits']}
Hit Rate: {stats['hit_rate']:.2%}
Avg Latency: {stats['avg_latency_ms']:.1f}ms
Cost Saved: ${stats['cost_saved']:.2f}
""")
```
### CLI Monitoring
```bash
# Detailed performance stats
cachefuse stats
# Output:
# entries: 150
# total_calls: 1000
# hits: 850
# hit_rate: 0.85
# avg_latency_ms: 2.3
# cost_saved: 127.50
```
### Production Monitoring
```python
# Log metrics for monitoring systems
import logging
logger = logging.getLogger("cachefuse.metrics")
stats = cache.stats()
logger.info("cache_metrics", extra={
"hit_rate": stats["hit_rate"],
"avg_latency": stats["avg_latency_ms"],
"cost_saved": stats["cost_saved"]
})
```
## π Concurrency & Reliability
### Stampede Protection
Prevents duplicate expensive operations when multiple requests arrive simultaneously:
```python
# 100 concurrent requests for same uncached item
# Result: Only 1 API call, 99 cache hits
results = await asyncio.gather(*[
summarize("same input") for _ in range(100)
])
# All results identical, massive cost/latency savings
```
### Thread Safety
- **Per-key file locks** prevent race conditions
- **ACID transactions** ensure data consistency
- **Atomic operations** for concurrent access
- **Lock timeout handling** prevents deadlocks
### Reliability Features
- **Graceful degradation** when cache unavailable
- **Automatic retry logic** for transient failures
- **Connection pooling** for Redis backend
- **WAL mode** for SQLite performance
## π§ͺ Testing & Development
### Running Tests
```bash
# Install development dependencies
uv pip install -e ".[dev]"
# Run unit tests (fast)
uv run pytest -q -m "not integration" --cov=cachefuse
# Run integration tests (requires Redis for some tests)
uv run pytest -q -m integration
# Run all tests
uv run pytest --cov=cachefuse
```
### Performance Benchmarks
- **Cache hit latency**: < 3ms (SQLite), < 1ms (Redis)
- **Stampede protection**: 1 provider call regardless of concurrency
- **Memory overhead**: ~50MB typical usage
- **Storage efficiency**: Configurable compression and cleanup
### Examples & Demos
```bash
# RAG application demo
uv run python -m cachefuse.examples.rag_demo
# Embedding caching demo
uv run python -m cachefuse.examples.embed_demo
```
## πΊοΈ Roadmap
### v0.2.0 - Advanced Caching
- [ ] Semantic similarity caching
- [ ] Batch operations API
- [ ] Enhanced metrics (p95/p99 latencies)
### v0.3.0 - Enterprise Features
- [ ] Prometheus metrics export
- [ ] Distributed locking with Redis
- [ ] Advanced compression algorithms
### v0.4.0 - Provider Integration
- [ ] Native OpenAI SDK integration
- [ ] Anthropic Claude SDK support
- [ ] Automatic cost tracking by provider
### Future Releases
- [ ] Web dashboard for cache management
- [ ] Circuit breaker patterns
- [ ] Multi-tier caching strategies
## π Performance Comparison
| Scenario | Without CacheFuse | With CacheFuse | Improvement |
|----------|------------------|----------------|-------------|
| Repeated queries | 2-5 seconds | < 3ms | **100-1000x faster** |
| API costs | $0.02 per call | $0.00 (cached) | **90%+ savings** |
| Concurrency | N Γ API calls | 1 API call | **Perfect deduplication** |
| Memory usage | Negligible | ~50MB | **Minimal overhead** |
### Development Setup
```bash
# Clone the repository
git clone https://github.com/Yasserelhaddar/CacheFuse.git
cd CacheFuse
# Set up development environment
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Run tests
uv run pytest
```
### Areas for Contribution
- π Bug fixes and stability improvements
- β‘ Performance optimizations
- π Documentation and examples
- π New backend implementations
- π§ͺ Test coverage improvements
## π License
MIT License - see [LICENSE](LICENSE) file for details.
---
<div align="center">
<p>
<strong>Built with β€οΈ for the AI community</strong>
</p>
<p>
<em>Star β this repo if CacheFuse helps you build better LLM applications!</em>
</p>
</div>
Raw data
{
"_id": null,
"home_page": null,
"name": "cachefuse",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "cache, llm, embeddings, ai, openai, sqlite, redis, caching, performance",
"author": null,
"author_email": "Yasser Elhaddar <yasser.elhaddar@example.com>",
"download_url": "https://files.pythonhosted.org/packages/bf/24/d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8/cachefuse-0.1.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"./assets/logo.png\" alt=\"CacheFuse Logo\" width=\"100\"/>\n \n # CacheFuse\n \n **Enterprise-grade caching framework for LLM responses and embeddings**\n \n [](https://www.python.org/downloads/)\n [](https://opensource.org/licenses/MIT)\n [](https://badge.fury.io/py/cachefuse)\n \n *Dramatically reduce LLM API costs and latency with intelligent caching*\n \n</div>\n\n---\n\n## \ud83d\ude80 Why CacheFuse?\n\nCacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.\n\n### \ud83d\udcb0 **Massive Cost Savings**\n- **60-90% API cost reduction** in typical applications\n- **100x faster responses** for cached queries (<3ms vs 2-5 seconds)\n- **Smart invalidation** prevents stale results\n\n### \u26a1 **Enterprise-Ready Features**\n- **Deterministic cache keys** - Same inputs always produce same cache keys\n- **Stampede protection** - Concurrent requests handled intelligently \n- **Multi-backend support** - SQLite (local) or Redis (distributed)\n- **Privacy-compliant** - Hash-only mode with optional redaction hooks\n- **Production monitoring** - Hit rates, latency metrics, and CLI tools\n\n### \ud83d\udd27 **Developer-First Design**\n- **Drop-in decorators** - Add `@llm` or `@embed` to existing functions\n- **Zero configuration** - Works out of the box with sensible defaults\n- **Flexible invalidation** - TTL, tags, and template versioning\n- **Thread-safe** - Handles concurrency without race conditions\n\n## \ud83d\udce6 Installation\n\n### Production\n```bash\npip install cachefuse\n```\n\n### Development\n```bash\nuv venv .venv\nsource .venv/bin/activate\nuv pip install -e \".[dev]\"\n```\n\n### Optional Dependencies\n```bash\npip install cachefuse[redis] # For Redis backend support\n```\n\n## \u26a1 Quickstart\n\n### Basic LLM Caching\n```python\nfrom cachefuse.api.cache import Cache\nfrom cachefuse.api.decorators import llm\nimport openai\n\n# Initialize cache (works out of the box)\ncache = Cache.from_env()\n\n@llm(cache=cache, ttl=\"7d\", tag=\"summarize-v1\", template_version=\"1\")\ndef summarize(text: str, model: str = \"gpt-4o-mini\") -> str:\n response = openai.chat.completions.create(\n model=model,\n messages=[{\"role\": \"user\", \"content\": f\"Summarize: {text}\"}]\n )\n return response.choices[0].message.content\n\n# First call: API request (slow + costs money)\nsummary1 = summarize(\"CacheFuse speeds up LLM applications\") # ~2-5 seconds\n\n# Second call: Cache hit (fast + free) \nsummary2 = summarize(\"CacheFuse speeds up LLM applications\") # ~3ms\n\nprint(f\"Results identical: {summary1 == summary2}\") # True\nprint(f\"Cache stats: {cache.stats()}\") # Hit rate, latency, savings\n```\n\n### Embedding Caching\n```python\nfrom cachefuse.api.decorators import embed\n\n@embed(cache=cache, ttl=\"30d\", tag=\"embeddings-v1\")\ndef get_embeddings(texts: list[str], model: str = \"text-embedding-ada-002\") -> list[float]:\n response = openai.embeddings.create(\n model=model,\n input=texts\n )\n return [embedding.embedding for embedding in response.data]\n\n# Expensive embedding calls cached automatically\nvectors = get_embeddings([\"Hello world\", \"Goodbye world\"])\n```\n\n### CLI Management\n```bash\n# View cache statistics\ncachefuse stats\n\n# Clear specific tags \ncachefuse purge --tag summarize-v1\n\n# Compact SQLite database\ncachefuse vacuum\n\n# View help\ncachefuse --help\n```\n\n### Real-World Example\n```python\n# RAG application with caching\n@llm(cache=cache, ttl=\"1h\", tag=\"rag-v1\", template_version=\"2\")\ndef answer_question(question: str, context: str, model: str = \"gpt-4\") -> str:\n return openai.chat.completions.create(\n model=model,\n messages=[\n {\"role\": \"system\", \"content\": \"Answer based on the context provided.\"},\n {\"role\": \"user\", \"content\": f\"Context: {context}\\n\\nQuestion: {question}\"}\n ]\n ).choices[0].message.content\n\n# Same questions with same context = instant responses + no API costs\nanswer = answer_question(\"What is CacheFuse?\", \"CacheFuse is a caching framework...\")\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\nCacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 @llm / @embed \u2502\n\u2502 Decorators \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Cache Facade \u2502\n\u2502 \u2022 Deterministic fingerprinting \u2502\n\u2502 \u2022 Stampede protection (per-key locks) \u2502\n\u2502 \u2022 Metrics collection \u2502\n\u2502 \u2022 Privacy mode handling \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Backends \u2502\n \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n \u2502 SQLite \u2502 Redis \u2502\n \u2502 (local) \u2502 (distributed) \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### Key Components\n\n- **Decorators** - Simple `@llm` and `@embed` decorators for drop-in caching\n- **Cache Facade** - Intelligent cache management with fingerprinting and concurrency control\n- **Multi-Backend** - SQLite for local development, Redis for production scale\n- **Metrics System** - Real-time performance tracking and cost analysis\n\n## \u2699\ufe0f Configuration\n\n### Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `CF_BACKEND` | `sqlite` | Backend type (`sqlite` or `redis`) |\n| `CF_SQLITE_PATH` | `~/.cache/cachefuse/cache.db` | SQLite database file path |\n| `CF_REDIS_URL` | - | Redis connection string (e.g., `redis://localhost:6379/0`) |\n| `CF_MODE` | `normal` | Privacy mode (`normal` or `hash_only`) |\n| `CF_LOCK_TIMEOUT` | `30` | Per-key lock timeout in seconds |\n\n### Configuration Methods\n\n```python\n# Method 1: Environment-based (recommended)\nfrom cachefuse.api.cache import Cache\ncache = Cache.from_env()\n\n# Method 2: Explicit configuration\nfrom cachefuse.config import CacheConfig\nconfig = CacheConfig(\n backend=\"redis\",\n redis_url=\"redis://localhost:6379/0\",\n mode=\"hash_only\"\n)\ncache = Cache.from_config(config)\n```\n\n## \ud83d\uddc4\ufe0f Storage Backends\n\n### SQLite Backend (Default)\nPerfect for local development, single-machine deployments, and applications requiring file-based persistence.\n\n**Features:**\n- Single-file storage with WAL mode for optimal performance\n- Built-in ACID transactions\n- Automatic schema migration\n- Vacuum support for space reclamation\n- Zero external dependencies\n\n```python\n# Automatic (default)\ncache = Cache.from_env()\n\n# Explicit configuration\ncache = Cache.from_config(CacheConfig(\n backend=\"sqlite\",\n sqlite_path=\"/custom/path/cache.db\"\n))\n```\n\n### Redis Backend\nIdeal for distributed applications, horizontal scaling, and shared cache scenarios.\n\n**Features:**\n- Distributed caching across multiple instances\n- Built-in TTL expiration\n- Atomic operations with Redis transactions\n- Tag-based bulk operations using sets\n- High availability and clustering support\n\n```python\ncache = Cache.from_config(CacheConfig(\n backend=\"redis\", \n redis_url=\"redis://localhost:6379/0\"\n))\n```\n\n**Redis Key Layout:**\n- `cf:entry:<key>` - Cache entry data\n- `cf:tag:<tag>` - Set of keys with specific tag\n\n## \ud83c\udf9b\ufe0f Advanced Features\n\n### TTL (Time-To-Live)\nFlexible expiration control with human-readable formats:\n\n```python\n@llm(cache=cache, ttl=\"7d\") # 7 days\n@llm(cache=cache, ttl=\"2h\") # 2 hours \n@llm(cache=cache, ttl=\"30m\") # 30 minutes\n@llm(cache=cache, ttl=\"300s\") # 300 seconds\n@llm(cache=cache, ttl=0) # No expiration\n```\n\n### Tags & Bulk Invalidation\nGroup related cache entries for easy management:\n\n```python\n# Tag entries by version, feature, or use case\n@llm(cache=cache, ttl=\"1h\", tag=\"summarize-v2\")\ndef summarize_v2(text: str) -> str: ...\n\n@llm(cache=cache, ttl=\"1h\", tags=[\"rag\", \"qa-v1\"]) \ndef answer_question(question: str, context: str) -> str: ...\n\n# Bulk invalidation\ncache.purge_tag(\"summarize-v2\") # Clear all v2 summaries\n```\n\n```bash\n# CLI bulk operations\ncachefuse purge --tag rag # Clear all RAG cache entries\ncachefuse purge --tag qa-v1 # Clear v1 Q&A entries\n```\n\n### Template Versioning\nAutomatic cache invalidation when prompts change:\n\n```python\n# Version 1\n@llm(cache=cache, ttl=\"1d\", template_version=\"1\")\ndef analyze_sentiment(text: str) -> str:\n return f\"Analyze sentiment: {text}\"\n\n# Version 2 - automatically uses different cache keys\n@llm(cache=cache, ttl=\"1d\", template_version=\"2\") \ndef analyze_sentiment(text: str) -> str:\n return f\"Analyze sentiment with context: {text}\"\n```\n\n### Deterministic Cache Keys\nCache keys are generated from:\n- **Function type** (`llm` or `embed`)\n- **Model parameters** (model name, temperature, etc.)\n- **Template version** \n- **Input hash** (SHA256 of processed input)\n- **Provider info** (optional)\n\n## \ud83d\udd12 Privacy & Security\n\n### Hash-Only Mode\nFor privacy-sensitive applications, store only hashes instead of raw content:\n\n```python\nfrom cachefuse.config import CacheConfig\n\n# Enable privacy mode\nconfig = CacheConfig(backend=\"sqlite\", mode=\"hash_only\")\ncache = Cache.from_config(config)\n\n@llm(cache=cache, ttl=\"1h\")\ndef process_sensitive_data(user_input: str) -> str:\n # Raw input never stored, only hash-based cache keys\n return llm_provider_call(user_input)\n```\n\n### Content Redaction\nAutomatically redact sensitive information before hashing:\n\n```python\ndef redactor(text: str) -> str:\n # Custom redaction logic\n return text.replace(\"SECRET_TOKEN\", \"[REDACTED]\").replace(\"PASSWORD\", \"[REDACTED]\")\n\ncache = Cache(backend=cache._backend, config=config, redactor=redactor)\n\n# Both calls hit the same cache (identical after redaction)\nresult1 = process_data(\"User SECRET_TOKEN abc123\") \nresult2 = process_data(\"User [REDACTED] abc123\") # Cache hit!\n```\n\n### Security Features\n- **No sensitive data storage** in hash-only mode\n- **Deterministic redaction** ensures consistent cache hits\n- **Configurable redaction functions** for custom privacy needs\n- **Thread-safe operations** prevent race conditions\n\n## \ud83d\udcca Performance Monitoring\n\n### Real-Time Metrics\nTrack cache performance and cost savings:\n\n```python\nstats = cache.stats()\nprint(f\"\"\"\nCache Performance:\n Entries: {stats['entries']}\n Total Calls: {stats['total_calls']}\n Cache Hits: {stats['hits']}\n Hit Rate: {stats['hit_rate']:.2%}\n Avg Latency: {stats['avg_latency_ms']:.1f}ms\n Cost Saved: ${stats['cost_saved']:.2f}\n\"\"\")\n```\n\n### CLI Monitoring\n```bash\n# Detailed performance stats\ncachefuse stats\n\n# Output:\n# entries: 150\n# total_calls: 1000 \n# hits: 850\n# hit_rate: 0.85\n# avg_latency_ms: 2.3\n# cost_saved: 127.50\n```\n\n### Production Monitoring\n```python\n# Log metrics for monitoring systems\nimport logging\nlogger = logging.getLogger(\"cachefuse.metrics\")\n\nstats = cache.stats()\nlogger.info(\"cache_metrics\", extra={\n \"hit_rate\": stats[\"hit_rate\"],\n \"avg_latency\": stats[\"avg_latency_ms\"], \n \"cost_saved\": stats[\"cost_saved\"]\n})\n```\n\n## \ud83d\udd04 Concurrency & Reliability \n\n### Stampede Protection\nPrevents duplicate expensive operations when multiple requests arrive simultaneously:\n\n```python\n# 100 concurrent requests for same uncached item\n# Result: Only 1 API call, 99 cache hits\nresults = await asyncio.gather(*[\n summarize(\"same input\") for _ in range(100)\n])\n# All results identical, massive cost/latency savings\n```\n\n### Thread Safety\n- **Per-key file locks** prevent race conditions\n- **ACID transactions** ensure data consistency \n- **Atomic operations** for concurrent access\n- **Lock timeout handling** prevents deadlocks\n\n### Reliability Features\n- **Graceful degradation** when cache unavailable\n- **Automatic retry logic** for transient failures\n- **Connection pooling** for Redis backend\n- **WAL mode** for SQLite performance\n\n## \ud83e\uddea Testing & Development\n\n### Running Tests\n```bash\n# Install development dependencies\nuv pip install -e \".[dev]\"\n\n# Run unit tests (fast)\nuv run pytest -q -m \"not integration\" --cov=cachefuse\n\n# Run integration tests (requires Redis for some tests)\nuv run pytest -q -m integration\n\n# Run all tests\nuv run pytest --cov=cachefuse\n```\n\n### Performance Benchmarks\n- **Cache hit latency**: < 3ms (SQLite), < 1ms (Redis) \n- **Stampede protection**: 1 provider call regardless of concurrency\n- **Memory overhead**: ~50MB typical usage\n- **Storage efficiency**: Configurable compression and cleanup\n\n### Examples & Demos\n```bash\n# RAG application demo\nuv run python -m cachefuse.examples.rag_demo\n\n# Embedding caching demo \nuv run python -m cachefuse.examples.embed_demo\n```\n\n## \ud83d\uddfa\ufe0f Roadmap\n\n### v0.2.0 - Advanced Caching\n- [ ] Semantic similarity caching\n- [ ] Batch operations API\n- [ ] Enhanced metrics (p95/p99 latencies)\n\n### v0.3.0 - Enterprise Features \n- [ ] Prometheus metrics export\n- [ ] Distributed locking with Redis\n- [ ] Advanced compression algorithms\n\n### v0.4.0 - Provider Integration\n- [ ] Native OpenAI SDK integration\n- [ ] Anthropic Claude SDK support\n- [ ] Automatic cost tracking by provider\n\n### Future Releases\n- [ ] Web dashboard for cache management\n- [ ] Circuit breaker patterns\n- [ ] Multi-tier caching strategies\n\n## \ud83d\udcc8 Performance Comparison\n\n| Scenario | Without CacheFuse | With CacheFuse | Improvement |\n|----------|------------------|----------------|-------------|\n| Repeated queries | 2-5 seconds | < 3ms | **100-1000x faster** |\n| API costs | $0.02 per call | $0.00 (cached) | **90%+ savings** |\n| Concurrency | N \u00d7 API calls | 1 API call | **Perfect deduplication** |\n| Memory usage | Negligible | ~50MB | **Minimal overhead** |\n\n\n### Development Setup\n```bash\n# Clone the repository\ngit clone https://github.com/Yasserelhaddar/CacheFuse.git\ncd CacheFuse\n\n# Set up development environment\nuv venv .venv\nsource .venv/bin/activate\nuv pip install -e \".[dev]\"\n\n# Run tests\nuv run pytest\n```\n\n### Areas for Contribution\n- \ud83d\udc1b Bug fixes and stability improvements\n- \u26a1 Performance optimizations\n- \ud83d\udcda Documentation and examples\n- \ud83d\udd0c New backend implementations\n- \ud83e\uddea Test coverage improvements\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n<div align=\"center\">\n <p>\n <strong>Built with \u2764\ufe0f for the AI community</strong>\n </p>\n <p>\n <em>Star \u2b50 this repo if CacheFuse helps you build better LLM applications!</em>\n </p>\n</div>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Enterprise-grade caching framework for LLM responses and embeddings",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [
"cache",
" llm",
" embeddings",
" ai",
" openai",
" sqlite",
" redis",
" caching",
" performance"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7303cc65ec277d05ee61bdfc582a5e724225653078a182cb40d4619b55ac05dd",
"md5": "e672930213205efde1f9aabff50de7f9",
"sha256": "14269a4a05851a08d3bbb0ccf337f56ebdbd94c53c1b43a7d2b2de4dc4525cae"
},
"downloads": -1,
"filename": "cachefuse-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e672930213205efde1f9aabff50de7f9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 29204,
"upload_time": "2025-08-18T20:19:37",
"upload_time_iso_8601": "2025-08-18T20:19:37.146954Z",
"url": "https://files.pythonhosted.org/packages/73/03/cc65ec277d05ee61bdfc582a5e724225653078a182cb40d4619b55ac05dd/cachefuse-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bf24d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8",
"md5": "1efdb51fb63fdd14fe56709b1dd9e62b",
"sha256": "2c0dfea9b764fea466cc51aefbe8a51be8af8118f0e8c36a329f72f33b1b88cf"
},
"downloads": -1,
"filename": "cachefuse-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "1efdb51fb63fdd14fe56709b1dd9e62b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 27008,
"upload_time": "2025-08-18T20:19:38",
"upload_time_iso_8601": "2025-08-18T20:19:38.386340Z",
"url": "https://files.pythonhosted.org/packages/bf/24/d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8/cachefuse-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-18 20:19:38",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "cachefuse"
}