https://www.notion.so/ml-infra/mega-base-cache-24291d247273805b8e20fe26677b7b0f
# B10 Transfer
PyTorch compilation cache for Baseten deployments.
## Usage
### Synchronous Operations (Blocking)
```python
import b10_transfer
# Inside model.load() function
def load():
# Load cache before torch.compile()
status = b10_transfer.load_compile_cache()
# ...
# Your model compilation
model = torch.compile(model)
# Warm up the model with dummy prompts, and arguments that would be typically used in your requests (e.g resolutions)
dummy_input = "What is the capital of France?"
model(dummy_input)
# ...
# Save cache after compilation
if status != b10_transfer.LoadStatus.SUCCESS:
b10_transfer.save_compile_cache()
```
### Asynchronous Operations (Non-blocking)
```python
import b10_transfer
def load_with_async_cache():
# Start async cache load (returns immediately with operation ID)
operation_id = b10_transfer.load_compile_cache_async()
# Check status periodically
while not b10_transfer.is_transfer_complete(operation_id):
status = b10_transfer.get_transfer_status(operation_id)
print(f"Cache load status: {status.status}")
time.sleep(1)
# Get final status
final_status = b10_transfer.get_transfer_status(operation_id)
if final_status.status == b10_transfer.AsyncTransferStatus.SUCCESS:
print("Cache loaded successfully!")
# Your model compilation...
model = torch.compile(model)
# Async save
save_op_id = b10_transfer.save_compile_cache_async()
# You can continue with other work while save happens in background
# Or wait for completion if needed
b10_transfer.wait_for_completion(save_op_id, timeout=300) # 5 minute timeout
# With progress callback
def on_progress(operation_id: str):
status = b10_transfer.get_transfer_status(operation_id)
print(f"Transfer {operation_id}: {status.status}")
operation_id = b10_transfer.load_compile_cache_async(progress_callback=on_progress)
```
### Generic Async Operations
You can also use the generic async system for custom transfer operations:
```python
import b10_transfer
from pathlib import Path
def my_custom_callback(source: Path, dest: Path):
# Your custom transfer logic here
# This could be any file operation, compression, etc.
shutil.copy2(source, dest)
# Start a generic async transfer
operation_id = b10_transfer.start_transfer_async(
source=Path("/source/file.txt"),
dest=Path("/dest/file.txt"),
callback=my_custom_callback,
operation_name="custom_file_copy",
monitor_local=True,
monitor_b10fs=False
)
# Use the same progress tracking as torch cache operations
b10_transfer.wait_for_completion(operation_id)
```
## Configuration
Configure via environment variables:
```bash
# Cache directories
export TORCH_CACHE_DIR="/tmp/torchinductor_root" # Default
export B10FS_CACHE_DIR="/cache/model/compile_cache" # Default
export LOCAL_WORK_DIR="/app" # Default
# Cache limits
export MAX_CACHE_SIZE_MB="1024" # 1GB default
```
## How It Works
### Environment-Specific Caching
The library automatically creates unique cache keys based on your environment:
```
torch-2.1.0_cuda-12.1_cc-8.6_triton-2.1.0 → cache_a1b2c3d4e5f6.latest.tar.gz
torch-2.0.1_cuda-11.8_cc-7.5_triton-2.0.1 → cache_x9y8z7w6v5u4.latest.tar.gz
torch-2.1.0_cpu_triton-none → cache_m1n2o3p4q5r6.latest.tar.gz
```
**Components used:**
- **PyTorch version** (e.g., `torch-2.1.0`)
- **CUDA version** (e.g., `cuda-12.1` or `cpu`)
- **GPU compute capability** (e.g., `cc-8.6` for A100)
- **Triton version** (e.g., `triton-2.1.0` or `triton-none`)
### Cache Workflow
1. **Load Phase** (startup): Generate environment key, check for matching cache in B10FS, extract to local directory
2. **Save Phase** (after compilation): Create archive, atomic copy to B10FS with environment-specific filename
### Lock-Free Race Prevention
Uses journal pattern with atomic filesystem operations for parallel-safe cache saves.
## API Reference
### Synchronous Functions
- `load_compile_cache() -> LoadStatus`: Load cache from B10FS for current environment
- `save_compile_cache() -> SaveStatus`: Save cache to B10FS with environment-specific filename
- `clear_local_cache() -> bool`: Clear local cache directory
- `get_cache_info() -> Dict[str, Any]`: Get cache status information for current environment
- `list_available_caches() -> Dict[str, Any]`: List all cache files with environment details
### Generic Asynchronous Functions
- `start_transfer_async(source, dest, callback, operation_name, **kwargs) -> str`: Start any async transfer operation
- `get_transfer_status(operation_id: str) -> TransferProgress`: Get current status of async operation
- `is_transfer_complete(operation_id: str) -> bool`: Check if async operation has completed
- `wait_for_completion(operation_id: str, timeout=None) -> bool`: Wait for async operation to complete
- `cancel_transfer(operation_id: str) -> bool`: Attempt to cancel running operation
- `list_active_transfers() -> Dict[str, TransferProgress]`: Get all active transfer operations
### Torch Cache Async Functions
- `load_compile_cache_async(progress_callback=None) -> str`: Start async cache load, returns operation ID
- `save_compile_cache_async(progress_callback=None) -> str`: Start async cache save, returns operation ID
### Status Enums
- `LoadStatus`: SUCCESS, ERROR, DOES_NOT_EXIST, SKIPPED
- `SaveStatus`: SUCCESS, ERROR, SKIPPED
- `AsyncTransferStatus`: NOT_STARTED, IN_PROGRESS, SUCCESS, ERROR, INTERRUPTED, CANCELLED
### Data Classes
- `TransferProgress`: Contains operation_id, status, started_at, completed_at, error_message
### Exceptions
- `CacheError`: Base exception for cache operations
- `CacheValidationError`: Path validation or compatibility check failed
- `CacheOperationInterrupted`: Operation interrupted due to insufficient disk space
## Performance Impact
### Debugging
Enable debug logging:
```python
import logging
logging.getLogger('b10_transfer').setLevel(logging.DEBUG)
```
Raw data
{
"_id": null,
"home_page": "https://docs.baseten.co/development/model/b10-transfer",
"name": "b10-transfer",
"maintainer": "Fred Liu",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": "fred.liu.noreply@baseten.co",
"keywords": "pytorch, torch.compile, cache, machine-learning, inference",
"author": "Shounak Ray",
"author_email": "shounak.noreply@baseten.co",
"download_url": "https://files.pythonhosted.org/packages/44/4f/feaabbec8402c54bef9bbf2e38952dfc836809388342f0ec90aa20911b43/b10_transfer-0.1.1.tar.gz",
"platform": null,
"description": "https://www.notion.so/ml-infra/mega-base-cache-24291d247273805b8e20fe26677b7b0f\n\n# B10 Transfer\n\nPyTorch compilation cache for Baseten deployments.\n\n## Usage\n\n### Synchronous Operations (Blocking)\n\n```python\nimport b10_transfer\n\n# Inside model.load() function\ndef load():\n # Load cache before torch.compile()\n status = b10_transfer.load_compile_cache()\n\n # ...\n\n # Your model compilation\n model = torch.compile(model)\n # Warm up the model with dummy prompts, and arguments that would be typically used in your requests (e.g resolutions)\n dummy_input = \"What is the capital of France?\"\n model(dummy_input)\n\n # ...\n\n # Save cache after compilation\n if status != b10_transfer.LoadStatus.SUCCESS:\n b10_transfer.save_compile_cache()\n```\n\n### Asynchronous Operations (Non-blocking)\n\n```python\nimport b10_transfer\n\ndef load_with_async_cache():\n # Start async cache load (returns immediately with operation ID)\n operation_id = b10_transfer.load_compile_cache_async()\n \n # Check status periodically\n while not b10_transfer.is_transfer_complete(operation_id):\n status = b10_transfer.get_transfer_status(operation_id)\n print(f\"Cache load status: {status.status}\")\n time.sleep(1)\n \n # Get final status\n final_status = b10_transfer.get_transfer_status(operation_id)\n if final_status.status == b10_transfer.AsyncTransferStatus.SUCCESS:\n print(\"Cache loaded successfully!\")\n \n # Your model compilation...\n model = torch.compile(model)\n \n # Async save\n save_op_id = b10_transfer.save_compile_cache_async()\n \n # You can continue with other work while save happens in background\n # Or wait for completion if needed\n b10_transfer.wait_for_completion(save_op_id, timeout=300) # 5 minute timeout\n\n# With progress callback\ndef on_progress(operation_id: str):\n status = b10_transfer.get_transfer_status(operation_id)\n print(f\"Transfer {operation_id}: {status.status}\")\n\noperation_id = b10_transfer.load_compile_cache_async(progress_callback=on_progress)\n```\n\n### Generic Async Operations\n\nYou can also use the generic async system for custom transfer operations:\n\n```python\nimport b10_transfer\nfrom pathlib import Path\n\ndef my_custom_callback(source: Path, dest: Path):\n # Your custom transfer logic here\n # This could be any file operation, compression, etc.\n shutil.copy2(source, dest)\n\n# Start a generic async transfer\noperation_id = b10_transfer.start_transfer_async(\n source=Path(\"/source/file.txt\"),\n dest=Path(\"/dest/file.txt\"),\n callback=my_custom_callback,\n operation_name=\"custom_file_copy\",\n monitor_local=True,\n monitor_b10fs=False\n)\n\n# Use the same progress tracking as torch cache operations\nb10_transfer.wait_for_completion(operation_id)\n```\n\n## Configuration\n\nConfigure via environment variables:\n\n```bash\n# Cache directories\nexport TORCH_CACHE_DIR=\"/tmp/torchinductor_root\" # Default\nexport B10FS_CACHE_DIR=\"/cache/model/compile_cache\" # Default \nexport LOCAL_WORK_DIR=\"/app\" # Default\n\n# Cache limits\nexport MAX_CACHE_SIZE_MB=\"1024\" # 1GB default\n```\n\n## How It Works\n\n### Environment-Specific Caching\n\nThe library automatically creates unique cache keys based on your environment:\n\n```\ntorch-2.1.0_cuda-12.1_cc-8.6_triton-2.1.0 \u2192 cache_a1b2c3d4e5f6.latest.tar.gz\ntorch-2.0.1_cuda-11.8_cc-7.5_triton-2.0.1 \u2192 cache_x9y8z7w6v5u4.latest.tar.gz\ntorch-2.1.0_cpu_triton-none \u2192 cache_m1n2o3p4q5r6.latest.tar.gz\n```\n\n**Components used:**\n- **PyTorch version** (e.g., `torch-2.1.0`)\n- **CUDA version** (e.g., `cuda-12.1` or `cpu`)\n- **GPU compute capability** (e.g., `cc-8.6` for A100)\n- **Triton version** (e.g., `triton-2.1.0` or `triton-none`)\n\n### Cache Workflow\n\n1. **Load Phase** (startup): Generate environment key, check for matching cache in B10FS, extract to local directory\n2. **Save Phase** (after compilation): Create archive, atomic copy to B10FS with environment-specific filename\n\n### Lock-Free Race Prevention \n\nUses journal pattern with atomic filesystem operations for parallel-safe cache saves.\n\n## API Reference\n\n### Synchronous Functions\n\n- `load_compile_cache() -> LoadStatus`: Load cache from B10FS for current environment\n- `save_compile_cache() -> SaveStatus`: Save cache to B10FS with environment-specific filename\n- `clear_local_cache() -> bool`: Clear local cache directory\n- `get_cache_info() -> Dict[str, Any]`: Get cache status information for current environment\n- `list_available_caches() -> Dict[str, Any]`: List all cache files with environment details\n\n### Generic Asynchronous Functions\n\n- `start_transfer_async(source, dest, callback, operation_name, **kwargs) -> str`: Start any async transfer operation\n- `get_transfer_status(operation_id: str) -> TransferProgress`: Get current status of async operation\n- `is_transfer_complete(operation_id: str) -> bool`: Check if async operation has completed\n- `wait_for_completion(operation_id: str, timeout=None) -> bool`: Wait for async operation to complete\n- `cancel_transfer(operation_id: str) -> bool`: Attempt to cancel running operation\n- `list_active_transfers() -> Dict[str, TransferProgress]`: Get all active transfer operations\n\n### Torch Cache Async Functions\n\n- `load_compile_cache_async(progress_callback=None) -> str`: Start async cache load, returns operation ID\n- `save_compile_cache_async(progress_callback=None) -> str`: Start async cache save, returns operation ID\n\n### Status Enums\n\n- `LoadStatus`: SUCCESS, ERROR, DOES_NOT_EXIST, SKIPPED\n- `SaveStatus`: SUCCESS, ERROR, SKIPPED \n- `AsyncTransferStatus`: NOT_STARTED, IN_PROGRESS, SUCCESS, ERROR, INTERRUPTED, CANCELLED\n\n### Data Classes\n\n- `TransferProgress`: Contains operation_id, status, started_at, completed_at, error_message\n\n### Exceptions\n\n- `CacheError`: Base exception for cache operations\n- `CacheValidationError`: Path validation or compatibility check failed\n- `CacheOperationInterrupted`: Operation interrupted due to insufficient disk space\n\n## Performance Impact\n\n### Debugging\n\nEnable debug logging:\n\n```python\nimport logging\nlogging.getLogger('b10_transfer').setLevel(logging.DEBUG)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distributed PyTorch compilation cache for Baseten - Environment-aware, lock-free compilation cache management",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://docs.baseten.co/development/model/b10-transfer",
"Homepage": "https://docs.baseten.co/development/model/b10-transfer",
"Repository": "https://pypi.org/project/b10-transfer/"
},
"split_keywords": [
"pytorch",
" torch.compile",
" cache",
" machine-learning",
" inference"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "77db1c8c1c4d5ac502fdb99a61afebb875bfd50ab25fd92403bf625bc0de0a7c",
"md5": "7208410c1e0ef948c2f538386bb67cb4",
"sha256": "13095fb7ba7c5f1237fdd4df3142e476557aafc09a99df33c3aca8243f00d38a"
},
"downloads": -1,
"filename": "b10_transfer-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7208410c1e0ef948c2f538386bb67cb4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 30656,
"upload_time": "2025-08-29T23:29:55",
"upload_time_iso_8601": "2025-08-29T23:29:55.989134Z",
"url": "https://files.pythonhosted.org/packages/77/db/1c8c1c4d5ac502fdb99a61afebb875bfd50ab25fd92403bf625bc0de0a7c/b10_transfer-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "444ffeaabbec8402c54bef9bbf2e38952dfc836809388342f0ec90aa20911b43",
"md5": "a7f489dafbfa65f0d8ffe62ee932a9f9",
"sha256": "89c803f37565f549de7277d9ec87555f27a72fffc014ac3e8f17fd78bb802f45"
},
"downloads": -1,
"filename": "b10_transfer-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "a7f489dafbfa65f0d8ffe62ee932a9f9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 26117,
"upload_time": "2025-08-29T23:29:56",
"upload_time_iso_8601": "2025-08-29T23:29:56.810683Z",
"url": "https://files.pythonhosted.org/packages/44/4f/feaabbec8402c54bef9bbf2e38952dfc836809388342f0ec90aa20911b43/b10_transfer-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 23:29:56",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "b10-transfer"
}