deltaglider


Namedeltaglider JSON
Version 4.2.2 PyPI version JSON
download
home_pageNone
SummaryStore 4TB in 5GB: S3-compatible storage with 99.9% compression for versioned files
upload_time2025-10-06 21:13:00
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords s3 compression delta storage backup deduplication xdelta3 binary-diff artifact-storage version-control minio aws cost-optimization devops
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DeltaGlider

[![PyPI version](https://badge.fury.io/py/deltaglider.svg)](https://pypi.org/project/deltaglider/)
[![GitHub Repository](https://img.shields.io/badge/github-deltaglider-blue.svg)](https://github.com/beshu-tech/deltaglider)
[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](https://github.com/jmacd/xdelta)

<div align="center">
  <img src="https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png" alt="DeltaGlider Logo" width="500"/>
</div>

**Store 4TB of similar files in 5GB. No, that's not a typo.**

DeltaGlider is a drop-in S3 replacement that may achieve 99.9% size reduction for versioned compressed artifacts, backups, and release archives through intelligent binary delta compression (via xdelta3).

## The Problem We Solved

You're storing hundreds of versions of your software releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.

Sound familiar?

## Real-World Impact

From our [ReadOnlyREST case study](docs/case-study-readonlyrest.md):
- **Before**: 201,840 files, 3.96TB storage, $1,120/year
- **After**: Same files, 4.9GB storage, $1.32/year
- **Compression**: 99.9% (not a typo)
- **Integration time**: 5 minutes

## Quick Start

The quickest way to start is using the GUI
* https://github.com/sscarduzio/dg_commander/

### CLI Installation

```bash
# Via pip (Python 3.11+)
pip install deltaglider

# Via uv (faster)
uv pip install deltaglider

# Via Docker
docker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help
```

### Basic Usage

```bash
# Upload a file (automatic delta compression)
deltaglider cp my-app-v1.0.0.zip s3://releases/

# Download a file (automatic delta reconstruction)
deltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip

# List objects
deltaglider ls s3://releases/

# Sync directories
deltaglider sync ./dist/ s3://releases/v1.0.0/
```

**That's it!** DeltaGlider automatically detects similar files and applies 99%+ compression. For more commands and options, see [CLI Reference](#cli-reference).

## Core Concepts

### How It Works

```
Traditional S3:
  v1.0.0.zip (100MB) → S3: 100MB
  v1.0.1.zip (100MB) → S3: 100MB (200MB total)
  v1.0.2.zip (100MB) → S3: 100MB (300MB total)

With DeltaGlider:
  v1.0.0.zip (100MB) → S3: 100MB reference + 0KB delta
  v1.0.1.zip (100MB) → S3: 98KB delta (100.1MB total)
  v1.0.2.zip (100MB) → S3: 97KB delta (100.3MB total)
```

DeltaGlider stores the first file as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.

### Intelligent File Type Detection

DeltaGlider automatically detects file types and applies the optimal strategy:

| File Type | Strategy | Typical Compression | Why It Works |
|-----------|----------|---------------------|--------------|
| `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | Archive structure remains consistent between versions |
| `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | Package formats with predictable structure |
| `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | Java archives with mostly unchanged classes |
| `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | Compiled code changes unpredictably |
| `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | Text files benefit more from standard compression |
| `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | Hash files are unique by design |

### Key Features

- **AWS CLI Replacement**: Same commands as `aws s3` with automatic compression
- **boto3-Compatible SDK**: Works with existing boto3 code with minimal changes
- **Zero Configuration**: No databases, no manifest files, no complex setup
- **Data Integrity**: SHA256 verification on every operation
- **S3 Compatible**: Works with AWS S3, MinIO, Cloudflare R2, and any S3-compatible storage

## CLI Reference

### All Commands

```bash
# Copy files to/from S3 (automatic delta compression for archives)
deltaglider cp my-app-v1.0.0.zip s3://releases/
deltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip

# Recursive directory operations
deltaglider cp -r ./dist/ s3://releases/v1.0.0/
deltaglider cp -r s3://releases/v1.0.0/ ./local-copy/

# List buckets and objects
deltaglider ls                                    # List all buckets
deltaglider ls s3://releases/                     # List objects
deltaglider ls -r s3://releases/                  # Recursive listing
deltaglider ls -h --summarize s3://releases/      # Human-readable with summary

# Remove objects
deltaglider rm s3://releases/old-version.zip      # Remove single object
deltaglider rm -r s3://releases/old/              # Recursive removal
deltaglider rm --dryrun s3://releases/test.zip    # Preview deletion

# Sync directories (only transfers changes)
deltaglider sync ./local-dir/ s3://releases/      # Sync to S3
deltaglider sync s3://releases/ ./local-backup/   # Sync from S3
deltaglider sync --delete ./src/ s3://backup/     # Mirror exactly
deltaglider sync --exclude "*.log" ./src/ s3://backup/  # Exclude patterns

# Works with MinIO, R2, and S3-compatible storage
deltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000
```

### Command Flags

```bash
# All standard AWS flags work
deltaglider cp file.zip s3://bucket/ \
  --endpoint-url http://localhost:9000 \
  --profile production \
  --region us-west-2

# DeltaGlider-specific flags
deltaglider cp file.zip s3://bucket/ \
  --no-delta              # Disable compression for specific files
  --max-ratio 0.8         # Only use delta if compression > 20%
```

### CI/CD Integration

#### GitHub Actions

```yaml
- name: Upload Release with 99% compression
  run: |
    pip install deltaglider
    deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/
    # Or recursive for entire directories
    deltaglider cp -r dist/ s3://releases/${{ github.ref_name }}/
```

#### Daily Backup Script

```bash
#!/bin/bash
# Daily backup with automatic deduplication
tar -czf backup-$(date +%Y%m%d).tar.gz /data
deltaglider cp backup-*.tar.gz s3://backups/
# Only changes are stored, not full backup

# Clean up old backups
deltaglider rm -r s3://backups/2023/
```

## Python SDK

**[📚 Full SDK Documentation](docs/sdk/README.md)** | **[API Reference](docs/sdk/api.md)** | **[Examples](docs/sdk/examples.md)** | **[boto3 Compatibility Guide](BOTO3_COMPATIBILITY.md)**

### boto3-Compatible API (Recommended)

DeltaGlider provides a **boto3-compatible API** for core S3 operations (21 methods covering 80% of use cases):

```python
from deltaglider import create_client

# Drop-in replacement for boto3.client('s3')
client = create_client()  # Uses AWS credentials automatically

# Identical to boto3 S3 API - just works with 99% compression!
response = client.put_object(
    Bucket='releases',
    Key='v2.0.0/my-app.zip',
    Body=open('my-app-v2.0.0.zip', 'rb')
)
print(f"Stored with ETag: {response['ETag']}")

# Standard boto3 get_object - handles delta reconstruction automatically
response = client.get_object(Bucket='releases', Key='v2.0.0/my-app.zip')
with open('downloaded.zip', 'wb') as f:
    f.write(response['Body'].read())

# Smart list_objects with optimized performance
response = client.list_objects(Bucket='releases', Prefix='v2.0.0/')

# Paginated listing for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.is_truncated:
    response = client.list_objects(
        Bucket='releases',
        MaxKeys=100,
        ContinuationToken=response.next_continuation_token
    )

# Delete and inspect objects
client.delete_object(Bucket='releases', Key='old-version.zip')
client.head_object(Bucket='releases', Key='v2.0.0/my-app.zip')
```

### Bucket Management

**No boto3 required!** DeltaGlider provides complete bucket management:

```python
from deltaglider import create_client

client = create_client()

# Create buckets
client.create_bucket(Bucket='my-releases')

# Create bucket in specific region (AWS only)
client.create_bucket(
    Bucket='my-regional-bucket',
    CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
)

# List all buckets
response = client.list_buckets()
for bucket in response['Buckets']:
    print(f"{bucket['Name']} - {bucket['CreationDate']}")

# Delete bucket (must be empty)
client.delete_bucket(Bucket='my-old-bucket')
```

See [examples/bucket_management.py](examples/bucket_management.py) for complete example.

### Simple API (Alternative)

For simpler use cases, DeltaGlider also provides a streamlined API:

```python
from deltaglider import create_client

client = create_client()

# Simple upload with automatic compression detection
summary = client.upload("my-app-v2.0.0.zip", "s3://releases/v2.0.0/")
print(f"Compressed from {summary.original_size_mb:.1f}MB to {summary.stored_size_mb:.1f}MB")
print(f"Saved {summary.savings_percent:.0f}% storage space")

# Simple download with automatic delta reconstruction
client.download("s3://releases/v2.0.0/my-app-v2.0.0.zip", "local-app.zip")
```

### Real-World Examples

#### Software Release Storage

```python
from deltaglider import create_client

client = create_client()

# Upload multiple versions
versions = ["v1.0.0", "v1.0.1", "v1.0.2", "v1.1.0"]
for version in versions:
    with open(f"dist/my-app-{version}.zip", 'rb') as f:
        response = client.put_object(
            Bucket='releases',
            Key=f'{version}/my-app-{version}.zip',
            Body=f,
            Metadata={'version': version, 'build': 'production'}
        )

    # Check compression stats (DeltaGlider extension)
    if 'DeltaGliderInfo' in response:
        info = response['DeltaGliderInfo']
        if info.get('IsDelta'):
            print(f"{version}: Stored as {info['StoredSizeMB']:.1f}MB delta "
                  f"(saved {info['SavingsPercent']:.0f}%)")
        else:
            print(f"{version}: Stored as reference ({info['OriginalSizeMB']:.1f}MB)")

# Result:
# v1.0.0: Stored as reference (100.0MB)
# v1.0.1: Stored as 0.2MB delta (saved 99.8%)
# v1.0.2: Stored as 0.3MB delta (saved 99.7%)
# v1.1.0: Stored as 5.2MB delta (saved 94.8%)
```

#### Automated Database Backup

```python
from datetime import datetime
from deltaglider import create_client

client = create_client(endpoint_url="http://minio.internal:9000")

def backup_database():
    """Daily database backup with automatic deduplication."""
    date = datetime.now().strftime("%Y%m%d")
    dump_file = f"backup-{date}.sql.gz"

    # Upload using boto3-compatible API
    with open(dump_file, 'rb') as f:
        response = client.put_object(
            Bucket='backups',
            Key=f'postgres/{date}/{dump_file}',
            Body=f,
            Tagging='type=daily&database=production',
            Metadata={'date': date, 'source': 'production'}
        )

    # Check compression effectiveness
    if 'DeltaGliderInfo' in response:
        info = response['DeltaGliderInfo']
        if info['DeltaRatio'] > 0.1:
            print(f"Warning: Low compression ({info['SavingsPercent']:.0f}%), "
                  "database might have significant changes")
        print(f"Backup stored: {info['StoredSizeMB']:.1f}MB "
              f"(compressed from {info['OriginalSizeMB']:.1f}MB)")

backup_database()
```

For more examples and detailed API documentation, see the [SDK Documentation](docs/sdk/README.md).

## Performance & Benchmarks

### Real-World Results

Testing with 513 Elasticsearch plugin releases (82.5MB each):

```
Original size:       42.3 GB
DeltaGlider size:    115 MB
Compression:         99.7%
Upload speed:        3-4 files/second
Download speed:      <100ms reconstruction
```

### The Math

For `N` versions of a `S` MB file with `D%` difference between versions:

**Traditional S3**: `N × S` MB
**DeltaGlider**: `S + (N-1) × S × D%` MB

Example: 100 versions of 100MB files with 1% difference:
- **Traditional**: 10,000 MB
- **DeltaGlider**: 199 MB
- **Savings**: 98%

### Comparison

| Solution | Compression | Speed | Integration | Cost |
|----------|------------|-------|-------------|------|
| **DeltaGlider** | 99%+ | Fast | Drop-in | Open source |
| S3 Versioning | 0% | Native | Built-in | $$ per version |
| Deduplication | 30-50% | Slow | Complex | Enterprise $$$ |
| Git LFS | Good | Slow | Git-only | $ per GB |
| Restic/Borg | 80-90% | Medium | Backup-only | Open source |

## Architecture & Technical Deep Dive

### Why xdelta3 Excels at Archive Compression

Traditional diff algorithms (like `diff` or `git diff`) work line-by-line on text files. Binary diff tools like `bsdiff` or `courgette` are optimized for executables. But **xdelta3** is uniquely suited for compressed archives because:

1. **Block-level matching**: xdelta3 uses a rolling hash algorithm to find matching byte sequences at any offset, not just line boundaries. This is crucial for archives where small file changes can shift all subsequent byte positions.

2. **Large window support**: xdelta3 can use reference windows up to 2GB, allowing it to find matches even when content has moved significantly within the archive. Other delta algorithms typically use much smaller windows (64KB-1MB).

3. **Compression-aware**: When you update one file in a ZIP/TAR archive, the archive format itself remains largely identical - same compression dictionary, same structure. xdelta3 preserves these similarities while other algorithms might miss them.

4. **Format agnostic**: Unlike specialized tools (e.g., `courgette` for Chrome updates), xdelta3 works on raw bytes without understanding the file format, making it perfect for any archive type.

#### Real-World Example

When you rebuild a JAR file with one class changed:
- **Text diff**: 100% different (it's binary data!)
- **bsdiff**: ~30-40% of original size (optimized for executables, not archives)
- **xdelta3**: ~0.1-1% of original size (finds the unchanged parts regardless of position)

This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta3 can identify that 99% of the archive structure and content remains identical between versions.

### System Architecture

DeltaGlider uses a clean hexagonal architecture:

```
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Your App  │────▶│ DeltaGlider  │────▶│  S3/MinIO   │
│   (CLI/SDK) │     │    Core      │     │   Storage   │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                    ┌──────▼───────┐
                    │ Local Cache  │
                    │ (References) │
                    └──────────────┘
```

**Key Components:**
- **Binary diff engine**: xdelta3 for optimal compression
- **Intelligent routing**: Automatic file type detection
- **Integrity verification**: SHA256 on every operation
- **Local caching**: Fast repeated operations
- **Zero dependencies**: No database, no manifest files

### When to Use DeltaGlider

✅ **Perfect for:**
- Software releases and versioned artifacts
- Container images and layers
- Database backups and snapshots
- Machine learning model checkpoints
- Game assets and updates
- Any versioned binary data

❌ **Not ideal for:**
- Already compressed **unique** files
- Streaming or multimedia files
- Frequently changing unstructured data
- Files smaller than 1MB

## Migration from AWS CLI

Migrating from `aws s3` to `deltaglider` is as simple as changing the command name:

| AWS CLI | DeltaGlider | Compression Benefit |
|---------|------------|---------------------|
| `aws s3 cp file.zip s3://bucket/` | `deltaglider cp file.zip s3://bucket/` | ✅ 99% for similar files |
| `aws s3 cp -r dir/ s3://bucket/` | `deltaglider cp -r dir/ s3://bucket/` | ✅ 99% for archives |
| `aws s3 ls s3://bucket/` | `deltaglider ls s3://bucket/` | - |
| `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - |
| `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | ✅ 99% incremental |

## Production Ready

- ✅ **Battle tested**: 200K+ files in production
- ✅ **Data integrity**: SHA256 verification on every operation
- ✅ **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc.
- ✅ **Atomic operations**: No partial states
- ✅ **Concurrent safe**: Multiple clients supported
- ✅ **Well tested**: 95%+ code coverage

## Development

```bash
# Clone the repo
git clone https://github.com/beshu-tech/deltaglider
cd deltaglider

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run pytest

# Run with local MinIO
docker-compose up -d
export AWS_ENDPOINT_URL=http://localhost:9000
deltaglider cp test.zip s3://test/
```

## FAQ

**Q: What if my reference file gets corrupted?**
A: Every operation includes SHA256 verification. Corruption is detected immediately.

**Q: How fast is reconstruction?**
A: Sub-100ms for typical files. The delta is applied in-memory using xdelta3.

**Q: Can I use this with existing S3 data?**
A: Yes! DeltaGlider can start optimizing new uploads immediately. Old data remains accessible.

**Q: What's the overhead for unique files?**
A: Zero. Files without similarity are uploaded directly.

**Q: Is this compatible with S3 encryption?**
A: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies.

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

Key areas we're exploring:
- Cloud-native reference management
- Rust implementation for 10x speed
- Automatic similarity detection
- Multi-threaded delta generation
- WASM support for browser usage

## License

MIT - Use it freely in your projects.

## Success Stories

> "We reduced our artifact storage from 4TB to 5GB. This isn't hyperbole—it's math."
> — [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md)

> "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds."
> — Platform Engineer at [redacted]

> "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year."
> — CTO at [stealth startup]

---

**Try it now**: Got versioned files in S3? See your potential savings:

```bash
# Analyze your S3 bucket
deltaglider analyze s3://your-bucket/
# Output: "Potential savings: 95.2% (4.8TB → 237GB)"
```

Built with ❤️ by engineers who were tired of paying to store the same bytes over and over.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "deltaglider",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Beshu Tech Team <info@beshu.tech>",
    "keywords": "s3, compression, delta, storage, backup, deduplication, xdelta3, binary-diff, artifact-storage, version-control, minio, aws, cost-optimization, devops",
    "author": null,
    "author_email": "Beshu Tech <info@beshu.tech>",
    "download_url": "https://files.pythonhosted.org/packages/1f/30/29e0ac4f03b273b460b1faeb7c34b2c393b1d21e4e7af519bdbf1ebc1a8c/deltaglider-4.2.2.tar.gz",
    "platform": null,
    "description": "# DeltaGlider\n\n[![PyPI version](https://badge.fury.io/py/deltaglider.svg)](https://pypi.org/project/deltaglider/)\n[![GitHub Repository](https://img.shields.io/badge/github-deltaglider-blue.svg)](https://github.com/beshu-tech/deltaglider)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](https://github.com/jmacd/xdelta)\n\n<div align=\"center\">\n  <img src=\"https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png\" alt=\"DeltaGlider Logo\" width=\"500\"/>\n</div>\n\n**Store 4TB of similar files in 5GB. No, that's not a typo.**\n\nDeltaGlider is a drop-in S3 replacement that may achieve 99.9% size reduction for versioned compressed artifacts, backups, and release archives through intelligent binary delta compression (via xdelta3).\n\n## The Problem We Solved\n\nYou're storing hundreds of versions of your software releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.\n\nSound familiar?\n\n## Real-World Impact\n\nFrom our [ReadOnlyREST case study](docs/case-study-readonlyrest.md):\n- **Before**: 201,840 files, 3.96TB storage, $1,120/year\n- **After**: Same files, 4.9GB storage, $1.32/year\n- **Compression**: 99.9% (not a typo)\n- **Integration time**: 5 minutes\n\n## Quick Start\n\nThe quickest way to start is using the GUI\n* https://github.com/sscarduzio/dg_commander/\n\n### CLI Installation\n\n```bash\n# Via pip (Python 3.11+)\npip install deltaglider\n\n# Via uv (faster)\nuv pip install deltaglider\n\n# Via Docker\ndocker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help\n```\n\n### Basic Usage\n\n```bash\n# Upload a file (automatic delta compression)\ndeltaglider cp my-app-v1.0.0.zip s3://releases/\n\n# Download a file (automatic delta reconstruction)\ndeltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip\n\n# List objects\ndeltaglider ls s3://releases/\n\n# Sync directories\ndeltaglider sync ./dist/ s3://releases/v1.0.0/\n```\n\n**That's it!** DeltaGlider automatically detects similar files and applies 99%+ compression. For more commands and options, see [CLI Reference](#cli-reference).\n\n## Core Concepts\n\n### How It Works\n\n```\nTraditional S3:\n  v1.0.0.zip (100MB) \u2192 S3: 100MB\n  v1.0.1.zip (100MB) \u2192 S3: 100MB (200MB total)\n  v1.0.2.zip (100MB) \u2192 S3: 100MB (300MB total)\n\nWith DeltaGlider:\n  v1.0.0.zip (100MB) \u2192 S3: 100MB reference + 0KB delta\n  v1.0.1.zip (100MB) \u2192 S3: 98KB delta (100.1MB total)\n  v1.0.2.zip (100MB) \u2192 S3: 97KB delta (100.3MB total)\n```\n\nDeltaGlider stores the first file as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.\n\n### Intelligent File Type Detection\n\nDeltaGlider automatically detects file types and applies the optimal strategy:\n\n| File Type | Strategy | Typical Compression | Why It Works |\n|-----------|----------|---------------------|--------------|\n| `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | Archive structure remains consistent between versions |\n| `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | Package formats with predictable structure |\n| `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | Java archives with mostly unchanged classes |\n| `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | Compiled code changes unpredictably |\n| `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | Text files benefit more from standard compression |\n| `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | Hash files are unique by design |\n\n### Key Features\n\n- **AWS CLI Replacement**: Same commands as `aws s3` with automatic compression\n- **boto3-Compatible SDK**: Works with existing boto3 code with minimal changes\n- **Zero Configuration**: No databases, no manifest files, no complex setup\n- **Data Integrity**: SHA256 verification on every operation\n- **S3 Compatible**: Works with AWS S3, MinIO, Cloudflare R2, and any S3-compatible storage\n\n## CLI Reference\n\n### All Commands\n\n```bash\n# Copy files to/from S3 (automatic delta compression for archives)\ndeltaglider cp my-app-v1.0.0.zip s3://releases/\ndeltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip\n\n# Recursive directory operations\ndeltaglider cp -r ./dist/ s3://releases/v1.0.0/\ndeltaglider cp -r s3://releases/v1.0.0/ ./local-copy/\n\n# List buckets and objects\ndeltaglider ls                                    # List all buckets\ndeltaglider ls s3://releases/                     # List objects\ndeltaglider ls -r s3://releases/                  # Recursive listing\ndeltaglider ls -h --summarize s3://releases/      # Human-readable with summary\n\n# Remove objects\ndeltaglider rm s3://releases/old-version.zip      # Remove single object\ndeltaglider rm -r s3://releases/old/              # Recursive removal\ndeltaglider rm --dryrun s3://releases/test.zip    # Preview deletion\n\n# Sync directories (only transfers changes)\ndeltaglider sync ./local-dir/ s3://releases/      # Sync to S3\ndeltaglider sync s3://releases/ ./local-backup/   # Sync from S3\ndeltaglider sync --delete ./src/ s3://backup/     # Mirror exactly\ndeltaglider sync --exclude \"*.log\" ./src/ s3://backup/  # Exclude patterns\n\n# Works with MinIO, R2, and S3-compatible storage\ndeltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000\n```\n\n### Command Flags\n\n```bash\n# All standard AWS flags work\ndeltaglider cp file.zip s3://bucket/ \\\n  --endpoint-url http://localhost:9000 \\\n  --profile production \\\n  --region us-west-2\n\n# DeltaGlider-specific flags\ndeltaglider cp file.zip s3://bucket/ \\\n  --no-delta              # Disable compression for specific files\n  --max-ratio 0.8         # Only use delta if compression > 20%\n```\n\n### CI/CD Integration\n\n#### GitHub Actions\n\n```yaml\n- name: Upload Release with 99% compression\n  run: |\n    pip install deltaglider\n    deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/\n    # Or recursive for entire directories\n    deltaglider cp -r dist/ s3://releases/${{ github.ref_name }}/\n```\n\n#### Daily Backup Script\n\n```bash\n#!/bin/bash\n# Daily backup with automatic deduplication\ntar -czf backup-$(date +%Y%m%d).tar.gz /data\ndeltaglider cp backup-*.tar.gz s3://backups/\n# Only changes are stored, not full backup\n\n# Clean up old backups\ndeltaglider rm -r s3://backups/2023/\n```\n\n## Python SDK\n\n**[\ud83d\udcda Full SDK Documentation](docs/sdk/README.md)** | **[API Reference](docs/sdk/api.md)** | **[Examples](docs/sdk/examples.md)** | **[boto3 Compatibility Guide](BOTO3_COMPATIBILITY.md)**\n\n### boto3-Compatible API (Recommended)\n\nDeltaGlider provides a **boto3-compatible API** for core S3 operations (21 methods covering 80% of use cases):\n\n```python\nfrom deltaglider import create_client\n\n# Drop-in replacement for boto3.client('s3')\nclient = create_client()  # Uses AWS credentials automatically\n\n# Identical to boto3 S3 API - just works with 99% compression!\nresponse = client.put_object(\n    Bucket='releases',\n    Key='v2.0.0/my-app.zip',\n    Body=open('my-app-v2.0.0.zip', 'rb')\n)\nprint(f\"Stored with ETag: {response['ETag']}\")\n\n# Standard boto3 get_object - handles delta reconstruction automatically\nresponse = client.get_object(Bucket='releases', Key='v2.0.0/my-app.zip')\nwith open('downloaded.zip', 'wb') as f:\n    f.write(response['Body'].read())\n\n# Smart list_objects with optimized performance\nresponse = client.list_objects(Bucket='releases', Prefix='v2.0.0/')\n\n# Paginated listing for large buckets\nresponse = client.list_objects(Bucket='releases', MaxKeys=100)\nwhile response.is_truncated:\n    response = client.list_objects(\n        Bucket='releases',\n        MaxKeys=100,\n        ContinuationToken=response.next_continuation_token\n    )\n\n# Delete and inspect objects\nclient.delete_object(Bucket='releases', Key='old-version.zip')\nclient.head_object(Bucket='releases', Key='v2.0.0/my-app.zip')\n```\n\n### Bucket Management\n\n**No boto3 required!** DeltaGlider provides complete bucket management:\n\n```python\nfrom deltaglider import create_client\n\nclient = create_client()\n\n# Create buckets\nclient.create_bucket(Bucket='my-releases')\n\n# Create bucket in specific region (AWS only)\nclient.create_bucket(\n    Bucket='my-regional-bucket',\n    CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}\n)\n\n# List all buckets\nresponse = client.list_buckets()\nfor bucket in response['Buckets']:\n    print(f\"{bucket['Name']} - {bucket['CreationDate']}\")\n\n# Delete bucket (must be empty)\nclient.delete_bucket(Bucket='my-old-bucket')\n```\n\nSee [examples/bucket_management.py](examples/bucket_management.py) for complete example.\n\n### Simple API (Alternative)\n\nFor simpler use cases, DeltaGlider also provides a streamlined API:\n\n```python\nfrom deltaglider import create_client\n\nclient = create_client()\n\n# Simple upload with automatic compression detection\nsummary = client.upload(\"my-app-v2.0.0.zip\", \"s3://releases/v2.0.0/\")\nprint(f\"Compressed from {summary.original_size_mb:.1f}MB to {summary.stored_size_mb:.1f}MB\")\nprint(f\"Saved {summary.savings_percent:.0f}% storage space\")\n\n# Simple download with automatic delta reconstruction\nclient.download(\"s3://releases/v2.0.0/my-app-v2.0.0.zip\", \"local-app.zip\")\n```\n\n### Real-World Examples\n\n#### Software Release Storage\n\n```python\nfrom deltaglider import create_client\n\nclient = create_client()\n\n# Upload multiple versions\nversions = [\"v1.0.0\", \"v1.0.1\", \"v1.0.2\", \"v1.1.0\"]\nfor version in versions:\n    with open(f\"dist/my-app-{version}.zip\", 'rb') as f:\n        response = client.put_object(\n            Bucket='releases',\n            Key=f'{version}/my-app-{version}.zip',\n            Body=f,\n            Metadata={'version': version, 'build': 'production'}\n        )\n\n    # Check compression stats (DeltaGlider extension)\n    if 'DeltaGliderInfo' in response:\n        info = response['DeltaGliderInfo']\n        if info.get('IsDelta'):\n            print(f\"{version}: Stored as {info['StoredSizeMB']:.1f}MB delta \"\n                  f\"(saved {info['SavingsPercent']:.0f}%)\")\n        else:\n            print(f\"{version}: Stored as reference ({info['OriginalSizeMB']:.1f}MB)\")\n\n# Result:\n# v1.0.0: Stored as reference (100.0MB)\n# v1.0.1: Stored as 0.2MB delta (saved 99.8%)\n# v1.0.2: Stored as 0.3MB delta (saved 99.7%)\n# v1.1.0: Stored as 5.2MB delta (saved 94.8%)\n```\n\n#### Automated Database Backup\n\n```python\nfrom datetime import datetime\nfrom deltaglider import create_client\n\nclient = create_client(endpoint_url=\"http://minio.internal:9000\")\n\ndef backup_database():\n    \"\"\"Daily database backup with automatic deduplication.\"\"\"\n    date = datetime.now().strftime(\"%Y%m%d\")\n    dump_file = f\"backup-{date}.sql.gz\"\n\n    # Upload using boto3-compatible API\n    with open(dump_file, 'rb') as f:\n        response = client.put_object(\n            Bucket='backups',\n            Key=f'postgres/{date}/{dump_file}',\n            Body=f,\n            Tagging='type=daily&database=production',\n            Metadata={'date': date, 'source': 'production'}\n        )\n\n    # Check compression effectiveness\n    if 'DeltaGliderInfo' in response:\n        info = response['DeltaGliderInfo']\n        if info['DeltaRatio'] > 0.1:\n            print(f\"Warning: Low compression ({info['SavingsPercent']:.0f}%), \"\n                  \"database might have significant changes\")\n        print(f\"Backup stored: {info['StoredSizeMB']:.1f}MB \"\n              f\"(compressed from {info['OriginalSizeMB']:.1f}MB)\")\n\nbackup_database()\n```\n\nFor more examples and detailed API documentation, see the [SDK Documentation](docs/sdk/README.md).\n\n## Performance & Benchmarks\n\n### Real-World Results\n\nTesting with 513 Elasticsearch plugin releases (82.5MB each):\n\n```\nOriginal size:       42.3 GB\nDeltaGlider size:    115 MB\nCompression:         99.7%\nUpload speed:        3-4 files/second\nDownload speed:      <100ms reconstruction\n```\n\n### The Math\n\nFor `N` versions of a `S` MB file with `D%` difference between versions:\n\n**Traditional S3**: `N \u00d7 S` MB\n**DeltaGlider**: `S + (N-1) \u00d7 S \u00d7 D%` MB\n\nExample: 100 versions of 100MB files with 1% difference:\n- **Traditional**: 10,000 MB\n- **DeltaGlider**: 199 MB\n- **Savings**: 98%\n\n### Comparison\n\n| Solution | Compression | Speed | Integration | Cost |\n|----------|------------|-------|-------------|------|\n| **DeltaGlider** | 99%+ | Fast | Drop-in | Open source |\n| S3 Versioning | 0% | Native | Built-in | $$ per version |\n| Deduplication | 30-50% | Slow | Complex | Enterprise $$$ |\n| Git LFS | Good | Slow | Git-only | $ per GB |\n| Restic/Borg | 80-90% | Medium | Backup-only | Open source |\n\n## Architecture & Technical Deep Dive\n\n### Why xdelta3 Excels at Archive Compression\n\nTraditional diff algorithms (like `diff` or `git diff`) work line-by-line on text files. Binary diff tools like `bsdiff` or `courgette` are optimized for executables. But **xdelta3** is uniquely suited for compressed archives because:\n\n1. **Block-level matching**: xdelta3 uses a rolling hash algorithm to find matching byte sequences at any offset, not just line boundaries. This is crucial for archives where small file changes can shift all subsequent byte positions.\n\n2. **Large window support**: xdelta3 can use reference windows up to 2GB, allowing it to find matches even when content has moved significantly within the archive. Other delta algorithms typically use much smaller windows (64KB-1MB).\n\n3. **Compression-aware**: When you update one file in a ZIP/TAR archive, the archive format itself remains largely identical - same compression dictionary, same structure. xdelta3 preserves these similarities while other algorithms might miss them.\n\n4. **Format agnostic**: Unlike specialized tools (e.g., `courgette` for Chrome updates), xdelta3 works on raw bytes without understanding the file format, making it perfect for any archive type.\n\n#### Real-World Example\n\nWhen you rebuild a JAR file with one class changed:\n- **Text diff**: 100% different (it's binary data!)\n- **bsdiff**: ~30-40% of original size (optimized for executables, not archives)\n- **xdelta3**: ~0.1-1% of original size (finds the unchanged parts regardless of position)\n\nThis is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta3 can identify that 99% of the archive structure and content remains identical between versions.\n\n### System Architecture\n\nDeltaGlider uses a clean hexagonal architecture:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502   Your App  \u2502\u2500\u2500\u2500\u2500\u25b6\u2502 DeltaGlider  \u2502\u2500\u2500\u2500\u2500\u25b6\u2502  S3/MinIO   \u2502\n\u2502   (CLI/SDK) \u2502     \u2502    Core      \u2502     \u2502   Storage   \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                           \u2502\n                    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                    \u2502 Local Cache  \u2502\n                    \u2502 (References) \u2502\n                    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Key Components:**\n- **Binary diff engine**: xdelta3 for optimal compression\n- **Intelligent routing**: Automatic file type detection\n- **Integrity verification**: SHA256 on every operation\n- **Local caching**: Fast repeated operations\n- **Zero dependencies**: No database, no manifest files\n\n### When to Use DeltaGlider\n\n\u2705 **Perfect for:**\n- Software releases and versioned artifacts\n- Container images and layers\n- Database backups and snapshots\n- Machine learning model checkpoints\n- Game assets and updates\n- Any versioned binary data\n\n\u274c **Not ideal for:**\n- Already compressed **unique** files\n- Streaming or multimedia files\n- Frequently changing unstructured data\n- Files smaller than 1MB\n\n## Migration from AWS CLI\n\nMigrating from `aws s3` to `deltaglider` is as simple as changing the command name:\n\n| AWS CLI | DeltaGlider | Compression Benefit |\n|---------|------------|---------------------|\n| `aws s3 cp file.zip s3://bucket/` | `deltaglider cp file.zip s3://bucket/` | \u2705 99% for similar files |\n| `aws s3 cp -r dir/ s3://bucket/` | `deltaglider cp -r dir/ s3://bucket/` | \u2705 99% for archives |\n| `aws s3 ls s3://bucket/` | `deltaglider ls s3://bucket/` | - |\n| `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - |\n| `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | \u2705 99% incremental |\n\n## Production Ready\n\n- \u2705 **Battle tested**: 200K+ files in production\n- \u2705 **Data integrity**: SHA256 verification on every operation\n- \u2705 **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc.\n- \u2705 **Atomic operations**: No partial states\n- \u2705 **Concurrent safe**: Multiple clients supported\n- \u2705 **Well tested**: 95%+ code coverage\n\n## Development\n\n```bash\n# Clone the repo\ngit clone https://github.com/beshu-tech/deltaglider\ncd deltaglider\n\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Run tests\nuv run pytest\n\n# Run with local MinIO\ndocker-compose up -d\nexport AWS_ENDPOINT_URL=http://localhost:9000\ndeltaglider cp test.zip s3://test/\n```\n\n## FAQ\n\n**Q: What if my reference file gets corrupted?**\nA: Every operation includes SHA256 verification. Corruption is detected immediately.\n\n**Q: How fast is reconstruction?**\nA: Sub-100ms for typical files. The delta is applied in-memory using xdelta3.\n\n**Q: Can I use this with existing S3 data?**\nA: Yes! DeltaGlider can start optimizing new uploads immediately. Old data remains accessible.\n\n**Q: What's the overhead for unique files?**\nA: Zero. Files without similarity are uploaded directly.\n\n**Q: Is this compatible with S3 encryption?**\nA: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies.\n\n## Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\nKey areas we're exploring:\n- Cloud-native reference management\n- Rust implementation for 10x speed\n- Automatic similarity detection\n- Multi-threaded delta generation\n- WASM support for browser usage\n\n## License\n\nMIT - Use it freely in your projects.\n\n## Success Stories\n\n> \"We reduced our artifact storage from 4TB to 5GB. This isn't hyperbole\u2014it's math.\"\n> \u2014 [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md)\n\n> \"Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds.\"\n> \u2014 Platform Engineer at [redacted]\n\n> \"We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year.\"\n> \u2014 CTO at [stealth startup]\n\n---\n\n**Try it now**: Got versioned files in S3? See your potential savings:\n\n```bash\n# Analyze your S3 bucket\ndeltaglider analyze s3://your-bucket/\n# Output: \"Potential savings: 95.2% (4.8TB \u2192 237GB)\"\n```\n\nBuilt with \u2764\ufe0f by engineers who were tired of paying to store the same bytes over and over.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Store 4TB in 5GB: S3-compatible storage with 99.9% compression for versioned files",
    "version": "4.2.2",
    "project_urls": {
        "Case Study": "https://github.com/beshu-tech/deltaglider/blob/main/docs/case-study-readonlyrest.md",
        "Changelog": "https://github.com/beshu-tech/deltaglider/releases",
        "Documentation": "https://github.com/beshu-tech/deltaglider#readme",
        "Homepage": "https://github.com/beshu-tech/deltaglider",
        "Issues": "https://github.com/beshu-tech/deltaglider/issues",
        "Repository": "https://github.com/beshu-tech/deltaglider"
    },
    "split_keywords": [
        "s3",
        " compression",
        " delta",
        " storage",
        " backup",
        " deduplication",
        " xdelta3",
        " binary-diff",
        " artifact-storage",
        " version-control",
        " minio",
        " aws",
        " cost-optimization",
        " devops"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f03c132074b19f056d68f10e6f5056f4c6e7dd3dd10912ba7f1c9a6676066065",
                "md5": "7a738f6ace12bee84e276452fd831d38",
                "sha256": "4550a1d17c3bd9994f715d6639dd1820cb93f49155e324329058ec77571c8f00"
            },
            "downloads": -1,
            "filename": "deltaglider-4.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7a738f6ace12bee84e276452fd831d38",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 53977,
            "upload_time": "2025-10-06T21:12:59",
            "upload_time_iso_8601": "2025-10-06T21:12:59.545848Z",
            "url": "https://files.pythonhosted.org/packages/f0/3c/132074b19f056d68f10e6f5056f4c6e7dd3dd10912ba7f1c9a6676066065/deltaglider-4.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1f3029e0ac4f03b273b460b1faeb7c34b2c393b1d21e4e7af519bdbf1ebc1a8c",
                "md5": "47ead33ddbc016db834b5e86dabb4e6e",
                "sha256": "a473e091495073568ada183e4ddbffc17ce8bb2fc9ceb0a63bc1cf70e4b8c992"
            },
            "downloads": -1,
            "filename": "deltaglider-4.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "47ead33ddbc016db834b5e86dabb4e6e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 402368,
            "upload_time": "2025-10-06T21:13:00",
            "upload_time_iso_8601": "2025-10-06T21:13:00.538944Z",
            "url": "https://files.pythonhosted.org/packages/1f/30/29e0ac4f03b273b460b1faeb7c34b2c393b1d21e4e7af519bdbf1ebc1a8c/deltaglider-4.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-06 21:13:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "beshu-tech",
    "github_project": "deltaglider",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "deltaglider"
}
        
Elapsed time: 0.83501s