aura-compression

Name	aura-compression JSON
Version	2.0.2 JSON
	download
home_page	https://github.com/hendrixx-cnc/AURA
Summary	AI-Optimized Hybrid Compression Protocol for Real-Time Communication
upload_time	2025-11-01 20:04:49
maintainer	None
docs_url	None
author	Todd Hendricks
requires_python	>=3.10
license	Apache License 2.0
keywords	compression ai chat websocket auralite compliance
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # AURA Compression Toolkit

AURA is an experimental, Python-first playground for hybrid compression. It mixes
template‑aware encoders, semantic heuristics, and audit-friendly metadata so you
can explore how structured traffic (API chatter, AI↔AI messages, log streams)
behaves under different strategies. The project is **not production-ready**, but
it now ships with a lean test suite and CLI tooling that make local experiments
straightforward.

---

## TL;DR

|                       | Status                                                                 |
|-----------------------|------------------------------------------------------------------------|
| Vision                | Efficient, auditable compression tuned for repetitive, structured text |
| Current maturity      | Alpha — safe for prototyping only                                      |
| Runtime support       | CPython ≥ 3.10 (pure Python, no native deps)                           |
| Test coverage         | ~44 % (core pipelines + CLI smoke tests)                               |
| License               | Apache 2.0 (see LICENSE for patent notice)                             |

---

## Installation

```bash
git clone https://github.com/hendrixx-cnc/AURA.git
cd AURA
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```

The `dev` extra installs `pytest`, coverage tooling, and linters.

---

## Quick Start (Python API)

```python
from aura_compression.compressor_refactored import ProductionHybridCompressor

compressor = ProductionHybridCompressor(
    enable_aura=False,          # disable background discovery worker
    enable_fast_path=True,
    enable_audit_logging=False,
    template_sync_interval_seconds=None,
)

message = "Order 42: status=ready"
payload, method, metadata = compressor.compress(message)
restored = compressor.decompress(payload)

assert restored == message
print(method.name, metadata["ratio"])
```

### When does it shine?

- You control both ends of the link (AI ↔ AI, microservices, etc.)
- Payloads are verbose but structured (logs, JSON, templated replies)
- You’re comfortable tuning template libraries / cache policy

### When to avoid it

- Need wire compatibility with gzip/zstd/brotli
- Response time budgets are tight (large-file compression is slow)
- You cannot ship persistent template state alongside payloads

---

## Large-File CLI

The `tools/compress_large_file.py` script provides a streaming container format.
It records chunk metadata (including template usage) so decompression works on a
fresh machine.

```bash
# Compress with a progress bar and write stats to JSON
python tools/compress_large_file.py compress \
  --input "/path/to/enwik8" \
  --output "/path/to/enwik8.aura" \
  --chunk-size 64K \
  --progress bar \
  --stats-format json \
  --stats-file stats/compress.json

# Round-trip integrity check without writing output
python tools/compress_large_file.py verify \
  --input "/path/to/enwik8.aura" \
  --progress percent

# Inspect container metadata (headers, sample chunks, template IDs)
python tools/compress_large_file.py info \
  --input "/path/to/enwik8.aura" \
  --max-chunks 5 \
  --stats-format table
```

Key switches:

| Flag               | Description                                            |
|--------------------|--------------------------------------------------------|
| `--chunk-size`     | Bytes or suffixed value (`256K`, `4M`, …)              |
| `--progress`       | `auto`, `bar`, `percent`, `none`                       |
| `--stats-format`   | `table` (default) or `json`                            |
| `--stats-file`     | Path to persist stats output (useful in CI)            |

---

## Synthetic Network Smoke Test

To sanity-check the compressor against AI‑style traffic:

```bash
pytest tests/test_network_simulation_smoke.py -q
```

The generator streams ~120 messages (API calls, logs, chat replies, binary blobs)
and asserts:

- Round-trip fidelity for every payload
- Multiple compression strategies selected
- Binary semantic templates triggered at least once
- Average compression ratio stays sensible (>0.5)

Use this as a starting point when tailoring the system to your own message mix.

---

## Testing & Coverage

```bash
pytest -q                # fast path (~40 s)
pytest --cov=src --cov=tools --cov-report=term-missing
```

Current suite highlights:

- `tests/test_cli_utilities.py` — input parsing, progress modes, container inspection
- `tests/test_core_components.py` — basic round-trip compressor + template matching
- `tests/test_network_simulation_smoke.py` — synthetic AI/network workload

Large areas of the codebase remain untested (BRIO internals, ML selector, legacy
tools). Treat reported coverage as a proxy for explored functionality, not as a
production safety net.

---

## Roadmap Snapshot

- ✅ Streamlined large-file CLI with inspect/verify subcommands
- ✅ Lean regression tests to keep core behavior honest
- 🔜 Refactor BRIO and ML pipelines into testable, modular units
- 🔜 Benchmark suite vs. gzip/zstd/brotli on realistic corpora
- 🔜 Documentation on template discovery + SQLite persistence internals

---

## Contributing

1. Open an issue describing your proposal.
2. Fork the repo and create a feature branch.
3. Keep changes focused; add tests when practical.
4. Run `pytest -q` before submitting your PR.

Helpful areas:

- Improving template discovery robustness (error handling, logging)
- Instrumentation and profiling of large-file compression
- Type hints / static analysis for critical modules
- Benchmarks and data-driven comparisons

---

## License & Patents

Licensed under Apache 2.0. The project references patent-pending techniques; the
open-source distribution grants a royalty-free license for evaluation and
non-commercial use. See `LICENSE` for full text and obligations.

---

## Contact

- Author: Todd Hendricks — `todd@auraprotocol.org`
- Issues & discussions: [GitHub Issues](https://github.com/hendrixx-cnc/AURA/issues)

If you do end up using AURA in research or prototyping, feedback on data sets,
compression ratios, and pain points is greatly appreciated.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hendrixx-cnc/AURA",
    "name": "aura-compression",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "compression, ai, chat, websocket, auralite, compliance",
    "author": "Todd Hendricks",
    "author_email": "Todd Hendricks <todd@auraprotocol.org>",
    "download_url": null,
    "platform": null,
    "description": "# AURA Compression Toolkit\n\nAURA is an experimental, Python-first playground for hybrid compression. It mixes\ntemplate\u2011aware encoders, semantic heuristics, and audit-friendly metadata so you\ncan explore how structured traffic (API chatter, AI\u2194AI messages, log streams)\nbehaves under different strategies. The project is **not production-ready**, but\nit now ships with a lean test suite and CLI tooling that make local experiments\nstraightforward.\n\n---\n\n## TL;DR\n\n|                       | Status                                                                 |\n|-----------------------|------------------------------------------------------------------------|\n| Vision                | Efficient, auditable compression tuned for repetitive, structured text |\n| Current maturity      | Alpha \u2014 safe for prototyping only                                      |\n| Runtime support       | CPython \u2265 3.10 (pure Python, no native deps)                           |\n| Test coverage         | ~44\u202f% (core pipelines + CLI smoke tests)                               |\n| License               | Apache 2.0 (see LICENSE for patent notice)                             |\n\n---\n\n## Installation\n\n```bash\ngit clone https://github.com/hendrixx-cnc/AURA.git\ncd AURA\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -e \".[dev]\"\n```\n\nThe `dev` extra installs `pytest`, coverage tooling, and linters.\n\n---\n\n## Quick Start (Python API)\n\n```python\nfrom aura_compression.compressor_refactored import ProductionHybridCompressor\n\ncompressor = ProductionHybridCompressor(\n    enable_aura=False,          # disable background discovery worker\n    enable_fast_path=True,\n    enable_audit_logging=False,\n    template_sync_interval_seconds=None,\n)\n\nmessage = \"Order 42: status=ready\"\npayload, method, metadata = compressor.compress(message)\nrestored = compressor.decompress(payload)\n\nassert restored == message\nprint(method.name, metadata[\"ratio\"])\n```\n\n### When does it shine?\n\n- You control both ends of the link (AI \u2194 AI, microservices, etc.)\n- Payloads are verbose but structured (logs, JSON, templated replies)\n- You\u2019re comfortable tuning template libraries / cache policy\n\n### When to avoid it\n\n- Need wire compatibility with gzip/zstd/brotli\n- Response time budgets are tight (large-file compression is slow)\n- You cannot ship persistent template state alongside payloads\n\n---\n\n## Large-File CLI\n\nThe `tools/compress_large_file.py` script provides a streaming container format.\nIt records chunk metadata (including template usage) so decompression works on a\nfresh machine.\n\n```bash\n# Compress with a progress bar and write stats to JSON\npython tools/compress_large_file.py compress \\\n  --input \"/path/to/enwik8\" \\\n  --output \"/path/to/enwik8.aura\" \\\n  --chunk-size 64K \\\n  --progress bar \\\n  --stats-format json \\\n  --stats-file stats/compress.json\n\n# Round-trip integrity check without writing output\npython tools/compress_large_file.py verify \\\n  --input \"/path/to/enwik8.aura\" \\\n  --progress percent\n\n# Inspect container metadata (headers, sample chunks, template IDs)\npython tools/compress_large_file.py info \\\n  --input \"/path/to/enwik8.aura\" \\\n  --max-chunks 5 \\\n  --stats-format table\n```\n\nKey switches:\n\n| Flag               | Description                                            |\n|--------------------|--------------------------------------------------------|\n| `--chunk-size`     | Bytes or suffixed value (`256K`, `4M`, \u2026)              |\n| `--progress`       | `auto`, `bar`, `percent`, `none`                       |\n| `--stats-format`   | `table` (default) or `json`                            |\n| `--stats-file`     | Path to persist stats output (useful in CI)            |\n\n---\n\n## Synthetic Network Smoke Test\n\nTo sanity-check the compressor against AI\u2011style traffic:\n\n```bash\npytest tests/test_network_simulation_smoke.py -q\n```\n\nThe generator streams ~120 messages (API calls, logs, chat replies, binary blobs)\nand asserts:\n\n- Round-trip fidelity for every payload\n- Multiple compression strategies selected\n- Binary semantic templates triggered at least once\n- Average compression ratio stays sensible (>0.5)\n\nUse this as a starting point when tailoring the system to your own message mix.\n\n---\n\n## Testing & Coverage\n\n```bash\npytest -q                # fast path (~40 s)\npytest --cov=src --cov=tools --cov-report=term-missing\n```\n\nCurrent suite highlights:\n\n- `tests/test_cli_utilities.py` \u2014 input parsing, progress modes, container inspection\n- `tests/test_core_components.py` \u2014 basic round-trip compressor + template matching\n- `tests/test_network_simulation_smoke.py` \u2014 synthetic AI/network workload\n\nLarge areas of the codebase remain untested (BRIO internals, ML selector, legacy\ntools). Treat reported coverage as a proxy for explored functionality, not as a\nproduction safety net.\n\n---\n\n## Roadmap Snapshot\n\n- \u2705 Streamlined large-file CLI with inspect/verify subcommands\n- \u2705 Lean regression tests to keep core behavior honest\n- \ud83d\udd1c Refactor BRIO and ML pipelines into testable, modular units\n- \ud83d\udd1c Benchmark suite vs. gzip/zstd/brotli on realistic corpora\n- \ud83d\udd1c Documentation on template discovery + SQLite persistence internals\n\n---\n\n## Contributing\n\n1. Open an issue describing your proposal.\n2. Fork the repo and create a feature branch.\n3. Keep changes focused; add tests when practical.\n4. Run `pytest -q` before submitting your PR.\n\nHelpful areas:\n\n- Improving template discovery robustness (error handling, logging)\n- Instrumentation and profiling of large-file compression\n- Type hints / static analysis for critical modules\n- Benchmarks and data-driven comparisons\n\n---\n\n## License & Patents\n\nLicensed under Apache 2.0. The project references patent-pending techniques; the\nopen-source distribution grants a royalty-free license for evaluation and\nnon-commercial use. See `LICENSE` for full text and obligations.\n\n---\n\n## Contact\n\n- Author: Todd Hendricks \u2014 `todd@auraprotocol.org`\n- Issues & discussions: [GitHub Issues](https://github.com/hendrixx-cnc/AURA/issues)\n\nIf you do end up using AURA in research or prototyping, feedback on data sets,\ncompression ratios, and pain points is greatly appreciated.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "AI-Optimized Hybrid Compression Protocol for Real-Time Communication",
    "version": "2.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/hendrixx-cnc/AURA/issues",
        "Documentation": "https://github.com/hendrixx-cnc/AURA/blob/main/docs/technical/DEVELOPER_GUIDE.md",
        "Homepage": "https://github.com/hendrixx-cnc/AURA",
        "Repository": "https://github.com/hendrixx-cnc/AURA"
    },
    "split_keywords": [
        "compression",
        " ai",
        " chat",
        " websocket",
        " auralite",
        " compliance"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ab10f1cca10464bf3c7493582810b2157dde6f9d2a7f360748b6c4726c7070cd",
                "md5": "2bb4e4c3852df1f82b44d9294254b13e",
                "sha256": "0fbe4108836d703f3e9e919f4cc015b21cd1229f5b798630fa57e9418daa4ba7"
            },
            "downloads": -1,
            "filename": "aura_compression-2.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2bb4e4c3852df1f82b44d9294254b13e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 141517,
            "upload_time": "2025-11-01T20:04:49",
            "upload_time_iso_8601": "2025-11-01T20:04:49.866748Z",
            "url": "https://files.pythonhosted.org/packages/ab/10/f1cca10464bf3c7493582810b2157dde6f9d2a7f360748b6c4726c7070cd/aura_compression-2.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 20:04:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hendrixx-cnc",
    "github_project": "AURA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "aura-compression"
}

Todd Hendricks