embedded-jsonl-db-engine


Nameembedded-jsonl-db-engine JSON
Version 0.1.0a5 PyPI version JSON
download
home_pageNone
SummaryEmbedded JSONL DB with strict schema, in-file header (schema+taxonomies), in-memory indexes, and fast regex queries.
upload_time2025-08-24 04:23:43
maintainerNone
docs_urlNone
authorMykola Rudenko
requires_python>=3.9
licenseMIT
keywords jsonl embedded database single-file schema taxonomies regex blobs
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Embedded JSONL DB Engine

Human-readable and human-editable storage for serializable objects. A single JSONL file acts as a database with a typed schema and taxonomies stored in the header. Appends-only write model means you don't rewrite the whole file on every change. The schema exists but can evolve freely: add new fields at any time, and the database will self-adapt by materializing defaults and preserving existing data without manual migrations. For simple predicates we use a fast regex plan; otherwise we fall back to full JSON parsing. Single-writer model with explicit compaction, rolling & daily backups, and external BLOB storage.

Status: alpha (0.1.0a2). Core features implemented: file I/O, in-memory indexes, CRUD, compaction, backups, taxonomy header + migrations, external BLOBs. Fast-regex plan is integrated for simple queries; complex queries fall back to full parse.

Test status
- Full test suite passes locally (CRUD, queries, performance, backups retention, corruption handling, schema migration, taxonomy ops, blobs GC).
- Note on parallelism: current engine favors simplicity and deterministic ordering; regex fast-path and index build operate single-threaded to avoid GIL/IPC overhead. Parallel scan/parse across CPU cores can be added later via multiprocessing if workloads demand it.
- Run: ./run-tests.sh
- Reference run (on Dev machine):
  - 21 passed in ~9s
  - reopen and build indexes for 10000 records: ~0.47s
  - fast-plan query (>= 5000) matched=5000: ~0.63s
  - full-parse query (same predicate via $or) matched=5000: ~0.72s

Install
- pip install embedded_jsonl_db_engine


Quick start

Install
- pip install embedded_jsonl_db_engine

Minimal example (quick_start.py)
```python
from embedded_jsonl_db_engine import Database

SCHEMA = {
    "id": {"type": "str", "mandatory": False, "index": True},
    "name": {"type": "str", "mandatory": True, "index": True},
    "age": {"type": "int", "mandatory": False, "default": 0, "index": True},
    "flags": {
        "type": "object",
        "fields": {
            "active": {"type": "bool", "mandatory": False, "default": True, "index": True},
        },
    },
    "createdAt": {"type": "datetime", "mandatory": False, "index": True},
}

db = Database(path="demo.jsonl", schema=SCHEMA, mode="+")
rec = db.new()
rec["name"] = "Alice"
rec["age"] = 33
rec.save()

loaded = db.get(rec.id)
print("Loaded:", loaded)

for r in db.find({"flags": {"active": True}, "age": {"$gte": 18}}):
    print("Adult active:", r["name"], r["age"])
```

Run the example
- python examples/quick_start.py

Contributing
- Development setup: run ./setup.sh to install dev extras, then ruff and pytest locally.
- Roadmap: implement storage I/O, open/index build, CRUD, compaction/backups, taxonomy migrations, blobs.

Development bootstrap
- Initialize repository structure and minimal package scaffold:
  - embedded_jsonl_db_engine/ with core modules (database.py, storage.py, schema.py, taxonomy.py, index.py, query.py, fastregex.py, blobs.py, utils.py, progress.py, errors.py)
  - pyproject.toml with project metadata and ruff config
  - tests/ with placeholder tests
  - project_log.md to track decisions and progress
- Next steps:
  1) Implement FileStorage I/O primitives (open/lock, header R/W, append, scan, atomic replace).
  2) Implement Database._open() to build in-memory indexes from meta scan.
  3) Implement minimal CRUD: new/get/save/find(delete as a stub if needed).
  4) Add simple tests for open/new/save/get.

What has been implemented so far
- Low-level file I/O (FileStorage): cross-platform exclusive lock, header read/write/rewrite, append meta+data with fsync, meta scan with offsets, atomic replace.
- Database open with progress: lock, header init if missing, base meta index rebuild, secondary/reverse index build.
- In-memory indexes: secondary (scalar) and reverse (taxonomy) indexes; built on open and maintained on save()/delete(); prefilter in find().
- CRUD: new() with defaults, get() (with optional meta), save() with schema validation and canonical JSON, find() with predicate evaluation + index prefilter, update(), delete() (logical).
- Queries: field projection (fields=[...]), ordering (supports nested paths "a/b"), skip/limit; is_simple_query() helper; fast regex plan for simple scalar predicates with fallback to full json.loads.
- Maintenance: compact_now() (garbage ratio ≥ 0.30), backup_now() (rolling and daily .gz) with progress events.
- Taxonomies: header-only updates (rewrite_header), full migrations (rename/merge/delete detach) with progress; strict schema validation for taxonomy-backed fields.
- BLOBs: external CAS by sha256 with put/open/gc and Database wrappers.
- Utilities: ISO timestamps, epoch converters, canonical JSON, sha256, ULID-like ids.

License
MIT

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "embedded-jsonl-db-engine",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "jsonl, embedded, database, single-file, schema, taxonomies, regex, blobs",
    "author": "Mykola Rudenko",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9a/cf/516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a/embedded_jsonl_db_engine-0.1.0a5.tar.gz",
    "platform": null,
    "description": "# Embedded JSONL DB Engine\n\nHuman-readable and human-editable storage for serializable objects. A single JSONL file acts as a database with a typed schema and taxonomies stored in the header. Appends-only write model means you don't rewrite the whole file on every change. The schema exists but can evolve freely: add new fields at any time, and the database will self-adapt by materializing defaults and preserving existing data without manual migrations. For simple predicates we use a fast regex plan; otherwise we fall back to full JSON parsing. Single-writer model with explicit compaction, rolling & daily backups, and external BLOB storage.\n\nStatus: alpha (0.1.0a2). Core features implemented: file I/O, in-memory indexes, CRUD, compaction, backups, taxonomy header + migrations, external BLOBs. Fast-regex plan is integrated for simple queries; complex queries fall back to full parse.\n\nTest status\n- Full test suite passes locally (CRUD, queries, performance, backups retention, corruption handling, schema migration, taxonomy ops, blobs GC).\n- Note on parallelism: current engine favors simplicity and deterministic ordering; regex fast-path and index build operate single-threaded to avoid GIL/IPC overhead. Parallel scan/parse across CPU cores can be added later via multiprocessing if workloads demand it.\n- Run: ./run-tests.sh\n- Reference run (on Dev machine):\n  - 21 passed in ~9s\n  - reopen and build indexes for 10000 records: ~0.47s\n  - fast-plan query (>= 5000) matched=5000: ~0.63s\n  - full-parse query (same predicate via $or) matched=5000: ~0.72s\n\nInstall\n- pip install embedded_jsonl_db_engine\n\n\nQuick start\n\nInstall\n- pip install embedded_jsonl_db_engine\n\nMinimal example (quick_start.py)\n```python\nfrom embedded_jsonl_db_engine import Database\n\nSCHEMA = {\n    \"id\": {\"type\": \"str\", \"mandatory\": False, \"index\": True},\n    \"name\": {\"type\": \"str\", \"mandatory\": True, \"index\": True},\n    \"age\": {\"type\": \"int\", \"mandatory\": False, \"default\": 0, \"index\": True},\n    \"flags\": {\n        \"type\": \"object\",\n        \"fields\": {\n            \"active\": {\"type\": \"bool\", \"mandatory\": False, \"default\": True, \"index\": True},\n        },\n    },\n    \"createdAt\": {\"type\": \"datetime\", \"mandatory\": False, \"index\": True},\n}\n\ndb = Database(path=\"demo.jsonl\", schema=SCHEMA, mode=\"+\")\nrec = db.new()\nrec[\"name\"] = \"Alice\"\nrec[\"age\"] = 33\nrec.save()\n\nloaded = db.get(rec.id)\nprint(\"Loaded:\", loaded)\n\nfor r in db.find({\"flags\": {\"active\": True}, \"age\": {\"$gte\": 18}}):\n    print(\"Adult active:\", r[\"name\"], r[\"age\"])\n```\n\nRun the example\n- python examples/quick_start.py\n\nContributing\n- Development setup: run ./setup.sh to install dev extras, then ruff and pytest locally.\n- Roadmap: implement storage I/O, open/index build, CRUD, compaction/backups, taxonomy migrations, blobs.\n\nDevelopment bootstrap\n- Initialize repository structure and minimal package scaffold:\n  - embedded_jsonl_db_engine/ with core modules (database.py, storage.py, schema.py, taxonomy.py, index.py, query.py, fastregex.py, blobs.py, utils.py, progress.py, errors.py)\n  - pyproject.toml with project metadata and ruff config\n  - tests/ with placeholder tests\n  - project_log.md to track decisions and progress\n- Next steps:\n  1) Implement FileStorage I/O primitives (open/lock, header R/W, append, scan, atomic replace).\n  2) Implement Database._open() to build in-memory indexes from meta scan.\n  3) Implement minimal CRUD: new/get/save/find(delete as a stub if needed).\n  4) Add simple tests for open/new/save/get.\n\nWhat has been implemented so far\n- Low-level file I/O (FileStorage): cross-platform exclusive lock, header read/write/rewrite, append meta+data with fsync, meta scan with offsets, atomic replace.\n- Database open with progress: lock, header init if missing, base meta index rebuild, secondary/reverse index build.\n- In-memory indexes: secondary (scalar) and reverse (taxonomy) indexes; built on open and maintained on save()/delete(); prefilter in find().\n- CRUD: new() with defaults, get() (with optional meta), save() with schema validation and canonical JSON, find() with predicate evaluation + index prefilter, update(), delete() (logical).\n- Queries: field projection (fields=[...]), ordering (supports nested paths \"a/b\"), skip/limit; is_simple_query() helper; fast regex plan for simple scalar predicates with fallback to full json.loads.\n- Maintenance: compact_now() (garbage ratio \u2265 0.30), backup_now() (rolling and daily .gz) with progress events.\n- Taxonomies: header-only updates (rewrite_header), full migrations (rename/merge/delete detach) with progress; strict schema validation for taxonomy-backed fields.\n- BLOBs: external CAS by sha256 with put/open/gc and Database wrappers.\n- Utilities: ISO timestamps, epoch converters, canonical JSON, sha256, ULID-like ids.\n\nLicense\nMIT\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Embedded JSONL DB with strict schema, in-file header (schema+taxonomies), in-memory indexes, and fast regex queries.",
    "version": "0.1.0a5",
    "project_urls": {
        "Homepage": "https://github.com/mykolarudenko/embedded_jsonl_db_engine",
        "Issues": "https://github.com/mykolarudenko/embedded_jsonl_db_engine/issues",
        "Repository": "https://github.com/mykolarudenko/embedded_jsonl_db_engine"
    },
    "split_keywords": [
        "jsonl",
        " embedded",
        " database",
        " single-file",
        " schema",
        " taxonomies",
        " regex",
        " blobs"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "24b50ad569c7559103c463652cc1c022f59e8b09229374d8c0d5c7706f6d7f22",
                "md5": "8b6cad995e877ca81c2ff6949a545b59",
                "sha256": "71e4bf84c8d6901144c2cc233a61670762e392aa515bd4710075e2503d016801"
            },
            "downloads": -1,
            "filename": "embedded_jsonl_db_engine-0.1.0a5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b6cad995e877ca81c2ff6949a545b59",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 30705,
            "upload_time": "2025-08-24T04:23:42",
            "upload_time_iso_8601": "2025-08-24T04:23:42.249322Z",
            "url": "https://files.pythonhosted.org/packages/24/b5/0ad569c7559103c463652cc1c022f59e8b09229374d8c0d5c7706f6d7f22/embedded_jsonl_db_engine-0.1.0a5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9acf516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a",
                "md5": "6fdedbd661c632920835c174a315a333",
                "sha256": "aea51f4a099d69d5274932508953c322392711c29ee432d02927965e61713a24"
            },
            "downloads": -1,
            "filename": "embedded_jsonl_db_engine-0.1.0a5.tar.gz",
            "has_sig": false,
            "md5_digest": "6fdedbd661c632920835c174a315a333",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 36323,
            "upload_time": "2025-08-24T04:23:43",
            "upload_time_iso_8601": "2025-08-24T04:23:43.589035Z",
            "url": "https://files.pythonhosted.org/packages/9a/cf/516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a/embedded_jsonl_db_engine-0.1.0a5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-24 04:23:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mykolarudenko",
    "github_project": "embedded_jsonl_db_engine",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "embedded-jsonl-db-engine"
}
        
Elapsed time: 1.24843s