# Embedded JSONL DB Engine
Human-readable and human-editable storage for serializable objects. A single JSONL file acts as a database with a typed schema and taxonomies stored in the header. Appends-only write model means you don't rewrite the whole file on every change. The schema exists but can evolve freely: add new fields at any time, and the database will self-adapt by materializing defaults and preserving existing data without manual migrations. For simple predicates we use a fast regex plan; otherwise we fall back to full JSON parsing. Single-writer model with explicit compaction, rolling & daily backups, and external BLOB storage.
Status: alpha (0.1.0a2). Core features implemented: file I/O, in-memory indexes, CRUD, compaction, backups, taxonomy header + migrations, external BLOBs. Fast-regex plan is integrated for simple queries; complex queries fall back to full parse.
Test status
- Full test suite passes locally (CRUD, queries, performance, backups retention, corruption handling, schema migration, taxonomy ops, blobs GC).
- Note on parallelism: current engine favors simplicity and deterministic ordering; regex fast-path and index build operate single-threaded to avoid GIL/IPC overhead. Parallel scan/parse across CPU cores can be added later via multiprocessing if workloads demand it.
- Run: ./run-tests.sh
- Reference run (on Dev machine):
- 21 passed in ~9s
- reopen and build indexes for 10000 records: ~0.47s
- fast-plan query (>= 5000) matched=5000: ~0.63s
- full-parse query (same predicate via $or) matched=5000: ~0.72s
Install
- pip install embedded_jsonl_db_engine
Quick start
Install
- pip install embedded_jsonl_db_engine
Minimal example (quick_start.py)
```python
from embedded_jsonl_db_engine import Database
SCHEMA = {
"id": {"type": "str", "mandatory": False, "index": True},
"name": {"type": "str", "mandatory": True, "index": True},
"age": {"type": "int", "mandatory": False, "default": 0, "index": True},
"flags": {
"type": "object",
"fields": {
"active": {"type": "bool", "mandatory": False, "default": True, "index": True},
},
},
"createdAt": {"type": "datetime", "mandatory": False, "index": True},
}
db = Database(path="demo.jsonl", schema=SCHEMA, mode="+")
rec = db.new()
rec["name"] = "Alice"
rec["age"] = 33
rec.save()
loaded = db.get(rec.id)
print("Loaded:", loaded)
for r in db.find({"flags": {"active": True}, "age": {"$gte": 18}}):
print("Adult active:", r["name"], r["age"])
```
Run the example
- python examples/quick_start.py
Contributing
- Development setup: run ./setup.sh to install dev extras, then ruff and pytest locally.
- Roadmap: implement storage I/O, open/index build, CRUD, compaction/backups, taxonomy migrations, blobs.
Development bootstrap
- Initialize repository structure and minimal package scaffold:
- embedded_jsonl_db_engine/ with core modules (database.py, storage.py, schema.py, taxonomy.py, index.py, query.py, fastregex.py, blobs.py, utils.py, progress.py, errors.py)
- pyproject.toml with project metadata and ruff config
- tests/ with placeholder tests
- project_log.md to track decisions and progress
- Next steps:
1) Implement FileStorage I/O primitives (open/lock, header R/W, append, scan, atomic replace).
2) Implement Database._open() to build in-memory indexes from meta scan.
3) Implement minimal CRUD: new/get/save/find(delete as a stub if needed).
4) Add simple tests for open/new/save/get.
What has been implemented so far
- Low-level file I/O (FileStorage): cross-platform exclusive lock, header read/write/rewrite, append meta+data with fsync, meta scan with offsets, atomic replace.
- Database open with progress: lock, header init if missing, base meta index rebuild, secondary/reverse index build.
- In-memory indexes: secondary (scalar) and reverse (taxonomy) indexes; built on open and maintained on save()/delete(); prefilter in find().
- CRUD: new() with defaults, get() (with optional meta), save() with schema validation and canonical JSON, find() with predicate evaluation + index prefilter, update(), delete() (logical).
- Queries: field projection (fields=[...]), ordering (supports nested paths "a/b"), skip/limit; is_simple_query() helper; fast regex plan for simple scalar predicates with fallback to full json.loads.
- Maintenance: compact_now() (garbage ratio ≥ 0.30), backup_now() (rolling and daily .gz) with progress events.
- Taxonomies: header-only updates (rewrite_header), full migrations (rename/merge/delete detach) with progress; strict schema validation for taxonomy-backed fields.
- BLOBs: external CAS by sha256 with put/open/gc and Database wrappers.
- Utilities: ISO timestamps, epoch converters, canonical JSON, sha256, ULID-like ids.
License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "embedded-jsonl-db-engine",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "jsonl, embedded, database, single-file, schema, taxonomies, regex, blobs",
"author": "Mykola Rudenko",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/9a/cf/516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a/embedded_jsonl_db_engine-0.1.0a5.tar.gz",
"platform": null,
"description": "# Embedded JSONL DB Engine\n\nHuman-readable and human-editable storage for serializable objects. A single JSONL file acts as a database with a typed schema and taxonomies stored in the header. Appends-only write model means you don't rewrite the whole file on every change. The schema exists but can evolve freely: add new fields at any time, and the database will self-adapt by materializing defaults and preserving existing data without manual migrations. For simple predicates we use a fast regex plan; otherwise we fall back to full JSON parsing. Single-writer model with explicit compaction, rolling & daily backups, and external BLOB storage.\n\nStatus: alpha (0.1.0a2). Core features implemented: file I/O, in-memory indexes, CRUD, compaction, backups, taxonomy header + migrations, external BLOBs. Fast-regex plan is integrated for simple queries; complex queries fall back to full parse.\n\nTest status\n- Full test suite passes locally (CRUD, queries, performance, backups retention, corruption handling, schema migration, taxonomy ops, blobs GC).\n- Note on parallelism: current engine favors simplicity and deterministic ordering; regex fast-path and index build operate single-threaded to avoid GIL/IPC overhead. Parallel scan/parse across CPU cores can be added later via multiprocessing if workloads demand it.\n- Run: ./run-tests.sh\n- Reference run (on Dev machine):\n - 21 passed in ~9s\n - reopen and build indexes for 10000 records: ~0.47s\n - fast-plan query (>= 5000) matched=5000: ~0.63s\n - full-parse query (same predicate via $or) matched=5000: ~0.72s\n\nInstall\n- pip install embedded_jsonl_db_engine\n\n\nQuick start\n\nInstall\n- pip install embedded_jsonl_db_engine\n\nMinimal example (quick_start.py)\n```python\nfrom embedded_jsonl_db_engine import Database\n\nSCHEMA = {\n \"id\": {\"type\": \"str\", \"mandatory\": False, \"index\": True},\n \"name\": {\"type\": \"str\", \"mandatory\": True, \"index\": True},\n \"age\": {\"type\": \"int\", \"mandatory\": False, \"default\": 0, \"index\": True},\n \"flags\": {\n \"type\": \"object\",\n \"fields\": {\n \"active\": {\"type\": \"bool\", \"mandatory\": False, \"default\": True, \"index\": True},\n },\n },\n \"createdAt\": {\"type\": \"datetime\", \"mandatory\": False, \"index\": True},\n}\n\ndb = Database(path=\"demo.jsonl\", schema=SCHEMA, mode=\"+\")\nrec = db.new()\nrec[\"name\"] = \"Alice\"\nrec[\"age\"] = 33\nrec.save()\n\nloaded = db.get(rec.id)\nprint(\"Loaded:\", loaded)\n\nfor r in db.find({\"flags\": {\"active\": True}, \"age\": {\"$gte\": 18}}):\n print(\"Adult active:\", r[\"name\"], r[\"age\"])\n```\n\nRun the example\n- python examples/quick_start.py\n\nContributing\n- Development setup: run ./setup.sh to install dev extras, then ruff and pytest locally.\n- Roadmap: implement storage I/O, open/index build, CRUD, compaction/backups, taxonomy migrations, blobs.\n\nDevelopment bootstrap\n- Initialize repository structure and minimal package scaffold:\n - embedded_jsonl_db_engine/ with core modules (database.py, storage.py, schema.py, taxonomy.py, index.py, query.py, fastregex.py, blobs.py, utils.py, progress.py, errors.py)\n - pyproject.toml with project metadata and ruff config\n - tests/ with placeholder tests\n - project_log.md to track decisions and progress\n- Next steps:\n 1) Implement FileStorage I/O primitives (open/lock, header R/W, append, scan, atomic replace).\n 2) Implement Database._open() to build in-memory indexes from meta scan.\n 3) Implement minimal CRUD: new/get/save/find(delete as a stub if needed).\n 4) Add simple tests for open/new/save/get.\n\nWhat has been implemented so far\n- Low-level file I/O (FileStorage): cross-platform exclusive lock, header read/write/rewrite, append meta+data with fsync, meta scan with offsets, atomic replace.\n- Database open with progress: lock, header init if missing, base meta index rebuild, secondary/reverse index build.\n- In-memory indexes: secondary (scalar) and reverse (taxonomy) indexes; built on open and maintained on save()/delete(); prefilter in find().\n- CRUD: new() with defaults, get() (with optional meta), save() with schema validation and canonical JSON, find() with predicate evaluation + index prefilter, update(), delete() (logical).\n- Queries: field projection (fields=[...]), ordering (supports nested paths \"a/b\"), skip/limit; is_simple_query() helper; fast regex plan for simple scalar predicates with fallback to full json.loads.\n- Maintenance: compact_now() (garbage ratio \u2265 0.30), backup_now() (rolling and daily .gz) with progress events.\n- Taxonomies: header-only updates (rewrite_header), full migrations (rename/merge/delete detach) with progress; strict schema validation for taxonomy-backed fields.\n- BLOBs: external CAS by sha256 with put/open/gc and Database wrappers.\n- Utilities: ISO timestamps, epoch converters, canonical JSON, sha256, ULID-like ids.\n\nLicense\nMIT\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Embedded JSONL DB with strict schema, in-file header (schema+taxonomies), in-memory indexes, and fast regex queries.",
"version": "0.1.0a5",
"project_urls": {
"Homepage": "https://github.com/mykolarudenko/embedded_jsonl_db_engine",
"Issues": "https://github.com/mykolarudenko/embedded_jsonl_db_engine/issues",
"Repository": "https://github.com/mykolarudenko/embedded_jsonl_db_engine"
},
"split_keywords": [
"jsonl",
" embedded",
" database",
" single-file",
" schema",
" taxonomies",
" regex",
" blobs"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "24b50ad569c7559103c463652cc1c022f59e8b09229374d8c0d5c7706f6d7f22",
"md5": "8b6cad995e877ca81c2ff6949a545b59",
"sha256": "71e4bf84c8d6901144c2cc233a61670762e392aa515bd4710075e2503d016801"
},
"downloads": -1,
"filename": "embedded_jsonl_db_engine-0.1.0a5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8b6cad995e877ca81c2ff6949a545b59",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 30705,
"upload_time": "2025-08-24T04:23:42",
"upload_time_iso_8601": "2025-08-24T04:23:42.249322Z",
"url": "https://files.pythonhosted.org/packages/24/b5/0ad569c7559103c463652cc1c022f59e8b09229374d8c0d5c7706f6d7f22/embedded_jsonl_db_engine-0.1.0a5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9acf516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a",
"md5": "6fdedbd661c632920835c174a315a333",
"sha256": "aea51f4a099d69d5274932508953c322392711c29ee432d02927965e61713a24"
},
"downloads": -1,
"filename": "embedded_jsonl_db_engine-0.1.0a5.tar.gz",
"has_sig": false,
"md5_digest": "6fdedbd661c632920835c174a315a333",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 36323,
"upload_time": "2025-08-24T04:23:43",
"upload_time_iso_8601": "2025-08-24T04:23:43.589035Z",
"url": "https://files.pythonhosted.org/packages/9a/cf/516e49093aa782bf68b863a524a5d6ca511dc66f77a8371c630879379d0a/embedded_jsonl_db_engine-0.1.0a5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-24 04:23:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mykolarudenko",
"github_project": "embedded_jsonl_db_engine",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "embedded-jsonl-db-engine"
}