timewarp-llm

Name	timewarp-llm JSON
Version	0.2.1 JSON
	download
home_page	None
Summary	Event-sourced recording + deterministic replay + time‑travel debugger for LLM agents.
upload_time	2025-09-09 08:46:38
maintainer	None
docs_url	None
author	Timewarp maintainers
requires_python	>=3.11
license	None
keywords	llm langgraph langchain event-sourcing replay debugger opentelemetry cli determinism
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Timewarp — Deterministic Replay & Time‑Travel Debugger for LLM Agent Workflows
==============================================================================

[![PyPI](https://img.shields.io/pypi/v/timewarp-llm.svg)](https://pypi.org/project/timewarp-llm/)
[![Python Versions](https://img.shields.io/pypi/pyversions/timewarp-llm.svg)](https://pypi.org/project/timewarp-llm/)
[![License](https://img.shields.io/pypi/l/timewarp-llm.svg)](./LICENSE)
[![CI](https://github.com/aleks-apostle/timewarp/actions/workflows/ci.yml/badge.svg)](https://github.com/aleks-apostle/timewarp/actions/workflows/ci.yml)
[![Publish](https://github.com/aleks-apostle/timewarp/actions/workflows/release.yml/badge.svg)](https://github.com/aleks-apostle/timewarp/actions/workflows/release.yml)

Record every step. Rewind any step. Reproduce any run.

Timewarp adds event‑sourced logging and deterministic replay to agent frameworks (LangGraph first, LangChain optional), plus an interactive REPL debugger for step‑through, diffs, and what‑if edits. It fills a well‑documented gap: mainstream tools visualize traces but don’t let you replay them exactly.

What’s Included (v0.1 core)
---------------------------

- Core models and helpers
  - `timewarp.events`: Pydantic v2 models (`Run`, `Event`, `BlobRef`), hashing, redaction
  - `timewarp.codec`: Canonical JSON (orjson), Zstandard compression
  - `timewarp.determinism`: RNG snapshot/restore
- Local store
  - `timewarp.store.LocalStore`: SQLite (WAL) for runs/events + filesystem blobs
  - Deterministic blob layout: `runs/<run_id>/events/<step>/<kind>.bin` (zstd)
  - Connection PRAGMAs applied per-connection: `journal_mode=WAL`, `synchronous=NORMAL`, configurable busy timeout
  - Monotonic steps: per-run event `step` must strictly increase; single-writer-per-run is recommended for correctness
- LangGraph recording
  - `timewarp.langgraph.LangGraphRecorder`: streams `updates|values|messages`, records `LLM|TOOL|DECISION|HITL|SNAPSHOT` events
  - Labels include `thread_id`, `namespace`, `node`, `checkpoint_id`, `anchor_id`
  - Privacy redaction via `privacy_marks`
- Diff engine
  - Anchor‑aware alignment + windowed realignment; DeepDiff/text diffs; first divergence
  - Delta debugging: minimal failing window via `diff --bisect`
- Replay scaffolding
  - `PlaybackLLM`/`PlaybackTool` inject recorded outputs with prompt/args validation
  - `LangGraphReplayer.resume()` re‑executes from nearest checkpoint using recorded outputs
  - What‑if overrides supported (one‑shot per step)
- CLI
  - `timewarp list|events|tools|diff|debug`, plus `resume`, `inject`, and `fsck` (see below)
  - `timewarp-repl` interactive debugger for browsing timelines, inspecting prompts/tools/memory, deterministic resume, and recording what‑if forks
  - `export langsmith <run_id>` to serialize runs/events for external tooling
- Telemetry (optional)
  - OpenTelemetry spans per event; replay spans link to originals via Span Links
  - Attributes use `tw.*` keys: `tw.run_id`, `tw.step`, `tw.action_type`, `tw.actor`, `tw.replay`,
    `tw.namespace`, `tw.thread_id`, `tw.checkpoint_id`, `tw.anchor_id`, `tw.branch_of`,
    `tw.hash.output|state|prompt`

Install & Dev
-------------

Requires Python 3.11+.

Install from PyPI:

```
pip install timewarp-llm
# optional extras
pip install langgraph langchain-core  # optional runtime dependencies for recording/replay
pip install 'timewarp-llm[otel]'
pip install 'timewarp-llm[dspy]'   # optional DSPy optimizers

# CLI entry points
timewarp --help
timewarp-repl --help
```

```
uv venv && uv pip install -e .[dev]
ruff format && ruff check --fix
mypy --strict
pytest -q

Developer notes: see `docs/DEV.md` for CLI helper modules, canonical JSON path, store insert/observability details, and provenance consistency.
```

Optional dependencies
- LangGraph/LC core: `uv pip install langgraph langchain-core`
- Telemetry: `uv pip install -e .[otel]` (installs `opentelemetry-*`)
- DSPy: `uv pip install -e .[dspy]` (installs `dspy`) for optional prompt optimization

DSPy Dataset & Optimization (optional)
--------------------------------------

Timewarp can export per-agent datasets from recorded runs and optionally run DSPy optimizers
to produce improved prompt specifications.

Build a dataset from a run:

```
timewarp ./timewarp.sqlite3 ./blobs dspy build-dataset <run_id> --out ds.json
```

Run an optimizer (requires installing the `dspy` extra):

```
timewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --optimizer bootstrap --out prompts.json
# or
timewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --optimizer mipro --out prompts.json
# emit overrides JSON directly consumable by `dspy fork`
timewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --emit-overrides --out overrides.json
```

Notes
- Dataset groups examples by agent (LangGraph node). Each example includes inputs (messages when
  available), the agent's memory snapshot at `step-1`, the recorded output, and step/thread metadata.
- Optimizers are optional. If DSPy is not installed or compilation fails, the CLI emits a
  heuristic prompt template per agent with basic metrics.
- This is pre‑release functionality and may evolve without backward compatibility guarantees.

Recording a Run (LangGraph)
---------------------------

Quickstart via facade:

```
from timewarp import wrap, messages_pruner
from examples.langgraph_demo.app import make_graph

graph = make_graph()

rec = wrap(
    graph,
    project="demo",
    name="my-run",
    stream_modes=("updates", "messages", "values"),
    snapshot_every=20,
    snapshot_on=("terminal", "decision"),
    state_pruner=messages_pruner(max_len=2000, max_items=200),
    enable_record_taps=True,  # robust prompt/tool args hashing
    event_batch_size=20,      # batch appends to reduce SQLite overhead
)
result = rec.invoke({"text": "hi"}, config={"configurable": {"thread_id": "t-1"}})
print("run_id=", rec.last_run_id)
```

Manual recorder usage:

```
from pathlib import Path
from timewarp.events import Run
from timewarp.store import LocalStore
from timewarp.langgraph import LangGraphRecorder
from timewarp import messages_pruner

store = LocalStore(db_path=Path("./timewarp.db"), blobs_root=Path("./blobs"))
run = Run(project="demo", name="my-run", framework="langgraph")
rec = LangGraphRecorder(
    graph=my_compiled_graph,
    store=store,
    run=run,
    stream_modes=("updates", "values"),  # also supports "messages"
    stream_subgraphs=True,
    snapshot_on={"terminal", "decision"},
    state_pruner=messages_pruner(max_len=2000, max_items=200),
)
result = rec.invoke({"text": "hi"}, config={"configurable": {"thread_id": "t-1"}})
```

Debugging & Diffs
-----------------

```
timewarp ./timewarp.db ./blobs list
timewarp ./timewarp.db ./blobs debug <run_id>              # basic inspector (legacy)
timewarp ./timewarp.db ./blobs diff <run_a> <run_b>       # first divergence / bisect
timewarp ./timewarp.db ./blobs events <run_id> --type LLM --node compose --thread t-1 --json

# NEW: interactive debugger (recommended)
timewarp-repl ./timewarp.sqlite3 ./blobs <run_id> \
  --app examples.langgraph_demo.app:make_graph \
  --thread t-1 --freeze-time
```

Interactive Debugger (REPL)
---------------------------

Timewarp ships a richer interactive REPL that unifies timeline browsing, prompt/tools/memory
inspection, deterministic replay, what‑if injections, prompt overrides, and diffs.

- Binary: `timewarp-repl` (installed by the package)
- Programmatic: `timewarp.interactive_debug.launch_debugger(...)`

CLI usage

```
timewarp-repl <db> <blobs> <run_id> [--app module:function] [--thread ID] \
  [--freeze-time] [--strict-meta] [--allow-diff] [--overrides overrides.json]
```

Inside the REPL

```
Commands
  app module:function         Load a compiled LangGraph via factory (enables resume/inject)
  thread T                    Set thread_id to use during resume/inject
  freeze | unfreeze           Toggle freeze-time replay
  strict | nonstrict          Toggle strict meta checks (provider/model/tools invariants)
  allowdiff | disallowdiff    Toggle allowing prompt diffs during replay (for overrides)
  overrides [file.json]       Load per-agent prompt overrides; empty to clear

Views
  list [type=.. node=.. thread=.. ns=..]   Timeline (filterable)
  event STEP                  Show a single event + blob size hints
  llm                         Show the last LLM before current position
  prompt [STEP]               Messages/tools head + estimated tokens for an LLM step
  tools [STEP]                Tools summary across run or details for one LLM step
  memory                      Memory summary by space
  memory_show STEP            Full memory snapshot up to step
  memory_diff A B [key=path]  Structural diff between two snapshots (optional dotted key)

Execution
  resume [FROM_STEP]          Deterministically resume from a checkpoint using recorded outputs
  inject STEP output.json [-r|--record]
                              One-shot override at STEP; optionally record fork immediately
  fork_prompts overrides.json [-r|--record]
                              Prepare/record a fork that applies prompt overrides
  diff OTHER_RUN_ID [--bisect] [--window N]
                              Show first divergence or minimal failing window
```

Programmatic launch

```
from timewarp.interactive_debug import launch_debugger
from examples.langgraph_demo.app import make_graph

launch_debugger(
    db="./timewarp.sqlite3",
    blobs="./blobs",
    run_id="<UUID>",
    graph=make_graph(),          # or pass --app module:function via CLI
    thread_id="t-1",
    freeze_time=True,
    strict_meta=False,
    allow_diff=False,
)
```

Integrity Check (fsck)
----------------------

Verify that all blobs referenced by a run exist on disk; optionally repair and garbage‑collect orphans. Emits JSON for easy automation.

```
# Basic verification (JSON output)
timewarp ./timewarp.db ./blobs fsck <run_id>
# Attempt repair by promoting any matching .tmp files to final .bin
timewarp ./timewarp.db ./blobs fsck <run_id> --repair
# Remove blob files on disk that are not referenced by the run (dangerous);
# use a grace period to avoid racing with in-flight writes
timewarp ./timewarp.db ./blobs fsck <run_id> --gc-orphans --grace 5
```

Output shape:

```
{"missing": ["runs/<id>/events/12/output.bin", ...],
 "repaired": ["runs/<id>/events/12/output.bin", ...],
"orphans_gc": ["runs/<id>/events/99/output.bin.tmp", ...]}
```

SQLite & Concurrency
--------------------

- Single-writer-per-run: Timewarp enforces strictly increasing `step` per `run_id`.
  The `events` table uses `(run_id, step)` as its primary key and `LocalStore` guards with a
  monotonic check against `MAX(step)`. Running multiple writers for the same `run_id` can cause
  UNIQUE violations or out-of-order errors. Recommended: one process per run ID.
- PRAGMAs: Each connection applies `journal_mode=WAL`, `synchronous=NORMAL`, a configurable
  `busy_timeout`, and best-effort `foreign_keys=ON`, `temp_store=MEMORY`, and a
  `journal_size_limit`. These trade-offs aim for durable, fast appends.
- JSON1 indexes (optional): Additional indexes rely on SQLite JSON1 (`json_extract`). When JSON1
  isn’t available, index creation is skipped and a one-time warning is printed; queries still work
  but may fall back to table scans. Most modern Python builds ship SQLite with JSON1 enabled.
- Blob finalization: Blobs are written to `*.bin.tmp` and promoted to final `*.bin` on event
  append. Reading a blob may also finalize the file if a matching `.tmp` exists.
- Orphan GC: `fsck --gc-orphans` applies a grace window (`--grace`) to avoid deleting files
  created moments ago by in-flight writers.

Delta Debugging (Minimal Failing Delta)
---------------------------------------

Find the smallest contiguous mismatching window between two runs, accounting for
anchor‑aware realignment to skip benign reorders:

```
# Text output
timewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect

# JSON output (machine‑readable)
timewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect --json
# => {"start_a": <int>, "end_a": <int>, "start_b": <int>, "end_b": <int>, "cause": <str>} | {"result": null}

# Tune anchor lookahead window (default 5)
timewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect --window 3
```

Notes
- Causes: "output hash mismatch" | "anchor mismatch" | "adapter/schema mismatch".
- Benign reorders (by matching anchors) are excluded from the window.
- If all aligned pairs match but lengths differ, the trailing unmatched step is reported with cause "anchor mismatch".
- Exit codes: add `--fail-on-divergence` to return a non‑zero exit code when a divergence is found (applies to text and `--json` modes).

Deterministic Replay & What‑ifs (CLI)
-------------------------------------

Provide an app factory that returns a compiled LangGraph (example shipped):

```
--app examples.langgraph_demo.app:make_graph
```

Resume deterministically from a prior checkpoint:

```
timewarp ./timewarp.db ./blobs resume <run_id> --from 42 --thread t-1 --app examples.langgraph_demo.app:make_graph
```

Inject an alternative output at step N and fork:

```
timewarp ./timewarp.db ./blobs inject <run_id> 23 \
  --output alt_23.json \
  --thread t-1 \
  --app examples.langgraph_demo.app:make_graph \
  --record-fork   # execute and persist new branch immediately
```

Notes
- The CLI binds playback wrappers via lightweight installers that intercept LangChain ChatModel/Tool calls during replay. Your graph runs without network/tool side‑effects in replay mode.
- For forks, you can either prepare and record later, or pass `--record-fork` to execute and persist the new branch immediately. The new run is labeled with `branch_of` and `override_step` for lineage.
 - Snapshot knobs: `snapshot_every` controls cadence; `snapshot_on` can include `"terminal"` and/or `"decision"` to emit snapshots at run end and after routing decisions. You can also pass a `state_pruner` callable to trim large fields from state snapshots before persistence.
 - REPL filters: inside `debug`, run `list type=LLM node=compose thread=t-1` to view a subset.
 - Pretty state: `state --pretty` prints truncated previews with size hints.
 - Save patch: `savepatch STEP file.json` writes the event’s output JSON for reuse with `inject`.
  - Event batching: `event_batch_size` batches DB writes for throughput. For heavy runs, try `50` or `100`.

Tools, Prompt, and Memory Views
-------------------------------

Inspect tools available to the model, prompts, and memory/retrieval state reconstructed per step and agent:

```
# Tools summary across LLM steps
timewarp ./timewarp.sqlite3 ./blobs tools <run_id>

# Tools detail for a specific LLM step
timewarp ./timewarp.sqlite3 ./blobs tools <run_id> --step 42 --json

# Memory snapshots (per agent/space)
timewarp ./timewarp.sqlite3 ./blobs memory summary <run_id> --step 120
timewarp ./timewarp.sqlite3 ./blobs memory show <run_id> --step 120 --space planner --json
timewarp ./timewarp.sqlite3 ./blobs memory diff <run_id> 100 140 --space planner --scope working --key messages.0
Tips in the interactive REPL
> tools            # summary across LLM steps
> tools 42         # detail for step 42
> prompt 42        # prompt parts (messages + tools), hashes, token estimate
> memory           # summary by agent at current step
> memory_show 120  # snapshot at step 120
> memory_diff 120 140 key=messages.0  # structural diff; optional dot path
```

Details
- Available tools: extracted from recorded prompt parts when present; `tools_digest` shows a stable hash when details aren’t recorded.
- Called tools: correlated by `thread_id` and node, scanning forward until the next LLM event on the same thread.
- Token estimate: provider-agnostic heuristic (≈ chars/4) to quickly gauge prompt size; for precise tokens/costs integrate a tokenizer.
- Privacy: printed payloads respect `privacy_marks` redaction.

Capturing LangGraph memory from values
-------------------------------------

To synthesize memory from LangGraph `values` stream, configure the recorder with `memory_paths`:

```
from timewarp.langgraph import LangGraphRecorder

rec = LangGraphRecorder(
    graph=graph,
    store=store,
    run=run,
    stream_modes=("updates", "values"),
    memory_paths=("messages", "history", "scratch", "artifacts", "memory"),
)
```

Each new/changed key under these paths emits a MEMORY event (`mem_provider="LangGraphState"`) with stable `hashes.item`, `labels.anchor_id`, and inferred `mem_scope` from the path name. The CLI `memory` command reconstructs per-agent snapshots from these events.

Record‑time taps (determinism)
------------------------------

For stronger determinism checks, Timewarp can compute and store `hashes.prompt` and `hashes.args` at call sites (LangChain core). When using installers directly, start a recording session to scope staged hashes to the current run:

```
from timewarp.bindings import begin_recording_session, bind_langgraph_record

# Assuming you are using LangGraphRecorder with a concrete Run object
end_session = begin_recording_session(run.run_id)
teardown = bind_langgraph_record()
try:
    # run your graph under the recorder
    ...
finally:
    # Ensure both session and patches are cleaned up
    end_session()
    teardown()
```

- The `wrap(...)` facade auto‑enables record taps with `enable_record_taps=True` and管理 the session lifecycle for you.
- When not using `wrap(...)`, prefer `begin_recording_session(...)` to avoid any cross‑run leakage; global fallbacks are removed in dev.

Telemetry
---------

Enable OpenTelemetry by installing the extras and configuring an exporter in your app. Timewarp emits spans per event; replay spans use Span Links pointing to original spans. Attributes use the `tw.*` namespace.

Examples
--------

- Example LangGraph factory: `examples/langgraph_demo/app.py` provides `make_graph()` for quick `--app` usage in CLI.
- CLI implementation: the console entrypoint is `timewarp.cli:main`, which dispatches to a decomposed CLI package under `timewarp/cli/` (commands/helpers). Commands remain stable across versions.
- Freeze-time example: `examples/langgraph_demo/time_freeze_app.py` provides `make_graph_time()` that writes `timewarp.determinism.now()` into state so you can verify identical timestamps on replay with `--freeze-time`.
- Parallel branches example: `examples/langgraph_demo/parallel_app.py` demonstrates fan-out and join with DECISION anchors.

Multi‑Agent Demo
----------------

A realistic multi‑agent LangGraph with mock tools, a fake LLM, human‑in‑the‑loop (HITL), subgraphs, and dynamic routing:

- File: `examples/langgraph_demo/multi_agent_full.py`
- Features:
  - LLM events with prompt hashing via record‑time taps (FakeListChatModel)
  - TOOL events with MCP‑style metadata (`tool_name`, `mcp_server`, `mcp_transport`) and args hashing
  - Human‑in‑the‑loop using `langgraph.types.interrupt` + `Command(goto=...)`
  - Subgraph for review (`draft_writer` → `light_edit`) and dynamic routing to skip review
  - Snapshots on terminal + decisions; message‑rich state for pruning/pretty printing
- Run it to record a run and demonstrate resume + what‑if:

```
uv run python -m examples.langgraph_demo.multi_agent_full

# Inspect with CLI (defaults: ./timewarp.sqlite3 ./blobs)
uv run timewarp ./timewarp.sqlite3 ./blobs list
uv run timewarp ./timewarp.sqlite3 ./blobs debug <run_id>
```

End‑to‑End Script (Record → Resume → Fork → Diff)
------------------------------------------------

For a single, repeatable flow that exercises most features, use:

- File: `examples/langgraph_demo/run_all.py`
- What it does:
  - Builds the multi‑agent graph and records a baseline run
  - Resumes deterministically from the nearest checkpoint (no side‑effects)
  - Forks the run by overriding the first TOOL/LLM output (what‑if), records the branch
  - Computes first divergence and minimal failing window between base and fork
- Uses a dedicated store by default to avoid local schema drift:
  - DB: `tw_runs/demo.sqlite3`
  - Blobs: `tw_runs/blobs/`
- Run:

```
uv run python -m examples.langgraph_demo.run_all
# => prints JSON with run IDs, first divergence, and minimal window
```

Notes
- Ensure optional deps installed: `uv pip install langgraph langchain-core`.
- For CLI `resume`/`inject`, pass your app factory (e.g., `examples.langgraph_demo.app:make_graph` or `examples.langgraph_demo.multi_agent_full:make_graph_multi`).
- Full multi‑agent example: `examples/langgraph_demo/multi_agent_full.py` exercises LLM, TOOL,
  DECISION, HITL, SNAPSHOT, subgraphs, parallel fan‑out with reducers, and async paths.
- Tests exercise recorder, diff alignment, replay state reconstruction, and playback installers.

Full Multi‑Agent Demo
---------------------

Record a representative multi‑agent workflow and exercise the debugger end‑to‑end:

```
python -m examples.langgraph_demo.multi_agent_full
# prints: Recorded run_id: <UUID>
# also records a what‑if fork and, if supported, an async run
```

The script builds a graph with:
- LLM nodes (`planner`, `review:draft_writer`) and staged prompt hashes.
- TOOL node (`tooling`) with MCP‑like metadata and privacy redaction on kwargs.
- Parallel branches (`planner`, `tooling`, optional `tooling_async`) merged via a
  reducer on `artifacts` to avoid concurrent update conflicts.
- A `human` HITL interrupt, DECISION events on routing, and periodic/terminal snapshots.
- A `review` subgraph that streams when `stream_subgraphs=True`.

After recording, explore via CLI (defaults write to `./timewarp.sqlite3` and `./blobs`):

```
timewarp ./timewarp.sqlite3 ./blobs list
timewarp ./timewarp.sqlite3 ./blobs debug <run_id>
```

Resume deterministically and run a what‑if injection (using this demo’s factory):

```
timewarp ./timewarp.sqlite3 ./blobs resume <run_id> \
  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \
  --thread t-demo --freeze-time

timewarp ./timewarp.sqlite3 ./blobs inject <run_id> <step> \
  --output alt.json \
  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \
  --thread t-demo --record-fork --freeze-time
```

Prompt overrides (DSPy-style)
-----------------------------

Provide a JSON mapping of agent/node name to an override spec. A spec can be a string (treated as a system message or appended to a raw prompt) or an object with a mode and text.

Example overrides.json

```json
{
  "planner": { "mode": "prepend_system", "text": "Be concise and accurate." },
  "review": "Prefer bullet points"
}
``

Apply overrides during a non-recorded resume (tolerate prompt-hash diffs with --allow-diff):

```
timewarp ./timewarp.sqlite3 ./blobs resume <run_id> \
  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \
  --thread t-demo \
  --prompt-overrides ./overrides.json \
  --allow-diff
```

Fork and record a branch with overrides using inject:

```
timewarp ./timewarp.sqlite3 ./blobs inject <run_id> 0 \
  --prompt-overrides ./overrides.json \
  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \
  --thread t-demo \
  --allow-diff \
  --record-fork
```

Or use the dedicated helper:

```
timewarp ./timewarp.sqlite3 ./blobs dspy fork <run_id> \
  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \
  --overrides ./overrides.json \
  --thread t-demo \
  --allow-diff \
  --record-fork
```

Each forked run is labeled with `branch_of=<baseline>` and `override_step=prompt_overrides`, so you can diff the branch against the baseline:

```
timewarp ./timewarp.sqlite3 ./blobs diff <baseline_run_id> <fork_run_id> --json
```

Event Filters Cheatsheet
------------------------

Focus on specific slices of the run quickly:

```
# Only TOOL events (MCP) from the tooling node
timewarp ./timewarp.sqlite3 ./blobs events <run_id> \
  --type TOOL --tool-kind MCP --node tooling --json

# Only LLM events from the planner node
timewarp ./timewarp.sqlite3 ./blobs events <run_id> --type LLM --node planner --json

# HITL interrupts from the human node
timewarp ./timewarp.sqlite3 ./blobs events <run_id> --type HITL --node human --json

# LLM events emitted inside the review subgraph (match by namespace)
timewarp ./timewarp.sqlite3 ./blobs events <run_id> --type LLM --namespace review --json

# All DECISION anchors, useful to understand routing and joins
timewarp ./timewarp.sqlite3 ./blobs events <run_id> --type DECISION --json
```

Notes
-----

- The demo records one sync run (no async nodes) to keep `.invoke()` compatible,
  then tries an async run with `make_graph_multi(include_async=True)` using `.ainvoke()`.
- If your environment does not provide `graph.astream`, the async run is skipped.
- When using `wrap(...)` without an explicit `LocalStore`, the default DB path is
  `./timewarp.sqlite3` (examples above use that). Earlier examples may reference
  `./timewarp.db`; both are supported if you pass matching paths on the CLI.

MCP Example (optional)
----------------------

When `langgraph` and `langchain-mcp-adapters` are available, you can run the MCP demo app:

```
# Record a run using the MCP example app
python - <<'PY'
from pathlib import Path
from timewarp.store import LocalStore
from timewarp.events import Run
from timewarp.langgraph import LangGraphRecorder
from examples.langgraph_demo.mcp_app import make_graph_mcp

store = LocalStore(db_path=Path('./timewarp.db'), blobs_root=Path('./blobs'))
graph = make_graph_mcp()
run = Run(project='demo', name='mcp', framework='langgraph')
rec = LangGraphRecorder(graph=graph, store=store, run=run, stream_modes=("messages","updates"), stream_subgraphs=True)
_ = rec.invoke({"text":"hi"}, config={"configurable": {"thread_id": "t-1"}})
print('run_id=', run.run_id)
PY

# View TOOL events with MCP metadata
timewarp ./timewarp.db ./blobs events <run_id> --type TOOL --tool-kind MCP --json
```

Note: MCP metadata is best-effort and dependent on adapter/provider behavior. In environments
where the stream does not emit tool metadata, you may not observe TOOL events for MCP calls.

HITL & Privacy Docs
-------------------

- HITL patterns with LangGraph (DECISION anchors, snapshots, CLI tips): see `docs/hitl.md`.
- Privacy marks and redaction strategies (`redact`, `mask4`) with examples: see `docs/privacy.md`.

Time Provider & Freeze-Time
---------------------------

Use `timewarp.determinism.now()` in your graphs to obtain a deterministic clock.
Recording uses `now()` for `Run.started_at` and all `Event.ts`. During replay, you can
freeze time to the recorded event timestamps.

Programmatic replay:

```
from timewarp.replay import LangGraphReplayer

replayer = LangGraphReplayer(graph=my_graph, store=store)
from timewarp.bindings import bind_langgraph_playback

# Define an installer with the standard 3‑arg signature
def installer(llm, tool, memory) -> None:
    bind_langgraph_playback(my_graph, llm, tool, memory)

session = replayer.resume(
    run_id, from_step=None, thread_id="t-1", install_wrappers=installer, freeze_time=True
)
```

CLI replay with frozen time:

```
timewarp ./timewarp.db ./blobs resume <run_id> --app examples.langgraph_demo.time_freeze_app:make_graph_time --thread t-1 --freeze-time

timewarp ./timewarp.db ./blobs inject <run_id> <step> --output alt.json \
  --app examples.langgraph_demo.time_freeze_app:make_graph_time --thread t-1 --freeze-time --record-fork
```

Example graph writes the ISO timestamp to state (key `now_iso`). With `--freeze-time`,
replay preserves the exact value that was recorded.

Replay Convenience Facade
-------------------------

You can also resume deterministically via a one-call facade:

```
from timewarp import Replay

session = Replay.resume(
    store,
    app_factory="examples.langgraph_demo.app:make_graph",
    run_id=<UUID>,
    from_step=42,
    thread_id="t-1",
    strict_meta=True,
    freeze_time=True,
)
print(session.result)
```

Exporters
---------

Use the CLI to export a run in a LangSmith-friendly JSON bundle:

```
timewarp ./timewarp.db ./blobs export langsmith <run_id> --include-blobs
```

The module `timewarp.exporters.langsmith` also exposes `serialize_run(...)` and `export_run(...)` for programmatic use.

OpenTelemetry Quickstart
------------------------

See `docs/otel-quickstart.md` for a minimal setup to emit spans per event and link replay spans to recorded ones.

CLI Internals (Contributors)
----------------------------

- Entry point: `timewarp.cli:main` dispatches to a decomposed CLI under `timewarp/cli/`.
- Commands: `timewarp/cli/commands/*` implement subcommands (list, events, tools, diff, resume, inject, export, fsck, debug).
- Helpers: `timewarp/cli/helpers/*` contains small utilities used by the CLI only:
  - `jsonio`: `print_json`, `dumps_text`, `loads_file` (orjson‑backed).
  - `state`: `format_state_pretty`, `dump_event_output_to_file`.
  - `events`: `filter_events`.
  - `filters`: `parse_list_filters`.
- Stability: these helper modules are implementation details for the CLI and are not part of the public API; they may change between versions.
- Programmatic use: prefer the core modules and top‑level exports (`timewarp.events`, `timewarp.store`, `timewarp.diff`, `timewarp.replay`, `timewarp.langgraph`, etc.).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "timewarp-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "LLM, LangGraph, LangChain, event-sourcing, replay, debugger, OpenTelemetry, CLI, determinism",
    "author": "Timewarp maintainers",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/3c/df/3f14cdf0d26adf9ed13fd186514900c90f790d6cff7c9757b01efc125257/timewarp_llm-0.2.1.tar.gz",
    "platform": null,
    "description": "Timewarp \u2014 Deterministic Replay & Time\u2011Travel Debugger for LLM Agent Workflows\n==============================================================================\n\n[![PyPI](https://img.shields.io/pypi/v/timewarp-llm.svg)](https://pypi.org/project/timewarp-llm/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/timewarp-llm.svg)](https://pypi.org/project/timewarp-llm/)\n[![License](https://img.shields.io/pypi/l/timewarp-llm.svg)](./LICENSE)\n[![CI](https://github.com/aleks-apostle/timewarp/actions/workflows/ci.yml/badge.svg)](https://github.com/aleks-apostle/timewarp/actions/workflows/ci.yml)\n[![Publish](https://github.com/aleks-apostle/timewarp/actions/workflows/release.yml/badge.svg)](https://github.com/aleks-apostle/timewarp/actions/workflows/release.yml)\n\nRecord every step. Rewind any step. Reproduce any run.\n\nTimewarp adds event\u2011sourced logging and deterministic replay to agent frameworks (LangGraph first, LangChain optional), plus an interactive REPL debugger for step\u2011through, diffs, and what\u2011if edits. It fills a well\u2011documented gap: mainstream tools visualize traces but don\u2019t let you replay them exactly.\n\nWhat\u2019s Included (v0.1 core)\n---------------------------\n\n- Core models and helpers\n  - `timewarp.events`: Pydantic v2 models (`Run`, `Event`, `BlobRef`), hashing, redaction\n  - `timewarp.codec`: Canonical JSON (orjson), Zstandard compression\n  - `timewarp.determinism`: RNG snapshot/restore\n- Local store\n  - `timewarp.store.LocalStore`: SQLite (WAL) for runs/events + filesystem blobs\n  - Deterministic blob layout: `runs/<run_id>/events/<step>/<kind>.bin` (zstd)\n  - Connection PRAGMAs applied per-connection: `journal_mode=WAL`, `synchronous=NORMAL`, configurable busy timeout\n  - Monotonic steps: per-run event `step` must strictly increase; single-writer-per-run is recommended for correctness\n- LangGraph recording\n  - `timewarp.langgraph.LangGraphRecorder`: streams `updates|values|messages`, records `LLM|TOOL|DECISION|HITL|SNAPSHOT` events\n  - Labels include `thread_id`, `namespace`, `node`, `checkpoint_id`, `anchor_id`\n  - Privacy redaction via `privacy_marks`\n- Diff engine\n  - Anchor\u2011aware alignment + windowed realignment; DeepDiff/text diffs; first divergence\n  - Delta debugging: minimal failing window via `diff --bisect`\n- Replay scaffolding\n  - `PlaybackLLM`/`PlaybackTool` inject recorded outputs with prompt/args validation\n  - `LangGraphReplayer.resume()` re\u2011executes from nearest checkpoint using recorded outputs\n  - What\u2011if overrides supported (one\u2011shot per step)\n- CLI\n  - `timewarp list|events|tools|diff|debug`, plus `resume`, `inject`, and `fsck` (see below)\n  - `timewarp-repl` interactive debugger for browsing timelines, inspecting prompts/tools/memory, deterministic resume, and recording what\u2011if forks\n  - `export langsmith <run_id>` to serialize runs/events for external tooling\n- Telemetry (optional)\n  - OpenTelemetry spans per event; replay spans link to originals via Span Links\n  - Attributes use `tw.*` keys: `tw.run_id`, `tw.step`, `tw.action_type`, `tw.actor`, `tw.replay`,\n    `tw.namespace`, `tw.thread_id`, `tw.checkpoint_id`, `tw.anchor_id`, `tw.branch_of`,\n    `tw.hash.output|state|prompt`\n\nInstall & Dev\n-------------\n\nRequires Python 3.11+.\n\nInstall from PyPI:\n\n```\npip install timewarp-llm\n# optional extras\npip install langgraph langchain-core  # optional runtime dependencies for recording/replay\npip install 'timewarp-llm[otel]'\npip install 'timewarp-llm[dspy]'   # optional DSPy optimizers\n\n# CLI entry points\ntimewarp --help\ntimewarp-repl --help\n```\n\n```\nuv venv && uv pip install -e .[dev]\nruff format && ruff check --fix\nmypy --strict\npytest -q\n\nDeveloper notes: see `docs/DEV.md` for CLI helper modules, canonical JSON path, store insert/observability details, and provenance consistency.\n```\n\nOptional dependencies\n- LangGraph/LC core: `uv pip install langgraph langchain-core`\n- Telemetry: `uv pip install -e .[otel]` (installs `opentelemetry-*`)\n- DSPy: `uv pip install -e .[dspy]` (installs `dspy`) for optional prompt optimization\n\nDSPy Dataset & Optimization (optional)\n--------------------------------------\n\nTimewarp can export per-agent datasets from recorded runs and optionally run DSPy optimizers\nto produce improved prompt specifications.\n\nBuild a dataset from a run:\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs dspy build-dataset <run_id> --out ds.json\n```\n\nRun an optimizer (requires installing the `dspy` extra):\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --optimizer bootstrap --out prompts.json\n# or\ntimewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --optimizer mipro --out prompts.json\n# emit overrides JSON directly consumable by `dspy fork`\ntimewarp ./timewarp.sqlite3 ./blobs dspy optimize ds.json --emit-overrides --out overrides.json\n```\n\nNotes\n- Dataset groups examples by agent (LangGraph node). Each example includes inputs (messages when\n  available), the agent's memory snapshot at `step-1`, the recorded output, and step/thread metadata.\n- Optimizers are optional. If DSPy is not installed or compilation fails, the CLI emits a\n  heuristic prompt template per agent with basic metrics.\n- This is pre\u2011release functionality and may evolve without backward compatibility guarantees.\n\nRecording a Run (LangGraph)\n---------------------------\n\nQuickstart via facade:\n\n```\nfrom timewarp import wrap, messages_pruner\nfrom examples.langgraph_demo.app import make_graph\n\ngraph = make_graph()\n\nrec = wrap(\n    graph,\n    project=\"demo\",\n    name=\"my-run\",\n    stream_modes=(\"updates\", \"messages\", \"values\"),\n    snapshot_every=20,\n    snapshot_on=(\"terminal\", \"decision\"),\n    state_pruner=messages_pruner(max_len=2000, max_items=200),\n    enable_record_taps=True,  # robust prompt/tool args hashing\n    event_batch_size=20,      # batch appends to reduce SQLite overhead\n)\nresult = rec.invoke({\"text\": \"hi\"}, config={\"configurable\": {\"thread_id\": \"t-1\"}})\nprint(\"run_id=\", rec.last_run_id)\n```\n\nManual recorder usage:\n\n```\nfrom pathlib import Path\nfrom timewarp.events import Run\nfrom timewarp.store import LocalStore\nfrom timewarp.langgraph import LangGraphRecorder\nfrom timewarp import messages_pruner\n\nstore = LocalStore(db_path=Path(\"./timewarp.db\"), blobs_root=Path(\"./blobs\"))\nrun = Run(project=\"demo\", name=\"my-run\", framework=\"langgraph\")\nrec = LangGraphRecorder(\n    graph=my_compiled_graph,\n    store=store,\n    run=run,\n    stream_modes=(\"updates\", \"values\"),  # also supports \"messages\"\n    stream_subgraphs=True,\n    snapshot_on={\"terminal\", \"decision\"},\n    state_pruner=messages_pruner(max_len=2000, max_items=200),\n)\nresult = rec.invoke({\"text\": \"hi\"}, config={\"configurable\": {\"thread_id\": \"t-1\"}})\n```\n\nDebugging & Diffs\n-----------------\n\n```\ntimewarp ./timewarp.db ./blobs list\ntimewarp ./timewarp.db ./blobs debug <run_id>              # basic inspector (legacy)\ntimewarp ./timewarp.db ./blobs diff <run_a> <run_b>       # first divergence / bisect\ntimewarp ./timewarp.db ./blobs events <run_id> --type LLM --node compose --thread t-1 --json\n\n# NEW: interactive debugger (recommended)\ntimewarp-repl ./timewarp.sqlite3 ./blobs <run_id> \\\n  --app examples.langgraph_demo.app:make_graph \\\n  --thread t-1 --freeze-time\n```\n\nInteractive Debugger (REPL)\n---------------------------\n\nTimewarp ships a richer interactive REPL that unifies timeline browsing, prompt/tools/memory\ninspection, deterministic replay, what\u2011if injections, prompt overrides, and diffs.\n\n- Binary: `timewarp-repl` (installed by the package)\n- Programmatic: `timewarp.interactive_debug.launch_debugger(...)`\n\nCLI usage\n\n```\ntimewarp-repl <db> <blobs> <run_id> [--app module:function] [--thread ID] \\\n  [--freeze-time] [--strict-meta] [--allow-diff] [--overrides overrides.json]\n```\n\nInside the REPL\n\n```\nCommands\n  app module:function         Load a compiled LangGraph via factory (enables resume/inject)\n  thread T                    Set thread_id to use during resume/inject\n  freeze | unfreeze           Toggle freeze-time replay\n  strict | nonstrict          Toggle strict meta checks (provider/model/tools invariants)\n  allowdiff | disallowdiff    Toggle allowing prompt diffs during replay (for overrides)\n  overrides [file.json]       Load per-agent prompt overrides; empty to clear\n\nViews\n  list [type=.. node=.. thread=.. ns=..]   Timeline (filterable)\n  event STEP                  Show a single event + blob size hints\n  llm                         Show the last LLM before current position\n  prompt [STEP]               Messages/tools head + estimated tokens for an LLM step\n  tools [STEP]                Tools summary across run or details for one LLM step\n  memory                      Memory summary by space\n  memory_show STEP            Full memory snapshot up to step\n  memory_diff A B [key=path]  Structural diff between two snapshots (optional dotted key)\n\nExecution\n  resume [FROM_STEP]          Deterministically resume from a checkpoint using recorded outputs\n  inject STEP output.json [-r|--record]\n                              One-shot override at STEP; optionally record fork immediately\n  fork_prompts overrides.json [-r|--record]\n                              Prepare/record a fork that applies prompt overrides\n  diff OTHER_RUN_ID [--bisect] [--window N]\n                              Show first divergence or minimal failing window\n```\n\nProgrammatic launch\n\n```\nfrom timewarp.interactive_debug import launch_debugger\nfrom examples.langgraph_demo.app import make_graph\n\nlaunch_debugger(\n    db=\"./timewarp.sqlite3\",\n    blobs=\"./blobs\",\n    run_id=\"<UUID>\",\n    graph=make_graph(),          # or pass --app module:function via CLI\n    thread_id=\"t-1\",\n    freeze_time=True,\n    strict_meta=False,\n    allow_diff=False,\n)\n```\n\nIntegrity Check (fsck)\n----------------------\n\nVerify that all blobs referenced by a run exist on disk; optionally repair and garbage\u2011collect orphans. Emits JSON for easy automation.\n\n```\n# Basic verification (JSON output)\ntimewarp ./timewarp.db ./blobs fsck <run_id>\n# Attempt repair by promoting any matching .tmp files to final .bin\ntimewarp ./timewarp.db ./blobs fsck <run_id> --repair\n# Remove blob files on disk that are not referenced by the run (dangerous);\n# use a grace period to avoid racing with in-flight writes\ntimewarp ./timewarp.db ./blobs fsck <run_id> --gc-orphans --grace 5\n```\n\nOutput shape:\n\n```\n{\"missing\": [\"runs/<id>/events/12/output.bin\", ...],\n \"repaired\": [\"runs/<id>/events/12/output.bin\", ...],\n\"orphans_gc\": [\"runs/<id>/events/99/output.bin.tmp\", ...]}\n```\n\nSQLite & Concurrency\n--------------------\n\n- Single-writer-per-run: Timewarp enforces strictly increasing `step` per `run_id`.\n  The `events` table uses `(run_id, step)` as its primary key and `LocalStore` guards with a\n  monotonic check against `MAX(step)`. Running multiple writers for the same `run_id` can cause\n  UNIQUE violations or out-of-order errors. Recommended: one process per run ID.\n- PRAGMAs: Each connection applies `journal_mode=WAL`, `synchronous=NORMAL`, a configurable\n  `busy_timeout`, and best-effort `foreign_keys=ON`, `temp_store=MEMORY`, and a\n  `journal_size_limit`. These trade-offs aim for durable, fast appends.\n- JSON1 indexes (optional): Additional indexes rely on SQLite JSON1 (`json_extract`). When JSON1\n  isn\u2019t available, index creation is skipped and a one-time warning is printed; queries still work\n  but may fall back to table scans. Most modern Python builds ship SQLite with JSON1 enabled.\n- Blob finalization: Blobs are written to `*.bin.tmp` and promoted to final `*.bin` on event\n  append. Reading a blob may also finalize the file if a matching `.tmp` exists.\n- Orphan GC: `fsck --gc-orphans` applies a grace window (`--grace`) to avoid deleting files\n  created moments ago by in-flight writers.\n\nDelta Debugging (Minimal Failing Delta)\n---------------------------------------\n\nFind the smallest contiguous mismatching window between two runs, accounting for\nanchor\u2011aware realignment to skip benign reorders:\n\n```\n# Text output\ntimewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect\n\n# JSON output (machine\u2011readable)\ntimewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect --json\n# => {\"start_a\": <int>, \"end_a\": <int>, \"start_b\": <int>, \"end_b\": <int>, \"cause\": <str>} | {\"result\": null}\n\n# Tune anchor lookahead window (default 5)\ntimewarp ./timewarp.db ./blobs diff <run_a> <run_b> --bisect --window 3\n```\n\nNotes\n- Causes: \"output hash mismatch\" | \"anchor mismatch\" | \"adapter/schema mismatch\".\n- Benign reorders (by matching anchors) are excluded from the window.\n- If all aligned pairs match but lengths differ, the trailing unmatched step is reported with cause \"anchor mismatch\".\n- Exit codes: add `--fail-on-divergence` to return a non\u2011zero exit code when a divergence is found (applies to text and `--json` modes).\n\nDeterministic Replay & What\u2011ifs (CLI)\n-------------------------------------\n\nProvide an app factory that returns a compiled LangGraph (example shipped):\n\n```\n--app examples.langgraph_demo.app:make_graph\n```\n\nResume deterministically from a prior checkpoint:\n\n```\ntimewarp ./timewarp.db ./blobs resume <run_id> --from 42 --thread t-1 --app examples.langgraph_demo.app:make_graph\n```\n\nInject an alternative output at step N and fork:\n\n```\ntimewarp ./timewarp.db ./blobs inject <run_id> 23 \\\n  --output alt_23.json \\\n  --thread t-1 \\\n  --app examples.langgraph_demo.app:make_graph \\\n  --record-fork   # execute and persist new branch immediately\n```\n\nNotes\n- The CLI binds playback wrappers via lightweight installers that intercept LangChain ChatModel/Tool calls during replay. Your graph runs without network/tool side\u2011effects in replay mode.\n- For forks, you can either prepare and record later, or pass `--record-fork` to execute and persist the new branch immediately. The new run is labeled with `branch_of` and `override_step` for lineage.\n - Snapshot knobs: `snapshot_every` controls cadence; `snapshot_on` can include `\"terminal\"` and/or `\"decision\"` to emit snapshots at run end and after routing decisions. You can also pass a `state_pruner` callable to trim large fields from state snapshots before persistence.\n - REPL filters: inside `debug`, run `list type=LLM node=compose thread=t-1` to view a subset.\n - Pretty state: `state --pretty` prints truncated previews with size hints.\n - Save patch: `savepatch STEP file.json` writes the event\u2019s output JSON for reuse with `inject`.\n  - Event batching: `event_batch_size` batches DB writes for throughput. For heavy runs, try `50` or `100`.\n\nTools, Prompt, and Memory Views\n-------------------------------\n\nInspect tools available to the model, prompts, and memory/retrieval state reconstructed per step and agent:\n\n```\n# Tools summary across LLM steps\ntimewarp ./timewarp.sqlite3 ./blobs tools <run_id>\n\n# Tools detail for a specific LLM step\ntimewarp ./timewarp.sqlite3 ./blobs tools <run_id> --step 42 --json\n\n# Memory snapshots (per agent/space)\ntimewarp ./timewarp.sqlite3 ./blobs memory summary <run_id> --step 120\ntimewarp ./timewarp.sqlite3 ./blobs memory show <run_id> --step 120 --space planner --json\ntimewarp ./timewarp.sqlite3 ./blobs memory diff <run_id> 100 140 --space planner --scope working --key messages.0\nTips in the interactive REPL\n> tools            # summary across LLM steps\n> tools 42         # detail for step 42\n> prompt 42        # prompt parts (messages + tools), hashes, token estimate\n> memory           # summary by agent at current step\n> memory_show 120  # snapshot at step 120\n> memory_diff 120 140 key=messages.0  # structural diff; optional dot path\n```\n\nDetails\n- Available tools: extracted from recorded prompt parts when present; `tools_digest` shows a stable hash when details aren\u2019t recorded.\n- Called tools: correlated by `thread_id` and node, scanning forward until the next LLM event on the same thread.\n- Token estimate: provider-agnostic heuristic (\u2248 chars/4) to quickly gauge prompt size; for precise tokens/costs integrate a tokenizer.\n- Privacy: printed payloads respect `privacy_marks` redaction.\n\nCapturing LangGraph memory from values\n-------------------------------------\n\nTo synthesize memory from LangGraph `values` stream, configure the recorder with `memory_paths`:\n\n```\nfrom timewarp.langgraph import LangGraphRecorder\n\nrec = LangGraphRecorder(\n    graph=graph,\n    store=store,\n    run=run,\n    stream_modes=(\"updates\", \"values\"),\n    memory_paths=(\"messages\", \"history\", \"scratch\", \"artifacts\", \"memory\"),\n)\n```\n\nEach new/changed key under these paths emits a MEMORY event (`mem_provider=\"LangGraphState\"`) with stable `hashes.item`, `labels.anchor_id`, and inferred `mem_scope` from the path name. The CLI `memory` command reconstructs per-agent snapshots from these events.\n\nRecord\u2011time taps (determinism)\n------------------------------\n\nFor stronger determinism checks, Timewarp can compute and store `hashes.prompt` and `hashes.args` at call sites (LangChain core). When using installers directly, start a recording session to scope staged hashes to the current run:\n\n```\nfrom timewarp.bindings import begin_recording_session, bind_langgraph_record\n\n# Assuming you are using LangGraphRecorder with a concrete Run object\nend_session = begin_recording_session(run.run_id)\nteardown = bind_langgraph_record()\ntry:\n    # run your graph under the recorder\n    ...\nfinally:\n    # Ensure both session and patches are cleaned up\n    end_session()\n    teardown()\n```\n\n- The `wrap(...)` facade auto\u2011enables record taps with `enable_record_taps=True` and\u7ba1\u7406 the session lifecycle for you.\n- When not using `wrap(...)`, prefer `begin_recording_session(...)` to avoid any cross\u2011run leakage; global fallbacks are removed in dev.\n\nTelemetry\n---------\n\nEnable OpenTelemetry by installing the extras and configuring an exporter in your app. Timewarp emits spans per event; replay spans use Span Links pointing to original spans. Attributes use the `tw.*` namespace.\n\nExamples\n--------\n\n- Example LangGraph factory: `examples/langgraph_demo/app.py` provides `make_graph()` for quick `--app` usage in CLI.\n- CLI implementation: the console entrypoint is `timewarp.cli:main`, which dispatches to a decomposed CLI package under `timewarp/cli/` (commands/helpers). Commands remain stable across versions.\n- Freeze-time example: `examples/langgraph_demo/time_freeze_app.py` provides `make_graph_time()` that writes `timewarp.determinism.now()` into state so you can verify identical timestamps on replay with `--freeze-time`.\n- Parallel branches example: `examples/langgraph_demo/parallel_app.py` demonstrates fan-out and join with DECISION anchors.\n\nMulti\u2011Agent Demo\n----------------\n\nA realistic multi\u2011agent LangGraph with mock tools, a fake LLM, human\u2011in\u2011the\u2011loop (HITL), subgraphs, and dynamic routing:\n\n- File: `examples/langgraph_demo/multi_agent_full.py`\n- Features:\n  - LLM events with prompt hashing via record\u2011time taps (FakeListChatModel)\n  - TOOL events with MCP\u2011style metadata (`tool_name`, `mcp_server`, `mcp_transport`) and args hashing\n  - Human\u2011in\u2011the\u2011loop using `langgraph.types.interrupt` + `Command(goto=...)`\n  - Subgraph for review (`draft_writer` \u2192 `light_edit`) and dynamic routing to skip review\n  - Snapshots on terminal + decisions; message\u2011rich state for pruning/pretty printing\n- Run it to record a run and demonstrate resume + what\u2011if:\n\n```\nuv run python -m examples.langgraph_demo.multi_agent_full\n\n# Inspect with CLI (defaults: ./timewarp.sqlite3 ./blobs)\nuv run timewarp ./timewarp.sqlite3 ./blobs list\nuv run timewarp ./timewarp.sqlite3 ./blobs debug <run_id>\n```\n\nEnd\u2011to\u2011End Script (Record \u2192 Resume \u2192 Fork \u2192 Diff)\n------------------------------------------------\n\nFor a single, repeatable flow that exercises most features, use:\n\n- File: `examples/langgraph_demo/run_all.py`\n- What it does:\n  - Builds the multi\u2011agent graph and records a baseline run\n  - Resumes deterministically from the nearest checkpoint (no side\u2011effects)\n  - Forks the run by overriding the first TOOL/LLM output (what\u2011if), records the branch\n  - Computes first divergence and minimal failing window between base and fork\n- Uses a dedicated store by default to avoid local schema drift:\n  - DB: `tw_runs/demo.sqlite3`\n  - Blobs: `tw_runs/blobs/`\n- Run:\n\n```\nuv run python -m examples.langgraph_demo.run_all\n# => prints JSON with run IDs, first divergence, and minimal window\n```\n\nNotes\n- Ensure optional deps installed: `uv pip install langgraph langchain-core`.\n- For CLI `resume`/`inject`, pass your app factory (e.g., `examples.langgraph_demo.app:make_graph` or `examples.langgraph_demo.multi_agent_full:make_graph_multi`).\n- Full multi\u2011agent example: `examples/langgraph_demo/multi_agent_full.py` exercises LLM, TOOL,\n  DECISION, HITL, SNAPSHOT, subgraphs, parallel fan\u2011out with reducers, and async paths.\n- Tests exercise recorder, diff alignment, replay state reconstruction, and playback installers.\n\nFull Multi\u2011Agent Demo\n---------------------\n\nRecord a representative multi\u2011agent workflow and exercise the debugger end\u2011to\u2011end:\n\n```\npython -m examples.langgraph_demo.multi_agent_full\n# prints: Recorded run_id: <UUID>\n# also records a what\u2011if fork and, if supported, an async run\n```\n\nThe script builds a graph with:\n- LLM nodes (`planner`, `review:draft_writer`) and staged prompt hashes.\n- TOOL node (`tooling`) with MCP\u2011like metadata and privacy redaction on kwargs.\n- Parallel branches (`planner`, `tooling`, optional `tooling_async`) merged via a\n  reducer on `artifacts` to avoid concurrent update conflicts.\n- A `human` HITL interrupt, DECISION events on routing, and periodic/terminal snapshots.\n- A `review` subgraph that streams when `stream_subgraphs=True`.\n\nAfter recording, explore via CLI (defaults write to `./timewarp.sqlite3` and `./blobs`):\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs list\ntimewarp ./timewarp.sqlite3 ./blobs debug <run_id>\n```\n\nResume deterministically and run a what\u2011if injection (using this demo\u2019s factory):\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs resume <run_id> \\\n  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \\\n  --thread t-demo --freeze-time\n\ntimewarp ./timewarp.sqlite3 ./blobs inject <run_id> <step> \\\n  --output alt.json \\\n  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \\\n  --thread t-demo --record-fork --freeze-time\n```\n\nPrompt overrides (DSPy-style)\n-----------------------------\n\nProvide a JSON mapping of agent/node name to an override spec. A spec can be a string (treated as a system message or appended to a raw prompt) or an object with a mode and text.\n\nExample overrides.json\n\n```json\n{\n  \"planner\": { \"mode\": \"prepend_system\", \"text\": \"Be concise and accurate.\" },\n  \"review\": \"Prefer bullet points\"\n}\n``\n\nApply overrides during a non-recorded resume (tolerate prompt-hash diffs with --allow-diff):\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs resume <run_id> \\\n  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \\\n  --thread t-demo \\\n  --prompt-overrides ./overrides.json \\\n  --allow-diff\n```\n\nFork and record a branch with overrides using inject:\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs inject <run_id> 0 \\\n  --prompt-overrides ./overrides.json \\\n  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \\\n  --thread t-demo \\\n  --allow-diff \\\n  --record-fork\n```\n\nOr use the dedicated helper:\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs dspy fork <run_id> \\\n  --app examples.langgraph_demo.multi_agent_full:make_graph_multi \\\n  --overrides ./overrides.json \\\n  --thread t-demo \\\n  --allow-diff \\\n  --record-fork\n```\n\nEach forked run is labeled with `branch_of=<baseline>` and `override_step=prompt_overrides`, so you can diff the branch against the baseline:\n\n```\ntimewarp ./timewarp.sqlite3 ./blobs diff <baseline_run_id> <fork_run_id> --json\n```\n\nEvent Filters Cheatsheet\n------------------------\n\nFocus on specific slices of the run quickly:\n\n```\n# Only TOOL events (MCP) from the tooling node\ntimewarp ./timewarp.sqlite3 ./blobs events <run_id> \\\n  --type TOOL --tool-kind MCP --node tooling --json\n\n# Only LLM events from the planner node\ntimewarp ./timewarp.sqlite3 ./blobs events <run_id> --type LLM --node planner --json\n\n# HITL interrupts from the human node\ntimewarp ./timewarp.sqlite3 ./blobs events <run_id> --type HITL --node human --json\n\n# LLM events emitted inside the review subgraph (match by namespace)\ntimewarp ./timewarp.sqlite3 ./blobs events <run_id> --type LLM --namespace review --json\n\n# All DECISION anchors, useful to understand routing and joins\ntimewarp ./timewarp.sqlite3 ./blobs events <run_id> --type DECISION --json\n```\n\nNotes\n-----\n\n- The demo records one sync run (no async nodes) to keep `.invoke()` compatible,\n  then tries an async run with `make_graph_multi(include_async=True)` using `.ainvoke()`.\n- If your environment does not provide `graph.astream`, the async run is skipped.\n- When using `wrap(...)` without an explicit `LocalStore`, the default DB path is\n  `./timewarp.sqlite3` (examples above use that). Earlier examples may reference\n  `./timewarp.db`; both are supported if you pass matching paths on the CLI.\n\nMCP Example (optional)\n----------------------\n\nWhen `langgraph` and `langchain-mcp-adapters` are available, you can run the MCP demo app:\n\n```\n# Record a run using the MCP example app\npython - <<'PY'\nfrom pathlib import Path\nfrom timewarp.store import LocalStore\nfrom timewarp.events import Run\nfrom timewarp.langgraph import LangGraphRecorder\nfrom examples.langgraph_demo.mcp_app import make_graph_mcp\n\nstore = LocalStore(db_path=Path('./timewarp.db'), blobs_root=Path('./blobs'))\ngraph = make_graph_mcp()\nrun = Run(project='demo', name='mcp', framework='langgraph')\nrec = LangGraphRecorder(graph=graph, store=store, run=run, stream_modes=(\"messages\",\"updates\"), stream_subgraphs=True)\n_ = rec.invoke({\"text\":\"hi\"}, config={\"configurable\": {\"thread_id\": \"t-1\"}})\nprint('run_id=', run.run_id)\nPY\n\n# View TOOL events with MCP metadata\ntimewarp ./timewarp.db ./blobs events <run_id> --type TOOL --tool-kind MCP --json\n```\n\nNote: MCP metadata is best-effort and dependent on adapter/provider behavior. In environments\nwhere the stream does not emit tool metadata, you may not observe TOOL events for MCP calls.\n\nHITL & Privacy Docs\n-------------------\n\n- HITL patterns with LangGraph (DECISION anchors, snapshots, CLI tips): see `docs/hitl.md`.\n- Privacy marks and redaction strategies (`redact`, `mask4`) with examples: see `docs/privacy.md`.\n\nTime Provider & Freeze-Time\n---------------------------\n\nUse `timewarp.determinism.now()` in your graphs to obtain a deterministic clock.\nRecording uses `now()` for `Run.started_at` and all `Event.ts`. During replay, you can\nfreeze time to the recorded event timestamps.\n\nProgrammatic replay:\n\n```\nfrom timewarp.replay import LangGraphReplayer\n\nreplayer = LangGraphReplayer(graph=my_graph, store=store)\nfrom timewarp.bindings import bind_langgraph_playback\n\n# Define an installer with the standard 3\u2011arg signature\ndef installer(llm, tool, memory) -> None:\n    bind_langgraph_playback(my_graph, llm, tool, memory)\n\nsession = replayer.resume(\n    run_id, from_step=None, thread_id=\"t-1\", install_wrappers=installer, freeze_time=True\n)\n```\n\nCLI replay with frozen time:\n\n```\ntimewarp ./timewarp.db ./blobs resume <run_id> --app examples.langgraph_demo.time_freeze_app:make_graph_time --thread t-1 --freeze-time\n\ntimewarp ./timewarp.db ./blobs inject <run_id> <step> --output alt.json \\\n  --app examples.langgraph_demo.time_freeze_app:make_graph_time --thread t-1 --freeze-time --record-fork\n```\n\nExample graph writes the ISO timestamp to state (key `now_iso`). With `--freeze-time`,\nreplay preserves the exact value that was recorded.\n\nReplay Convenience Facade\n-------------------------\n\nYou can also resume deterministically via a one-call facade:\n\n```\nfrom timewarp import Replay\n\nsession = Replay.resume(\n    store,\n    app_factory=\"examples.langgraph_demo.app:make_graph\",\n    run_id=<UUID>,\n    from_step=42,\n    thread_id=\"t-1\",\n    strict_meta=True,\n    freeze_time=True,\n)\nprint(session.result)\n```\n\nExporters\n---------\n\nUse the CLI to export a run in a LangSmith-friendly JSON bundle:\n\n```\ntimewarp ./timewarp.db ./blobs export langsmith <run_id> --include-blobs\n```\n\nThe module `timewarp.exporters.langsmith` also exposes `serialize_run(...)` and `export_run(...)` for programmatic use.\n\nOpenTelemetry Quickstart\n------------------------\n\nSee `docs/otel-quickstart.md` for a minimal setup to emit spans per event and link replay spans to recorded ones.\n\nCLI Internals (Contributors)\n----------------------------\n\n- Entry point: `timewarp.cli:main` dispatches to a decomposed CLI under `timewarp/cli/`.\n- Commands: `timewarp/cli/commands/*` implement subcommands (list, events, tools, diff, resume, inject, export, fsck, debug).\n- Helpers: `timewarp/cli/helpers/*` contains small utilities used by the CLI only:\n  - `jsonio`: `print_json`, `dumps_text`, `loads_file` (orjson\u2011backed).\n  - `state`: `format_state_pretty`, `dump_event_output_to_file`.\n  - `events`: `filter_events`.\n  - `filters`: `parse_list_filters`.\n- Stability: these helper modules are implementation details for the CLI and are not part of the public API; they may change between versions.\n- Programmatic use: prefer the core modules and top\u2011level exports (`timewarp.events`, `timewarp.store`, `timewarp.diff`, `timewarp.replay`, `timewarp.langgraph`, etc.).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Event-sourced recording + deterministic replay + time\u2011travel debugger for LLM agents.",
    "version": "0.2.1",
    "project_urls": {
        "Changelog": "https://github.com/aleks-apostle/timewarp/releases",
        "Homepage": "https://github.com/aleks-apostle/timewarp",
        "Issues": "https://github.com/aleks-apostle/timewarp/issues",
        "Repository": "https://github.com/aleks-apostle/timewarp"
    },
    "split_keywords": [
        "llm",
        " langgraph",
        " langchain",
        " event-sourcing",
        " replay",
        " debugger",
        " opentelemetry",
        " cli",
        " determinism"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b4c6da897a03d7399556f7056189187a2348e9c72f1c01e29dee777d4272d8b",
                "md5": "96f934f75ef974df61849317f8bc1241",
                "sha256": "788d681df56fe2e46f7b324d5ebdfb7e9639ec5ccfee5a0e39bfc8651a2f835d"
            },
            "downloads": -1,
            "filename": "timewarp_llm-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "96f934f75ef974df61849317f8bc1241",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 127883,
            "upload_time": "2025-09-09T08:46:37",
            "upload_time_iso_8601": "2025-09-09T08:46:37.576500Z",
            "url": "https://files.pythonhosted.org/packages/9b/4c/6da897a03d7399556f7056189187a2348e9c72f1c01e29dee777d4272d8b/timewarp_llm-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3cdf3f14cdf0d26adf9ed13fd186514900c90f790d6cff7c9757b01efc125257",
                "md5": "487938dbd70b5ab15e09404c2316b708",
                "sha256": "015c9e7aa287a6a23a825d268891e5df6d347075b3faaec1a759ff8098989548"
            },
            "downloads": -1,
            "filename": "timewarp_llm-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "487938dbd70b5ab15e09404c2316b708",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 149936,
            "upload_time": "2025-09-09T08:46:38",
            "upload_time_iso_8601": "2025-09-09T08:46:38.817894Z",
            "url": "https://files.pythonhosted.org/packages/3c/df/3f14cdf0d26adf9ed13fd186514900c90f790d6cff7c9757b01efc125257/timewarp_llm-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-09 08:46:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aleks-apostle",
    "github_project": "timewarp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "timewarp-llm"
}

Timewarp maintainers