datum-cli


Namedatum-cli JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummarySchedule dbt projects locally with datum cloud reporting
upload_time2025-10-20 01:48:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseCopyright 2025 Datum Labs Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
keywords analytics data dbt scheduling workflow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Datum DBT Tool - Complete Specification & Architecture

## Executive Summary

**Product:** A user-friendly CLI tool that helps users schedule dbt projects on their local infrastructure with datum cloud as an optional UI/reporting layer.

**Goal:** Make scheduling dbt easy for local environments (Ubuntu/Linux), with extensibility to K8s later. Ship v1 tool first, cloud infrastructure second.

**Distribution:** Publish to PyPI as `datum-cli`. Users install via `pip install datum-cli` and run as `datum` command globally.

---

## Core Philosophy

- **Tool-first approach:** Build the CLI as a complete, self-contained product. Cloud is optional reporting layer.
- **Minimal dependencies:** Use stdlib wherever possible, add only battle-tested packages.
- **Shipping speed:** v1 is feature-complete for local scheduling, not perfect for all edge cases.
- **Debugging-friendly:** Every action logs, every failure is debuggable, validation prevents mistakes.
- **Pythonic code:** Follow PEP 20, use modern Python 3.13+ typing, Pydantic v2 for validation.

---

## v1 Feature Set

### Core Commands

#### `datum dbt init --repo-path <path>`
- Initialize a new dbt project for datum
- **Actions:**
  - Validate dbt_project.yml exists
  - Generate SSH key: `~/.datum/keys/{project-id}.pem` (600 permissions)
  - Generate public key: `~/.datum/keys/{project-id}.pub`
  - Create `~/.datum/config.yaml` with project config
  - Validate profiles.yml location (auto-detect ~/.dbt/profiles.yml or prompt)
  - Show next steps clearly
- **Output:** Confirmation of all setup steps, instructions to share public key to datum cloud (later)

#### `datum dbt validate`
- Pre-flight check before scheduling
- **Checks:**
  - dbt_project.yml exists and valid
  - profiles.yml exists and readable
  - SSH key exists with correct permissions (600)
  - dbt target exists in profiles.yml
  - dbt command runs without errors (test connection)
- **Auto-repair mode:** Detect common issues (wrong permissions, missing files) and offer fixes
- **Output:** ✓ / ✗ status for each check, suggestions for fixes

#### `datum dbt run [--target <target>] [--profile <path>] [--dry-run]`
- Execute dbt locally with full logging
- **Actions:**
  - Load config from ~/.datum/config.yaml
  - Validate prerequisites (same as `validate`)
  - Execute: `dbt run --target <target> --profiles-dir <profiles-dir>`
  - Stream output to terminal in real-time with timestamps
  - Capture all stdout/stderr
  - Save run record with metadata
- **Options:**
  - `--dry-run`: Show what would run without executing
  - `--vars '{"key": "value"}'`: Pass dbt vars
  - `--timeout 3600`: Kill job after N seconds
- **Output:** Real-time streaming + success/failure summary

#### `datum dbt schedule --cron "<cron-expression>" [--target <target>]`
- Add a cron job to execute dbt on schedule
- **Actions:**
  - Validate cron expression syntax
  - Validate project prerequisites
  - Show what will be added to crontab (dry-run first)
  - Add entry to user's crontab: `0 10 * * * /usr/bin/datum-dbt-run {project-id} 2>&1`
  - Store schedule config in ~/.datum/config.yaml
- **Crontab entry behavior:**
  - Calls internal command: `datum dbt run --project {project-id}`
  - Pipes output to logging handler
  - Runs with user's environment (not root)
- **Output:** Confirmation + next steps (validate, check logs)

#### `datum dbt schedule --webhook [--port 8080]`
- Start webhook server for external triggers (Airflow, Dagster)
- **Actions:**
  - Start FastAPI server on `0.0.0.0:{port}`
  - Listen for POST requests
  - Endpoint: `POST /trigger/{project_id}/{run_id}`
  - Optional payload: `{"target": "prod", "vars": {...}, "timeout": 3600}`
  - Validate request, execute dbt run, return status
  - Webhook runs in background (daemonize or use systemd)
- **Output:** "Webhook listening on http://localhost:8080/trigger/{project-id}/run-{id}"
- **Response format:**
  ```json
  {
    "run_id": "abc123",
    "status": "running|queued|error",
    "output_url": "file:///home/user/.datum/runs/abc123/run.json"
  }
  ```

#### `datum dbt schedule --status`
- Show current schedule configuration
- **Output:**
  - Project name + status (ACTIVE/INACTIVE)
  - All schedules (cron + webhook)
  - Last run time + status
  - Next scheduled run time
  - Crontab location + entry

#### `datum dbt logs [--last 10] [--status SUCCESS|FAILED|TIMEOUT] [--follow]`
- View execution history and logs
- **Actions:**
  - Read all run records from ~/.datum/runs/
  - Display in table format (Run ID, Time, Status, Duration, Exit Code)
  - Allow filtering by status
- **Output:** Table of recent runs

#### `datum dbt logs <run-id> [--raw]`
- View full log for a specific run
- **Output:**
  - Metadata: timestamp, duration, exit code, command
  - Formatted output (with timestamps) or raw output
  - Path to full log files
  - Suggestion based on error (e.g., "Database connection failed → check profiles.yml credentials")

#### `datum dbt config [--project-path] [--profiles-path] [--target]`
- Update configuration after init
- **Actions:**
  - Interactively or via flags update ~/.datum/config.yaml
  - Validate changes before saving
  - Show current config
- **Output:** Confirmation of changes

---

## Data Models (Pydantic v2)

```python
# All models in src/datum/core/config.py

DbtProjectConfig:
  - project_id: str (auto-generated UUID)
  - project_path: Path (validated, must have dbt_project.yml)
  - profiles_path: Path (default ~/.dbt/profiles.yml)
  - target: str (default "dev")
  - dbt_version: str (auto-detected)

ScheduleConfig:
  - cron_expression: str (validated by croniter)
  - enabled: bool (default True)
  - created_at: datetime
  - last_run_at: Optional[datetime]
  - next_run_at: Optional[datetime]

WebhookConfig:
  - enabled: bool (default False)
  - port: int (1024-65535)
  - host: str (default "0.0.0.0")
  - token: Optional[str] (for future auth)

RunRecord:
  - run_id: str (UUID)
  - project_id: str
  - timestamp: datetime
  - command: str (what was executed)
  - exit_code: int
  - duration_seconds: float
  - status: str ("SUCCESS" | "FAILED" | "TIMEOUT")
  - stdout: str
  - stderr: str

DatumConfig (root, ~/.datum/config.yaml):
  - version: str ("1.0")
  - project: DbtProjectConfig
  - schedule: Optional[ScheduleConfig]
  - webhook: Optional[WebhookConfig]
  - private_key_path: Path
  - runs_dir: Path (default ~/.datum/runs/)
```

---

## File Structure

```
datum-dbt/
├── src/datum/
│   ├── __init__.py
│   ├── __main__.py                 # Entry point for `python -m datum`
│   ├── cli/
│   │   ├── __init__.py
│   │   ├── main.py                 # Typer app, command routing
│   │   └── commands/
│   │       ├── __init__.py
│   │       ├── init.py             # datum dbt init
│   │       ├── auth.py             # datum dbt auth validate (future)
│   │       ├── config.py           # datum dbt config
│   │       ├── validate.py         # datum dbt validate
│   │       ├── run.py              # datum dbt run
│   │       ├── schedule.py         # datum dbt schedule (cron + webhook)
│   │       └── logs.py             # datum dbt logs
│   ├── core/
│   │   ├── __init__.py
│   │   ├── config.py               # Pydantic models + file I/O
│   │   ├── auth.py                 # SSH key generation
│   │   ├── executor.py             # dbt execution + logging
│   │   ├── scheduler.py            # Cron job management
│   │   ├── webhook.py              # FastAPI webhook server
│   │   ├── storage.py              # Run storage/retrieval
│   │   ├── validators.py           # Pre-flight checks + auto-repair
│   │   └── utils.py                # Helpers (ID generation, etc)
│   └── errors.py                   # Custom exceptions
├── tests/
│   ├── __init__.py
│   ├── test_config.py
│   ├── test_auth.py
│   ├── test_executor.py
│   ├── test_scheduler.py
│   ├── test_webhook.py
│   └── test_cli.py
├── pyproject.toml                  # Project metadata + dependencies
├── README.md                        # User guide
├── LICENSE
└── .gitignore
```

---

## Dependencies (pyproject.toml)

```toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "datum"
version = "0.1.0"
description = "Schedule dbt projects locally with datum cloud reporting"
readme = "README.md"
license = {text = "MIT"}
authors = [{name = "Datum Labs", email = "contact@datumlabs.io"}]
requires-python = ">=3.11"
keywords = ["dbt", "scheduling", "data", "analytics", "workflow"]

dependencies = [
    "typer[all]>=0.9,<1.0",           # CLI framework
    "pydantic>=2.0,<3.0",             # Data validation
    "pydantic-settings>=2.0,<3.0",    # Config from YAML/env
    "pyyaml>=6.0,<7.0",               # YAML parsing
    "python-crontab>=3.0,<4.0",       # Cron management
    "croniter>=2.0,<3.0",             # Cron parsing/validation
    "fastapi>=0.104,<1.0",            # Webhook server
    "uvicorn>=0.24,<1.0",             # ASGI server
    "cryptography>=41.0,<43.0",       # SSH key generation
    "rich>=13.0,<14.0",               # Terminal formatting
]

[project.optional-dependencies]
dev = [
    "pytest>=7.4,<8.0",
    "pytest-cov>=4.1,<5.0",
    "ruff>=0.1,<1.0",
    "mypy>=1.6,<2.0",
    "black>=23.0,<24.0",
]

[project.scripts]
datum = "datum.cli.main:app"

[tool.hatch.build.targets.wheel]
packages = ["src/datum"]
```

---

## Key Implementation Details

### 1. SSH Key Generation (Secure)
- Generate 2048-bit RSA key pair in `~/.datum/keys/{project-id}.pem`
- Set permissions to 600 (readable only by user)
- Store public key as `{project-id}.pub`
- Never ask user to generate keys manually

### 2. Config File Handling
- Location: `~/.datum/config.yaml`
- Always load with validation (Pydantic)
- Auto-migrate config if version changes
- Handle missing/corrupted files gracefully with clear error messages

### 3. Run Execution & Logging
- Each run gets unique ID: `{project-id}-{timestamp}-{uuid[:8]}`
- Create directory: `~/.datum/runs/{run_id}/`
- Save files:
  - `metadata.json` (structured data)
  - `output.log` (stdout + stderr combined, timestamped)
  - `stdout.log` (stdout only)
  - `stderr.log` (stderr only)
- Stream output to terminal in real-time with `[HH:MM:SS]` timestamps
- Capture exit code + duration

### 4. Cron Integration
- Use `python-crontab` library (safer than shell scripts)
- Crontab entry format: `0 10 * * * /usr/local/bin/datum dbt run --project {project-id} 2>&1`
- Always show user what will be added before modifying crontab
- Allow user to review/edit crontab manually after
- Store schedule info in config for `--status` command

### 5. Webhook Server
- FastAPI app with single endpoint: `POST /trigger/{project_id}/{run_id}`
- Request validation (basic auth can be v2)
- Response includes run_id, status, output location
- Run webhook as background process (systemd unit template for later)
- For v1: user must manually start daemon or use screen/tmux

### 6. Validation & Auto-Repair
- Pre-flight checks before any command:
  - dbt_project.yml exists + valid YAML
  - profiles.yml exists + accessible
  - SSH key exists with 600 permissions
  - dbt target exists in profiles.yml
  - Can connect to database (run `dbt debug`)
- Offer automated fixes for:
  - Permissions: `chmod 600 {key}`
  - Missing files: Show path + how to create
  - Profile issues: Show profiles.yml location
- `datum dbt validate` lists all issues + fixes
- `--auto-fix` flag applies fixes (with confirmation)

### 7. Debugging Experience
- Every command shows what it's doing (✓/✗ indicators)
- Errors include suggestions (not just "failed")
- `datum dbt logs` shows last 10 runs in table
- `datum dbt logs {run-id}` shows full formatted log with timestamps
- Log files are stored locally (easy to grep, inspect, parse)
- Clear indication of where logs are stored

---

## PyPI Publishing Setup

### Before Publishing
1. Create account on pypi.org (or testpypi.org for testing)
2. Generate API token: pypi.org → Account Settings → API tokens
3. Add token to `~/.pypirc` or use GitHub Actions secret

### Publishing Command
```bash
# Build
python -m build

# Upload to TestPyPI (test first)
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*
```

### Package Naming & Metadata
- Package name: `datum`
- Import name: `datum`
- Console script: `datum` (global command)
- Keywords: "dbt", "scheduling", "data", "analytics"
- Homepage: https://datumlabs.io (or GitHub)
- Repository: https://github.com/datumlabs/datum-dbt
- License: MIT

### Post-Publishing
- Update PyPI page with README (auto from README.md)
- Add badges (build status, PyPI version, downloads)
- Set up GitHub releases to auto-publish on tags

---

## Development Instructions for Agent

### Phase 1: Foundation (2-3 hours)
1. **Project setup**
   - Create pyproject.toml with all dependencies
   - Set up directory structure
   - Create __init__.py files with version
   - Add .gitignore for Python

2. **Config system (Pydantic models)**
   - Write all dataclasses in config.py
   - Implement config file I/O (load/save ~/.datum/config.yaml)
   - Add validation for all fields
   - Handle missing/invalid configs gracefully

3. **Auth system**
   - SSH key generation in auth.py
   - Public key extraction
   - Permission handling (chmod 600)
   - Key validation

### Phase 2: CLI Skeleton (1 hour)
1. **Main app (typer)**
   - Create main.py with Typer app
   - Add command routing (init, run, schedule, logs, validate, config)
   - Add global options (--debug, --project)
   - Set up logging

2. **Placeholder commands**
   - Create each command file with docstring + typer.echo("TODO")
   - All imports correct but functions empty

### Phase 3: Core Features (4-5 hours)
1. **Executor** (run.py)
   - Run dbt subprocess with real-time streaming
   - Capture stdout/stderr
   - Handle timeouts + errors
   - Save run record

2. **Storage** (storage.py)
   - Save/load run records
   - Query runs by ID, timestamp, status
   - Clean up old runs (optional, v2)

3. **Validators** (validators.py)
   - Pre-flight checks (all 5 checks listed above)
   - Auto-repair suggestions
   - Clear error messages with fixes

4. **Scheduler** (scheduler.py)
   - Validate cron expressions
   - Add/remove crontab entries
   - Show crontab status

### Phase 4: CLI Commands (3-4 hours)
1. **init.py** - Full implementation with user prompts
2. **run.py** - Execute dbt with logging
3. **schedule.py** - Cron + webhook logic
4. **logs.py** - Display run history + details
5. **validate.py** - Pre-flight checks
6. **config.py** - Update configuration

### Phase 5: Webhook Server (2 hours)
1. **webhook.py**
   - FastAPI app with POST /trigger/{project_id}/{run_id}
   - Request validation
   - Execute dbt run in background
   - Return status response

2. **Integration**
   - Add `datum dbt schedule --webhook` command
   - Start server as subprocess or daemon
   - Show webhook URL to user

### Phase 6: Testing & Polish (2 hours)
1. **Unit tests** (tests/*.py)
   - Config loading/saving
   - Cron validation
   - Executor (mock dbt calls)
   - Storage (create/read runs)

2. **Integration tests**
   - Full CLI flow (init → run → logs)
   - Error handling (missing files, bad config)

3. **Edge cases**
   - Windows paths (defer for v2, document as Linux-only)
   - Permission errors
   - Crontab doesn't exist (create it)
   - Concurrent runs (same project, different run IDs)

---

## Testing Strategy

### Unit Tests
- Config models (validation, serialization)
- Auth (key generation, permissions)
- Validator (issue detection)
- Storage (save/load runs)

### Integration Tests
- Full command flow: `init` → `validate` → `run` → `logs`
- Error scenarios: missing files, bad cron, profile issues
- Dry-run modes

### Manual Testing Checklist
- [ ] `datum dbt init --repo-path ./` creates config + key
- [ ] `datum dbt validate` passes with valid project
- [ ] `datum dbt run` executes dbt + saves logs
- [ ] `datum dbt logs` shows recent runs
- [ ] `datum dbt schedule --cron "0 10 * * *"` adds crontab entry
- [ ] `datum dbt schedule --status` shows cron info
- [ ] Invalid cron rejected with helpful error
- [ ] Logs are saved to ~/.datum/runs/{run_id}/

---

## Error Handling

Every command should:
1. Catch exceptions early
2. Provide context (what were we doing?)
3. Suggest a fix (what should user do?)
4. Log full traceback to file (for debugging)
5. Exit with code 0 (success) or 1 (failure)

### Common Errors to Handle
- Missing dbt_project.yml → "No dbt project found at {path}. Initialize with: datum dbt init"
- Bad SSH key permissions → "SSH key too open. Fix: chmod 600 {key}"
- Database connection failed → "Could not connect to database. Check profiles.yml: {error}"
- Cron syntax invalid → "Invalid cron expression. Example: '0 10 * * *' (daily at 10 AM)"
- Permission denied → "Cannot write to {path}. Check file permissions."

---

## Documentation for Users

### README.md Structure
1. **Installation** - `pip install datum-cli`
2. **Quick Start** - 5 min walkthrough
3. **Commands Reference** - All commands with examples
4. **Configuration** - ~/.datum/config.yaml explained
5. **Debugging** - How to view logs, common issues
6. **Future Roadmap** - K8s, cloud sync, etc

### Example Usage Flow
```bash
# Install
pip install datum-cli

# Initialize
cd my-dbt-project
datum dbt init --repo-path .

# Validate setup
datum dbt validate

# Test run locally
datum dbt run

# Schedule daily at 10 AM
datum dbt schedule --cron "0 10 * * *"

# Check logs
datum dbt logs --last 5
datum dbt logs abc123
```

---

## Future Roadmap (v2+)

- [ ] Cloud sync (push run results to datum cloud UI)
- [ ] K8s job support (`--executor kubernetes`)
- [ ] Dagster/Airflow plugins
- [ ] Web UI for local runs
- [ ] Advanced scheduling (retries, backoff, alerts)
- [ ] Multiple projects in one config
- [ ] Environment variable templating in config
- [ ] Webhook authentication (API keys)
- [ ] Windows support
- [ ] Docker containerization

---

## Success Criteria for v1

✅ User can install via PyPI  
✅ User can init, validate, run, schedule with single config file  
✅ All runs are logged locally with searchable history  
✅ Webhook works for basic external triggers  
✅ Clear error messages for 95% of failure cases  
✅ No cloud infrastructure required (cloud is optional)  
✅ Cron jobs run reliably  
✅ Debugging is straightforward (logs are easy to find/read)  

---

## Build Order Recommendation

1. **Config + Auth** → Foundation everything depends on
2. **Executor + Storage** → Core functionality
3. **Validators** → Quality + UX
4. **CLI Commands** → User interface
5. **Scheduler** → Scheduling logic
6. **Webhook** → External triggers
7. **Tests + Docs** → Ship-ready

**Estimated total time: 12-16 hours of focused coding**

---

## Code Quality Standards

- All code typed with Python 3.13+ type hints
- Pydantic v2 for all data models
- Docstrings for all functions (Google style)
- No magic numbers (use named constants)
- Errors are informative (context + suggestion)
- Tests should cover happy path + 3 error scenarios per feature
- Use `rich` for beautiful terminal output (colors, tables, progress)
- Follow Black formatter (line length 100)
- Use Ruff for linting

---

## Ready to Code

Agent should now have everything needed to:
1. Build the project structure
2. Implement each feature with clear scope
3. Write tests as they go
4. Prepare for PyPI publishing

**All architectural decisions are finalized. Focus on shipping clean, tested, user-friendly code.**

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datum-cli",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "analytics, data, dbt, scheduling, workflow",
    "author": null,
    "author_email": "Datum Labs <contact@datumlabs.io>",
    "download_url": "https://files.pythonhosted.org/packages/ec/57/6fbfd2afba945bae97ea2fa68bb7b2b925e5834b25524453434f290cb562/datum_cli-0.1.2.tar.gz",
    "platform": null,
    "description": "# Datum DBT Tool - Complete Specification & Architecture\n\n## Executive Summary\n\n**Product:** A user-friendly CLI tool that helps users schedule dbt projects on their local infrastructure with datum cloud as an optional UI/reporting layer.\n\n**Goal:** Make scheduling dbt easy for local environments (Ubuntu/Linux), with extensibility to K8s later. Ship v1 tool first, cloud infrastructure second.\n\n**Distribution:** Publish to PyPI as `datum-cli`. Users install via `pip install datum-cli` and run as `datum` command globally.\n\n---\n\n## Core Philosophy\n\n- **Tool-first approach:** Build the CLI as a complete, self-contained product. Cloud is optional reporting layer.\n- **Minimal dependencies:** Use stdlib wherever possible, add only battle-tested packages.\n- **Shipping speed:** v1 is feature-complete for local scheduling, not perfect for all edge cases.\n- **Debugging-friendly:** Every action logs, every failure is debuggable, validation prevents mistakes.\n- **Pythonic code:** Follow PEP 20, use modern Python 3.13+ typing, Pydantic v2 for validation.\n\n---\n\n## v1 Feature Set\n\n### Core Commands\n\n#### `datum dbt init --repo-path <path>`\n- Initialize a new dbt project for datum\n- **Actions:**\n  - Validate dbt_project.yml exists\n  - Generate SSH key: `~/.datum/keys/{project-id}.pem` (600 permissions)\n  - Generate public key: `~/.datum/keys/{project-id}.pub`\n  - Create `~/.datum/config.yaml` with project config\n  - Validate profiles.yml location (auto-detect ~/.dbt/profiles.yml or prompt)\n  - Show next steps clearly\n- **Output:** Confirmation of all setup steps, instructions to share public key to datum cloud (later)\n\n#### `datum dbt validate`\n- Pre-flight check before scheduling\n- **Checks:**\n  - dbt_project.yml exists and valid\n  - profiles.yml exists and readable\n  - SSH key exists with correct permissions (600)\n  - dbt target exists in profiles.yml\n  - dbt command runs without errors (test connection)\n- **Auto-repair mode:** Detect common issues (wrong permissions, missing files) and offer fixes\n- **Output:** \u2713 / \u2717 status for each check, suggestions for fixes\n\n#### `datum dbt run [--target <target>] [--profile <path>] [--dry-run]`\n- Execute dbt locally with full logging\n- **Actions:**\n  - Load config from ~/.datum/config.yaml\n  - Validate prerequisites (same as `validate`)\n  - Execute: `dbt run --target <target> --profiles-dir <profiles-dir>`\n  - Stream output to terminal in real-time with timestamps\n  - Capture all stdout/stderr\n  - Save run record with metadata\n- **Options:**\n  - `--dry-run`: Show what would run without executing\n  - `--vars '{\"key\": \"value\"}'`: Pass dbt vars\n  - `--timeout 3600`: Kill job after N seconds\n- **Output:** Real-time streaming + success/failure summary\n\n#### `datum dbt schedule --cron \"<cron-expression>\" [--target <target>]`\n- Add a cron job to execute dbt on schedule\n- **Actions:**\n  - Validate cron expression syntax\n  - Validate project prerequisites\n  - Show what will be added to crontab (dry-run first)\n  - Add entry to user's crontab: `0 10 * * * /usr/bin/datum-dbt-run {project-id} 2>&1`\n  - Store schedule config in ~/.datum/config.yaml\n- **Crontab entry behavior:**\n  - Calls internal command: `datum dbt run --project {project-id}`\n  - Pipes output to logging handler\n  - Runs with user's environment (not root)\n- **Output:** Confirmation + next steps (validate, check logs)\n\n#### `datum dbt schedule --webhook [--port 8080]`\n- Start webhook server for external triggers (Airflow, Dagster)\n- **Actions:**\n  - Start FastAPI server on `0.0.0.0:{port}`\n  - Listen for POST requests\n  - Endpoint: `POST /trigger/{project_id}/{run_id}`\n  - Optional payload: `{\"target\": \"prod\", \"vars\": {...}, \"timeout\": 3600}`\n  - Validate request, execute dbt run, return status\n  - Webhook runs in background (daemonize or use systemd)\n- **Output:** \"Webhook listening on http://localhost:8080/trigger/{project-id}/run-{id}\"\n- **Response format:**\n  ```json\n  {\n    \"run_id\": \"abc123\",\n    \"status\": \"running|queued|error\",\n    \"output_url\": \"file:///home/user/.datum/runs/abc123/run.json\"\n  }\n  ```\n\n#### `datum dbt schedule --status`\n- Show current schedule configuration\n- **Output:**\n  - Project name + status (ACTIVE/INACTIVE)\n  - All schedules (cron + webhook)\n  - Last run time + status\n  - Next scheduled run time\n  - Crontab location + entry\n\n#### `datum dbt logs [--last 10] [--status SUCCESS|FAILED|TIMEOUT] [--follow]`\n- View execution history and logs\n- **Actions:**\n  - Read all run records from ~/.datum/runs/\n  - Display in table format (Run ID, Time, Status, Duration, Exit Code)\n  - Allow filtering by status\n- **Output:** Table of recent runs\n\n#### `datum dbt logs <run-id> [--raw]`\n- View full log for a specific run\n- **Output:**\n  - Metadata: timestamp, duration, exit code, command\n  - Formatted output (with timestamps) or raw output\n  - Path to full log files\n  - Suggestion based on error (e.g., \"Database connection failed \u2192 check profiles.yml credentials\")\n\n#### `datum dbt config [--project-path] [--profiles-path] [--target]`\n- Update configuration after init\n- **Actions:**\n  - Interactively or via flags update ~/.datum/config.yaml\n  - Validate changes before saving\n  - Show current config\n- **Output:** Confirmation of changes\n\n---\n\n## Data Models (Pydantic v2)\n\n```python\n# All models in src/datum/core/config.py\n\nDbtProjectConfig:\n  - project_id: str (auto-generated UUID)\n  - project_path: Path (validated, must have dbt_project.yml)\n  - profiles_path: Path (default ~/.dbt/profiles.yml)\n  - target: str (default \"dev\")\n  - dbt_version: str (auto-detected)\n\nScheduleConfig:\n  - cron_expression: str (validated by croniter)\n  - enabled: bool (default True)\n  - created_at: datetime\n  - last_run_at: Optional[datetime]\n  - next_run_at: Optional[datetime]\n\nWebhookConfig:\n  - enabled: bool (default False)\n  - port: int (1024-65535)\n  - host: str (default \"0.0.0.0\")\n  - token: Optional[str] (for future auth)\n\nRunRecord:\n  - run_id: str (UUID)\n  - project_id: str\n  - timestamp: datetime\n  - command: str (what was executed)\n  - exit_code: int\n  - duration_seconds: float\n  - status: str (\"SUCCESS\" | \"FAILED\" | \"TIMEOUT\")\n  - stdout: str\n  - stderr: str\n\nDatumConfig (root, ~/.datum/config.yaml):\n  - version: str (\"1.0\")\n  - project: DbtProjectConfig\n  - schedule: Optional[ScheduleConfig]\n  - webhook: Optional[WebhookConfig]\n  - private_key_path: Path\n  - runs_dir: Path (default ~/.datum/runs/)\n```\n\n---\n\n## File Structure\n\n```\ndatum-dbt/\n\u251c\u2500\u2500 src/datum/\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 __main__.py                 # Entry point for `python -m datum`\n\u2502   \u251c\u2500\u2500 cli/\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u251c\u2500\u2500 main.py                 # Typer app, command routing\n\u2502   \u2502   \u2514\u2500\u2500 commands/\n\u2502   \u2502       \u251c\u2500\u2500 __init__.py\n\u2502   \u2502       \u251c\u2500\u2500 init.py             # datum dbt init\n\u2502   \u2502       \u251c\u2500\u2500 auth.py             # datum dbt auth validate (future)\n\u2502   \u2502       \u251c\u2500\u2500 config.py           # datum dbt config\n\u2502   \u2502       \u251c\u2500\u2500 validate.py         # datum dbt validate\n\u2502   \u2502       \u251c\u2500\u2500 run.py              # datum dbt run\n\u2502   \u2502       \u251c\u2500\u2500 schedule.py         # datum dbt schedule (cron + webhook)\n\u2502   \u2502       \u2514\u2500\u2500 logs.py             # datum dbt logs\n\u2502   \u251c\u2500\u2500 core/\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u251c\u2500\u2500 config.py               # Pydantic models + file I/O\n\u2502   \u2502   \u251c\u2500\u2500 auth.py                 # SSH key generation\n\u2502   \u2502   \u251c\u2500\u2500 executor.py             # dbt execution + logging\n\u2502   \u2502   \u251c\u2500\u2500 scheduler.py            # Cron job management\n\u2502   \u2502   \u251c\u2500\u2500 webhook.py              # FastAPI webhook server\n\u2502   \u2502   \u251c\u2500\u2500 storage.py              # Run storage/retrieval\n\u2502   \u2502   \u251c\u2500\u2500 validators.py           # Pre-flight checks + auto-repair\n\u2502   \u2502   \u2514\u2500\u2500 utils.py                # Helpers (ID generation, etc)\n\u2502   \u2514\u2500\u2500 errors.py                   # Custom exceptions\n\u251c\u2500\u2500 tests/\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 test_config.py\n\u2502   \u251c\u2500\u2500 test_auth.py\n\u2502   \u251c\u2500\u2500 test_executor.py\n\u2502   \u251c\u2500\u2500 test_scheduler.py\n\u2502   \u251c\u2500\u2500 test_webhook.py\n\u2502   \u2514\u2500\u2500 test_cli.py\n\u251c\u2500\u2500 pyproject.toml                  # Project metadata + dependencies\n\u251c\u2500\u2500 README.md                        # User guide\n\u251c\u2500\u2500 LICENSE\n\u2514\u2500\u2500 .gitignore\n```\n\n---\n\n## Dependencies (pyproject.toml)\n\n```toml\n[build-system]\nrequires = [\"hatchling\"]\nbuild-backend = \"hatchling.build\"\n\n[project]\nname = \"datum\"\nversion = \"0.1.0\"\ndescription = \"Schedule dbt projects locally with datum cloud reporting\"\nreadme = \"README.md\"\nlicense = {text = \"MIT\"}\nauthors = [{name = \"Datum Labs\", email = \"contact@datumlabs.io\"}]\nrequires-python = \">=3.11\"\nkeywords = [\"dbt\", \"scheduling\", \"data\", \"analytics\", \"workflow\"]\n\ndependencies = [\n    \"typer[all]>=0.9,<1.0\",           # CLI framework\n    \"pydantic>=2.0,<3.0\",             # Data validation\n    \"pydantic-settings>=2.0,<3.0\",    # Config from YAML/env\n    \"pyyaml>=6.0,<7.0\",               # YAML parsing\n    \"python-crontab>=3.0,<4.0\",       # Cron management\n    \"croniter>=2.0,<3.0\",             # Cron parsing/validation\n    \"fastapi>=0.104,<1.0\",            # Webhook server\n    \"uvicorn>=0.24,<1.0\",             # ASGI server\n    \"cryptography>=41.0,<43.0\",       # SSH key generation\n    \"rich>=13.0,<14.0\",               # Terminal formatting\n]\n\n[project.optional-dependencies]\ndev = [\n    \"pytest>=7.4,<8.0\",\n    \"pytest-cov>=4.1,<5.0\",\n    \"ruff>=0.1,<1.0\",\n    \"mypy>=1.6,<2.0\",\n    \"black>=23.0,<24.0\",\n]\n\n[project.scripts]\ndatum = \"datum.cli.main:app\"\n\n[tool.hatch.build.targets.wheel]\npackages = [\"src/datum\"]\n```\n\n---\n\n## Key Implementation Details\n\n### 1. SSH Key Generation (Secure)\n- Generate 2048-bit RSA key pair in `~/.datum/keys/{project-id}.pem`\n- Set permissions to 600 (readable only by user)\n- Store public key as `{project-id}.pub`\n- Never ask user to generate keys manually\n\n### 2. Config File Handling\n- Location: `~/.datum/config.yaml`\n- Always load with validation (Pydantic)\n- Auto-migrate config if version changes\n- Handle missing/corrupted files gracefully with clear error messages\n\n### 3. Run Execution & Logging\n- Each run gets unique ID: `{project-id}-{timestamp}-{uuid[:8]}`\n- Create directory: `~/.datum/runs/{run_id}/`\n- Save files:\n  - `metadata.json` (structured data)\n  - `output.log` (stdout + stderr combined, timestamped)\n  - `stdout.log` (stdout only)\n  - `stderr.log` (stderr only)\n- Stream output to terminal in real-time with `[HH:MM:SS]` timestamps\n- Capture exit code + duration\n\n### 4. Cron Integration\n- Use `python-crontab` library (safer than shell scripts)\n- Crontab entry format: `0 10 * * * /usr/local/bin/datum dbt run --project {project-id} 2>&1`\n- Always show user what will be added before modifying crontab\n- Allow user to review/edit crontab manually after\n- Store schedule info in config for `--status` command\n\n### 5. Webhook Server\n- FastAPI app with single endpoint: `POST /trigger/{project_id}/{run_id}`\n- Request validation (basic auth can be v2)\n- Response includes run_id, status, output location\n- Run webhook as background process (systemd unit template for later)\n- For v1: user must manually start daemon or use screen/tmux\n\n### 6. Validation & Auto-Repair\n- Pre-flight checks before any command:\n  - dbt_project.yml exists + valid YAML\n  - profiles.yml exists + accessible\n  - SSH key exists with 600 permissions\n  - dbt target exists in profiles.yml\n  - Can connect to database (run `dbt debug`)\n- Offer automated fixes for:\n  - Permissions: `chmod 600 {key}`\n  - Missing files: Show path + how to create\n  - Profile issues: Show profiles.yml location\n- `datum dbt validate` lists all issues + fixes\n- `--auto-fix` flag applies fixes (with confirmation)\n\n### 7. Debugging Experience\n- Every command shows what it's doing (\u2713/\u2717 indicators)\n- Errors include suggestions (not just \"failed\")\n- `datum dbt logs` shows last 10 runs in table\n- `datum dbt logs {run-id}` shows full formatted log with timestamps\n- Log files are stored locally (easy to grep, inspect, parse)\n- Clear indication of where logs are stored\n\n---\n\n## PyPI Publishing Setup\n\n### Before Publishing\n1. Create account on pypi.org (or testpypi.org for testing)\n2. Generate API token: pypi.org \u2192 Account Settings \u2192 API tokens\n3. Add token to `~/.pypirc` or use GitHub Actions secret\n\n### Publishing Command\n```bash\n# Build\npython -m build\n\n# Upload to TestPyPI (test first)\ntwine upload --repository testpypi dist/*\n\n# Upload to PyPI\ntwine upload dist/*\n```\n\n### Package Naming & Metadata\n- Package name: `datum`\n- Import name: `datum`\n- Console script: `datum` (global command)\n- Keywords: \"dbt\", \"scheduling\", \"data\", \"analytics\"\n- Homepage: https://datumlabs.io (or GitHub)\n- Repository: https://github.com/datumlabs/datum-dbt\n- License: MIT\n\n### Post-Publishing\n- Update PyPI page with README (auto from README.md)\n- Add badges (build status, PyPI version, downloads)\n- Set up GitHub releases to auto-publish on tags\n\n---\n\n## Development Instructions for Agent\n\n### Phase 1: Foundation (2-3 hours)\n1. **Project setup**\n   - Create pyproject.toml with all dependencies\n   - Set up directory structure\n   - Create __init__.py files with version\n   - Add .gitignore for Python\n\n2. **Config system (Pydantic models)**\n   - Write all dataclasses in config.py\n   - Implement config file I/O (load/save ~/.datum/config.yaml)\n   - Add validation for all fields\n   - Handle missing/invalid configs gracefully\n\n3. **Auth system**\n   - SSH key generation in auth.py\n   - Public key extraction\n   - Permission handling (chmod 600)\n   - Key validation\n\n### Phase 2: CLI Skeleton (1 hour)\n1. **Main app (typer)**\n   - Create main.py with Typer app\n   - Add command routing (init, run, schedule, logs, validate, config)\n   - Add global options (--debug, --project)\n   - Set up logging\n\n2. **Placeholder commands**\n   - Create each command file with docstring + typer.echo(\"TODO\")\n   - All imports correct but functions empty\n\n### Phase 3: Core Features (4-5 hours)\n1. **Executor** (run.py)\n   - Run dbt subprocess with real-time streaming\n   - Capture stdout/stderr\n   - Handle timeouts + errors\n   - Save run record\n\n2. **Storage** (storage.py)\n   - Save/load run records\n   - Query runs by ID, timestamp, status\n   - Clean up old runs (optional, v2)\n\n3. **Validators** (validators.py)\n   - Pre-flight checks (all 5 checks listed above)\n   - Auto-repair suggestions\n   - Clear error messages with fixes\n\n4. **Scheduler** (scheduler.py)\n   - Validate cron expressions\n   - Add/remove crontab entries\n   - Show crontab status\n\n### Phase 4: CLI Commands (3-4 hours)\n1. **init.py** - Full implementation with user prompts\n2. **run.py** - Execute dbt with logging\n3. **schedule.py** - Cron + webhook logic\n4. **logs.py** - Display run history + details\n5. **validate.py** - Pre-flight checks\n6. **config.py** - Update configuration\n\n### Phase 5: Webhook Server (2 hours)\n1. **webhook.py**\n   - FastAPI app with POST /trigger/{project_id}/{run_id}\n   - Request validation\n   - Execute dbt run in background\n   - Return status response\n\n2. **Integration**\n   - Add `datum dbt schedule --webhook` command\n   - Start server as subprocess or daemon\n   - Show webhook URL to user\n\n### Phase 6: Testing & Polish (2 hours)\n1. **Unit tests** (tests/*.py)\n   - Config loading/saving\n   - Cron validation\n   - Executor (mock dbt calls)\n   - Storage (create/read runs)\n\n2. **Integration tests**\n   - Full CLI flow (init \u2192 run \u2192 logs)\n   - Error handling (missing files, bad config)\n\n3. **Edge cases**\n   - Windows paths (defer for v2, document as Linux-only)\n   - Permission errors\n   - Crontab doesn't exist (create it)\n   - Concurrent runs (same project, different run IDs)\n\n---\n\n## Testing Strategy\n\n### Unit Tests\n- Config models (validation, serialization)\n- Auth (key generation, permissions)\n- Validator (issue detection)\n- Storage (save/load runs)\n\n### Integration Tests\n- Full command flow: `init` \u2192 `validate` \u2192 `run` \u2192 `logs`\n- Error scenarios: missing files, bad cron, profile issues\n- Dry-run modes\n\n### Manual Testing Checklist\n- [ ] `datum dbt init --repo-path ./` creates config + key\n- [ ] `datum dbt validate` passes with valid project\n- [ ] `datum dbt run` executes dbt + saves logs\n- [ ] `datum dbt logs` shows recent runs\n- [ ] `datum dbt schedule --cron \"0 10 * * *\"` adds crontab entry\n- [ ] `datum dbt schedule --status` shows cron info\n- [ ] Invalid cron rejected with helpful error\n- [ ] Logs are saved to ~/.datum/runs/{run_id}/\n\n---\n\n## Error Handling\n\nEvery command should:\n1. Catch exceptions early\n2. Provide context (what were we doing?)\n3. Suggest a fix (what should user do?)\n4. Log full traceback to file (for debugging)\n5. Exit with code 0 (success) or 1 (failure)\n\n### Common Errors to Handle\n- Missing dbt_project.yml \u2192 \"No dbt project found at {path}. Initialize with: datum dbt init\"\n- Bad SSH key permissions \u2192 \"SSH key too open. Fix: chmod 600 {key}\"\n- Database connection failed \u2192 \"Could not connect to database. Check profiles.yml: {error}\"\n- Cron syntax invalid \u2192 \"Invalid cron expression. Example: '0 10 * * *' (daily at 10 AM)\"\n- Permission denied \u2192 \"Cannot write to {path}. Check file permissions.\"\n\n---\n\n## Documentation for Users\n\n### README.md Structure\n1. **Installation** - `pip install datum-cli`\n2. **Quick Start** - 5 min walkthrough\n3. **Commands Reference** - All commands with examples\n4. **Configuration** - ~/.datum/config.yaml explained\n5. **Debugging** - How to view logs, common issues\n6. **Future Roadmap** - K8s, cloud sync, etc\n\n### Example Usage Flow\n```bash\n# Install\npip install datum-cli\n\n# Initialize\ncd my-dbt-project\ndatum dbt init --repo-path .\n\n# Validate setup\ndatum dbt validate\n\n# Test run locally\ndatum dbt run\n\n# Schedule daily at 10 AM\ndatum dbt schedule --cron \"0 10 * * *\"\n\n# Check logs\ndatum dbt logs --last 5\ndatum dbt logs abc123\n```\n\n---\n\n## Future Roadmap (v2+)\n\n- [ ] Cloud sync (push run results to datum cloud UI)\n- [ ] K8s job support (`--executor kubernetes`)\n- [ ] Dagster/Airflow plugins\n- [ ] Web UI for local runs\n- [ ] Advanced scheduling (retries, backoff, alerts)\n- [ ] Multiple projects in one config\n- [ ] Environment variable templating in config\n- [ ] Webhook authentication (API keys)\n- [ ] Windows support\n- [ ] Docker containerization\n\n---\n\n## Success Criteria for v1\n\n\u2705 User can install via PyPI  \n\u2705 User can init, validate, run, schedule with single config file  \n\u2705 All runs are logged locally with searchable history  \n\u2705 Webhook works for basic external triggers  \n\u2705 Clear error messages for 95% of failure cases  \n\u2705 No cloud infrastructure required (cloud is optional)  \n\u2705 Cron jobs run reliably  \n\u2705 Debugging is straightforward (logs are easy to find/read)  \n\n---\n\n## Build Order Recommendation\n\n1. **Config + Auth** \u2192 Foundation everything depends on\n2. **Executor + Storage** \u2192 Core functionality\n3. **Validators** \u2192 Quality + UX\n4. **CLI Commands** \u2192 User interface\n5. **Scheduler** \u2192 Scheduling logic\n6. **Webhook** \u2192 External triggers\n7. **Tests + Docs** \u2192 Ship-ready\n\n**Estimated total time: 12-16 hours of focused coding**\n\n---\n\n## Code Quality Standards\n\n- All code typed with Python 3.13+ type hints\n- Pydantic v2 for all data models\n- Docstrings for all functions (Google style)\n- No magic numbers (use named constants)\n- Errors are informative (context + suggestion)\n- Tests should cover happy path + 3 error scenarios per feature\n- Use `rich` for beautiful terminal output (colors, tables, progress)\n- Follow Black formatter (line length 100)\n- Use Ruff for linting\n\n---\n\n## Ready to Code\n\nAgent should now have everything needed to:\n1. Build the project structure\n2. Implement each feature with clear scope\n3. Write tests as they go\n4. Prepare for PyPI publishing\n\n**All architectural decisions are finalized. Focus on shipping clean, tested, user-friendly code.**\n",
    "bugtrack_url": null,
    "license": "Copyright 2025 Datum Labs  Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at  http://www.apache.org/licenses/LICENSE-2.0",
    "summary": "Schedule dbt projects locally with datum cloud reporting",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/datumlabs/datum-dbt",
        "Issues": "https://github.com/datumlabs/datum-dbt/issues",
        "Repository": "https://github.com/datumlabs/datum-dbt"
    },
    "split_keywords": [
        "analytics",
        " data",
        " dbt",
        " scheduling",
        " workflow"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6bc587e5f2108c2c243ee32ea3a672f45dc2c39980f6cde10f5ae9ab2db4e5d2",
                "md5": "0a07d98c9900753f972ec9e21ed4fe6a",
                "sha256": "bbb04e93cb86995aed49ed804e76efa813404fc06dd722852e9da1728933ace5"
            },
            "downloads": -1,
            "filename": "datum_cli-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0a07d98c9900753f972ec9e21ed4fe6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 9452,
            "upload_time": "2025-10-20T01:48:14",
            "upload_time_iso_8601": "2025-10-20T01:48:14.633567Z",
            "url": "https://files.pythonhosted.org/packages/6b/c5/87e5f2108c2c243ee32ea3a672f45dc2c39980f6cde10f5ae9ab2db4e5d2/datum_cli-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ec576fbfd2afba945bae97ea2fa68bb7b2b925e5834b25524453434f290cb562",
                "md5": "e1298e8c5202778cbd4e98e044178405",
                "sha256": "f9b31a012f52036f9f5fd764adb0d7a6c0d8cf33ceed7f2b99a11bf316eee9a1"
            },
            "downloads": -1,
            "filename": "datum_cli-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e1298e8c5202778cbd4e98e044178405",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 10759,
            "upload_time": "2025-10-20T01:48:16",
            "upload_time_iso_8601": "2025-10-20T01:48:16.245933Z",
            "url": "https://files.pythonhosted.org/packages/ec/57/6fbfd2afba945bae97ea2fa68bb7b2b925e5834b25524453434f290cb562/datum_cli-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-20 01:48:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datumlabs",
    "github_project": "datum-dbt",
    "github_not_found": true,
    "lcname": "datum-cli"
}
        
Elapsed time: 1.17880s