herrkunft


Nameherrkunft JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryTrack configuration value origins and modification history through YAML parsing
upload_time2025-10-28 14:31:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords provenance configuration yaml tracking metadata scientific-computing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # herrkunft

**From German "Herkunft" (origin, provenance)**

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pgierz/herrkunft/main?labpath=docs%2Fnotebooks)

Track configuration value origins and modification history through YAML parsing with modern Python best practices.

## Overview

`herrkunft` is a standalone library extracted from [esm_tools](https://github.com/esm-tools/esm_tools) that provides transparent provenance tracking for configuration values loaded from YAML files. It tracks:

- **Where** each value came from (file path, line number, column)
- **When** it was set or modified
- **How** conflicts were resolved using hierarchical categories
- **What** the complete modification history is

Perfect for scientific computing, workflow configuration, and any application where configuration traceability matters.

## Features

- 🎯 **Transparent Tracking**: Values behave like normal Python types while tracking their provenance
- 📍 **Precise Location**: Track exact file, line, and column for every configuration value
- 🏗️ **Hierarchical Resolution**: Category-based conflict resolution (e.g., defaults < user < runtime)
- 🔄 **Modification History**: Complete audit trail of all changes to configuration values
- 🎨 **Type-Safe**: Full type hints and Pydantic validation throughout
- 📝 **YAML Round-Trip**: Preserve provenance as comments when writing YAML
- 🚀 **Modern Python**: Built with Pydantic 2.0, ruamel.yaml, and loguru
- 📓 **Interactive Docs**: Try it in Binder without installing anything

## Try It Now

Launch interactive notebooks in your browser (no installation required):

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pgierz/herrkunft/main?labpath=docs%2Fnotebooks)

## Installation

```bash
pip install herrkunft
```

For development:

```bash
pip install herrkunft[dev]
```

## Quick Start

```python
from provenance import load_yaml

# Load a configuration file with provenance tracking
config = load_yaml("config.yaml", category="defaults")

# Access values normally
database_url = config["database"]["url"]
print(database_url)  # postgresql://localhost/mydb

# Access provenance information
print(database_url.provenance.current.yaml_file)  # config.yaml
print(database_url.provenance.current.line)       # 15
print(database_url.provenance.current.column)     # 8
```

### Hierarchical Configuration

```python
from provenance import ProvenanceLoader

# Set up hierarchy: defaults < user < production
loader = ProvenanceLoader()

# Load multiple configs with different priorities
defaults = loader.load("defaults.yaml", category="defaults")
user_config = loader.load("user.yaml", category="user")
prod_config = loader.load("production.yaml", category="production")

# Merge with automatic conflict resolution
from provenance import HierarchyManager

hierarchy = HierarchyManager(["defaults", "user", "production"])
final_config = hierarchy.merge(defaults, user_config, prod_config)

# Production values override user values, which override defaults
# Full history is preserved in provenance
```

### Save with Provenance Comments

```python
from provenance import dump_yaml

# Save configuration with provenance as inline comments
dump_yaml(config, "output.yaml", include_provenance=True)
```

Output:

```yaml
database:
  url: postgresql://localhost/mydb  # config.yaml:15:8
  port: 5432  # config.yaml:16:8
```

## Architecture

herrkunft is built with modern Python best practices:

- **Pydantic 2.0**: Type-safe data models and settings
- **ruamel.yaml**: YAML parsing with position tracking and comment preservation
- **loguru**: Simple, powerful logging
- **Type hints**: Full typing support for IDE autocomplete and type checking

### Core Components

```
herrkunft/
├── core/           # Provenance tracking and hierarchy management
├── types/          # Type wrappers (DictWithProvenance, etc.)
├── yaml/           # YAML loading and dumping
├── utils/          # Utilities for cleaning, validation, serialization
└── config/         # Library configuration and settings
```

## Use Cases

### Scientific Computing

Track which configuration file and parameters were used for each simulation run:

```python
config = load_yaml("simulation.yaml")
run_simulation(config)

# Later, audit which file provided each parameter
for key, value in config.items():
    print(f"{key}: {value.provenance.current.yaml_file}")
```

### Multi-Environment Configuration

Manage development, staging, and production configs with clear conflict resolution:

```python
loader = ProvenanceLoader()
config = loader.load_multiple([
    ("defaults.yaml", "defaults"),
    ("production.yaml", "production"),
    ("secrets.yaml", "secrets"),  # Highest priority
])
```

### Configuration Auditing

Export complete provenance history for compliance or debugging:

```python
from provenance import to_json

# Export config with full provenance metadata
to_json_file(config, "audit.json")
```

## Documentation

Full documentation is available at [https://herrkunft.readthedocs.io](https://herrkunft.readthedocs.io)

- [Getting Started Guide](https://herrkunft.readthedocs.io/getting-started)
- [API Reference](https://herrkunft.readthedocs.io/api)
- [Architecture Overview](https://herrkunft.readthedocs.io/architecture)
- [Migration from esm_tools](https://herrkunft.readthedocs.io/migration)

## Development

### Setup

```bash
git clone https://github.com/pgierz/herrkunft.git
cd herrkunft
pip install -e .[dev]
```

### Testing

```bash
pytest                          # Run all tests
pytest --cov=provenance        # With coverage
pytest -v tests/test_core/     # Specific test directory
```

### Code Quality

```bash
black provenance tests          # Format code
ruff provenance tests           # Lint
mypy provenance                 # Type check
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## Authors

- **Paul Gierz** - [paul.gierz@awi.de](mailto:paul.gierz@awi.de)
- **Miguel Andrés-Martínez** - [miguel.andres-martinez@awi.de](mailto:miguel.andres-martinez@awi.de)

## License

MIT License - see [LICENSE](LICENSE) for details.

## Acknowledgments

Extracted from the [esm_tools](https://github.com/esm-tools/esm_tools) project, which provides workflow management for Earth System Models. The provenance tracking feature was originally developed to track configuration origins in complex HPC simulation workflows.

## Related Projects

- [esm_tools](https://github.com/esm-tools/esm_tools) - Earth System Model workflow management
- [OmegaConf](https://omegaconf.readthedocs.io/) - Hierarchical configuration (no provenance tracking)
- [Dynaconf](https://www.dynaconf.com/) - Settings management (no provenance tracking)
- [Hydra](https://hydra.cc/) - Configuration framework (no detailed provenance)

## Citation

If you use herrkunft in your research, please cite:

```bibtex
@software{herrkunft2024,
  title = {herrkunft: Configuration Provenance Tracking for Python},
  author = {Gierz, Paul and Andrés-Martínez, Miguel},
  year = {2024},
  url = {https://github.com/pgierz/herrkunft}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "herrkunft",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Paul Gierz <paul.gierz@awi.de>, Miguel Andr\u00e9s-Mart\u00ednez <miguel.andres-martinez@awi.de>",
    "keywords": "provenance, configuration, yaml, tracking, metadata, scientific-computing",
    "author": null,
    "author_email": "Paul Gierz <paul.gierz@awi.de>, Miguel Andr\u00e9s-Mart\u00ednez <miguel.andres-martinez@awi.de>",
    "download_url": "https://files.pythonhosted.org/packages/a6/ae/461d04c4db6f288b847c03dad6073ed0f38634f4d2e361877c5743dbc0d7/herrkunft-0.2.0.tar.gz",
    "platform": null,
    "description": "# herrkunft\n\n**From German \"Herkunft\" (origin, provenance)**\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pgierz/herrkunft/main?labpath=docs%2Fnotebooks)\n\nTrack configuration value origins and modification history through YAML parsing with modern Python best practices.\n\n## Overview\n\n`herrkunft` is a standalone library extracted from [esm_tools](https://github.com/esm-tools/esm_tools) that provides transparent provenance tracking for configuration values loaded from YAML files. It tracks:\n\n- **Where** each value came from (file path, line number, column)\n- **When** it was set or modified\n- **How** conflicts were resolved using hierarchical categories\n- **What** the complete modification history is\n\nPerfect for scientific computing, workflow configuration, and any application where configuration traceability matters.\n\n## Features\n\n- \ud83c\udfaf **Transparent Tracking**: Values behave like normal Python types while tracking their provenance\n- \ud83d\udccd **Precise Location**: Track exact file, line, and column for every configuration value\n- \ud83c\udfd7\ufe0f **Hierarchical Resolution**: Category-based conflict resolution (e.g., defaults < user < runtime)\n- \ud83d\udd04 **Modification History**: Complete audit trail of all changes to configuration values\n- \ud83c\udfa8 **Type-Safe**: Full type hints and Pydantic validation throughout\n- \ud83d\udcdd **YAML Round-Trip**: Preserve provenance as comments when writing YAML\n- \ud83d\ude80 **Modern Python**: Built with Pydantic 2.0, ruamel.yaml, and loguru\n- \ud83d\udcd3 **Interactive Docs**: Try it in Binder without installing anything\n\n## Try It Now\n\nLaunch interactive notebooks in your browser (no installation required):\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pgierz/herrkunft/main?labpath=docs%2Fnotebooks)\n\n## Installation\n\n```bash\npip install herrkunft\n```\n\nFor development:\n\n```bash\npip install herrkunft[dev]\n```\n\n## Quick Start\n\n```python\nfrom provenance import load_yaml\n\n# Load a configuration file with provenance tracking\nconfig = load_yaml(\"config.yaml\", category=\"defaults\")\n\n# Access values normally\ndatabase_url = config[\"database\"][\"url\"]\nprint(database_url)  # postgresql://localhost/mydb\n\n# Access provenance information\nprint(database_url.provenance.current.yaml_file)  # config.yaml\nprint(database_url.provenance.current.line)       # 15\nprint(database_url.provenance.current.column)     # 8\n```\n\n### Hierarchical Configuration\n\n```python\nfrom provenance import ProvenanceLoader\n\n# Set up hierarchy: defaults < user < production\nloader = ProvenanceLoader()\n\n# Load multiple configs with different priorities\ndefaults = loader.load(\"defaults.yaml\", category=\"defaults\")\nuser_config = loader.load(\"user.yaml\", category=\"user\")\nprod_config = loader.load(\"production.yaml\", category=\"production\")\n\n# Merge with automatic conflict resolution\nfrom provenance import HierarchyManager\n\nhierarchy = HierarchyManager([\"defaults\", \"user\", \"production\"])\nfinal_config = hierarchy.merge(defaults, user_config, prod_config)\n\n# Production values override user values, which override defaults\n# Full history is preserved in provenance\n```\n\n### Save with Provenance Comments\n\n```python\nfrom provenance import dump_yaml\n\n# Save configuration with provenance as inline comments\ndump_yaml(config, \"output.yaml\", include_provenance=True)\n```\n\nOutput:\n\n```yaml\ndatabase:\n  url: postgresql://localhost/mydb  # config.yaml:15:8\n  port: 5432  # config.yaml:16:8\n```\n\n## Architecture\n\nherrkunft is built with modern Python best practices:\n\n- **Pydantic 2.0**: Type-safe data models and settings\n- **ruamel.yaml**: YAML parsing with position tracking and comment preservation\n- **loguru**: Simple, powerful logging\n- **Type hints**: Full typing support for IDE autocomplete and type checking\n\n### Core Components\n\n```\nherrkunft/\n\u251c\u2500\u2500 core/           # Provenance tracking and hierarchy management\n\u251c\u2500\u2500 types/          # Type wrappers (DictWithProvenance, etc.)\n\u251c\u2500\u2500 yaml/           # YAML loading and dumping\n\u251c\u2500\u2500 utils/          # Utilities for cleaning, validation, serialization\n\u2514\u2500\u2500 config/         # Library configuration and settings\n```\n\n## Use Cases\n\n### Scientific Computing\n\nTrack which configuration file and parameters were used for each simulation run:\n\n```python\nconfig = load_yaml(\"simulation.yaml\")\nrun_simulation(config)\n\n# Later, audit which file provided each parameter\nfor key, value in config.items():\n    print(f\"{key}: {value.provenance.current.yaml_file}\")\n```\n\n### Multi-Environment Configuration\n\nManage development, staging, and production configs with clear conflict resolution:\n\n```python\nloader = ProvenanceLoader()\nconfig = loader.load_multiple([\n    (\"defaults.yaml\", \"defaults\"),\n    (\"production.yaml\", \"production\"),\n    (\"secrets.yaml\", \"secrets\"),  # Highest priority\n])\n```\n\n### Configuration Auditing\n\nExport complete provenance history for compliance or debugging:\n\n```python\nfrom provenance import to_json\n\n# Export config with full provenance metadata\nto_json_file(config, \"audit.json\")\n```\n\n## Documentation\n\nFull documentation is available at [https://herrkunft.readthedocs.io](https://herrkunft.readthedocs.io)\n\n- [Getting Started Guide](https://herrkunft.readthedocs.io/getting-started)\n- [API Reference](https://herrkunft.readthedocs.io/api)\n- [Architecture Overview](https://herrkunft.readthedocs.io/architecture)\n- [Migration from esm_tools](https://herrkunft.readthedocs.io/migration)\n\n## Development\n\n### Setup\n\n```bash\ngit clone https://github.com/pgierz/herrkunft.git\ncd herrkunft\npip install -e .[dev]\n```\n\n### Testing\n\n```bash\npytest                          # Run all tests\npytest --cov=provenance        # With coverage\npytest -v tests/test_core/     # Specific test directory\n```\n\n### Code Quality\n\n```bash\nblack provenance tests          # Format code\nruff provenance tests           # Lint\nmypy provenance                 # Type check\n```\n\n## Contributing\n\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## Authors\n\n- **Paul Gierz** - [paul.gierz@awi.de](mailto:paul.gierz@awi.de)\n- **Miguel Andr\u00e9s-Mart\u00ednez** - [miguel.andres-martinez@awi.de](mailto:miguel.andres-martinez@awi.de)\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Acknowledgments\n\nExtracted from the [esm_tools](https://github.com/esm-tools/esm_tools) project, which provides workflow management for Earth System Models. The provenance tracking feature was originally developed to track configuration origins in complex HPC simulation workflows.\n\n## Related Projects\n\n- [esm_tools](https://github.com/esm-tools/esm_tools) - Earth System Model workflow management\n- [OmegaConf](https://omegaconf.readthedocs.io/) - Hierarchical configuration (no provenance tracking)\n- [Dynaconf](https://www.dynaconf.com/) - Settings management (no provenance tracking)\n- [Hydra](https://hydra.cc/) - Configuration framework (no detailed provenance)\n\n## Citation\n\nIf you use herrkunft in your research, please cite:\n\n```bibtex\n@software{herrkunft2024,\n  title = {herrkunft: Configuration Provenance Tracking for Python},\n  author = {Gierz, Paul and Andr\u00e9s-Mart\u00ednez, Miguel},\n  year = {2024},\n  url = {https://github.com/pgierz/herrkunft}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Track configuration value origins and modification history through YAML parsing",
    "version": "0.2.0",
    "project_urls": {
        "Documentation": "https://herrkunft.readthedocs.io",
        "Homepage": "https://github.com/pgierz/herrkunft",
        "Issues": "https://github.com/pgierz/herrkunft/issues",
        "Repository": "https://github.com/pgierz/herrkunft"
    },
    "split_keywords": [
        "provenance",
        " configuration",
        " yaml",
        " tracking",
        " metadata",
        " scientific-computing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "13f94f2c151e7d7a496680b5d2350d6e379190cb390f5e10612789a66d81cf83",
                "md5": "b65216b01030705c47866f3844dc9641",
                "sha256": "d5ccd1c768854faacd818e068192a25e0d40b61a3b1ee5e2f2b9e6edfa9fc9b9"
            },
            "downloads": -1,
            "filename": "herrkunft-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b65216b01030705c47866f3844dc9641",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 87121,
            "upload_time": "2025-10-28T14:31:46",
            "upload_time_iso_8601": "2025-10-28T14:31:46.367524Z",
            "url": "https://files.pythonhosted.org/packages/13/f9/4f2c151e7d7a496680b5d2350d6e379190cb390f5e10612789a66d81cf83/herrkunft-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a6ae461d04c4db6f288b847c03dad6073ed0f38634f4d2e361877c5743dbc0d7",
                "md5": "a8f8d1b82f2dc9310f29ef36da136eed",
                "sha256": "ea2a45a408c08f744835b6547af61cd254b5753b4dffa6babee8494b514fa11e"
            },
            "downloads": -1,
            "filename": "herrkunft-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a8f8d1b82f2dc9310f29ef36da136eed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 75686,
            "upload_time": "2025-10-28T14:31:48",
            "upload_time_iso_8601": "2025-10-28T14:31:48.218781Z",
            "url": "https://files.pythonhosted.org/packages/a6/ae/461d04c4db6f288b847c03dad6073ed0f38634f4d2e361877c5743dbc0d7/herrkunft-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-28 14:31:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pgierz",
    "github_project": "herrkunft",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "herrkunft"
}
        
Elapsed time: 1.01672s