# EZGA — Evolutionary Structure Explorer (ezga-lib)
A modular **multi-objective genetic algorithm** (GA) framework for **atomistic structure exploration**, with first-class YAML configuration, plugin-style extensibility, and a **Hierarchical Supercell Escalation (HiSE)** workflow for coarse-to-fine supercell searches.
> PyPI name: `ezga-lib`
> CLI entry point: `ezga` (via `ezga.cli.run:app`)
> License: GPL-3.0-only
---
## Features
* **Clean YAML → Runtime**: Pydantic-v2 validated configs; dotted imports & factory specs are materialized into live Python callables.
* **Multi-objective selection**: Boltzmann (default) plus alternative methods; repulsion & diversity control.
* **Rich variation operators**: Tunable mutation, crossover, and user-defined operators.
* **ASE integration**: Simple shorthand to wrap ASE calculators.
* **HiSE manager**: Orchestrates multi-stage, coarse-to-fine supercell exploration. Lifts previous results via:
* `tile` (Partition-based `generate_supercell`),
* `best_compatible` (find largest divisor supercell among previous stages),
* `ase` (fallback tiling using ASE).
* **Agentic mailbox**: Stage-scoped shared directory for multi-agent workflows.
* **Pretty CLI summaries**: Rich panels with compact configuration overviews.
---
## Installation
### From source (recommended during development)
```bash
git clone <your-repo-url>
cd ezga
pip install -U pip
pip install -e .
```
This installs the `ezga` command line app.
### From PyPI (when available)
```bash
pip install ezga-lib
```
---
## Quick Start
Create a minimal `ezga.yaml`:
```yaml
max_generations: 100
output_path: demo/run
population:
dataset_path: config.xyz
filter_duplicates: true
evaluator:
features_funcs:
factory: ezga.selection.features:feature_composition_vector
args: [["C","H"]] # features are composition counts
objectives_funcs:
- ezga.selection.objective:objective_energy
multiobjective:
size: 256
selection_method: boltzmann
sampling_temperature: 0.9
objective_temperature: 0.6
random_seed: 73
variation:
initial_mutation_rate: 3.0
crossover_probability: 0.1
simulator:
mode: sampling
calculator:
type: ase
class: ase.calculators.lj:LennardJones
kwargs: { epsilon: 0.0103, sigma: 3.4 } # ASE params
```
Run:
```bash
ezga validate -c ezga.yaml --strict
ezga once -c ezga.yaml
```
---
## CLI
```
ezga once -c <config.yaml>
ezga validate -c <config.yaml> [--strict]
```
* `once`: Runs a single GA or delegates to **HiSE** if the YAML has an `hise` block.
* `validate`: Validates and prints a rich summary; `--strict` also builds the engine to catch wiring errors.
---
## Configuration
### GAConfig (high level)
* `population`: dataset paths, constraints, duplicate filtering, …
* `evaluator`: `features_funcs`, `objectives_funcs` (dotted, factory, or list)
* `multiobjective`: selection params (size, method, temperatures, metric, …)
* `variation`: mutation & crossover knobs
* `simulator`: mode & calculator (ASE shorthand supported)
* `convergence`, `hashmap`, `agentic`: execution support
* `hise` (optional): HiSE manager block (see below)
All sections are validated by Pydantic-v2; unknown fields are forbidden.
### Dotted imports & factories
Anywhere you need a callable/object, you can write:
* **Dotted string**: `"package.module:attr"` or `"package.module.attr"`
* **Factory spec**:
```yaml
key:
factory: "pkg.mod:build_something"
args: [1, 2]
kwargs: { flag: true }
```
* **ASE shorthand** (calculator only):
```yaml
simulator:
mode: sampling
calculator:
type: ase
class: ase.calculators.lj:LennardJones
kwargs: { epsilon: 0.0103, sigma: 3.4 }
```
The loader resolves these into live Python objects before the run.
---
## Constraints (Design of Experiments)
You can provide constraint generators as factories. Example using a custom generator:
```yaml
population:
constraints:
- factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range
args: [["C", "H"], 100, 100]
```
> **Tip**
> Use `ezga.DoE.DoE:ConstraintGenerator.sum_in_range` (colon form).
> Avoid `ezga.DoE.DoE.ConstraintGenerator:sum_in_range` (that treats `ConstraintGenerator` as a module path).
If your constraint generator expects feature **names**, you can register a name→index mapping in your code (e.g., after features are known):
```python
from ezga.DoE.DoE import ConstraintGenerator
ConstraintGenerator.set_name_mapping({"C": 0, "H": 1})
```
---
## HiSE — Hierarchical Supercell Escalation
HiSE runs a sequence of stages over growing supercells and **replaces** the base input at each stage with a lifted dataset derived from previous results.
### Example
```yaml
hise:
supercells:
- [1,1,1]
- [2,1,1]
- [2,2,1]
input_from: final_dataset # or: latest_generation
stage_dir_pattern: "supercell_{a}_{b}_{c}"
restart: false
carry: all
reseed_fraction: 1.0
lift_method: tile # tile | best_compatible | ase
overrides:
multiobjective.size: [10, 20, 30]
max_generations: [ 2, 3, 5]
variation.initial_mutation_rate: [ 1, 2, 3]
population.constraints:
- factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range
args: [['C', 'H'], 100, 100]
- factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range
args: [['C', 'H'], 200, 200]
- factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range
args: [['C', 'H'], 400, 400]
```
### Lift methods
* **`tile`**: Partition-based lifting using
`container.AtomPositionManager.generate_supercell(repeat=(ra, rb, rc))`
(requires `sage_lib.partition.Partition`).
* **`best_compatible`**: Scans *all* previous stages and picks the largest supercell (by volume) that divides the target coordinate-wise; lifts via Partition.
* **`ase`**: Simple tiling via `ASE.Atoms.repeat`. No Partition dependency (fallback).
### Input source
* `final_dataset`: uses `stage_root/config.xyz`
* `latest_generation`: concatenates `stage_root/generation/*/config.xyz`
### Stage directories
For each supercell `(a,b,c)` the HiSE manager creates:
```
<output_path>/
supercell_{a}_{b}_{c}/
input_lifted.xyz # if lifting writes to disk
config.xyz # final dataset (engine may write this)
generation/...
```
### Agentic shared dir
If `agentic.shared_dir` is set in the base config, each stage receives a **stage-scoped** mailbox:
```
<base_shared>/<relative_stage_dir>/
```
All agents of a given stage share this directory.
---
## Directory Layout (source tree)
```
src/ezga/
cli/
run.py # Typer app (ezga entry point)
runners.py # once / validate / hise dispatchers
core/
config.py # GAConfig + submodels (Pydantic v2)
engine.py # GA main loop
population.py # population & DoE validation
selection/
features.py, objective.py # feature/ objective factories
DoE/
DoE.py # ConstraintGenerator and DoE
hise/
manager.py # HiSE orchestrator
io/
config_loader.py # YAML loader & materializer
simulator/
ase_calculator.py # ASE adapter (shorthand support)
```
---
## Logging & Output
* Logs and artifacts are written under `output_path` (and per-stage subdirs in HiSE).
* The CLI prints a rich summary of the configuration before running.
---
## Developing
### Tests
We use `pytest`. Example structure:
```
tests/
test_loader.py
test_hise_manager.py
test_constraints.py
conftest.py
```
Run:
```bash
pip install -e ".[test]" # if you add an extra in pyproject
pytest -q
```
Example unit test for loader materialization:
```python
# tests/test_loader.py
from ezga.io.config_loader import _materialize_factories
def test_factory_resolution():
spec = {"factory": "math:prod", "args": [[2,3,4]]}
fn = _materialize_factories(spec)
assert callable(fn)
assert fn([2,3,4]) == 24
```
### Code style
* Type hints everywhere.
* Docstrings follow **Google style**.
* Avoid side effects in import time; factories should be cheap to resolve.
---
## Troubleshooting
* **`TypeError: 'dict' object is not callable`**
You likely passed a factory **dict** (not materialized) directly into a runtime component. Ensure your keys live in the YAML under sections that the loader post-processes, or put them under `hise.overrides` if you need stage-specific values. The loader will materialize `population.constraints`, `evaluator.*`, `mutation_funcs`, `crossover_funcs`, and `simulator.calculator`.
* **`ModuleNotFoundError` or wrong dotted form**
Use colon form: `pkg.mod:attr` (preferred). For our DoE example:
`ezga.DoE.DoE:ConstraintGenerator.sum_in_range`.
* **Pydantic model errors**
Ensure `pydantic>=2.x` is installed. Unknown fields are rejected (`extra='forbid'`).
* **Permission error exporting `input_lifted.xyz`**
Ensure the path is writable. The exporter writes a new file; if you manage files manually, don’t open the same file elsewhere.
---
## Roadmap
* Additional selection methods & visual diagnostics.
* More HiSE lift strategies (symmetry-aware mapping).
* Native viewers for generation trajectories.
* Optional async physics backends.
---
## Citation
If this software helps your research, please cite the repository (add DOI when available).
---
## License
GPL-3.0-only. See `LICENSE`.
---
## Acknowledgments
* **ASE** for atomistic infrastructure.
* **pydantic**, **typer**, **ruamel.yaml**, **rich** for the developer experience.
* **sage\_lib** for partition and supercell lifting utilities.
Raw data
{
"_id": null,
"home_page": null,
"name": "ezga-lib",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "DFT, MLIP, genetic algorithm, materials, multi-objective",
"author": null,
"author_email": "Juan Manuel Lombardi <lombardi@fhi-berlin.mpg.de>",
"download_url": "https://files.pythonhosted.org/packages/54/94/c18eb683090f3f9a58c75d7c63181b1029b6f3ea88897ffd228a7ce61eb5/ezga_lib-0.0.4.tar.gz",
"platform": null,
"description": "# EZGA \u2014 Evolutionary Structure Explorer (ezga-lib)\n\nA modular **multi-objective genetic algorithm** (GA) framework for **atomistic structure exploration**, with first-class YAML configuration, plugin-style extensibility, and a **Hierarchical Supercell Escalation (HiSE)** workflow for coarse-to-fine supercell searches.\n\n> PyPI name: `ezga-lib`\n> CLI entry point: `ezga` (via `ezga.cli.run:app`)\n> License: GPL-3.0-only\n\n---\n\n## Features\n\n* **Clean YAML \u2192 Runtime**: Pydantic-v2 validated configs; dotted imports & factory specs are materialized into live Python callables.\n* **Multi-objective selection**: Boltzmann (default) plus alternative methods; repulsion & diversity control.\n* **Rich variation operators**: Tunable mutation, crossover, and user-defined operators.\n* **ASE integration**: Simple shorthand to wrap ASE calculators.\n* **HiSE manager**: Orchestrates multi-stage, coarse-to-fine supercell exploration. Lifts previous results via:\n\n * `tile` (Partition-based `generate_supercell`),\n * `best_compatible` (find largest divisor supercell among previous stages),\n * `ase` (fallback tiling using ASE).\n* **Agentic mailbox**: Stage-scoped shared directory for multi-agent workflows.\n* **Pretty CLI summaries**: Rich panels with compact configuration overviews.\n\n---\n\n## Installation\n\n### From source (recommended during development)\n\n```bash\ngit clone <your-repo-url>\ncd ezga\npip install -U pip\npip install -e .\n```\n\nThis installs the `ezga` command line app.\n\n### From PyPI (when available)\n\n```bash\npip install ezga-lib\n```\n\n---\n\n## Quick Start\n\nCreate a minimal `ezga.yaml`:\n\n```yaml\nmax_generations: 100\noutput_path: demo/run\n\npopulation:\n dataset_path: config.xyz\n filter_duplicates: true\n\nevaluator:\n features_funcs:\n factory: ezga.selection.features:feature_composition_vector\n args: [[\"C\",\"H\"]] # features are composition counts\n objectives_funcs:\n - ezga.selection.objective:objective_energy\n\nmultiobjective:\n size: 256\n selection_method: boltzmann\n sampling_temperature: 0.9\n objective_temperature: 0.6\n random_seed: 73\n\nvariation:\n initial_mutation_rate: 3.0\n crossover_probability: 0.1\n\nsimulator:\n mode: sampling\n calculator:\n type: ase\n class: ase.calculators.lj:LennardJones\n kwargs: { epsilon: 0.0103, sigma: 3.4 } # ASE params\n```\n\nRun:\n\n```bash\nezga validate -c ezga.yaml --strict\nezga once -c ezga.yaml\n```\n\n---\n\n## CLI\n\n```\nezga once -c <config.yaml>\nezga validate -c <config.yaml> [--strict]\n```\n\n* `once`: Runs a single GA or delegates to **HiSE** if the YAML has an `hise` block.\n* `validate`: Validates and prints a rich summary; `--strict` also builds the engine to catch wiring errors.\n\n---\n\n## Configuration\n\n### GAConfig (high level)\n\n* `population`: dataset paths, constraints, duplicate filtering, \u2026\n* `evaluator`: `features_funcs`, `objectives_funcs` (dotted, factory, or list)\n* `multiobjective`: selection params (size, method, temperatures, metric, \u2026)\n* `variation`: mutation & crossover knobs\n* `simulator`: mode & calculator (ASE shorthand supported)\n* `convergence`, `hashmap`, `agentic`: execution support\n* `hise` (optional): HiSE manager block (see below)\n\nAll sections are validated by Pydantic-v2; unknown fields are forbidden.\n\n### Dotted imports & factories\n\nAnywhere you need a callable/object, you can write:\n\n* **Dotted string**: `\"package.module:attr\"` or `\"package.module.attr\"`\n* **Factory spec**:\n\n ```yaml\n key:\n factory: \"pkg.mod:build_something\"\n args: [1, 2]\n kwargs: { flag: true }\n ```\n* **ASE shorthand** (calculator only):\n\n ```yaml\n simulator:\n mode: sampling\n calculator:\n type: ase\n class: ase.calculators.lj:LennardJones\n kwargs: { epsilon: 0.0103, sigma: 3.4 }\n ```\n\nThe loader resolves these into live Python objects before the run.\n\n---\n\n## Constraints (Design of Experiments)\n\nYou can provide constraint generators as factories. Example using a custom generator:\n\n```yaml\npopulation:\n constraints:\n - factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range\n args: [[\"C\", \"H\"], 100, 100]\n```\n\n> **Tip**\n> Use `ezga.DoE.DoE:ConstraintGenerator.sum_in_range` (colon form).\n> Avoid `ezga.DoE.DoE.ConstraintGenerator:sum_in_range` (that treats `ConstraintGenerator` as a module path).\n\nIf your constraint generator expects feature **names**, you can register a name\u2192index mapping in your code (e.g., after features are known):\n\n```python\nfrom ezga.DoE.DoE import ConstraintGenerator\nConstraintGenerator.set_name_mapping({\"C\": 0, \"H\": 1})\n```\n\n---\n\n## HiSE \u2014 Hierarchical Supercell Escalation\n\nHiSE runs a sequence of stages over growing supercells and **replaces** the base input at each stage with a lifted dataset derived from previous results.\n\n### Example\n\n```yaml\nhise:\n supercells:\n - [1,1,1]\n - [2,1,1]\n - [2,2,1]\n\n input_from: final_dataset # or: latest_generation\n stage_dir_pattern: \"supercell_{a}_{b}_{c}\"\n restart: false\n carry: all\n reseed_fraction: 1.0\n lift_method: tile # tile | best_compatible | ase\n\n overrides:\n multiobjective.size: [10, 20, 30]\n max_generations: [ 2, 3, 5]\n variation.initial_mutation_rate: [ 1, 2, 3]\n population.constraints:\n - factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range\n args: [['C', 'H'], 100, 100]\n - factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range\n args: [['C', 'H'], 200, 200]\n - factory: ezga.DoE.DoE:ConstraintGenerator.sum_in_range\n args: [['C', 'H'], 400, 400]\n```\n\n### Lift methods\n\n* **`tile`**: Partition-based lifting using\n `container.AtomPositionManager.generate_supercell(repeat=(ra, rb, rc))`\n (requires `sage_lib.partition.Partition`).\n* **`best_compatible`**: Scans *all* previous stages and picks the largest supercell (by volume) that divides the target coordinate-wise; lifts via Partition.\n* **`ase`**: Simple tiling via `ASE.Atoms.repeat`. No Partition dependency (fallback).\n\n### Input source\n\n* `final_dataset`: uses `stage_root/config.xyz`\n* `latest_generation`: concatenates `stage_root/generation/*/config.xyz`\n\n### Stage directories\n\nFor each supercell `(a,b,c)` the HiSE manager creates:\n\n```\n<output_path>/\n supercell_{a}_{b}_{c}/\n input_lifted.xyz # if lifting writes to disk\n config.xyz # final dataset (engine may write this)\n generation/...\n```\n\n### Agentic shared dir\n\nIf `agentic.shared_dir` is set in the base config, each stage receives a **stage-scoped** mailbox:\n\n```\n<base_shared>/<relative_stage_dir>/\n```\n\nAll agents of a given stage share this directory.\n\n---\n\n## Directory Layout (source tree)\n\n```\nsrc/ezga/\n cli/\n run.py # Typer app (ezga entry point)\n runners.py # once / validate / hise dispatchers\n core/\n config.py # GAConfig + submodels (Pydantic v2)\n engine.py # GA main loop\n population.py # population & DoE validation\n selection/\n features.py, objective.py # feature/ objective factories\n DoE/\n DoE.py # ConstraintGenerator and DoE\n hise/\n manager.py # HiSE orchestrator\n io/\n config_loader.py # YAML loader & materializer\n simulator/\n ase_calculator.py # ASE adapter (shorthand support)\n```\n\n---\n\n## Logging & Output\n\n* Logs and artifacts are written under `output_path` (and per-stage subdirs in HiSE).\n* The CLI prints a rich summary of the configuration before running.\n\n---\n\n## Developing\n\n### Tests\n\nWe use `pytest`. Example structure:\n\n```\ntests/\n test_loader.py\n test_hise_manager.py\n test_constraints.py\n conftest.py\n```\n\nRun:\n\n```bash\npip install -e \".[test]\" # if you add an extra in pyproject\npytest -q\n```\n\nExample unit test for loader materialization:\n\n```python\n# tests/test_loader.py\nfrom ezga.io.config_loader import _materialize_factories\n\ndef test_factory_resolution():\n spec = {\"factory\": \"math:prod\", \"args\": [[2,3,4]]}\n fn = _materialize_factories(spec)\n assert callable(fn)\n assert fn([2,3,4]) == 24\n```\n\n### Code style\n\n* Type hints everywhere.\n* Docstrings follow **Google style**.\n* Avoid side effects in import time; factories should be cheap to resolve.\n\n---\n\n## Troubleshooting\n\n* **`TypeError: 'dict' object is not callable`**\n You likely passed a factory **dict** (not materialized) directly into a runtime component. Ensure your keys live in the YAML under sections that the loader post-processes, or put them under `hise.overrides` if you need stage-specific values. The loader will materialize `population.constraints`, `evaluator.*`, `mutation_funcs`, `crossover_funcs`, and `simulator.calculator`.\n\n* **`ModuleNotFoundError` or wrong dotted form**\n Use colon form: `pkg.mod:attr` (preferred). For our DoE example:\n `ezga.DoE.DoE:ConstraintGenerator.sum_in_range`.\n\n* **Pydantic model errors**\n Ensure `pydantic>=2.x` is installed. Unknown fields are rejected (`extra='forbid'`).\n\n* **Permission error exporting `input_lifted.xyz`**\n Ensure the path is writable. The exporter writes a new file; if you manage files manually, don\u2019t open the same file elsewhere.\n\n---\n\n## Roadmap\n\n* Additional selection methods & visual diagnostics.\n* More HiSE lift strategies (symmetry-aware mapping).\n* Native viewers for generation trajectories.\n* Optional async physics backends.\n\n---\n\n## Citation\n\nIf this software helps your research, please cite the repository (add DOI when available).\n\n---\n\n## License\n\nGPL-3.0-only. See `LICENSE`.\n\n---\n\n## Acknowledgments\n\n* **ASE** for atomistic infrastructure.\n* **pydantic**, **typer**, **ruamel.yaml**, **rich** for the developer experience.\n* **sage\\_lib** for partition and supercell lifting utilities.\n",
"bugtrack_url": null,
"license": "GPL-3.0-only",
"summary": "A modular multi-objective genetic algorithm framework for atomistic structure exploration",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://thgitlab.rz-berlin.mpg.de/lombardi/ezga",
"Issues": "https://thgitlab.rz-berlin.mpg.de/lombardi/ezga/issues"
},
"split_keywords": [
"dft",
" mlip",
" genetic algorithm",
" materials",
" multi-objective"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a282da803dd590d6c2d706479c99c7734886663c93e642df58c4885238397336",
"md5": "1954fffaddc34cd8334c0e4ce39b79fd",
"sha256": "aa08d6c72e65ae62492c55d7f78369fcb138b68ad098291ee83fb20b23b6e359"
},
"downloads": -1,
"filename": "ezga_lib-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1954fffaddc34cd8334c0e4ce39b79fd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 238003,
"upload_time": "2025-08-17T21:47:46",
"upload_time_iso_8601": "2025-08-17T21:47:46.817714Z",
"url": "https://files.pythonhosted.org/packages/a2/82/da803dd590d6c2d706479c99c7734886663c93e642df58c4885238397336/ezga_lib-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5494c18eb683090f3f9a58c75d7c63181b1029b6f3ea88897ffd228a7ce61eb5",
"md5": "c37a2ef1ce8339bdf8a1d1a41a2e5da4",
"sha256": "a18b9f4ff17bb0102f33f28db68e4bfdd802695b6ff94c723bb5bcf57b27d6ba"
},
"downloads": -1,
"filename": "ezga_lib-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "c37a2ef1ce8339bdf8a1d1a41a2e5da4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 211091,
"upload_time": "2025-08-17T21:47:49",
"upload_time_iso_8601": "2025-08-17T21:47:49.236430Z",
"url": "https://files.pythonhosted.org/packages/54/94/c18eb683090f3f9a58c75d7c63181b1029b6f3ea88897ffd228a7ce61eb5/ezga_lib-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-17 21:47:49",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ezga-lib"
}