pydantic-ai-optimizers


Namepydantic-ai-optimizers JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryA library for optimizing PydanticAI agents prompts through iterative improvement and evaluation, built on top of PydanticAI + Pydantic Evals.
upload_time2025-08-17 20:46:16
maintainerNone
docs_urlNone
authorJan Siml
requires_python>=3.11
licenseMIT
keywords pydantic ai optimizers llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PydanticAI Optimizers

> ⚠️ **Super Opinionated**: This library is specifically built on top of PydanticAI + Pydantic Evals. If you don't use both together, this is useless to you.

A Python library for systematically improving PydanticAI agent prompts through iterative optimization. **Heavily inspired by the [GEPA paper](https://arxiv.org/abs/2507.19457)** with practical extensions for prompt optimization when switching model classes or providers.

## Acknowledgments

This work builds upon the excellent research in **"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning"** by Agrawal et al. We're grateful for their foundational work on reflective prompt evolution and have adapted (some of) their methodology with several practical tweaks for the PydanticAI ecosystem.

**Why this exists**: Every time you switch model classes (GPT-4.1 → GPT-5 → Claude Sonnet 4) or providers, your prompting needs change. Instead of manually tweaking prompts each time, this automates the optimization process for your existing PydanticAI agents with minimal effort.

## What It Does

This library optimizes prompts by:

1. **Mini-batch Testing**: Each candidate prompt is tested against a small subset of cases to see if it beats its parent before full evaluation
2. **Individual Case Tracking**: Performance on each test case is tracked, enabling weighted sampling that favors prompts that win on more individual cases  
3. **Memory for Failed Attempts**: When optimization gets stuck (children keep failing mini-batch tests), the system provides previous failed attempts to the reflection agent with the message: "You've tried these approaches and they didn't work - think outside the box!"

The core insight is that you don't lose learning between iterations, and the weighted sampling based on individual case win rates helps explore more diverse and effective prompt variations.

## Quick Start

### Installation

```bash
uv pip install -e .
```

Or for running the examples:
```bash
uv run python examples/chef/optimize.py
```

### Run the Chef Example

```bash
cd examples/chef
uv run python optimize.py
```

This will optimize a chef assistant prompt that helps users find recipes while avoiding allergens. You'll see the optimization process with real-time feedback and the final best prompt.

### Basic Usage in Your Project

```python
from pydantic_ai_optimizers import Optimizer
from your_domain import make_run_case, make_reflection_agent, build_dataset

# Set up your domain-specific components
dataset = build_dataset("your_cases.json")
run_case = make_run_case()  # Function that runs your agent with a prompt
reflection_agent = make_reflection_agent()  # Agent that improves prompts

# Optimize
optimizer = Optimizer(
    dataset=dataset,
    run_case=run_case,
    reflection_agent=reflection_agent,
)

best = await optimizer.optimize(
    seed_prompt_file="seed.txt",
    full_validation_budget=20
)
```

## How It Works

### 1. Start with a Seed Prompt
The optimizer begins with your initial prompt and evaluates it on all test cases.

### 2. Mini-batch Gating (Key Innovation #1)
- Select a parent prompt using weighted sampling (prompts that win more individual cases are more likely to be selected)
- Generate a new candidate through reflection on failed cases
- Test the candidate on a small mini-batch of cases
- Only if it beats the parent on the mini-batch does it get added to the candidate pool

### 3. Individual Case Performance Tracking (Key Innovation #2)  
- Track which prompt wins each individual test case
- Use this for Pareto-efficient weighted sampling of parents
- This ensures diverse exploration and prevents getting stuck in local optima

### 4. Memory for Failed Attempts (Our Addition)
- When candidates keep failing mini-batch tests, record the failed attempts
- Provide these to the reflection agent as context: "Here's what you've tried that didn't work"
- This increases pressure over time to try more creative approaches when stuck

## Creating Your Own Optimization

### 1. Set Up Your Domain

Copy the `examples/chef/` structure:

```
your_domain/
├── agent.py             # Your complete agent (tools, setup, everything)
├── optimize.py          # Your evaluation logic + optimization loop
├── data/                # Your domain data
└── prompts/             # Seed prompt and reflection instructions
```

### 2. Implement Required Functions

**Agent** (`agent.py`):
```python
def make_run_case():
    async def run_case(prompt_file: str, user_input: str):
        # Load prompt, run your agent, return results
        pass
    return run_case

def make_reflection_agent():
    # Return agent that improves prompts based on feedback
    pass
```

**Optimization** (`optimize.py`):
```python
def build_dataset(cases_file):
    # Load test cases and evaluators
    # Return dataset that can evaluate your agent's outputs
    pass

def main():
    # Set up dataset, run_case, reflection_agent
    # Create optimizer and run optimization loop
    pass
```

### 3. Run Optimization

```bash
python optimize.py
```

## Key Integrations

This library is designed to work seamlessly with:

### [textprompts](https://github.com/svilupp/textprompts)
Makes it easy to use standard text files with placeholders for prompt evolution. Perfect for diffing prompts and version control:

```python
# In your prompt file:
"You are a {role}. Your task is to {task}..."

# textprompts handles loading and placeholder substitution
prompt = textprompts.load_prompt("my_prompt.txt", role="chef", task="find recipes")
```

### [pydantic-ai-helpers](https://github.com/svilupp/pydantic-ai-helpers)  
Provides utilities that make PydanticAI much more convenient:
- Quick tool parsing and setup
- Simple evaluation comparisons between outputs and expected results
- Streamlined agent configuration

These integrations save significant development time when building optimization pipelines.

## Configuration

Set up through environment variables or configuration files:

```bash
export OPENAI_API_KEY="your-key"
export REFLECTION_MODEL="openai:gpt-4o"  
export AGENT_MODEL="openai:gpt-4o-mini"
export VALIDATION_BUDGET=20
export MAX_POOL_SIZE=16
```

## Development

```bash
# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests  
make test

# Format and lint
make format && make lint

# Type check
make type-check
```

## Why This Approach Works

The combination of mini-batch gating and individual case tracking prevents two common optimization problems:

1. **Expensive Evaluation**: Mini-batches mean you only do full evaluation on promising candidates
2. **Premature Convergence**: Weighted sampling based on individual case wins maintains diversity

The memory system addresses a key weakness in memoryless optimization: when you get stuck, the system learns from its failures and tries more creative approaches.

## License

MIT License
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pydantic-ai-optimizers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "pydantic, ai, optimizers, llm",
    "author": "Jan Siml",
    "author_email": "Jan Siml <49557684+svilupp@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/9c/6f/f03dcaa0acbc546b1814cba370f0eaa33aabae0bbadced440ca59977502c/pydantic_ai_optimizers-0.0.1.tar.gz",
    "platform": null,
    "description": "# PydanticAI Optimizers\n\n> \u26a0\ufe0f **Super Opinionated**: This library is specifically built on top of PydanticAI + Pydantic Evals. If you don't use both together, this is useless to you.\n\nA Python library for systematically improving PydanticAI agent prompts through iterative optimization. **Heavily inspired by the [GEPA paper](https://arxiv.org/abs/2507.19457)** with practical extensions for prompt optimization when switching model classes or providers.\n\n## Acknowledgments\n\nThis work builds upon the excellent research in **\"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning\"** by Agrawal et al. We're grateful for their foundational work on reflective prompt evolution and have adapted (some of) their methodology with several practical tweaks for the PydanticAI ecosystem.\n\n**Why this exists**: Every time you switch model classes (GPT-4.1 \u2192 GPT-5 \u2192 Claude Sonnet 4) or providers, your prompting needs change. Instead of manually tweaking prompts each time, this automates the optimization process for your existing PydanticAI agents with minimal effort.\n\n## What It Does\n\nThis library optimizes prompts by:\n\n1. **Mini-batch Testing**: Each candidate prompt is tested against a small subset of cases to see if it beats its parent before full evaluation\n2. **Individual Case Tracking**: Performance on each test case is tracked, enabling weighted sampling that favors prompts that win on more individual cases  \n3. **Memory for Failed Attempts**: When optimization gets stuck (children keep failing mini-batch tests), the system provides previous failed attempts to the reflection agent with the message: \"You've tried these approaches and they didn't work - think outside the box!\"\n\nThe core insight is that you don't lose learning between iterations, and the weighted sampling based on individual case win rates helps explore more diverse and effective prompt variations.\n\n## Quick Start\n\n### Installation\n\n```bash\nuv pip install -e .\n```\n\nOr for running the examples:\n```bash\nuv run python examples/chef/optimize.py\n```\n\n### Run the Chef Example\n\n```bash\ncd examples/chef\nuv run python optimize.py\n```\n\nThis will optimize a chef assistant prompt that helps users find recipes while avoiding allergens. You'll see the optimization process with real-time feedback and the final best prompt.\n\n### Basic Usage in Your Project\n\n```python\nfrom pydantic_ai_optimizers import Optimizer\nfrom your_domain import make_run_case, make_reflection_agent, build_dataset\n\n# Set up your domain-specific components\ndataset = build_dataset(\"your_cases.json\")\nrun_case = make_run_case()  # Function that runs your agent with a prompt\nreflection_agent = make_reflection_agent()  # Agent that improves prompts\n\n# Optimize\noptimizer = Optimizer(\n    dataset=dataset,\n    run_case=run_case,\n    reflection_agent=reflection_agent,\n)\n\nbest = await optimizer.optimize(\n    seed_prompt_file=\"seed.txt\",\n    full_validation_budget=20\n)\n```\n\n## How It Works\n\n### 1. Start with a Seed Prompt\nThe optimizer begins with your initial prompt and evaluates it on all test cases.\n\n### 2. Mini-batch Gating (Key Innovation #1)\n- Select a parent prompt using weighted sampling (prompts that win more individual cases are more likely to be selected)\n- Generate a new candidate through reflection on failed cases\n- Test the candidate on a small mini-batch of cases\n- Only if it beats the parent on the mini-batch does it get added to the candidate pool\n\n### 3. Individual Case Performance Tracking (Key Innovation #2)  \n- Track which prompt wins each individual test case\n- Use this for Pareto-efficient weighted sampling of parents\n- This ensures diverse exploration and prevents getting stuck in local optima\n\n### 4. Memory for Failed Attempts (Our Addition)\n- When candidates keep failing mini-batch tests, record the failed attempts\n- Provide these to the reflection agent as context: \"Here's what you've tried that didn't work\"\n- This increases pressure over time to try more creative approaches when stuck\n\n## Creating Your Own Optimization\n\n### 1. Set Up Your Domain\n\nCopy the `examples/chef/` structure:\n\n```\nyour_domain/\n\u251c\u2500\u2500 agent.py             # Your complete agent (tools, setup, everything)\n\u251c\u2500\u2500 optimize.py          # Your evaluation logic + optimization loop\n\u251c\u2500\u2500 data/                # Your domain data\n\u2514\u2500\u2500 prompts/             # Seed prompt and reflection instructions\n```\n\n### 2. Implement Required Functions\n\n**Agent** (`agent.py`):\n```python\ndef make_run_case():\n    async def run_case(prompt_file: str, user_input: str):\n        # Load prompt, run your agent, return results\n        pass\n    return run_case\n\ndef make_reflection_agent():\n    # Return agent that improves prompts based on feedback\n    pass\n```\n\n**Optimization** (`optimize.py`):\n```python\ndef build_dataset(cases_file):\n    # Load test cases and evaluators\n    # Return dataset that can evaluate your agent's outputs\n    pass\n\ndef main():\n    # Set up dataset, run_case, reflection_agent\n    # Create optimizer and run optimization loop\n    pass\n```\n\n### 3. Run Optimization\n\n```bash\npython optimize.py\n```\n\n## Key Integrations\n\nThis library is designed to work seamlessly with:\n\n### [textprompts](https://github.com/svilupp/textprompts)\nMakes it easy to use standard text files with placeholders for prompt evolution. Perfect for diffing prompts and version control:\n\n```python\n# In your prompt file:\n\"You are a {role}. Your task is to {task}...\"\n\n# textprompts handles loading and placeholder substitution\nprompt = textprompts.load_prompt(\"my_prompt.txt\", role=\"chef\", task=\"find recipes\")\n```\n\n### [pydantic-ai-helpers](https://github.com/svilupp/pydantic-ai-helpers)  \nProvides utilities that make PydanticAI much more convenient:\n- Quick tool parsing and setup\n- Simple evaluation comparisons between outputs and expected results\n- Streamlined agent configuration\n\nThese integrations save significant development time when building optimization pipelines.\n\n## Configuration\n\nSet up through environment variables or configuration files:\n\n```bash\nexport OPENAI_API_KEY=\"your-key\"\nexport REFLECTION_MODEL=\"openai:gpt-4o\"  \nexport AGENT_MODEL=\"openai:gpt-4o-mini\"\nexport VALIDATION_BUDGET=20\nexport MAX_POOL_SIZE=16\n```\n\n## Development\n\n```bash\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Run tests  \nmake test\n\n# Format and lint\nmake format && make lint\n\n# Type check\nmake type-check\n```\n\n## Why This Approach Works\n\nThe combination of mini-batch gating and individual case tracking prevents two common optimization problems:\n\n1. **Expensive Evaluation**: Mini-batches mean you only do full evaluation on promising candidates\n2. **Premature Convergence**: Weighted sampling based on individual case wins maintains diversity\n\nThe memory system addresses a key weakness in memoryless optimization: when you get stuck, the system learns from its failures and tries more creative approaches.\n\n## License\n\nMIT License",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for optimizing PydanticAI agents prompts through iterative improvement and evaluation, built on top of PydanticAI + Pydantic Evals.",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/svilupp/pydantic-ai-optimizers",
        "Issues": "https://github.com/svilupp/pydantic-ai-optimizers/issues",
        "Repository": "https://github.com/svilupp/pydantic-ai-optimizers"
    },
    "split_keywords": [
        "pydantic",
        " ai",
        " optimizers",
        " llm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94e38e806d19853b1a523bd7864d615c898d4bfa4f710d5ace3f162ea9205935",
                "md5": "b35542f4e1130e1671b6d58df9ecaea1",
                "sha256": "24a5de4521a3232c6da8f1908dbd07e307796f1b20d8d1079c69cb5a818ffaf6"
            },
            "downloads": -1,
            "filename": "pydantic_ai_optimizers-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b35542f4e1130e1671b6d58df9ecaea1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 15304,
            "upload_time": "2025-08-17T20:46:15",
            "upload_time_iso_8601": "2025-08-17T20:46:15.795628Z",
            "url": "https://files.pythonhosted.org/packages/94/e3/8e806d19853b1a523bd7864d615c898d4bfa4f710d5ace3f162ea9205935/pydantic_ai_optimizers-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9c6ff03dcaa0acbc546b1814cba370f0eaa33aabae0bbadced440ca59977502c",
                "md5": "189b6b26f4fad530104413d9574accbc",
                "sha256": "27367451f9f9da1c2e1d5c2cfc04ca305c779c01235d022b6f91f15b04cdd536"
            },
            "downloads": -1,
            "filename": "pydantic_ai_optimizers-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "189b6b26f4fad530104413d9574accbc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 12458,
            "upload_time": "2025-08-17T20:46:16",
            "upload_time_iso_8601": "2025-08-17T20:46:16.900180Z",
            "url": "https://files.pythonhosted.org/packages/9c/6f/f03dcaa0acbc546b1814cba370f0eaa33aabae0bbadced440ca59977502c/pydantic_ai_optimizers-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-17 20:46:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "svilupp",
    "github_project": "pydantic-ai-optimizers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pydantic-ai-optimizers"
}
        
Elapsed time: 0.76293s