rlbidder

Name	rlbidder JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	A modern, modular RL library for auto-bidding in online advertising auctions.
upload_time	2025-10-09 04:01:53
maintainer	None
docs_url	None
author	Xingdong Zuo
requires_python	>=3.11
license	None
keywords	reinforcement-learning autobidding advertising auctions pytorch offline-rl rl
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # RLBidder

<p align="center">
  <img src="assets/rlbidder.jpg" alt="rlbidder" width="800px" style="max-width: 65%;"/>
</p>


<p align="center"><strong>Reinforcement learning auto-bidding library 
for research and production.</strong></p>

<p align="center">
  <a href="https://pypi.org/project/rlbidder/"><img alt="PyPI" src="https://img.shields.io/pypi/v/rlbidder.svg"></a>
  <img alt="Python" src="https://img.shields.io/badge/Python-3.11%2B-blue.svg">
  <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-2.6%2B-EE4C2C.svg">
  <img alt="Lightning" src="https://img.shields.io/badge/Lightning-2.4%2B-792EE5.svg">
  <a href="#quickstart"><img alt="Quickstart" src="https://img.shields.io/badge/Quickstart-ready-brightgreen.svg"></a>
</p>

<p align="center">
  <a href="#-overview">Overview</a> •
  <a href="#-who-should-use-rlbidder">Who Should Use This</a> •
  <a href="#installation">Installation</a> •
  <a href="#quickstart">Quickstart</a> •
  <a href="#benchmarking-results">Benchmarks</a> •
  <a href="#-module-guide">API</a> •
  <a href="#citation">Citation</a>
</p>

---

## 📖 Overview

`rlbidder` is a comprehensive toolkit for training and deploying reinforcement learning agents in online advertising auctions. Built for both **industrial scale** and **research agility**, it provides:

- **Complete offline RL pipeline**: Rust-powered data processing (Polars) → SOTA algorithms (IQL, CQL, DT, GAVE) → parallel evaluation
- **Modern ML infrastructure**: PyTorch Lightning multi-GPU training, experiment tracking, automated reproducibility
- **Production insights**: Interactive dashboards for campaign monitoring, market analytics, and agent behavior analysis
- **Research rigor**: Statistically robust benchmarking with RLiable metrics, tuned control baselines, and round-robin evaluation

Whether you're deploying bidding systems at scale or researching novel RL methods, `rlbidder` bridges the gap between academic innovation and production readiness.

---

## 🎯 Who Should Use rlbidder?

**Researchers** looking to experiment with SOTA offline RL algorithms (IQL, CQL, DT, GAVE, GAS) on realistic auction data with rigorous benchmarking.

**AdTech Practitioners** comparing RL agents against classic baselines (PID, BudgetPacer) before production deployment.

---

## 🚀 Key Features & What Makes rlbidder Different

`rlbidder` pushes beyond conventional RL libraries by integrating cutting-edge techniques from both RL research and modern LLM/transformer architectures. Here's what sets it apart:

### **Rust-Powered Data Pipeline**
- **Standardized workflow**: Scan Parquet → RL Dataset → Feature Engineering → DT Dataset with reproducible artifacts at every stage
- **Polars Lazy API**: Streaming data processing with a blazingly fast Rust engine that handles massive datasets without memory overhead
- **Scalable workflows**: Process 100GB+ auction logs efficiently with lazy evaluation and zero-copy operations
- **Feature engineering**: Drop-in scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward) for states, actions, and rewards

### **State-of-the-Art RL Algorithms**
- **Comprehensive baselines**: Classic control (Heuristic, BudgetPacer, PID) and learning-based methods (BC, CQL, IQL, DT, GAVE, GAS)
- **HL-Gauss Distributional RL**: Smooth Gaussian-based distributional Q-learning for improved uncertainty quantification, advancing beyond standard categorical approaches
- **Efficient ensemble critics**: Leverage `torch.vmap` for vectorized ensemble operations—much faster than traditional loop-based implementations
- **Numerically stable stochastic policies**: DreamerV3-style `SigmoidRangeStd` and TorchRL-style `BiasedSoftplus` to avoid numerical instabilities from exp/log operations

### **Modern Transformer Stack (LLM-Grade)**
- **FlashAttention (SDPA)**: Uses latest PyTorch scaled dot-product attention API for accelerated training
- **RoPE positional encoding**: Rotary positional embeddings for improved sequence length generalization, adopted from modern LLMs
- **QK-Norm**: Query-key normalization for enhanced training stability at scale
- **SwiGLU**: Advanced feed-forward networks for superior expressiveness
- **Efficient inference**: `DTInferenceBuffer` with deque-based temporal buffering for online Decision Transformer deployment

### **Simulated Online Evaluation & Visualization**
- **Parallel evaluation**: Multi-process evaluators with pre-loaded data per worker—much faster than sequential benchmarking
- **Robust testing**: Round-robin agent rotation with multi-seed evaluation for statistically reliable comparisons
- **Tuned competitors**: Classic control methods (BudgetPacer, PID) with optimized hyperparameters as baselines
- **Interactive dashboards**: Production-ready Plotly visualizations with market structure metrics (HHI, Gini, volatility) and RLiable metrics
- **Industrial analytics**: Campaign health monitoring, budget pacing diagnostics, auction dynamics, and score distribution analysis

### **Modern ML Engineering Stack**
- **Modular design**: Enables both production readiness and rapid prototyping
- **PyTorch Lightning**: Reduce boilerplate code, automatic mixed precision, gradient accumulation
- **Draccus configuration**: Type-safe dataclass-to-CLI with hierarchical configs, dot-notation overrides, and zero boilerplate
- **Local experiment tracking**: AIM for experiment management without external cloud dependencies

### **Comparison with AuctionNet**

| Feature | AuctionNet | rlbidder |
|---------|-----------|----------|
| **Data Engine** | Pandas | **Polars Lazy (Rust)** ✨ |
| **Configuration** | argparse | **Draccus (dataclass-to-CLI)** ✨ |
| **Distributional RL** | ❌ | **HL-Gauss** ✨ |
| **Ensemble Method** | ❌ | **torch.vmap** ✨ |
| **Transformer Attention** | Standard | **SDPA/FlashAttn** ✨ |
| **Positional Encoding** | Learned | **RoPE** ✨ |
| **Policy Stability** | exp(log_std) | **SigmoidRangeStd/BiasedSoftplus** ✨ |
| **Parallel Evaluation** | ❌ | **ProcessPool + Round-robin** ✨ |
| **Visualization** | ❌ | **Production Dashboards** ✨ |

---

## 📊 Benchmarking Results

We evaluate all agents using rigorous statistical methods across multiple delivery periods with round-robin testing and multi-seed evaluation. The evaluation protocol follows RLiable best practices for statistically reliable algorithm comparison.

<table>
  <tr>
    <td width="50%" align="center">
      <img src="assets/benchmark_violin.png" alt="Benchmark violin plots" width="100%" />
      <p><strong>Score Distribution Analysis</strong><br/>Violin plots showing performance distributions across agents and seeds.</p>
    </td>
    <td width="50%" align="center">
      <img src="assets/benchmark_bar.png" alt="Benchmark bar charts" width="100%" />
      <p><strong>Mean Performance Comparison</strong><br/>Aggregated performance metrics with confidence intervals.</p>
    </td>
  </tr>
  <tr>
    <td colspan="2" align="center">
      <img src="assets/rliable.png" alt="RLiable metrics" width="65%" />
      <p><strong>RLiable Statistical Metrics</strong><br/>Performance profiles and aggregate metrics following RLiable best practices.</p>
    </td>
  </tr>
</table>

---

## 📈 Interactive Dashboards & Gallery

Beyond raw performance metrics, `rlbidder` helps you understand *why* agents behave the way they do. Production-grade interactive dashboards summarize policy behavior, campaign health, and auction dynamics for both research insights and production monitoring.

<table>
  <tr>
    <td width="50%" align="center">
      <img src="assets/market.png" alt="Auction market analysis" width="100%" />
      <p><strong>Auction market analysis</strong><br/>Market concentration, volatility, and competitiveness.</p>
    </td>
    <td width="50%" align="center">
      <img src="assets/campaign_cql.png" alt="Campaign analysis for CQL" width="100%" />
      <p><strong>Campaign analysis (CQL)</strong><br/>Segment-level delivery quality and conversion outcomes.</p>
    </td>
  </tr>
  <tr>
    <td width="50%" align="center">
      <img src="assets/budgetpace_cql.png" alt="Budget pacing for CQL" width="100%" />
      <p><strong>Budget pacing (CQL)</strong><br/>Daily spend pacing and CPA stabilization diagnostics.</p>
    </td>
    <td width="50%" align="center">
      <img src="assets/scatter.png" alt="Auction metrics scatterplots" width="100%" />
      <p><strong>Auction metrics scatterplots</strong><br/>Spend, conversion, ROI, and win-rate trade-offs.</p>
    </td>
  </tr>
</table>


---

## 🚀 Getting Started

### Installation

#### Prerequisites

- Python 3.11 or newer
- PyTorch 2.6 or newer (follow [PyTorch install guide](https://pytorch.org/get-started/locally/))
- GPU with 8GB+ vRAM recommended for training

#### Install from PyPI

```bash
pip install rlbidder
```

#### Local Development

```bash
git clone https://github.com/zuoxingdong/rlbidder.git
cd rlbidder
pip install -e .
```

---

### Quickstart

Follow the steps below to reproduce the full offline RL workflow on processed campaign data. 

#### Step 1: Data Preparation

```bash
# Download sample competition data (periods 7-8 and trajectory 1)
bash scripts/download_raw_data.sh -p 7-8,traj1 -d data/raw

# Convert raw CSV to Parquet (faster I/O with Polars)
python scripts/convert_csv_to_parquet.py --raw_data_dir=data/raw

# Build evaluation-period parquet files
python scripts/build_eval_dataset.py --data_dir=data

# Create training transitions (trajectory format for offline RL)
python scripts/build_transition_dataset.py --data_dir=data --mode=trajectory

# Fit scalers for state, action, and reward normalization
python scripts/scale_transitions.py --data_dir=data --output_dir=scaled_transitions

# Generate Decision Transformer trajectories with return-to-go
python scripts/build_dt_dataset.py \
  --build.data_dir=data \
  --build.reward_type=reward_dense \
  --build.use_scaled_reward=true
```

**What you'll have:** Preprocessed datasets in `data/processed/` and fitted scalers in `data/scaled_transitions/` ready for training.

#### Step 2: Train Agents

```bash
# Train IQL (Implicit Q-Learning) - value-based offline RL
python examples/train_iql.py \
  --model_cfg.lr_actor 3e-4 \
  --model_cfg.lr_critic 3e-4 \
  --model_cfg.num_q_models 5 \
  --model_cfg.bc_alpha 0.01

# Train DT (Decision Transformer) - sequence modeling for RL
python examples/train_dt.py \
  --model_cfg.embedding_dim 512 \
  --model_cfg.num_layers 6 \
  --model_cfg.lr 1e-4 \
  --model_cfg.rtg_scale 98 \
  --model_cfg.target_rtg 2.0
```

**What you'll have:** Trained model checkpoints in `examples/checkpoints/` with scalers and hyperparameters.

**💡 Configuration powered by draccus:** All training scripts use type-safe dataclass configs with automatic CLI generation. Override any nested config with dot-notation (e.g., `--model_cfg.lr 1e-4`) or pass config files directly.

**💡 Track experiments with Aim:** All training scripts automatically log metrics, hyperparameters, and model artifacts to Aim (a local experiment tracker). Launch the Aim UI to visualize training progress:

```bash
aim up --port 43800
```

Then open `http://localhost:43800` in your browser to explore training curves, compare runs, and analyze hyperparameter configurations.

#### Step 3: Evaluate in Simulated Auctions

```bash
# Evaluate IQL agent with parallel multi-seed evaluation
python examples/evaluate_agents.py \
  --evaluation.data_dir=data \
  --evaluation.evaluator_type=OnlineCampaignEvaluator \
  --evaluation.delivery_period_indices=[7,8] \
  --evaluation.num_seeds=5 \
  --evaluation.num_workers=8 \
  --evaluation.output_dir=examples/eval \
  --agent.agent_class=IQLBiddingAgent \
  --agent.model_dir=examples/checkpoints/iql \
  --agent.checkpoint_file=best.ckpt
```

**What you'll have:** Evaluation reports, campaign summaries, and auction histories in `examples/eval/` ready for visualization.

**Next steps:** Generate dashboards with `examples/performance_visualization.ipynb` or explore the evaluation results with Polars DataFrames.

---

## 📦 Module Guide

Each module handles a specific aspect of the RL bidding pipeline:

| Module | Description | Key Classes/Functions |
| --- | --- | --- |
| 📚 `rlbidder.agents` | Offline RL agents and control baselines | `IQLModel`, `CQLModel`, `DTModel`, `GAVEModel`, `BudgetPacerBiddingAgent` |
| 🔧 `rlbidder.data` | Data processing, scalers, and datasets | `OfflineDataModule`, `TrajDataset`, `SymlogTransformer`, `WinsorizerTransformer` |
| 🏪 `rlbidder.envs` | Auction simulation and value sampling | `OnlineAuctionEnv`, `ValueSampler`, `sample_conversions` |
| 🎯 `rlbidder.evaluation` | Multi-agent evaluation and metrics | `ParallelOnlineCampaignEvaluator`, `OnlineCampaignEvaluator` |
| 🧠 `rlbidder.models` | Neural network building blocks | `StochasticActor`, `EnsembledQNetwork`, `NormalHead`, `HLGaussLoss` |
| 📊 `rlbidder.viz` | Interactive dashboards and analytics | `create_campaign_dashboard`, `create_market_dashboard`, `plot_rliable_metrics` |
| 🛠️ `rlbidder.utils` | Utilities and helpers | `set_seed`, `log_distribution`, `regression_report` |

---

## 🏗️ Architecture

The library follows a modular design with clear separation of concerns. Data flows from raw logs through preprocessing, training, and evaluation to final visualization:

```mermaid
flowchart TD
    subgraph Data["📦 Data Pipeline"]
        direction TB
        raw["Raw Campaign Data<br/><i>CSV/Parquet logs</i>"]
        scripts["Build Scripts<br/>convert • build_eval<br/>build_transition • scale"]
        artifacts["📁 Preprocessed Artifacts<br/>processed/ • scaled_transitions/<br/><i>Parquet + Scalers</i>"]
        
        raw -->|transform| scripts
        scripts -->|generate| artifacts
    end

    subgraph Core["⚙️ Core Library Modules"]
        direction TB
        data_mod["<b>rlbidder.data</b><br/>OfflineDataModule<br/>TrajDataset • ReplayBuffer<br/>🔧 <i>Handles batching & scaling</i>"]
        models["<b>rlbidder.models</b><br/>StochasticActor • EnsembledQNetwork<br/>ValueNetwork • Losses • Optimizers<br/>🧠 <i>Agent building blocks</i>"]
        agents["<b>rlbidder.agents</b><br/>IQLModel • CQLModel • DTModel<br/>📚 <i>LightningModule implementations</i>"]
        
        agents -->|composes| models
    end

    subgraph Training["🔥 Training Pipeline"]
        direction TB
        train["<b>examples/train_iql.py</b><br/>🎛️ Config + CLI<br/><i>Orchestration script</i>"]
        trainer["⚡ Lightning Trainer<br/>fit() • validate()<br/><i>Multi-GPU support</i>"]
        ckpt["💾 Model Checkpoints<br/>best.ckpt • last.ckpt<br/><i>+ scalers + hparams</i>"]
        
        train -->|instantiates| data_mod
        train -->|instantiates| agents
        train -->|launches| trainer
        trainer -->|saves| ckpt
    end

    subgraph Eval["🎯 Online Evaluation"]
        direction TB
        evaluator["<b>rlbidder.evaluation</b><br/>OnlineCampaignEvaluator<br/>ParallelEvaluator<br/>🔄 <i>Multi-seed, round-robin</i>"]
        env["<b>rlbidder.envs</b><br/>Auction Simulator<br/>🏪 <i>Multi-agent market</i>"]
        results["📈 Evaluation Results<br/>Campaign Reports • Agent Summaries<br/>Auction Histories<br/><i>Polars DataFrames</i>"]
        
        evaluator -->|simulates| env
        env -->|produces| results
    end

    subgraph Viz["📊 Visualization & Analysis"]
        direction TB
        viz["<b>rlbidder.viz</b><br/>Plotly Dashboards<br/>Market Metrics<br/>🎨 <i>Interactive HTML</i>"]
        plots["📉 Production Dashboards<br/>Campaign Health • Market Structure<br/>Budget Pacing • Scatter Analysis"]
        
        viz -->|renders| plots
    end

    artifacts ==>|loads| data_mod
    artifacts -.->|eval data| evaluator
    ckpt ==>|load_from_checkpoint| evaluator
    results ==>|consumes| viz

    classDef dataStyle fill:#1565c0,stroke:#0d47a1,stroke-width:3px,color:#fff,font-weight:bold
    classDef coreStyle fill:#ef6c00,stroke:#e65100,stroke-width:3px,color:#fff,font-weight:bold
    classDef trainStyle fill:#6a1b9a,stroke:#4a148c,stroke-width:3px,color:#fff,font-weight:bold
    classDef evalStyle fill:#2e7d32,stroke:#1b5e20,stroke-width:3px,color:#fff,font-weight:bold
    classDef vizStyle fill:#c2185b,stroke:#880e4f,stroke-width:3px,color:#fff,font-weight:bold
    
    class Data,raw,scripts,artifacts dataStyle
    class Core,data_mod,models,agents coreStyle
    class Training,train,trainer,ckpt trainStyle
    class Eval,evaluator,env,results evalStyle
    class Viz,viz,plots vizStyle
```

**Design Principles:**
- 🔌 **Modular** - Each component is independently usable and testable
- ⚡ **Scalable** - Polars + Lightning enable massive datasets and efficient training
- 🔄 **Reproducible** - Deterministic seeding, configuration management, and evaluation
- 🚀 **Production-ready** - Type hints, error handling, logging, and monitoring built-in

---

## 🤝 Contributing

- 🌟 Star the repo if you find it useful
- 🔀 Fork and submit PRs for bug fixes or new features
- 📝 Improve documentation and add examples
- 🧪 Add tests for new functionality


---

## 🌟 Acknowledgments

`rlbidder` builds upon ideas from:

- **[AuctionNet](https://github.com/alimama-tech/AuctionNet)** original pioneer, for auction environment and benchmark design
- **[PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning)** for training infrastructure
- **[Draccus](https://github.com/dlwh/draccus)** for elegant dataclass-to-CLI configuration management
- **[TRL](https://github.com/huggingface/trl)** & **[Transformers](https://github.com/huggingface/transformers)** for modern transformer implementations
- **[Polars](https://github.com/pola-rs/polars)** for high-performance data processing
- **[PyTorch RL](https://github.com/pytorch/rl)** for RL algorithm

---

## 📝 Citation

If you use `rlbidder` in your work, please cite it using the BibTeX entry below.

```bibtex
@misc{zuo2025rlbidder,
  author = {Zuo, Xingdong},
  title = {RLBidder: Reinforcement learning auto-bidding library for research and production},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zuoxingdong/rlbidder}}
}
```

---

## License

MIT License. See `LICENSE`.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rlbidder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "reinforcement-learning, autobidding, advertising, auctions, pytorch, offline-rl, rl",
    "author": "Xingdong Zuo",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e0/28/926dac3e1cedb3a00b09b7ff97e0a0447eca3426a514071c859efdc61bc9/rlbidder-0.1.0.tar.gz",
    "platform": null,
    "description": "# RLBidder\n\n<p align=\"center\">\n  <img src=\"assets/rlbidder.jpg\" alt=\"rlbidder\" width=\"800px\" style=\"max-width: 65%;\"/>\n</p>\n\n\n<p align=\"center\"><strong>Reinforcement learning auto-bidding library \nfor research and production.</strong></p>\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/rlbidder/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/rlbidder.svg\"></a>\n  <img alt=\"Python\" src=\"https://img.shields.io/badge/Python-3.11%2B-blue.svg\">\n  <img alt=\"PyTorch\" src=\"https://img.shields.io/badge/PyTorch-2.6%2B-EE4C2C.svg\">\n  <img alt=\"Lightning\" src=\"https://img.shields.io/badge/Lightning-2.4%2B-792EE5.svg\">\n  <a href=\"#quickstart\"><img alt=\"Quickstart\" src=\"https://img.shields.io/badge/Quickstart-ready-brightgreen.svg\"></a>\n</p>\n\n<p align=\"center\">\n  <a href=\"#-overview\">Overview</a> \u2022\n  <a href=\"#-who-should-use-rlbidder\">Who Should Use This</a> \u2022\n  <a href=\"#installation\">Installation</a> \u2022\n  <a href=\"#quickstart\">Quickstart</a> \u2022\n  <a href=\"#benchmarking-results\">Benchmarks</a> \u2022\n  <a href=\"#-module-guide\">API</a> \u2022\n  <a href=\"#citation\">Citation</a>\n</p>\n\n---\n\n## \ud83d\udcd6 Overview\n\n`rlbidder` is a comprehensive toolkit for training and deploying reinforcement learning agents in online advertising auctions. Built for both **industrial scale** and **research agility**, it provides:\n\n- **Complete offline RL pipeline**: Rust-powered data processing (Polars) \u2192 SOTA algorithms (IQL, CQL, DT, GAVE) \u2192 parallel evaluation\n- **Modern ML infrastructure**: PyTorch Lightning multi-GPU training, experiment tracking, automated reproducibility\n- **Production insights**: Interactive dashboards for campaign monitoring, market analytics, and agent behavior analysis\n- **Research rigor**: Statistically robust benchmarking with RLiable metrics, tuned control baselines, and round-robin evaluation\n\nWhether you're deploying bidding systems at scale or researching novel RL methods, `rlbidder` bridges the gap between academic innovation and production readiness.\n\n---\n\n## \ud83c\udfaf Who Should Use rlbidder?\n\n**Researchers** looking to experiment with SOTA offline RL algorithms (IQL, CQL, DT, GAVE, GAS) on realistic auction data with rigorous benchmarking.\n\n**AdTech Practitioners** comparing RL agents against classic baselines (PID, BudgetPacer) before production deployment.\n\n---\n\n## \ud83d\ude80 Key Features & What Makes rlbidder Different\n\n`rlbidder` pushes beyond conventional RL libraries by integrating cutting-edge techniques from both RL research and modern LLM/transformer architectures. Here's what sets it apart:\n\n### **Rust-Powered Data Pipeline**\n- **Standardized workflow**: Scan Parquet \u2192 RL Dataset \u2192 Feature Engineering \u2192 DT Dataset with reproducible artifacts at every stage\n- **Polars Lazy API**: Streaming data processing with a blazingly fast Rust engine that handles massive datasets without memory overhead\n- **Scalable workflows**: Process 100GB+ auction logs efficiently with lazy evaluation and zero-copy operations\n- **Feature engineering**: Drop-in scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward) for states, actions, and rewards\n\n### **State-of-the-Art RL Algorithms**\n- **Comprehensive baselines**: Classic control (Heuristic, BudgetPacer, PID) and learning-based methods (BC, CQL, IQL, DT, GAVE, GAS)\n- **HL-Gauss Distributional RL**: Smooth Gaussian-based distributional Q-learning for improved uncertainty quantification, advancing beyond standard categorical approaches\n- **Efficient ensemble critics**: Leverage `torch.vmap` for vectorized ensemble operations\u2014much faster than traditional loop-based implementations\n- **Numerically stable stochastic policies**: DreamerV3-style `SigmoidRangeStd` and TorchRL-style `BiasedSoftplus` to avoid numerical instabilities from exp/log operations\n\n### **Modern Transformer Stack (LLM-Grade)**\n- **FlashAttention (SDPA)**: Uses latest PyTorch scaled dot-product attention API for accelerated training\n- **RoPE positional encoding**: Rotary positional embeddings for improved sequence length generalization, adopted from modern LLMs\n- **QK-Norm**: Query-key normalization for enhanced training stability at scale\n- **SwiGLU**: Advanced feed-forward networks for superior expressiveness\n- **Efficient inference**: `DTInferenceBuffer` with deque-based temporal buffering for online Decision Transformer deployment\n\n### **Simulated Online Evaluation & Visualization**\n- **Parallel evaluation**: Multi-process evaluators with pre-loaded data per worker\u2014much faster than sequential benchmarking\n- **Robust testing**: Round-robin agent rotation with multi-seed evaluation for statistically reliable comparisons\n- **Tuned competitors**: Classic control methods (BudgetPacer, PID) with optimized hyperparameters as baselines\n- **Interactive dashboards**: Production-ready Plotly visualizations with market structure metrics (HHI, Gini, volatility) and RLiable metrics\n- **Industrial analytics**: Campaign health monitoring, budget pacing diagnostics, auction dynamics, and score distribution analysis\n\n### **Modern ML Engineering Stack**\n- **Modular design**: Enables both production readiness and rapid prototyping\n- **PyTorch Lightning**: Reduce boilerplate code, automatic mixed precision, gradient accumulation\n- **Draccus configuration**: Type-safe dataclass-to-CLI with hierarchical configs, dot-notation overrides, and zero boilerplate\n- **Local experiment tracking**: AIM for experiment management without external cloud dependencies\n\n### **Comparison with AuctionNet**\n\n| Feature | AuctionNet | rlbidder |\n|---------|-----------|----------|\n| **Data Engine** | Pandas | **Polars Lazy (Rust)** \u2728 |\n| **Configuration** | argparse | **Draccus (dataclass-to-CLI)** \u2728 |\n| **Distributional RL** | \u274c | **HL-Gauss** \u2728 |\n| **Ensemble Method** | \u274c | **torch.vmap** \u2728 |\n| **Transformer Attention** | Standard | **SDPA/FlashAttn** \u2728 |\n| **Positional Encoding** | Learned | **RoPE** \u2728 |\n| **Policy Stability** | exp(log_std) | **SigmoidRangeStd/BiasedSoftplus** \u2728 |\n| **Parallel Evaluation** | \u274c | **ProcessPool + Round-robin** \u2728 |\n| **Visualization** | \u274c | **Production Dashboards** \u2728 |\n\n---\n\n## \ud83d\udcca Benchmarking Results\n\nWe evaluate all agents using rigorous statistical methods across multiple delivery periods with round-robin testing and multi-seed evaluation. The evaluation protocol follows RLiable best practices for statistically reliable algorithm comparison.\n\n<table>\n  <tr>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/benchmark_violin.png\" alt=\"Benchmark violin plots\" width=\"100%\" />\n      <p><strong>Score Distribution Analysis</strong><br/>Violin plots showing performance distributions across agents and seeds.</p>\n    </td>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/benchmark_bar.png\" alt=\"Benchmark bar charts\" width=\"100%\" />\n      <p><strong>Mean Performance Comparison</strong><br/>Aggregated performance metrics with confidence intervals.</p>\n    </td>\n  </tr>\n  <tr>\n    <td colspan=\"2\" align=\"center\">\n      <img src=\"assets/rliable.png\" alt=\"RLiable metrics\" width=\"65%\" />\n      <p><strong>RLiable Statistical Metrics</strong><br/>Performance profiles and aggregate metrics following RLiable best practices.</p>\n    </td>\n  </tr>\n</table>\n\n---\n\n## \ud83d\udcc8 Interactive Dashboards & Gallery\n\nBeyond raw performance metrics, `rlbidder` helps you understand *why* agents behave the way they do. Production-grade interactive dashboards summarize policy behavior, campaign health, and auction dynamics for both research insights and production monitoring.\n\n<table>\n  <tr>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/market.png\" alt=\"Auction market analysis\" width=\"100%\" />\n      <p><strong>Auction market analysis</strong><br/>Market concentration, volatility, and competitiveness.</p>\n    </td>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/campaign_cql.png\" alt=\"Campaign analysis for CQL\" width=\"100%\" />\n      <p><strong>Campaign analysis (CQL)</strong><br/>Segment-level delivery quality and conversion outcomes.</p>\n    </td>\n  </tr>\n  <tr>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/budgetpace_cql.png\" alt=\"Budget pacing for CQL\" width=\"100%\" />\n      <p><strong>Budget pacing (CQL)</strong><br/>Daily spend pacing and CPA stabilization diagnostics.</p>\n    </td>\n    <td width=\"50%\" align=\"center\">\n      <img src=\"assets/scatter.png\" alt=\"Auction metrics scatterplots\" width=\"100%\" />\n      <p><strong>Auction metrics scatterplots</strong><br/>Spend, conversion, ROI, and win-rate trade-offs.</p>\n    </td>\n  </tr>\n</table>\n\n\n---\n\n## \ud83d\ude80 Getting Started\n\n### Installation\n\n#### Prerequisites\n\n- Python 3.11 or newer\n- PyTorch 2.6 or newer (follow [PyTorch install guide](https://pytorch.org/get-started/locally/))\n- GPU with 8GB+ vRAM recommended for training\n\n#### Install from PyPI\n\n```bash\npip install rlbidder\n```\n\n#### Local Development\n\n```bash\ngit clone https://github.com/zuoxingdong/rlbidder.git\ncd rlbidder\npip install -e .\n```\n\n---\n\n### Quickstart\n\nFollow the steps below to reproduce the full offline RL workflow on processed campaign data. \n\n#### Step 1: Data Preparation\n\n```bash\n# Download sample competition data (periods 7-8 and trajectory 1)\nbash scripts/download_raw_data.sh -p 7-8,traj1 -d data/raw\n\n# Convert raw CSV to Parquet (faster I/O with Polars)\npython scripts/convert_csv_to_parquet.py --raw_data_dir=data/raw\n\n# Build evaluation-period parquet files\npython scripts/build_eval_dataset.py --data_dir=data\n\n# Create training transitions (trajectory format for offline RL)\npython scripts/build_transition_dataset.py --data_dir=data --mode=trajectory\n\n# Fit scalers for state, action, and reward normalization\npython scripts/scale_transitions.py --data_dir=data --output_dir=scaled_transitions\n\n# Generate Decision Transformer trajectories with return-to-go\npython scripts/build_dt_dataset.py \\\n  --build.data_dir=data \\\n  --build.reward_type=reward_dense \\\n  --build.use_scaled_reward=true\n```\n\n**What you'll have:** Preprocessed datasets in `data/processed/` and fitted scalers in `data/scaled_transitions/` ready for training.\n\n#### Step 2: Train Agents\n\n```bash\n# Train IQL (Implicit Q-Learning) - value-based offline RL\npython examples/train_iql.py \\\n  --model_cfg.lr_actor 3e-4 \\\n  --model_cfg.lr_critic 3e-4 \\\n  --model_cfg.num_q_models 5 \\\n  --model_cfg.bc_alpha 0.01\n\n# Train DT (Decision Transformer) - sequence modeling for RL\npython examples/train_dt.py \\\n  --model_cfg.embedding_dim 512 \\\n  --model_cfg.num_layers 6 \\\n  --model_cfg.lr 1e-4 \\\n  --model_cfg.rtg_scale 98 \\\n  --model_cfg.target_rtg 2.0\n```\n\n**What you'll have:** Trained model checkpoints in `examples/checkpoints/` with scalers and hyperparameters.\n\n**\ud83d\udca1 Configuration powered by draccus:** All training scripts use type-safe dataclass configs with automatic CLI generation. Override any nested config with dot-notation (e.g., `--model_cfg.lr 1e-4`) or pass config files directly.\n\n**\ud83d\udca1 Track experiments with Aim:** All training scripts automatically log metrics, hyperparameters, and model artifacts to Aim (a local experiment tracker). Launch the Aim UI to visualize training progress:\n\n```bash\naim up --port 43800\n```\n\nThen open `http://localhost:43800` in your browser to explore training curves, compare runs, and analyze hyperparameter configurations.\n\n#### Step 3: Evaluate in Simulated Auctions\n\n```bash\n# Evaluate IQL agent with parallel multi-seed evaluation\npython examples/evaluate_agents.py \\\n  --evaluation.data_dir=data \\\n  --evaluation.evaluator_type=OnlineCampaignEvaluator \\\n  --evaluation.delivery_period_indices=[7,8] \\\n  --evaluation.num_seeds=5 \\\n  --evaluation.num_workers=8 \\\n  --evaluation.output_dir=examples/eval \\\n  --agent.agent_class=IQLBiddingAgent \\\n  --agent.model_dir=examples/checkpoints/iql \\\n  --agent.checkpoint_file=best.ckpt\n```\n\n**What you'll have:** Evaluation reports, campaign summaries, and auction histories in `examples/eval/` ready for visualization.\n\n**Next steps:** Generate dashboards with `examples/performance_visualization.ipynb` or explore the evaluation results with Polars DataFrames.\n\n---\n\n## \ud83d\udce6 Module Guide\n\nEach module handles a specific aspect of the RL bidding pipeline:\n\n| Module | Description | Key Classes/Functions |\n| --- | --- | --- |\n| \ud83d\udcda `rlbidder.agents` | Offline RL agents and control baselines | `IQLModel`, `CQLModel`, `DTModel`, `GAVEModel`, `BudgetPacerBiddingAgent` |\n| \ud83d\udd27 `rlbidder.data` | Data processing, scalers, and datasets | `OfflineDataModule`, `TrajDataset`, `SymlogTransformer`, `WinsorizerTransformer` |\n| \ud83c\udfea `rlbidder.envs` | Auction simulation and value sampling | `OnlineAuctionEnv`, `ValueSampler`, `sample_conversions` |\n| \ud83c\udfaf `rlbidder.evaluation` | Multi-agent evaluation and metrics | `ParallelOnlineCampaignEvaluator`, `OnlineCampaignEvaluator` |\n| \ud83e\udde0 `rlbidder.models` | Neural network building blocks | `StochasticActor`, `EnsembledQNetwork`, `NormalHead`, `HLGaussLoss` |\n| \ud83d\udcca `rlbidder.viz` | Interactive dashboards and analytics | `create_campaign_dashboard`, `create_market_dashboard`, `plot_rliable_metrics` |\n| \ud83d\udee0\ufe0f `rlbidder.utils` | Utilities and helpers | `set_seed`, `log_distribution`, `regression_report` |\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture\n\nThe library follows a modular design with clear separation of concerns. Data flows from raw logs through preprocessing, training, and evaluation to final visualization:\n\n```mermaid\nflowchart TD\n    subgraph Data[\"\ud83d\udce6 Data Pipeline\"]\n        direction TB\n        raw[\"Raw Campaign Data<br/><i>CSV/Parquet logs</i>\"]\n        scripts[\"Build Scripts<br/>convert \u2022 build_eval<br/>build_transition \u2022 scale\"]\n        artifacts[\"\ud83d\udcc1 Preprocessed Artifacts<br/>processed/ \u2022 scaled_transitions/<br/><i>Parquet + Scalers</i>\"]\n        \n        raw -->|transform| scripts\n        scripts -->|generate| artifacts\n    end\n\n    subgraph Core[\"\u2699\ufe0f Core Library Modules\"]\n        direction TB\n        data_mod[\"<b>rlbidder.data</b><br/>OfflineDataModule<br/>TrajDataset \u2022 ReplayBuffer<br/>\ud83d\udd27 <i>Handles batching & scaling</i>\"]\n        models[\"<b>rlbidder.models</b><br/>StochasticActor \u2022 EnsembledQNetwork<br/>ValueNetwork \u2022 Losses \u2022 Optimizers<br/>\ud83e\udde0 <i>Agent building blocks</i>\"]\n        agents[\"<b>rlbidder.agents</b><br/>IQLModel \u2022 CQLModel \u2022 DTModel<br/>\ud83d\udcda <i>LightningModule implementations</i>\"]\n        \n        agents -->|composes| models\n    end\n\n    subgraph Training[\"\ud83d\udd25 Training Pipeline\"]\n        direction TB\n        train[\"<b>examples/train_iql.py</b><br/>\ud83c\udf9b\ufe0f Config + CLI<br/><i>Orchestration script</i>\"]\n        trainer[\"\u26a1 Lightning Trainer<br/>fit() \u2022 validate()<br/><i>Multi-GPU support</i>\"]\n        ckpt[\"\ud83d\udcbe Model Checkpoints<br/>best.ckpt \u2022 last.ckpt<br/><i>+ scalers + hparams</i>\"]\n        \n        train -->|instantiates| data_mod\n        train -->|instantiates| agents\n        train -->|launches| trainer\n        trainer -->|saves| ckpt\n    end\n\n    subgraph Eval[\"\ud83c\udfaf Online Evaluation\"]\n        direction TB\n        evaluator[\"<b>rlbidder.evaluation</b><br/>OnlineCampaignEvaluator<br/>ParallelEvaluator<br/>\ud83d\udd04 <i>Multi-seed, round-robin</i>\"]\n        env[\"<b>rlbidder.envs</b><br/>Auction Simulator<br/>\ud83c\udfea <i>Multi-agent market</i>\"]\n        results[\"\ud83d\udcc8 Evaluation Results<br/>Campaign Reports \u2022 Agent Summaries<br/>Auction Histories<br/><i>Polars DataFrames</i>\"]\n        \n        evaluator -->|simulates| env\n        env -->|produces| results\n    end\n\n    subgraph Viz[\"\ud83d\udcca Visualization & Analysis\"]\n        direction TB\n        viz[\"<b>rlbidder.viz</b><br/>Plotly Dashboards<br/>Market Metrics<br/>\ud83c\udfa8 <i>Interactive HTML</i>\"]\n        plots[\"\ud83d\udcc9 Production Dashboards<br/>Campaign Health \u2022 Market Structure<br/>Budget Pacing \u2022 Scatter Analysis\"]\n        \n        viz -->|renders| plots\n    end\n\n    artifacts ==>|loads| data_mod\n    artifacts -.->|eval data| evaluator\n    ckpt ==>|load_from_checkpoint| evaluator\n    results ==>|consumes| viz\n\n    classDef dataStyle fill:#1565c0,stroke:#0d47a1,stroke-width:3px,color:#fff,font-weight:bold\n    classDef coreStyle fill:#ef6c00,stroke:#e65100,stroke-width:3px,color:#fff,font-weight:bold\n    classDef trainStyle fill:#6a1b9a,stroke:#4a148c,stroke-width:3px,color:#fff,font-weight:bold\n    classDef evalStyle fill:#2e7d32,stroke:#1b5e20,stroke-width:3px,color:#fff,font-weight:bold\n    classDef vizStyle fill:#c2185b,stroke:#880e4f,stroke-width:3px,color:#fff,font-weight:bold\n    \n    class Data,raw,scripts,artifacts dataStyle\n    class Core,data_mod,models,agents coreStyle\n    class Training,train,trainer,ckpt trainStyle\n    class Eval,evaluator,env,results evalStyle\n    class Viz,viz,plots vizStyle\n```\n\n**Design Principles:**\n- \ud83d\udd0c **Modular** - Each component is independently usable and testable\n- \u26a1 **Scalable** - Polars + Lightning enable massive datasets and efficient training\n- \ud83d\udd04 **Reproducible** - Deterministic seeding, configuration management, and evaluation\n- \ud83d\ude80 **Production-ready** - Type hints, error handling, logging, and monitoring built-in\n\n---\n\n## \ud83e\udd1d Contributing\n\n- \ud83c\udf1f Star the repo if you find it useful\n- \ud83d\udd00 Fork and submit PRs for bug fixes or new features\n- \ud83d\udcdd Improve documentation and add examples\n- \ud83e\uddea Add tests for new functionality\n\n\n---\n\n## \ud83c\udf1f Acknowledgments\n\n`rlbidder` builds upon ideas from:\n\n- **[AuctionNet](https://github.com/alimama-tech/AuctionNet)** original pioneer, for auction environment and benchmark design\n- **[PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning)** for training infrastructure\n- **[Draccus](https://github.com/dlwh/draccus)** for elegant dataclass-to-CLI configuration management\n- **[TRL](https://github.com/huggingface/trl)** & **[Transformers](https://github.com/huggingface/transformers)** for modern transformer implementations\n- **[Polars](https://github.com/pola-rs/polars)** for high-performance data processing\n- **[PyTorch RL](https://github.com/pytorch/rl)** for RL algorithm\n\n---\n\n## \ud83d\udcdd Citation\n\nIf you use `rlbidder` in your work, please cite it using the BibTeX entry below.\n\n```bibtex\n@misc{zuo2025rlbidder,\n  author = {Zuo, Xingdong},\n  title = {RLBidder: Reinforcement learning auto-bidding library for research and production},\n  year = {2025},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/zuoxingdong/rlbidder}}\n}\n```\n\n---\n\n## License\n\nMIT License. See `LICENSE`.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A modern, modular RL library for auto-bidding in online advertising auctions.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/zuoxingdong/rlbidder",
        "Issues": "https://github.com/zuoxingdong/rlbidder/issues",
        "Repository": "https://github.com/zuoxingdong/rlbidder"
    },
    "split_keywords": [
        "reinforcement-learning",
        " autobidding",
        " advertising",
        " auctions",
        " pytorch",
        " offline-rl",
        " rl"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94c73c2a8c3a2e8da35eaa2c9e30748ec8e510da073ff568c3009814ab7fb663",
                "md5": "90f22123810a784f06f8f8851d273d06",
                "sha256": "a7a3b38af423db0b1e90e0cf1fed0bd7b6f2f81d044fbf242e8bd16b32c166af"
            },
            "downloads": -1,
            "filename": "rlbidder-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "90f22123810a784f06f8f8851d273d06",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 135672,
            "upload_time": "2025-10-09T04:01:51",
            "upload_time_iso_8601": "2025-10-09T04:01:51.497129Z",
            "url": "https://files.pythonhosted.org/packages/94/c7/3c2a8c3a2e8da35eaa2c9e30748ec8e510da073ff568c3009814ab7fb663/rlbidder-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e028926dac3e1cedb3a00b09b7ff97e0a0447eca3426a514071c859efdc61bc9",
                "md5": "d1772677bdf1a37b814d1a9fb6a469bd",
                "sha256": "6369da3736e2cd107d1e07331dc5bd7d82959ed0afd65f1ad743df8be899cf77"
            },
            "downloads": -1,
            "filename": "rlbidder-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d1772677bdf1a37b814d1a9fb6a469bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 121181,
            "upload_time": "2025-10-09T04:01:53",
            "upload_time_iso_8601": "2025-10-09T04:01:53.730465Z",
            "url": "https://files.pythonhosted.org/packages/e0/28/926dac3e1cedb3a00b09b7ff97e0a0447eca3426a514071c859efdc61bc9/rlbidder-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-09 04:01:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zuoxingdong",
    "github_project": "rlbidder",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rlbidder"
}

Xingdong Zuo