chatroutes-autobranch

Name	chatroutes-autobranch JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Intelligent branch exploration for LLM-powered applications
upload_time	2025-10-22 23:44:35
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT
keywords	llm beam-search tree-of-thought branching ai machine-learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # chatroutes-autobranch

**Controlled branching generation for LLM applications**

[![PyPI version](https://badge.fury.io/py/chatroutes-autobranch.svg)](https://badge.fury.io/py/chatroutes-autobranch)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Modern LLM applications often need to explore multiple reasoning paths (tree-of-thought, beam search, multi-agent systems) while staying **usable** and **affordable**. `chatroutes-autobranch` provides clean, standalone primitives for:

- 🎯 **Beam Search** – Pick the *best K* candidates by configurable scoring
- 🌈 **Diversity Control** – Ensure variety via novelty pruning (cosine similarity, MMR)
- 🛑 **Smart Stopping** – Know when to stop via entropy/information-gain metrics
- 💰 **Budget Management** – Keep costs predictable with token/time/node caps
- 🔌 **Pluggable Design** – Swap any component (scorer, embeddings, stopping criteria)

**Key Features:**
- ✅ Deterministic & reproducible (fixed tie-breaking, seeded clustering)
- ✅ Embedding-agnostic (OpenAI, HuggingFace, or custom)
- ✅ Production-ready (thread-safe, observable, checkpoint/resume)
- ✅ Framework-friendly (works with LangChain, LlamaIndex, or raw LLM APIs)
- ✅ Zero vendor lock-in (MIT License, no cloud dependencies)

---

## 🚀 Interactive Demos (Try it Now!)

### Getting Started Demo (Recommended)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/getting_started_demo.ipynb)

**Perfect for first-time users!** Learn the fundamentals in 5 minutes:
- ✅ Installation and setup
- ✅ Basic beam search examples
- ✅ Multi-strategy scoring
- ✅ Novelty filtering
- ✅ Complete pipeline with budget control

**No setup required** - runs entirely in your browser!

### Creative Writing Scenario (Advanced)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)

**See it in action with a real LLM!** Complete creative writing assistant:
- ✅ Full Ollama integration (free, local inference)
- ✅ Multi-turn branching (tree exploration)
- ✅ GPU/CPU performance comparison
- ✅ 4 complete story scenarios

[**📚 View all notebooks →**](notebooks/README.md)

---

## Quick Start

**Install:**
```bash
pip install chatroutes-autobranch
```

**Basic Usage:**
```python
from chatroutes_autobranch import BranchSelector, Candidate
from chatroutes_autobranch.config import load_config

# Load config (or use dict/env vars)
selector = BranchSelector.from_config(load_config("config.yaml"))

# Define parent and candidate branches
parent = Candidate(id="root", text="Explain photosynthesis simply")
candidates = [
    Candidate(id="c1", text="Start with sunlight absorption"),
    Candidate(id="c2", text="Begin with glucose production"),
    Candidate(id="c3", text="Explain chlorophyll's role"),
]

# Select best branches (applies beam → novelty → entropy pipeline)
result = selector.step(parent, candidates)

print(f"Kept: {[c.id for c in result.kept]}")
print(f"Entropy: {result.metrics['entropy']['value']:.2f}")
print(f"Should continue: {result.metrics['entropy']['continue']}")
```

**Config (`config.yaml`):**
```yaml
beam:
  k: 3  # Keep top 3 by score
  weights: {confidence: 0.4, relevance: 0.3, novelty_parent: 0.2}

novelty:
  method: cosine  # or 'mmr' for Maximal Marginal Relevance
  threshold: 0.85

entropy:
  min_entropy: 0.6  # Stop if diversity drops below 60%

embeddings:
  provider: openai
  model: text-embedding-3-large
```

---

## Why Use This?

**Problem:** Exploring multiple LLM reasoning paths (e.g., tree-of-thought) quickly becomes:
- **Expensive** – Exponential growth of branches drains API budgets
- **Redundant** – Models generate similar outputs (mode collapse)
- **Uncontrolled** – No clear stopping criteria (when is "enough" exploration?)

**Solution:** `chatroutes-autobranch` gives you:
1. **Beam Search** to keep only the top-K candidates (quality filtering)
2. **Novelty Pruning** to remove similar outputs (diversity enforcement)
3. **Entropy Stopping** to detect when you've explored enough (convergence detection)
4. **Budget Limits** to cap costs before runaway spending

**Result:** Controlled, efficient tree exploration with predictable costs.

---

## Use Cases

| Scenario | Configuration | Benefit |
|----------|--------------|---------|
| **Tree-of-Thought Reasoning** | K=5, cosine novelty, entropy stopping | Explore diverse reasoning paths without explosion |
| **Multi-Agent Debate** | K=3, MMR novelty (λ=0.3) | Select diverse agent perspectives, avoid redundancy |
| **Code Generation** | K=4, high relevance weight | Generate varied solutions, prune duplicates |
| **Creative Writing** | K=8, low novelty threshold | High diversity, explore creative space |
| **Factual Q&A** | K=2, strict budget | Focus on accuracy, minimal branching |

---

## Architecture

**Pipeline (fixed order):**
```
Raw Candidates (N)
    ↓
1. Scoring (composite: confidence + relevance + novelty + intent + reward)
    ↓
2. Beam Selection (top K by score, deterministic tie-breaking)
    ↓
3. Novelty Filtering (prune similar via cosine/MMR)
    ↓
4. Entropy Check (compute diversity, decide if should continue)
    ↓
5. Result (kept + pruned + metrics)
```

**Pluggable Components:**
- **Scorer**: Composite (built-in) or custom
- **EmbeddingProvider**: OpenAI, HuggingFace, or custom
- **NoveltyFilter**: Cosine threshold or MMR
- **EntropyStopper**: Shannon entropy or custom
- **BudgetManager**: Token/time/node caps

All components use **Protocol** (duck typing) – swap any part without touching others.

---

## Installation

**Minimal:**
```bash
pip install chatroutes-autobranch
```

**With extras:**
```bash
# FastAPI service (for TypeScript/other languages)
pip install chatroutes-autobranch[service]

# HuggingFace local embeddings
pip install chatroutes-autobranch[hf]

# FAISS for large-scale similarity (1000+ candidates)
pip install chatroutes-autobranch[faiss]

# All features
pip install chatroutes-autobranch[all]
```

---

## Documentation

📘 **[Full Specification](./chatroutes_autobranch_v1.0.md)** – Complete API reference, algorithms, examples, and troubleshooting

**Key Sections:**
- [Philosophy & Design](./chatroutes_autobranch_v1.0.md#0-philosophy) – Core principles
- [Pluggable Interfaces](./chatroutes_autobranch_v1.0.md#4-variability-pluggable-interfaces) – Protocols & implementations
- [Configuration](./chatroutes_autobranch_v1.0.md#5-configuration) – YAML/JSON/env setup
- [Examples](./chatroutes_autobranch_v1.0.md#6-example-usage) – Single-step & multi-generation
- [Tuning Guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k) – How to choose K
- [Common Failures](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns) – Troubleshooting

---

## Examples

### Multi-Generation Tree Exploration

```python
from collections import deque
import time

# User provides LLM generation function
def my_llm_generate(parent: Candidate, n: int) -> list[Candidate]:
    # Your LLM call here (OpenAI, Anthropic, etc.)
    responses = llm_api.generate(parent.text, n=n)
    return [Candidate(id=f"{parent.id}_{i}", text=r) for i, r in enumerate(responses)]

# Setup
selector = BranchSelector.from_config(load_config("config.yaml"))
budget_manager = BudgetManager(Budget(max_nodes=50, max_tokens=20000))

# Tree exploration
queue = deque([root_candidate])
while queue:
    current = queue.popleft()
    children = my_llm_generate(current, n=5)

    # Check budget before selection
    if not budget_manager.admit(n_new=5, est_tokens=1000, est_ms=2000):
        break

    # Select best branches
    result = selector.step(current, children)
    budget_manager.update(actual_tokens=1200, actual_ms=1800)

    # Continue with kept candidates
    queue.extend(result.kept)

    # Stop if entropy is low (converged)
    if not result.metrics["entropy"]["continue"]:
        break
```

### Custom Scorer

```python
from chatroutes_autobranch import Scorer, Candidate, ScoredCandidate

class DomainScorer(Scorer):
    def score(self, parent: Candidate, candidates: list[Candidate]) -> list[ScoredCandidate]:
        scored = []
        for c in candidates:
            # Custom logic: prefer longer, detailed responses
            detail_score = min(len(c.text) / 1000, 1.0)
            scored.append(ScoredCandidate(id=c.id, text=c.text, score=detail_score))
        return scored

# Use in pipeline
beam = BeamSelector(k=3, scorer=DomainScorer())
selector = BranchSelector(beam, novelty, entropy, budget)
```

### FastAPI Service (for TypeScript/other languages)

```python
# server.py
from fastapi import FastAPI
from chatroutes_autobranch import BranchSelector
from chatroutes_autobranch.config import load_config_from_file

app = FastAPI()
_config = load_config_from_file("config.yaml")

@app.post("/select")
async def select(parent: dict, candidates: list[dict]):
    # Create fresh selector per request (thread-safe)
    selector = BranchSelector.from_config(_config)
    result = selector.step(
        Candidate(**parent),
        [Candidate(**c) for c in candidates]
    )
    return {
        "kept": [{"id": c.id, "score": c.score} for c in result.kept],
        "metrics": result.metrics
    }

# Run: uvicorn server:app
```

**TypeScript client:**
```typescript
const response = await fetch('http://localhost:8000/select', {
  method: 'POST',
  body: JSON.stringify({ parent, candidates })
});
const { kept, metrics } = await response.json();
```

---

## Features

### Beam Search
- Top-K selection by composite scoring
- Deterministic tie-breaking (lexicographic ID ordering)
- Configurable weights: confidence, relevance, novelty, intent alignment, historical reward

### Novelty Pruning
- **Cosine similarity:** Remove candidates above threshold (e.g., 0.85)
- **MMR (Maximal Marginal Relevance):** Balance relevance vs diversity with λ parameter
- Preserves score ordering (best candidates kept first)

### Entropy-Based Stopping
- Shannon entropy on K-means clusters of embeddings
- Delta-entropy tracking (stop if change < epsilon)
- Handles edge cases (0, 1, 2 candidates)
- Normalized to [0,1] scale

### Budget Management
- **Caps:** max_nodes, max_tokens, max_ms
- **Modes:** strict (raise on exceeded) or soft (return False, allow fallback)
- **Pre-admit:** Check budget before generation
- **Post-update:** Record actual usage for rolling averages

### Observability
- Structured JSON logging (PII-safe by default)
- OpenTelemetry spans (optional)
- Rich metrics per step (kept/pruned counts, scores, entropy, budget usage)

### Checkpointing
- Serialize selector state (entropy history, budget snapshot)
- Resume from checkpoint (pause/resume tree exploration)
- Schema versioning for backward compatibility

---

## Integrations

**LangChain:**
```python
from langchain.chains import LLMChain
from chatroutes_autobranch import Candidate, BranchSelector

def generate_and_select(query: str, chain: LLMChain, selector: BranchSelector):
    # Generate N candidates via LangChain
    responses = chain.generate([{"query": query}] * 5)
    candidates = [Candidate(id=f"c{i}", text=r.text) for i, r in enumerate(responses.generations[0])]

    # Select best
    parent = Candidate(id="root", text=query)
    result = selector.step(parent, candidates)
    return result.kept
```

**LlamaIndex:** Similar pattern using `QueryEngine.query()` for generation

**Raw APIs (OpenAI, Anthropic):** See [multi-generation example](#multi-generation-tree-exploration)

---

## Performance

**Benchmarks** (M1 Max, OpenAI embeddings):

| Candidates | Beam K | Latency (p50) | Bottleneck |
|-----------|--------|---------------|------------|
| 10 | 3 | 240ms | Embedding API |
| 50 | 5 | 520ms | Embedding API |
| 100 | 10 | 1.1s | Novelty O(N²) |
| 500 | 10 | 4.2s | Use FAISS |

**Optimization tips:**
- Use local embeddings (HuggingFace) for <100ms latency
- Enable FAISS for 100+ candidates
- Batch embedding calls (`batch_size: 64` in config)
- Global embedding cache for repeated candidates

---

## Development

**Setup:**
```bash
git clone https://github.com/chatroutes/chatroutes-autobranch
cd chatroutes-autobranch
pip install -e .[dev]
```

**Run tests:**
```bash
pytest tests/
pytest tests/ -v --cov=chatroutes_autobranch  # With coverage
```

**Type checking:**
```bash
mypy src/
```

**Formatting:**
```bash
black src/ tests/
ruff check src/ tests/
```

**Benchmarks:**
```bash
pytest bench/ --benchmark-only
```

---

## Contributing

We welcome contributions! Please see our [contributing guidelines](./CONTRIBUTING.md).

**Areas we'd love help with:**
- Additional novelty algorithms (DPP, k-DPP)
- More embedding providers (Cohere, Voyage AI)
- Adaptive K scheduling (auto-tune beam width)
- Tree visualization tools
- More examples (specific domains)

**How to contribute:**
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Run tests and type checking
5. Submit a Pull Request

---

## Roadmap

- **v1.0.0** ✅ **RELEASED** (January 2025): Core components, beam search, MMR novelty, cosine filtering, entropy stopping, budget management, full test suite
- **v1.1.0** (Q2 2025): FAISS support for large-scale similarity, adaptive K scheduling
- **v1.2.0** (Q3 2025): Tree visualization tools, FastAPI service for multi-language support
- **v1.3.0** (Q4 2025): Async/await support, cluster-aware pruning
- **v2.0.0** (Q1 2026): gRPC service, TypeScript SDK, breaking API improvements

---

## FAQ

**Q: Do I need ChatRoutes cloud to use this?**
A: No. This library is standalone and has zero cloud dependencies. Use it with any LLM provider.

**Q: Can I use this with TypeScript/JavaScript?**
A: Yes. Run the FastAPI service and call via HTTP. Native TS SDK planned for v2.0.0.

**Q: How do I choose beam width K?**
A: Start with K=3-5. Use budget formula: `K ≈ (budget/tokens_per_branch)^(1/depth)`. See [tuning guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k).

**Q: What if all candidates get pruned by novelty?**
A: Lower threshold (e.g., 0.75) or switch to MMR. See [troubleshooting](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns).

**Q: Is this deterministic?**
A: Yes, with fixed random seeds and deterministic tie-breaking. See [tests](./chatroutes_autobranch_v1.0.md#7-tests-pytest).

---

## License

MIT License - see [LICENSE](./LICENSE) file for details.

---

## Acknowledgements

Inspired by research in beam search, diverse selection (MMR, DPP), and LLM orchestration patterns. Built to be practical, swappable, and friendly for contributors.

Special thanks to the open-source community for tools and inspiration: LangChain, LlamaIndex, HuggingFace Transformers, FAISS, and the broader LLM ecosystem.

---

## Links

- **Documentation:** [Full Specification](./chatroutes_autobranch_v1.0.md)
- **Issues:** [GitHub Issues](https://github.com/chatroutes/chatroutes-autobranch/issues)
- **Discussions:** [GitHub Discussions](https://github.com/chatroutes/chatroutes-autobranch/discussions)
- **Changelog:** [CHANGELOG.md](./CHANGELOG.md)
- **PyPI:** [pypi.org/project/chatroutes-autobranch](https://pypi.org/project/chatroutes-autobranch)

---

**Built with ❤️ by the ChatRoutes team. Open to the community.**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "chatroutes-autobranch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "llm, beam-search, tree-of-thought, branching, ai, machine-learning",
    "author": null,
    "author_email": "ChatRoutes Team <hello@chatroutes.com>",
    "download_url": "https://files.pythonhosted.org/packages/65/04/190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c/chatroutes_autobranch-1.0.1.tar.gz",
    "platform": null,
    "description": "# chatroutes-autobranch\r\n\r\n**Controlled branching generation for LLM applications**\r\n\r\n[![PyPI version](https://badge.fury.io/py/chatroutes-autobranch.svg)](https://badge.fury.io/py/chatroutes-autobranch)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\r\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)\r\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\r\n\r\nModern LLM applications often need to explore multiple reasoning paths (tree-of-thought, beam search, multi-agent systems) while staying **usable** and **affordable**. `chatroutes-autobranch` provides clean, standalone primitives for:\r\n\r\n- \ud83c\udfaf **Beam Search** \u2013 Pick the *best K* candidates by configurable scoring\r\n- \ud83c\udf08 **Diversity Control** \u2013 Ensure variety via novelty pruning (cosine similarity, MMR)\r\n- \ud83d\uded1 **Smart Stopping** \u2013 Know when to stop via entropy/information-gain metrics\r\n- \ud83d\udcb0 **Budget Management** \u2013 Keep costs predictable with token/time/node caps\r\n- \ud83d\udd0c **Pluggable Design** \u2013 Swap any component (scorer, embeddings, stopping criteria)\r\n\r\n**Key Features:**\r\n- \u2705 Deterministic & reproducible (fixed tie-breaking, seeded clustering)\r\n- \u2705 Embedding-agnostic (OpenAI, HuggingFace, or custom)\r\n- \u2705 Production-ready (thread-safe, observable, checkpoint/resume)\r\n- \u2705 Framework-friendly (works with LangChain, LlamaIndex, or raw LLM APIs)\r\n- \u2705 Zero vendor lock-in (MIT License, no cloud dependencies)\r\n\r\n---\r\n\r\n## \ud83d\ude80 Interactive Demos (Try it Now!)\r\n\r\n### Getting Started Demo (Recommended)\r\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/getting_started_demo.ipynb)\r\n\r\n**Perfect for first-time users!** Learn the fundamentals in 5 minutes:\r\n- \u2705 Installation and setup\r\n- \u2705 Basic beam search examples\r\n- \u2705 Multi-strategy scoring\r\n- \u2705 Novelty filtering\r\n- \u2705 Complete pipeline with budget control\r\n\r\n**No setup required** - runs entirely in your browser!\r\n\r\n### Creative Writing Scenario (Advanced)\r\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)\r\n\r\n**See it in action with a real LLM!** Complete creative writing assistant:\r\n- \u2705 Full Ollama integration (free, local inference)\r\n- \u2705 Multi-turn branching (tree exploration)\r\n- \u2705 GPU/CPU performance comparison\r\n- \u2705 4 complete story scenarios\r\n\r\n[**\ud83d\udcda View all notebooks \u2192**](notebooks/README.md)\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n**Install:**\r\n```bash\r\npip install chatroutes-autobranch\r\n```\r\n\r\n**Basic Usage:**\r\n```python\r\nfrom chatroutes_autobranch import BranchSelector, Candidate\r\nfrom chatroutes_autobranch.config import load_config\r\n\r\n# Load config (or use dict/env vars)\r\nselector = BranchSelector.from_config(load_config(\"config.yaml\"))\r\n\r\n# Define parent and candidate branches\r\nparent = Candidate(id=\"root\", text=\"Explain photosynthesis simply\")\r\ncandidates = [\r\n    Candidate(id=\"c1\", text=\"Start with sunlight absorption\"),\r\n    Candidate(id=\"c2\", text=\"Begin with glucose production\"),\r\n    Candidate(id=\"c3\", text=\"Explain chlorophyll's role\"),\r\n]\r\n\r\n# Select best branches (applies beam \u2192 novelty \u2192 entropy pipeline)\r\nresult = selector.step(parent, candidates)\r\n\r\nprint(f\"Kept: {[c.id for c in result.kept]}\")\r\nprint(f\"Entropy: {result.metrics['entropy']['value']:.2f}\")\r\nprint(f\"Should continue: {result.metrics['entropy']['continue']}\")\r\n```\r\n\r\n**Config (`config.yaml`):**\r\n```yaml\r\nbeam:\r\n  k: 3  # Keep top 3 by score\r\n  weights: {confidence: 0.4, relevance: 0.3, novelty_parent: 0.2}\r\n\r\nnovelty:\r\n  method: cosine  # or 'mmr' for Maximal Marginal Relevance\r\n  threshold: 0.85\r\n\r\nentropy:\r\n  min_entropy: 0.6  # Stop if diversity drops below 60%\r\n\r\nembeddings:\r\n  provider: openai\r\n  model: text-embedding-3-large\r\n```\r\n\r\n---\r\n\r\n## Why Use This?\r\n\r\n**Problem:** Exploring multiple LLM reasoning paths (e.g., tree-of-thought) quickly becomes:\r\n- **Expensive** \u2013 Exponential growth of branches drains API budgets\r\n- **Redundant** \u2013 Models generate similar outputs (mode collapse)\r\n- **Uncontrolled** \u2013 No clear stopping criteria (when is \"enough\" exploration?)\r\n\r\n**Solution:** `chatroutes-autobranch` gives you:\r\n1. **Beam Search** to keep only the top-K candidates (quality filtering)\r\n2. **Novelty Pruning** to remove similar outputs (diversity enforcement)\r\n3. **Entropy Stopping** to detect when you've explored enough (convergence detection)\r\n4. **Budget Limits** to cap costs before runaway spending\r\n\r\n**Result:** Controlled, efficient tree exploration with predictable costs.\r\n\r\n---\r\n\r\n## Use Cases\r\n\r\n| Scenario | Configuration | Benefit |\r\n|----------|--------------|---------|\r\n| **Tree-of-Thought Reasoning** | K=5, cosine novelty, entropy stopping | Explore diverse reasoning paths without explosion |\r\n| **Multi-Agent Debate** | K=3, MMR novelty (\u03bb=0.3) | Select diverse agent perspectives, avoid redundancy |\r\n| **Code Generation** | K=4, high relevance weight | Generate varied solutions, prune duplicates |\r\n| **Creative Writing** | K=8, low novelty threshold | High diversity, explore creative space |\r\n| **Factual Q&A** | K=2, strict budget | Focus on accuracy, minimal branching |\r\n\r\n---\r\n\r\n## Architecture\r\n\r\n**Pipeline (fixed order):**\r\n```\r\nRaw Candidates (N)\r\n    \u2193\r\n1. Scoring (composite: confidence + relevance + novelty + intent + reward)\r\n    \u2193\r\n2. Beam Selection (top K by score, deterministic tie-breaking)\r\n    \u2193\r\n3. Novelty Filtering (prune similar via cosine/MMR)\r\n    \u2193\r\n4. Entropy Check (compute diversity, decide if should continue)\r\n    \u2193\r\n5. Result (kept + pruned + metrics)\r\n```\r\n\r\n**Pluggable Components:**\r\n- **Scorer**: Composite (built-in) or custom\r\n- **EmbeddingProvider**: OpenAI, HuggingFace, or custom\r\n- **NoveltyFilter**: Cosine threshold or MMR\r\n- **EntropyStopper**: Shannon entropy or custom\r\n- **BudgetManager**: Token/time/node caps\r\n\r\nAll components use **Protocol** (duck typing) \u2013 swap any part without touching others.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Minimal:**\r\n```bash\r\npip install chatroutes-autobranch\r\n```\r\n\r\n**With extras:**\r\n```bash\r\n# FastAPI service (for TypeScript/other languages)\r\npip install chatroutes-autobranch[service]\r\n\r\n# HuggingFace local embeddings\r\npip install chatroutes-autobranch[hf]\r\n\r\n# FAISS for large-scale similarity (1000+ candidates)\r\npip install chatroutes-autobranch[faiss]\r\n\r\n# All features\r\npip install chatroutes-autobranch[all]\r\n```\r\n\r\n---\r\n\r\n## Documentation\r\n\r\n\ud83d\udcd8 **[Full Specification](./chatroutes_autobranch_v1.0.md)** \u2013 Complete API reference, algorithms, examples, and troubleshooting\r\n\r\n**Key Sections:**\r\n- [Philosophy & Design](./chatroutes_autobranch_v1.0.md#0-philosophy) \u2013 Core principles\r\n- [Pluggable Interfaces](./chatroutes_autobranch_v1.0.md#4-variability-pluggable-interfaces) \u2013 Protocols & implementations\r\n- [Configuration](./chatroutes_autobranch_v1.0.md#5-configuration) \u2013 YAML/JSON/env setup\r\n- [Examples](./chatroutes_autobranch_v1.0.md#6-example-usage) \u2013 Single-step & multi-generation\r\n- [Tuning Guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k) \u2013 How to choose K\r\n- [Common Failures](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns) \u2013 Troubleshooting\r\n\r\n---\r\n\r\n## Examples\r\n\r\n### Multi-Generation Tree Exploration\r\n\r\n```python\r\nfrom collections import deque\r\nimport time\r\n\r\n# User provides LLM generation function\r\ndef my_llm_generate(parent: Candidate, n: int) -> list[Candidate]:\r\n    # Your LLM call here (OpenAI, Anthropic, etc.)\r\n    responses = llm_api.generate(parent.text, n=n)\r\n    return [Candidate(id=f\"{parent.id}_{i}\", text=r) for i, r in enumerate(responses)]\r\n\r\n# Setup\r\nselector = BranchSelector.from_config(load_config(\"config.yaml\"))\r\nbudget_manager = BudgetManager(Budget(max_nodes=50, max_tokens=20000))\r\n\r\n# Tree exploration\r\nqueue = deque([root_candidate])\r\nwhile queue:\r\n    current = queue.popleft()\r\n    children = my_llm_generate(current, n=5)\r\n\r\n    # Check budget before selection\r\n    if not budget_manager.admit(n_new=5, est_tokens=1000, est_ms=2000):\r\n        break\r\n\r\n    # Select best branches\r\n    result = selector.step(current, children)\r\n    budget_manager.update(actual_tokens=1200, actual_ms=1800)\r\n\r\n    # Continue with kept candidates\r\n    queue.extend(result.kept)\r\n\r\n    # Stop if entropy is low (converged)\r\n    if not result.metrics[\"entropy\"][\"continue\"]:\r\n        break\r\n```\r\n\r\n### Custom Scorer\r\n\r\n```python\r\nfrom chatroutes_autobranch import Scorer, Candidate, ScoredCandidate\r\n\r\nclass DomainScorer(Scorer):\r\n    def score(self, parent: Candidate, candidates: list[Candidate]) -> list[ScoredCandidate]:\r\n        scored = []\r\n        for c in candidates:\r\n            # Custom logic: prefer longer, detailed responses\r\n            detail_score = min(len(c.text) / 1000, 1.0)\r\n            scored.append(ScoredCandidate(id=c.id, text=c.text, score=detail_score))\r\n        return scored\r\n\r\n# Use in pipeline\r\nbeam = BeamSelector(k=3, scorer=DomainScorer())\r\nselector = BranchSelector(beam, novelty, entropy, budget)\r\n```\r\n\r\n### FastAPI Service (for TypeScript/other languages)\r\n\r\n```python\r\n# server.py\r\nfrom fastapi import FastAPI\r\nfrom chatroutes_autobranch import BranchSelector\r\nfrom chatroutes_autobranch.config import load_config_from_file\r\n\r\napp = FastAPI()\r\n_config = load_config_from_file(\"config.yaml\")\r\n\r\n@app.post(\"/select\")\r\nasync def select(parent: dict, candidates: list[dict]):\r\n    # Create fresh selector per request (thread-safe)\r\n    selector = BranchSelector.from_config(_config)\r\n    result = selector.step(\r\n        Candidate(**parent),\r\n        [Candidate(**c) for c in candidates]\r\n    )\r\n    return {\r\n        \"kept\": [{\"id\": c.id, \"score\": c.score} for c in result.kept],\r\n        \"metrics\": result.metrics\r\n    }\r\n\r\n# Run: uvicorn server:app\r\n```\r\n\r\n**TypeScript client:**\r\n```typescript\r\nconst response = await fetch('http://localhost:8000/select', {\r\n  method: 'POST',\r\n  body: JSON.stringify({ parent, candidates })\r\n});\r\nconst { kept, metrics } = await response.json();\r\n```\r\n\r\n---\r\n\r\n## Features\r\n\r\n### Beam Search\r\n- Top-K selection by composite scoring\r\n- Deterministic tie-breaking (lexicographic ID ordering)\r\n- Configurable weights: confidence, relevance, novelty, intent alignment, historical reward\r\n\r\n### Novelty Pruning\r\n- **Cosine similarity:** Remove candidates above threshold (e.g., 0.85)\r\n- **MMR (Maximal Marginal Relevance):** Balance relevance vs diversity with \u03bb parameter\r\n- Preserves score ordering (best candidates kept first)\r\n\r\n### Entropy-Based Stopping\r\n- Shannon entropy on K-means clusters of embeddings\r\n- Delta-entropy tracking (stop if change < epsilon)\r\n- Handles edge cases (0, 1, 2 candidates)\r\n- Normalized to [0,1] scale\r\n\r\n### Budget Management\r\n- **Caps:** max_nodes, max_tokens, max_ms\r\n- **Modes:** strict (raise on exceeded) or soft (return False, allow fallback)\r\n- **Pre-admit:** Check budget before generation\r\n- **Post-update:** Record actual usage for rolling averages\r\n\r\n### Observability\r\n- Structured JSON logging (PII-safe by default)\r\n- OpenTelemetry spans (optional)\r\n- Rich metrics per step (kept/pruned counts, scores, entropy, budget usage)\r\n\r\n### Checkpointing\r\n- Serialize selector state (entropy history, budget snapshot)\r\n- Resume from checkpoint (pause/resume tree exploration)\r\n- Schema versioning for backward compatibility\r\n\r\n---\r\n\r\n## Integrations\r\n\r\n**LangChain:**\r\n```python\r\nfrom langchain.chains import LLMChain\r\nfrom chatroutes_autobranch import Candidate, BranchSelector\r\n\r\ndef generate_and_select(query: str, chain: LLMChain, selector: BranchSelector):\r\n    # Generate N candidates via LangChain\r\n    responses = chain.generate([{\"query\": query}] * 5)\r\n    candidates = [Candidate(id=f\"c{i}\", text=r.text) for i, r in enumerate(responses.generations[0])]\r\n\r\n    # Select best\r\n    parent = Candidate(id=\"root\", text=query)\r\n    result = selector.step(parent, candidates)\r\n    return result.kept\r\n```\r\n\r\n**LlamaIndex:** Similar pattern using `QueryEngine.query()` for generation\r\n\r\n**Raw APIs (OpenAI, Anthropic):** See [multi-generation example](#multi-generation-tree-exploration)\r\n\r\n---\r\n\r\n## Performance\r\n\r\n**Benchmarks** (M1 Max, OpenAI embeddings):\r\n\r\n| Candidates | Beam K | Latency (p50) | Bottleneck |\r\n|-----------|--------|---------------|------------|\r\n| 10 | 3 | 240ms | Embedding API |\r\n| 50 | 5 | 520ms | Embedding API |\r\n| 100 | 10 | 1.1s | Novelty O(N\u00b2) |\r\n| 500 | 10 | 4.2s | Use FAISS |\r\n\r\n**Optimization tips:**\r\n- Use local embeddings (HuggingFace) for <100ms latency\r\n- Enable FAISS for 100+ candidates\r\n- Batch embedding calls (`batch_size: 64` in config)\r\n- Global embedding cache for repeated candidates\r\n\r\n---\r\n\r\n## Development\r\n\r\n**Setup:**\r\n```bash\r\ngit clone https://github.com/chatroutes/chatroutes-autobranch\r\ncd chatroutes-autobranch\r\npip install -e .[dev]\r\n```\r\n\r\n**Run tests:**\r\n```bash\r\npytest tests/\r\npytest tests/ -v --cov=chatroutes_autobranch  # With coverage\r\n```\r\n\r\n**Type checking:**\r\n```bash\r\nmypy src/\r\n```\r\n\r\n**Formatting:**\r\n```bash\r\nblack src/ tests/\r\nruff check src/ tests/\r\n```\r\n\r\n**Benchmarks:**\r\n```bash\r\npytest bench/ --benchmark-only\r\n```\r\n\r\n---\r\n\r\n## Contributing\r\n\r\nWe welcome contributions! Please see our [contributing guidelines](./CONTRIBUTING.md).\r\n\r\n**Areas we'd love help with:**\r\n- Additional novelty algorithms (DPP, k-DPP)\r\n- More embedding providers (Cohere, Voyage AI)\r\n- Adaptive K scheduling (auto-tune beam width)\r\n- Tree visualization tools\r\n- More examples (specific domains)\r\n\r\n**How to contribute:**\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Make your changes with tests\r\n4. Run tests and type checking\r\n5. Submit a Pull Request\r\n\r\n---\r\n\r\n## Roadmap\r\n\r\n- **v1.0.0** \u2705 **RELEASED** (January 2025): Core components, beam search, MMR novelty, cosine filtering, entropy stopping, budget management, full test suite\r\n- **v1.1.0** (Q2 2025): FAISS support for large-scale similarity, adaptive K scheduling\r\n- **v1.2.0** (Q3 2025): Tree visualization tools, FastAPI service for multi-language support\r\n- **v1.3.0** (Q4 2025): Async/await support, cluster-aware pruning\r\n- **v2.0.0** (Q1 2026): gRPC service, TypeScript SDK, breaking API improvements\r\n\r\n---\r\n\r\n## FAQ\r\n\r\n**Q: Do I need ChatRoutes cloud to use this?**\r\nA: No. This library is standalone and has zero cloud dependencies. Use it with any LLM provider.\r\n\r\n**Q: Can I use this with TypeScript/JavaScript?**\r\nA: Yes. Run the FastAPI service and call via HTTP. Native TS SDK planned for v2.0.0.\r\n\r\n**Q: How do I choose beam width K?**\r\nA: Start with K=3-5. Use budget formula: `K \u2248 (budget/tokens_per_branch)^(1/depth)`. See [tuning guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k).\r\n\r\n**Q: What if all candidates get pruned by novelty?**\r\nA: Lower threshold (e.g., 0.75) or switch to MMR. See [troubleshooting](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns).\r\n\r\n**Q: Is this deterministic?**\r\nA: Yes, with fixed random seeds and deterministic tie-breaking. See [tests](./chatroutes_autobranch_v1.0.md#7-tests-pytest).\r\n\r\n---\r\n\r\n## License\r\n\r\nMIT License - see [LICENSE](./LICENSE) file for details.\r\n\r\n---\r\n\r\n## Acknowledgements\r\n\r\nInspired by research in beam search, diverse selection (MMR, DPP), and LLM orchestration patterns. Built to be practical, swappable, and friendly for contributors.\r\n\r\nSpecial thanks to the open-source community for tools and inspiration: LangChain, LlamaIndex, HuggingFace Transformers, FAISS, and the broader LLM ecosystem.\r\n\r\n---\r\n\r\n## Links\r\n\r\n- **Documentation:** [Full Specification](./chatroutes_autobranch_v1.0.md)\r\n- **Issues:** [GitHub Issues](https://github.com/chatroutes/chatroutes-autobranch/issues)\r\n- **Discussions:** [GitHub Discussions](https://github.com/chatroutes/chatroutes-autobranch/discussions)\r\n- **Changelog:** [CHANGELOG.md](./CHANGELOG.md)\r\n- **PyPI:** [pypi.org/project/chatroutes-autobranch](https://pypi.org/project/chatroutes-autobranch)\r\n\r\n---\r\n\r\n**Built with \u2764\ufe0f by the ChatRoutes team. Open to the community.**\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Intelligent branch exploration for LLM-powered applications",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://github.com/chatroutes/chatroutes-autobranch/blob/main/chatroutes_autobranch_v1.0.md",
        "Homepage": "https://github.com/chatroutes/chatroutes-autobranch",
        "Issues": "https://github.com/chatroutes/chatroutes-autobranch/issues",
        "Repository": "https://github.com/chatroutes/chatroutes-autobranch"
    },
    "split_keywords": [
        "llm",
        " beam-search",
        " tree-of-thought",
        " branching",
        " ai",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0dcbea22494361adf9c92a1f765eb951d7a30e533907bf39aedc3f59d91dcffb",
                "md5": "793dc6b2d4bc2a9a34fa1e59241fea4b",
                "sha256": "98fff865ecbe105fd5424800db051ddef66a8f68eea2aba8e9212f920ee68b15"
            },
            "downloads": -1,
            "filename": "chatroutes_autobranch-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "793dc6b2d4bc2a9a34fa1e59241fea4b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 33481,
            "upload_time": "2025-10-22T23:44:34",
            "upload_time_iso_8601": "2025-10-22T23:44:34.689376Z",
            "url": "https://files.pythonhosted.org/packages/0d/cb/ea22494361adf9c92a1f765eb951d7a30e533907bf39aedc3f59d91dcffb/chatroutes_autobranch-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6504190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c",
                "md5": "756f3f8b26e3bd4249ccf7bf82d86cf9",
                "sha256": "ac807d4a4729d3f4ac368a18c5a98d624695a42ef7559e3c59f947c70403b0ba"
            },
            "downloads": -1,
            "filename": "chatroutes_autobranch-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "756f3f8b26e3bd4249ccf7bf82d86cf9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 40309,
            "upload_time": "2025-10-22T23:44:35",
            "upload_time_iso_8601": "2025-10-22T23:44:35.912518Z",
            "url": "https://files.pythonhosted.org/packages/65/04/190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c/chatroutes_autobranch-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 23:44:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chatroutes",
    "github_project": "chatroutes-autobranch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "chatroutes-autobranch"
}

None