# chatroutes-autobranch
**Controlled branching generation for LLM applications**
[](https://badge.fury.io/py/chatroutes-autobranch)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)
[](https://github.com/psf/black)
Modern LLM applications often need to explore multiple reasoning paths (tree-of-thought, beam search, multi-agent systems) while staying **usable** and **affordable**. `chatroutes-autobranch` provides clean, standalone primitives for:
- šÆ **Beam Search** ā Pick the *best K* candidates by configurable scoring
- š **Diversity Control** ā Ensure variety via novelty pruning (cosine similarity, MMR)
- š **Smart Stopping** ā Know when to stop via entropy/information-gain metrics
- š° **Budget Management** ā Keep costs predictable with token/time/node caps
- š **Pluggable Design** ā Swap any component (scorer, embeddings, stopping criteria)
**Key Features:**
- ā
Deterministic & reproducible (fixed tie-breaking, seeded clustering)
- ā
Embedding-agnostic (OpenAI, HuggingFace, or custom)
- ā
Production-ready (thread-safe, observable, checkpoint/resume)
- ā
Framework-friendly (works with LangChain, LlamaIndex, or raw LLM APIs)
- ā
Zero vendor lock-in (MIT License, no cloud dependencies)
---
## š Interactive Demos (Try it Now!)
### Getting Started Demo (Recommended)
[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/getting_started_demo.ipynb)
**Perfect for first-time users!** Learn the fundamentals in 5 minutes:
- ā
Installation and setup
- ā
Basic beam search examples
- ā
Multi-strategy scoring
- ā
Novelty filtering
- ā
Complete pipeline with budget control
**No setup required** - runs entirely in your browser!
### Creative Writing Scenario (Advanced)
[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)
**See it in action with a real LLM!** Complete creative writing assistant:
- ā
Full Ollama integration (free, local inference)
- ā
Multi-turn branching (tree exploration)
- ā
GPU/CPU performance comparison
- ā
4 complete story scenarios
[**š View all notebooks ā**](notebooks/README.md)
---
## Quick Start
**Install:**
```bash
pip install chatroutes-autobranch
```
**Basic Usage:**
```python
from chatroutes_autobranch import BranchSelector, Candidate
from chatroutes_autobranch.config import load_config
# Load config (or use dict/env vars)
selector = BranchSelector.from_config(load_config("config.yaml"))
# Define parent and candidate branches
parent = Candidate(id="root", text="Explain photosynthesis simply")
candidates = [
Candidate(id="c1", text="Start with sunlight absorption"),
Candidate(id="c2", text="Begin with glucose production"),
Candidate(id="c3", text="Explain chlorophyll's role"),
]
# Select best branches (applies beam ā novelty ā entropy pipeline)
result = selector.step(parent, candidates)
print(f"Kept: {[c.id for c in result.kept]}")
print(f"Entropy: {result.metrics['entropy']['value']:.2f}")
print(f"Should continue: {result.metrics['entropy']['continue']}")
```
**Config (`config.yaml`):**
```yaml
beam:
k: 3 # Keep top 3 by score
weights: {confidence: 0.4, relevance: 0.3, novelty_parent: 0.2}
novelty:
method: cosine # or 'mmr' for Maximal Marginal Relevance
threshold: 0.85
entropy:
min_entropy: 0.6 # Stop if diversity drops below 60%
embeddings:
provider: openai
model: text-embedding-3-large
```
---
## Why Use This?
**Problem:** Exploring multiple LLM reasoning paths (e.g., tree-of-thought) quickly becomes:
- **Expensive** ā Exponential growth of branches drains API budgets
- **Redundant** ā Models generate similar outputs (mode collapse)
- **Uncontrolled** ā No clear stopping criteria (when is "enough" exploration?)
**Solution:** `chatroutes-autobranch` gives you:
1. **Beam Search** to keep only the top-K candidates (quality filtering)
2. **Novelty Pruning** to remove similar outputs (diversity enforcement)
3. **Entropy Stopping** to detect when you've explored enough (convergence detection)
4. **Budget Limits** to cap costs before runaway spending
**Result:** Controlled, efficient tree exploration with predictable costs.
---
## Use Cases
| Scenario | Configuration | Benefit |
|----------|--------------|---------|
| **Tree-of-Thought Reasoning** | K=5, cosine novelty, entropy stopping | Explore diverse reasoning paths without explosion |
| **Multi-Agent Debate** | K=3, MMR novelty (Ī»=0.3) | Select diverse agent perspectives, avoid redundancy |
| **Code Generation** | K=4, high relevance weight | Generate varied solutions, prune duplicates |
| **Creative Writing** | K=8, low novelty threshold | High diversity, explore creative space |
| **Factual Q&A** | K=2, strict budget | Focus on accuracy, minimal branching |
---
## Architecture
**Pipeline (fixed order):**
```
Raw Candidates (N)
ā
1. Scoring (composite: confidence + relevance + novelty + intent + reward)
ā
2. Beam Selection (top K by score, deterministic tie-breaking)
ā
3. Novelty Filtering (prune similar via cosine/MMR)
ā
4. Entropy Check (compute diversity, decide if should continue)
ā
5. Result (kept + pruned + metrics)
```
**Pluggable Components:**
- **Scorer**: Composite (built-in) or custom
- **EmbeddingProvider**: OpenAI, HuggingFace, or custom
- **NoveltyFilter**: Cosine threshold or MMR
- **EntropyStopper**: Shannon entropy or custom
- **BudgetManager**: Token/time/node caps
All components use **Protocol** (duck typing) ā swap any part without touching others.
---
## Installation
**Minimal:**
```bash
pip install chatroutes-autobranch
```
**With extras:**
```bash
# FastAPI service (for TypeScript/other languages)
pip install chatroutes-autobranch[service]
# HuggingFace local embeddings
pip install chatroutes-autobranch[hf]
# FAISS for large-scale similarity (1000+ candidates)
pip install chatroutes-autobranch[faiss]
# All features
pip install chatroutes-autobranch[all]
```
---
## Documentation
š **[Full Specification](./chatroutes_autobranch_v1.0.md)** ā Complete API reference, algorithms, examples, and troubleshooting
**Key Sections:**
- [Philosophy & Design](./chatroutes_autobranch_v1.0.md#0-philosophy) ā Core principles
- [Pluggable Interfaces](./chatroutes_autobranch_v1.0.md#4-variability-pluggable-interfaces) ā Protocols & implementations
- [Configuration](./chatroutes_autobranch_v1.0.md#5-configuration) ā YAML/JSON/env setup
- [Examples](./chatroutes_autobranch_v1.0.md#6-example-usage) ā Single-step & multi-generation
- [Tuning Guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k) ā How to choose K
- [Common Failures](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns) ā Troubleshooting
---
## Examples
### Multi-Generation Tree Exploration
```python
from collections import deque
import time
# User provides LLM generation function
def my_llm_generate(parent: Candidate, n: int) -> list[Candidate]:
# Your LLM call here (OpenAI, Anthropic, etc.)
responses = llm_api.generate(parent.text, n=n)
return [Candidate(id=f"{parent.id}_{i}", text=r) for i, r in enumerate(responses)]
# Setup
selector = BranchSelector.from_config(load_config("config.yaml"))
budget_manager = BudgetManager(Budget(max_nodes=50, max_tokens=20000))
# Tree exploration
queue = deque([root_candidate])
while queue:
current = queue.popleft()
children = my_llm_generate(current, n=5)
# Check budget before selection
if not budget_manager.admit(n_new=5, est_tokens=1000, est_ms=2000):
break
# Select best branches
result = selector.step(current, children)
budget_manager.update(actual_tokens=1200, actual_ms=1800)
# Continue with kept candidates
queue.extend(result.kept)
# Stop if entropy is low (converged)
if not result.metrics["entropy"]["continue"]:
break
```
### Custom Scorer
```python
from chatroutes_autobranch import Scorer, Candidate, ScoredCandidate
class DomainScorer(Scorer):
def score(self, parent: Candidate, candidates: list[Candidate]) -> list[ScoredCandidate]:
scored = []
for c in candidates:
# Custom logic: prefer longer, detailed responses
detail_score = min(len(c.text) / 1000, 1.0)
scored.append(ScoredCandidate(id=c.id, text=c.text, score=detail_score))
return scored
# Use in pipeline
beam = BeamSelector(k=3, scorer=DomainScorer())
selector = BranchSelector(beam, novelty, entropy, budget)
```
### FastAPI Service (for TypeScript/other languages)
```python
# server.py
from fastapi import FastAPI
from chatroutes_autobranch import BranchSelector
from chatroutes_autobranch.config import load_config_from_file
app = FastAPI()
_config = load_config_from_file("config.yaml")
@app.post("/select")
async def select(parent: dict, candidates: list[dict]):
# Create fresh selector per request (thread-safe)
selector = BranchSelector.from_config(_config)
result = selector.step(
Candidate(**parent),
[Candidate(**c) for c in candidates]
)
return {
"kept": [{"id": c.id, "score": c.score} for c in result.kept],
"metrics": result.metrics
}
# Run: uvicorn server:app
```
**TypeScript client:**
```typescript
const response = await fetch('http://localhost:8000/select', {
method: 'POST',
body: JSON.stringify({ parent, candidates })
});
const { kept, metrics } = await response.json();
```
---
## Features
### Beam Search
- Top-K selection by composite scoring
- Deterministic tie-breaking (lexicographic ID ordering)
- Configurable weights: confidence, relevance, novelty, intent alignment, historical reward
### Novelty Pruning
- **Cosine similarity:** Remove candidates above threshold (e.g., 0.85)
- **MMR (Maximal Marginal Relevance):** Balance relevance vs diversity with Ī» parameter
- Preserves score ordering (best candidates kept first)
### Entropy-Based Stopping
- Shannon entropy on K-means clusters of embeddings
- Delta-entropy tracking (stop if change < epsilon)
- Handles edge cases (0, 1, 2 candidates)
- Normalized to [0,1] scale
### Budget Management
- **Caps:** max_nodes, max_tokens, max_ms
- **Modes:** strict (raise on exceeded) or soft (return False, allow fallback)
- **Pre-admit:** Check budget before generation
- **Post-update:** Record actual usage for rolling averages
### Observability
- Structured JSON logging (PII-safe by default)
- OpenTelemetry spans (optional)
- Rich metrics per step (kept/pruned counts, scores, entropy, budget usage)
### Checkpointing
- Serialize selector state (entropy history, budget snapshot)
- Resume from checkpoint (pause/resume tree exploration)
- Schema versioning for backward compatibility
---
## Integrations
**LangChain:**
```python
from langchain.chains import LLMChain
from chatroutes_autobranch import Candidate, BranchSelector
def generate_and_select(query: str, chain: LLMChain, selector: BranchSelector):
# Generate N candidates via LangChain
responses = chain.generate([{"query": query}] * 5)
candidates = [Candidate(id=f"c{i}", text=r.text) for i, r in enumerate(responses.generations[0])]
# Select best
parent = Candidate(id="root", text=query)
result = selector.step(parent, candidates)
return result.kept
```
**LlamaIndex:** Similar pattern using `QueryEngine.query()` for generation
**Raw APIs (OpenAI, Anthropic):** See [multi-generation example](#multi-generation-tree-exploration)
---
## Performance
**Benchmarks** (M1 Max, OpenAI embeddings):
| Candidates | Beam K | Latency (p50) | Bottleneck |
|-----------|--------|---------------|------------|
| 10 | 3 | 240ms | Embedding API |
| 50 | 5 | 520ms | Embedding API |
| 100 | 10 | 1.1s | Novelty O(N²) |
| 500 | 10 | 4.2s | Use FAISS |
**Optimization tips:**
- Use local embeddings (HuggingFace) for <100ms latency
- Enable FAISS for 100+ candidates
- Batch embedding calls (`batch_size: 64` in config)
- Global embedding cache for repeated candidates
---
## Development
**Setup:**
```bash
git clone https://github.com/chatroutes/chatroutes-autobranch
cd chatroutes-autobranch
pip install -e .[dev]
```
**Run tests:**
```bash
pytest tests/
pytest tests/ -v --cov=chatroutes_autobranch # With coverage
```
**Type checking:**
```bash
mypy src/
```
**Formatting:**
```bash
black src/ tests/
ruff check src/ tests/
```
**Benchmarks:**
```bash
pytest bench/ --benchmark-only
```
---
## Contributing
We welcome contributions! Please see our [contributing guidelines](./CONTRIBUTING.md).
**Areas we'd love help with:**
- Additional novelty algorithms (DPP, k-DPP)
- More embedding providers (Cohere, Voyage AI)
- Adaptive K scheduling (auto-tune beam width)
- Tree visualization tools
- More examples (specific domains)
**How to contribute:**
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Run tests and type checking
5. Submit a Pull Request
---
## Roadmap
- **v1.0.0** ā
**RELEASED** (January 2025): Core components, beam search, MMR novelty, cosine filtering, entropy stopping, budget management, full test suite
- **v1.1.0** (Q2 2025): FAISS support for large-scale similarity, adaptive K scheduling
- **v1.2.0** (Q3 2025): Tree visualization tools, FastAPI service for multi-language support
- **v1.3.0** (Q4 2025): Async/await support, cluster-aware pruning
- **v2.0.0** (Q1 2026): gRPC service, TypeScript SDK, breaking API improvements
---
## FAQ
**Q: Do I need ChatRoutes cloud to use this?**
A: No. This library is standalone and has zero cloud dependencies. Use it with any LLM provider.
**Q: Can I use this with TypeScript/JavaScript?**
A: Yes. Run the FastAPI service and call via HTTP. Native TS SDK planned for v2.0.0.
**Q: How do I choose beam width K?**
A: Start with K=3-5. Use budget formula: `K ā (budget/tokens_per_branch)^(1/depth)`. See [tuning guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k).
**Q: What if all candidates get pruned by novelty?**
A: Lower threshold (e.g., 0.75) or switch to MMR. See [troubleshooting](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns).
**Q: Is this deterministic?**
A: Yes, with fixed random seeds and deterministic tie-breaking. See [tests](./chatroutes_autobranch_v1.0.md#7-tests-pytest).
---
## License
MIT License - see [LICENSE](./LICENSE) file for details.
---
## Acknowledgements
Inspired by research in beam search, diverse selection (MMR, DPP), and LLM orchestration patterns. Built to be practical, swappable, and friendly for contributors.
Special thanks to the open-source community for tools and inspiration: LangChain, LlamaIndex, HuggingFace Transformers, FAISS, and the broader LLM ecosystem.
---
## Links
- **Documentation:** [Full Specification](./chatroutes_autobranch_v1.0.md)
- **Issues:** [GitHub Issues](https://github.com/chatroutes/chatroutes-autobranch/issues)
- **Discussions:** [GitHub Discussions](https://github.com/chatroutes/chatroutes-autobranch/discussions)
- **Changelog:** [CHANGELOG.md](./CHANGELOG.md)
- **PyPI:** [pypi.org/project/chatroutes-autobranch](https://pypi.org/project/chatroutes-autobranch)
---
**Built with ā¤ļø by the ChatRoutes team. Open to the community.**
Raw data
{
"_id": null,
"home_page": null,
"name": "chatroutes-autobranch",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "llm, beam-search, tree-of-thought, branching, ai, machine-learning",
"author": null,
"author_email": "ChatRoutes Team <hello@chatroutes.com>",
"download_url": "https://files.pythonhosted.org/packages/65/04/190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c/chatroutes_autobranch-1.0.1.tar.gz",
"platform": null,
"description": "# chatroutes-autobranch\r\n\r\n**Controlled branching generation for LLM applications**\r\n\r\n[](https://badge.fury.io/py/chatroutes-autobranch)\r\n[](https://opensource.org/licenses/MIT)\r\n[](https://www.python.org/downloads/)\r\n[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)\r\n[](https://github.com/psf/black)\r\n\r\nModern LLM applications often need to explore multiple reasoning paths (tree-of-thought, beam search, multi-agent systems) while staying **usable** and **affordable**. `chatroutes-autobranch` provides clean, standalone primitives for:\r\n\r\n- \ud83c\udfaf **Beam Search** \u2013 Pick the *best K* candidates by configurable scoring\r\n- \ud83c\udf08 **Diversity Control** \u2013 Ensure variety via novelty pruning (cosine similarity, MMR)\r\n- \ud83d\uded1 **Smart Stopping** \u2013 Know when to stop via entropy/information-gain metrics\r\n- \ud83d\udcb0 **Budget Management** \u2013 Keep costs predictable with token/time/node caps\r\n- \ud83d\udd0c **Pluggable Design** \u2013 Swap any component (scorer, embeddings, stopping criteria)\r\n\r\n**Key Features:**\r\n- \u2705 Deterministic & reproducible (fixed tie-breaking, seeded clustering)\r\n- \u2705 Embedding-agnostic (OpenAI, HuggingFace, or custom)\r\n- \u2705 Production-ready (thread-safe, observable, checkpoint/resume)\r\n- \u2705 Framework-friendly (works with LangChain, LlamaIndex, or raw LLM APIs)\r\n- \u2705 Zero vendor lock-in (MIT License, no cloud dependencies)\r\n\r\n---\r\n\r\n## \ud83d\ude80 Interactive Demos (Try it Now!)\r\n\r\n### Getting Started Demo (Recommended)\r\n[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/getting_started_demo.ipynb)\r\n\r\n**Perfect for first-time users!** Learn the fundamentals in 5 minutes:\r\n- \u2705 Installation and setup\r\n- \u2705 Basic beam search examples\r\n- \u2705 Multi-strategy scoring\r\n- \u2705 Novelty filtering\r\n- \u2705 Complete pipeline with budget control\r\n\r\n**No setup required** - runs entirely in your browser!\r\n\r\n### Creative Writing Scenario (Advanced)\r\n[](https://colab.research.google.com/github/chatroutes/chatroutes-autobranch/blob/master/notebooks/creative_writing_colab.ipynb)\r\n\r\n**See it in action with a real LLM!** Complete creative writing assistant:\r\n- \u2705 Full Ollama integration (free, local inference)\r\n- \u2705 Multi-turn branching (tree exploration)\r\n- \u2705 GPU/CPU performance comparison\r\n- \u2705 4 complete story scenarios\r\n\r\n[**\ud83d\udcda View all notebooks \u2192**](notebooks/README.md)\r\n\r\n---\r\n\r\n## Quick Start\r\n\r\n**Install:**\r\n```bash\r\npip install chatroutes-autobranch\r\n```\r\n\r\n**Basic Usage:**\r\n```python\r\nfrom chatroutes_autobranch import BranchSelector, Candidate\r\nfrom chatroutes_autobranch.config import load_config\r\n\r\n# Load config (or use dict/env vars)\r\nselector = BranchSelector.from_config(load_config(\"config.yaml\"))\r\n\r\n# Define parent and candidate branches\r\nparent = Candidate(id=\"root\", text=\"Explain photosynthesis simply\")\r\ncandidates = [\r\n Candidate(id=\"c1\", text=\"Start with sunlight absorption\"),\r\n Candidate(id=\"c2\", text=\"Begin with glucose production\"),\r\n Candidate(id=\"c3\", text=\"Explain chlorophyll's role\"),\r\n]\r\n\r\n# Select best branches (applies beam \u2192 novelty \u2192 entropy pipeline)\r\nresult = selector.step(parent, candidates)\r\n\r\nprint(f\"Kept: {[c.id for c in result.kept]}\")\r\nprint(f\"Entropy: {result.metrics['entropy']['value']:.2f}\")\r\nprint(f\"Should continue: {result.metrics['entropy']['continue']}\")\r\n```\r\n\r\n**Config (`config.yaml`):**\r\n```yaml\r\nbeam:\r\n k: 3 # Keep top 3 by score\r\n weights: {confidence: 0.4, relevance: 0.3, novelty_parent: 0.2}\r\n\r\nnovelty:\r\n method: cosine # or 'mmr' for Maximal Marginal Relevance\r\n threshold: 0.85\r\n\r\nentropy:\r\n min_entropy: 0.6 # Stop if diversity drops below 60%\r\n\r\nembeddings:\r\n provider: openai\r\n model: text-embedding-3-large\r\n```\r\n\r\n---\r\n\r\n## Why Use This?\r\n\r\n**Problem:** Exploring multiple LLM reasoning paths (e.g., tree-of-thought) quickly becomes:\r\n- **Expensive** \u2013 Exponential growth of branches drains API budgets\r\n- **Redundant** \u2013 Models generate similar outputs (mode collapse)\r\n- **Uncontrolled** \u2013 No clear stopping criteria (when is \"enough\" exploration?)\r\n\r\n**Solution:** `chatroutes-autobranch` gives you:\r\n1. **Beam Search** to keep only the top-K candidates (quality filtering)\r\n2. **Novelty Pruning** to remove similar outputs (diversity enforcement)\r\n3. **Entropy Stopping** to detect when you've explored enough (convergence detection)\r\n4. **Budget Limits** to cap costs before runaway spending\r\n\r\n**Result:** Controlled, efficient tree exploration with predictable costs.\r\n\r\n---\r\n\r\n## Use Cases\r\n\r\n| Scenario | Configuration | Benefit |\r\n|----------|--------------|---------|\r\n| **Tree-of-Thought Reasoning** | K=5, cosine novelty, entropy stopping | Explore diverse reasoning paths without explosion |\r\n| **Multi-Agent Debate** | K=3, MMR novelty (\u03bb=0.3) | Select diverse agent perspectives, avoid redundancy |\r\n| **Code Generation** | K=4, high relevance weight | Generate varied solutions, prune duplicates |\r\n| **Creative Writing** | K=8, low novelty threshold | High diversity, explore creative space |\r\n| **Factual Q&A** | K=2, strict budget | Focus on accuracy, minimal branching |\r\n\r\n---\r\n\r\n## Architecture\r\n\r\n**Pipeline (fixed order):**\r\n```\r\nRaw Candidates (N)\r\n \u2193\r\n1. Scoring (composite: confidence + relevance + novelty + intent + reward)\r\n \u2193\r\n2. Beam Selection (top K by score, deterministic tie-breaking)\r\n \u2193\r\n3. Novelty Filtering (prune similar via cosine/MMR)\r\n \u2193\r\n4. Entropy Check (compute diversity, decide if should continue)\r\n \u2193\r\n5. Result (kept + pruned + metrics)\r\n```\r\n\r\n**Pluggable Components:**\r\n- **Scorer**: Composite (built-in) or custom\r\n- **EmbeddingProvider**: OpenAI, HuggingFace, or custom\r\n- **NoveltyFilter**: Cosine threshold or MMR\r\n- **EntropyStopper**: Shannon entropy or custom\r\n- **BudgetManager**: Token/time/node caps\r\n\r\nAll components use **Protocol** (duck typing) \u2013 swap any part without touching others.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Minimal:**\r\n```bash\r\npip install chatroutes-autobranch\r\n```\r\n\r\n**With extras:**\r\n```bash\r\n# FastAPI service (for TypeScript/other languages)\r\npip install chatroutes-autobranch[service]\r\n\r\n# HuggingFace local embeddings\r\npip install chatroutes-autobranch[hf]\r\n\r\n# FAISS for large-scale similarity (1000+ candidates)\r\npip install chatroutes-autobranch[faiss]\r\n\r\n# All features\r\npip install chatroutes-autobranch[all]\r\n```\r\n\r\n---\r\n\r\n## Documentation\r\n\r\n\ud83d\udcd8 **[Full Specification](./chatroutes_autobranch_v1.0.md)** \u2013 Complete API reference, algorithms, examples, and troubleshooting\r\n\r\n**Key Sections:**\r\n- [Philosophy & Design](./chatroutes_autobranch_v1.0.md#0-philosophy) \u2013 Core principles\r\n- [Pluggable Interfaces](./chatroutes_autobranch_v1.0.md#4-variability-pluggable-interfaces) \u2013 Protocols & implementations\r\n- [Configuration](./chatroutes_autobranch_v1.0.md#5-configuration) \u2013 YAML/JSON/env setup\r\n- [Examples](./chatroutes_autobranch_v1.0.md#6-example-usage) \u2013 Single-step & multi-generation\r\n- [Tuning Guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k) \u2013 How to choose K\r\n- [Common Failures](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns) \u2013 Troubleshooting\r\n\r\n---\r\n\r\n## Examples\r\n\r\n### Multi-Generation Tree Exploration\r\n\r\n```python\r\nfrom collections import deque\r\nimport time\r\n\r\n# User provides LLM generation function\r\ndef my_llm_generate(parent: Candidate, n: int) -> list[Candidate]:\r\n # Your LLM call here (OpenAI, Anthropic, etc.)\r\n responses = llm_api.generate(parent.text, n=n)\r\n return [Candidate(id=f\"{parent.id}_{i}\", text=r) for i, r in enumerate(responses)]\r\n\r\n# Setup\r\nselector = BranchSelector.from_config(load_config(\"config.yaml\"))\r\nbudget_manager = BudgetManager(Budget(max_nodes=50, max_tokens=20000))\r\n\r\n# Tree exploration\r\nqueue = deque([root_candidate])\r\nwhile queue:\r\n current = queue.popleft()\r\n children = my_llm_generate(current, n=5)\r\n\r\n # Check budget before selection\r\n if not budget_manager.admit(n_new=5, est_tokens=1000, est_ms=2000):\r\n break\r\n\r\n # Select best branches\r\n result = selector.step(current, children)\r\n budget_manager.update(actual_tokens=1200, actual_ms=1800)\r\n\r\n # Continue with kept candidates\r\n queue.extend(result.kept)\r\n\r\n # Stop if entropy is low (converged)\r\n if not result.metrics[\"entropy\"][\"continue\"]:\r\n break\r\n```\r\n\r\n### Custom Scorer\r\n\r\n```python\r\nfrom chatroutes_autobranch import Scorer, Candidate, ScoredCandidate\r\n\r\nclass DomainScorer(Scorer):\r\n def score(self, parent: Candidate, candidates: list[Candidate]) -> list[ScoredCandidate]:\r\n scored = []\r\n for c in candidates:\r\n # Custom logic: prefer longer, detailed responses\r\n detail_score = min(len(c.text) / 1000, 1.0)\r\n scored.append(ScoredCandidate(id=c.id, text=c.text, score=detail_score))\r\n return scored\r\n\r\n# Use in pipeline\r\nbeam = BeamSelector(k=3, scorer=DomainScorer())\r\nselector = BranchSelector(beam, novelty, entropy, budget)\r\n```\r\n\r\n### FastAPI Service (for TypeScript/other languages)\r\n\r\n```python\r\n# server.py\r\nfrom fastapi import FastAPI\r\nfrom chatroutes_autobranch import BranchSelector\r\nfrom chatroutes_autobranch.config import load_config_from_file\r\n\r\napp = FastAPI()\r\n_config = load_config_from_file(\"config.yaml\")\r\n\r\n@app.post(\"/select\")\r\nasync def select(parent: dict, candidates: list[dict]):\r\n # Create fresh selector per request (thread-safe)\r\n selector = BranchSelector.from_config(_config)\r\n result = selector.step(\r\n Candidate(**parent),\r\n [Candidate(**c) for c in candidates]\r\n )\r\n return {\r\n \"kept\": [{\"id\": c.id, \"score\": c.score} for c in result.kept],\r\n \"metrics\": result.metrics\r\n }\r\n\r\n# Run: uvicorn server:app\r\n```\r\n\r\n**TypeScript client:**\r\n```typescript\r\nconst response = await fetch('http://localhost:8000/select', {\r\n method: 'POST',\r\n body: JSON.stringify({ parent, candidates })\r\n});\r\nconst { kept, metrics } = await response.json();\r\n```\r\n\r\n---\r\n\r\n## Features\r\n\r\n### Beam Search\r\n- Top-K selection by composite scoring\r\n- Deterministic tie-breaking (lexicographic ID ordering)\r\n- Configurable weights: confidence, relevance, novelty, intent alignment, historical reward\r\n\r\n### Novelty Pruning\r\n- **Cosine similarity:** Remove candidates above threshold (e.g., 0.85)\r\n- **MMR (Maximal Marginal Relevance):** Balance relevance vs diversity with \u03bb parameter\r\n- Preserves score ordering (best candidates kept first)\r\n\r\n### Entropy-Based Stopping\r\n- Shannon entropy on K-means clusters of embeddings\r\n- Delta-entropy tracking (stop if change < epsilon)\r\n- Handles edge cases (0, 1, 2 candidates)\r\n- Normalized to [0,1] scale\r\n\r\n### Budget Management\r\n- **Caps:** max_nodes, max_tokens, max_ms\r\n- **Modes:** strict (raise on exceeded) or soft (return False, allow fallback)\r\n- **Pre-admit:** Check budget before generation\r\n- **Post-update:** Record actual usage for rolling averages\r\n\r\n### Observability\r\n- Structured JSON logging (PII-safe by default)\r\n- OpenTelemetry spans (optional)\r\n- Rich metrics per step (kept/pruned counts, scores, entropy, budget usage)\r\n\r\n### Checkpointing\r\n- Serialize selector state (entropy history, budget snapshot)\r\n- Resume from checkpoint (pause/resume tree exploration)\r\n- Schema versioning for backward compatibility\r\n\r\n---\r\n\r\n## Integrations\r\n\r\n**LangChain:**\r\n```python\r\nfrom langchain.chains import LLMChain\r\nfrom chatroutes_autobranch import Candidate, BranchSelector\r\n\r\ndef generate_and_select(query: str, chain: LLMChain, selector: BranchSelector):\r\n # Generate N candidates via LangChain\r\n responses = chain.generate([{\"query\": query}] * 5)\r\n candidates = [Candidate(id=f\"c{i}\", text=r.text) for i, r in enumerate(responses.generations[0])]\r\n\r\n # Select best\r\n parent = Candidate(id=\"root\", text=query)\r\n result = selector.step(parent, candidates)\r\n return result.kept\r\n```\r\n\r\n**LlamaIndex:** Similar pattern using `QueryEngine.query()` for generation\r\n\r\n**Raw APIs (OpenAI, Anthropic):** See [multi-generation example](#multi-generation-tree-exploration)\r\n\r\n---\r\n\r\n## Performance\r\n\r\n**Benchmarks** (M1 Max, OpenAI embeddings):\r\n\r\n| Candidates | Beam K | Latency (p50) | Bottleneck |\r\n|-----------|--------|---------------|------------|\r\n| 10 | 3 | 240ms | Embedding API |\r\n| 50 | 5 | 520ms | Embedding API |\r\n| 100 | 10 | 1.1s | Novelty O(N\u00b2) |\r\n| 500 | 10 | 4.2s | Use FAISS |\r\n\r\n**Optimization tips:**\r\n- Use local embeddings (HuggingFace) for <100ms latency\r\n- Enable FAISS for 100+ candidates\r\n- Batch embedding calls (`batch_size: 64` in config)\r\n- Global embedding cache for repeated candidates\r\n\r\n---\r\n\r\n## Development\r\n\r\n**Setup:**\r\n```bash\r\ngit clone https://github.com/chatroutes/chatroutes-autobranch\r\ncd chatroutes-autobranch\r\npip install -e .[dev]\r\n```\r\n\r\n**Run tests:**\r\n```bash\r\npytest tests/\r\npytest tests/ -v --cov=chatroutes_autobranch # With coverage\r\n```\r\n\r\n**Type checking:**\r\n```bash\r\nmypy src/\r\n```\r\n\r\n**Formatting:**\r\n```bash\r\nblack src/ tests/\r\nruff check src/ tests/\r\n```\r\n\r\n**Benchmarks:**\r\n```bash\r\npytest bench/ --benchmark-only\r\n```\r\n\r\n---\r\n\r\n## Contributing\r\n\r\nWe welcome contributions! Please see our [contributing guidelines](./CONTRIBUTING.md).\r\n\r\n**Areas we'd love help with:**\r\n- Additional novelty algorithms (DPP, k-DPP)\r\n- More embedding providers (Cohere, Voyage AI)\r\n- Adaptive K scheduling (auto-tune beam width)\r\n- Tree visualization tools\r\n- More examples (specific domains)\r\n\r\n**How to contribute:**\r\n1. Fork the repository\r\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Make your changes with tests\r\n4. Run tests and type checking\r\n5. Submit a Pull Request\r\n\r\n---\r\n\r\n## Roadmap\r\n\r\n- **v1.0.0** \u2705 **RELEASED** (January 2025): Core components, beam search, MMR novelty, cosine filtering, entropy stopping, budget management, full test suite\r\n- **v1.1.0** (Q2 2025): FAISS support for large-scale similarity, adaptive K scheduling\r\n- **v1.2.0** (Q3 2025): Tree visualization tools, FastAPI service for multi-language support\r\n- **v1.3.0** (Q4 2025): Async/await support, cluster-aware pruning\r\n- **v2.0.0** (Q1 2026): gRPC service, TypeScript SDK, breaking API improvements\r\n\r\n---\r\n\r\n## FAQ\r\n\r\n**Q: Do I need ChatRoutes cloud to use this?**\r\nA: No. This library is standalone and has zero cloud dependencies. Use it with any LLM provider.\r\n\r\n**Q: Can I use this with TypeScript/JavaScript?**\r\nA: Yes. Run the FastAPI service and call via HTTP. Native TS SDK planned for v2.0.0.\r\n\r\n**Q: How do I choose beam width K?**\r\nA: Start with K=3-5. Use budget formula: `K \u2248 (budget/tokens_per_branch)^(1/depth)`. See [tuning guide](./chatroutes_autobranch_v1.0.md#19-tuning-guide-choosing-beam-width-k).\r\n\r\n**Q: What if all candidates get pruned by novelty?**\r\nA: Lower threshold (e.g., 0.75) or switch to MMR. See [troubleshooting](./chatroutes_autobranch_v1.0.md#20-common-failure-patterns).\r\n\r\n**Q: Is this deterministic?**\r\nA: Yes, with fixed random seeds and deterministic tie-breaking. See [tests](./chatroutes_autobranch_v1.0.md#7-tests-pytest).\r\n\r\n---\r\n\r\n## License\r\n\r\nMIT License - see [LICENSE](./LICENSE) file for details.\r\n\r\n---\r\n\r\n## Acknowledgements\r\n\r\nInspired by research in beam search, diverse selection (MMR, DPP), and LLM orchestration patterns. Built to be practical, swappable, and friendly for contributors.\r\n\r\nSpecial thanks to the open-source community for tools and inspiration: LangChain, LlamaIndex, HuggingFace Transformers, FAISS, and the broader LLM ecosystem.\r\n\r\n---\r\n\r\n## Links\r\n\r\n- **Documentation:** [Full Specification](./chatroutes_autobranch_v1.0.md)\r\n- **Issues:** [GitHub Issues](https://github.com/chatroutes/chatroutes-autobranch/issues)\r\n- **Discussions:** [GitHub Discussions](https://github.com/chatroutes/chatroutes-autobranch/discussions)\r\n- **Changelog:** [CHANGELOG.md](./CHANGELOG.md)\r\n- **PyPI:** [pypi.org/project/chatroutes-autobranch](https://pypi.org/project/chatroutes-autobranch)\r\n\r\n---\r\n\r\n**Built with \u2764\ufe0f by the ChatRoutes team. Open to the community.**\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Intelligent branch exploration for LLM-powered applications",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/chatroutes/chatroutes-autobranch/blob/main/chatroutes_autobranch_v1.0.md",
"Homepage": "https://github.com/chatroutes/chatroutes-autobranch",
"Issues": "https://github.com/chatroutes/chatroutes-autobranch/issues",
"Repository": "https://github.com/chatroutes/chatroutes-autobranch"
},
"split_keywords": [
"llm",
" beam-search",
" tree-of-thought",
" branching",
" ai",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0dcbea22494361adf9c92a1f765eb951d7a30e533907bf39aedc3f59d91dcffb",
"md5": "793dc6b2d4bc2a9a34fa1e59241fea4b",
"sha256": "98fff865ecbe105fd5424800db051ddef66a8f68eea2aba8e9212f920ee68b15"
},
"downloads": -1,
"filename": "chatroutes_autobranch-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "793dc6b2d4bc2a9a34fa1e59241fea4b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 33481,
"upload_time": "2025-10-22T23:44:34",
"upload_time_iso_8601": "2025-10-22T23:44:34.689376Z",
"url": "https://files.pythonhosted.org/packages/0d/cb/ea22494361adf9c92a1f765eb951d7a30e533907bf39aedc3f59d91dcffb/chatroutes_autobranch-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6504190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c",
"md5": "756f3f8b26e3bd4249ccf7bf82d86cf9",
"sha256": "ac807d4a4729d3f4ac368a18c5a98d624695a42ef7559e3c59f947c70403b0ba"
},
"downloads": -1,
"filename": "chatroutes_autobranch-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "756f3f8b26e3bd4249ccf7bf82d86cf9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 40309,
"upload_time": "2025-10-22T23:44:35",
"upload_time_iso_8601": "2025-10-22T23:44:35.912518Z",
"url": "https://files.pythonhosted.org/packages/65/04/190f5720650e15525c87140eea5020647276735aeae6ae6dace38fe9045c/chatroutes_autobranch-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-22 23:44:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "chatroutes",
"github_project": "chatroutes-autobranch",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "chatroutes-autobranch"
}