# **FlockParse - Document RAG Intelligence with Distributed Processing**
[](https://pypi.org/project/flockparser/)
[](https://pypi.org/project/flockparser/)
[](https://github.com/BenevolentJoker-JohnL/FlockParser/actions)
[](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/psf/black)
[](https://github.com/BenevolentJoker-JohnL/FlockParser)
> **Distributed document RAG system that turns mismatched hardware into a coordinated inference cluster.** Auto-discovers Ollama nodes, intelligently routes workloads across heterogeneous GPUs/CPUs, and achieves 60x+ speedups through adaptive load balancing. Privacy-first with local/network/cloud interfaces.
**What makes this different:** Real distributed systems engineering—not just API wrappers. Handles heterogeneous hardware (RTX A4000 + GTX 1050Ti + CPU laptops working together), network failures, and privacy requirements that rule out cloud APIs.
---
## ⚠️ Important: Current Maturity
**Status:** Beta (v1.0.0) - **Early adopters welcome, but read this first!**
**What works well:**
- ✅ Core distributed processing across heterogeneous nodes
- ✅ GPU detection and VRAM-aware routing
- ✅ Basic PDF extraction and OCR fallback
- ✅ Privacy-first local processing (CLI/Web UI modes)
**Known limitations:**
- ⚠️ **Limited battle testing** - Tested by ~2 developers, not yet proven at scale
- ⚠️ **Security gaps** - See [SECURITY.md](SECURITY.md) for current limitations
- ⚠️ **Edge cases** - Some PDF types may fail (encrypted, complex layouts)
- ⚠️ **Test coverage** - ~40% coverage, integration tests incomplete
**Read before using:** [KNOWN_ISSUES.md](KNOWN_ISSUES.md) documents all limitations, edge cases, and roadmap honestly.
**Recommended for:**
- 🎓 Learning distributed systems
- 🔬 Research and experimentation
- 🏠 Personal projects with non-critical data
- 🛠️ Contributors who want to help mature the project
**Not yet recommended for:**
- ❌ Mission-critical production workloads
- ❌ Regulated industries (healthcare, finance) without additional hardening
- ❌ Large-scale deployments (>50 concurrent users)
**Help us improve:** Report issues, contribute fixes, share feedback!
---
## **🏛️ Origins & Legacy**
FlockParser's distributed inference architecture originated from **[FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)**, which pioneered:
- **Auto-discovery** of Ollama nodes across heterogeneous hardware
- **Adaptive load balancing** with GPU/CPU awareness
- **VRAM-aware routing** and automatic failover mechanisms
This core distributed logic from FlockParser-legacy was later extracted and generalized to become **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - a standalone distributed inference platform that now powers both FlockParser and **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)**.
### **📊 Quick Performance Reference**
| Workload | Hardware | Time | Speedup | Notes |
|----------|----------|------|---------|-------|
| **5 AI papers (~350 pages)** | 1× RTX A4000 (16GB) | 21.3s | **17.5×** | [Real arXiv showcase](#-showcase-real-world-example) |
| **12-page PDF (demo video)** | 1× RTX A4000 (16GB) | 6.0s | **61.7×** | GPU-aware routing |
| **100 PDFs (2000 pages)** | 3-node cluster (mixed) | 3.2 min | **13.2×** | See [BENCHMARKS.md](BENCHMARKS.md) |
| **Embedding generation** | RTX A4000 vs i9 CPU | 8.2s vs 178s | **21.7×** | 10K chunks |
**🎯 Try it yourself:** `pip install flockparser && python showcase/process_arxiv_papers.py`
---
## **🔒 Privacy Model**
| Interface | Privacy Level | External Calls | Best For |
|-----------|---------------|----------------|----------|
| **CLI** (`flockparsecli.py`) | 🟢 **100% Local** | None | Personal use, air-gapped systems |
| **Web UI** (`flock_webui.py`) | 🟢 **100% Local** | None | GUI users, visual monitoring |
| **REST API** (`flock_ai_api.py`) | 🟡 **Local Network** | None | Multi-user, app integration |
| **MCP Server** (`flock_mcp_server.py`) | 🔴 **Cloud** | ⚠️ Claude Desktop (Anthropic) | AI assistant integration |
**⚠️ MCP Privacy Warning:** The MCP server integrates with Claude Desktop, which sends queries and document snippets to Anthropic's cloud API. Use CLI/Web UI for 100% offline processing.
---
## **Table of Contents**
- [Key Features](#-key-features)
- [👥 Who Uses This?](#-who-uses-this) - **Target users & scenarios**
- [📐 How It Works (5-Second Overview)](#-how-it-works-5-second-overview) - **Visual for non-technical evaluators**
- [Architecture](#-architecture) | **[📖 Deep Dive: Architecture & Design Decisions](docs/architecture.md)**
- [Quickstart](#-quickstart-3-steps)
- [Performance & Benchmarks](#-performance)
- [🎓 Showcase: Real-World Example](#-showcase-real-world-example) ⭐ **Try it yourself**
- [Usage Examples](#-usage)
- [Security & Production](#-security--production-notes)
- [🔗 Integration with SynapticLlamas & SOLLOL](#-integration-with-synapticllamas--sollol) - **Complete AI Ecosystem** ⭐
- [Troubleshooting](#-troubleshooting-guide)
- [Contributing](#-contributing)
## **⚡ Key Features**
- **🌐 Intelligent Load Balancing** - Auto-discovers Ollama nodes, detects GPU vs CPU, monitors VRAM, and routes work adaptively (10x speedup on heterogeneous clusters)
- **🔌 Multi-Protocol Support** - CLI (100% local), REST API (network), MCP (Claude Desktop), Web UI (Streamlit) - choose your privacy level
- **🎯 Adaptive Routing** - Sequential vs parallel decisions based on cluster characteristics (prevents slow nodes from bottlenecking)
- **📊 Production Observability** - Real-time health scores, performance tracking, VRAM monitoring, automatic failover
- **🔒 Privacy-First Architecture** - No external API calls required (CLI mode), all processing on-premise
- **📄 Complete Pipeline** - PDF extraction → OCR fallback → Multi-format conversion → Vector embeddings → RAG with source citations
---
## **👥 Who Uses This?**
FlockParser is designed for engineers and researchers who need **private, on-premise document intelligence** with **real distributed systems capabilities**.
### **Ideal Users**
| User Type | Use Case | Why FlockParser? |
|-----------|----------|------------------|
| **🔬 ML/AI Engineers** | Process research papers, build knowledge bases, experiment with RAG systems | GPU-aware routing, 21× faster embeddings, full pipeline control |
| **📊 Data Scientists** | Extract insights from large document corpora (100s-1000s of PDFs) | Distributed processing, semantic search, production observability |
| **🏢 Enterprise Engineers** | On-premise document search for regulated industries (healthcare, legal, finance) | 100% local processing, no cloud APIs, privacy-first architecture |
| **🎓 Researchers** | Build custom RAG systems, experiment with distributed inference patterns | Full source access, extensible architecture, real benchmarks |
| **🛠️ DevOps/Platform Engineers** | Set up document intelligence infrastructure for teams | Multi-node setup, health monitoring, automatic failover |
| **👨💻 Students/Learners** | Understand distributed systems, GPU orchestration, RAG architectures | Real working example, comprehensive docs, honest limitations |
### **Real-World Scenarios**
✅ **"I have 500 research papers and a spare GPU machine"** → Process your corpus 20× faster with distributed nodes
✅ **"I can't send medical records to OpenAI"** → 100% local processing (CLI/Web UI modes)
✅ **"I want to experiment with RAG without cloud costs"** → Full pipeline, runs on your hardware
✅ **"I need to search 10,000 internal documents"** → ChromaDB vector search with sub-20ms latency
✅ **"I have mismatched hardware (old laptop + gaming PC)"** → Adaptive routing handles heterogeneous clusters
### **Not Ideal For**
❌ **Production SaaS with 1000+ concurrent users** → Current SQLite backend limits concurrency (~50 users)
❌ **Mission-critical systems requiring 99.9% uptime** → Still in Beta, see [KNOWN_ISSUES.md](KNOWN_ISSUES.md)
❌ **Simple one-time PDF extraction** → Overkill; use `pdfplumber` directly
❌ **Cloud-first deployments** → Designed for on-premise/hybrid; cloud works but misses GPU routing benefits
**Bottom line:** If you're building document intelligence infrastructure on your own hardware and need distributed processing with privacy guarantees, FlockParser is for you.
---
## **📐 How It Works (5-Second Overview)**
**For recruiters and non-technical evaluators:**
```
┌─────────────────────────────────────────────────────────────────┐
│ INPUT │
│ 📄 Your Documents (PDFs, research papers, internal docs) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FLOCKPARSER │
│ │
│ 1. Extracts text from PDFs (handles scans with OCR) │
│ 2. Splits into chunks, creates vector embeddings │
│ 3. Distributes work across GPU/CPU nodes (auto-discovery) │
│ 4. Stores in searchable vector database (ChromaDB) │
│ │
│ ⚡ Distributed Processing: 3 nodes → 13× faster │
│ 🚀 GPU Acceleration: RTX A4000 → 61× faster than CPU │
│ 🔒 Privacy: 100% local (no cloud APIs) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT │
│ 🔍 Semantic Search: "Find all mentions of transformers" │
│ 💬 AI Chat: "Summarize the methodology section" │
│ 📊 Source Citations: Exact page/document references │
│ 🌐 4 Interfaces: CLI, Web UI, REST API, Claude Desktop │
└─────────────────────────────────────────────────────────────────┘
```
**Key Innovation:** Auto-detects GPU nodes, measures performance, and routes work to fastest hardware. No manual configuration needed.
---
## **🏗️ Architecture**
```
┌─────────────────────────────────────────────────────────────┐
│ Interfaces (Choose Your Privacy Level) │
│ CLI (Local) | REST API (Network) | MCP (Claude) | Web UI │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ FlockParse Core Engine │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PDF │ │ Semantic │ │ RAG │ │
│ │ Processing │→ │ Search │→ │ Engine │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ ChromaDB Vector Store (Persistent) │ │
│ └───────────────────────────────────────────────────┘ │
└──────────────────────┬──────────────────────────────────────┘
│ Intelligent Load Balancer
│ • Health scoring (GPU/VRAM detection)
│ • Adaptive routing (sequential vs parallel)
│ • Automatic failover & caching
▼
┌──────────────────────────────────────────────┐
│ Distributed Ollama Cluster │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │
│ │ GPU A │ │ GPU B │ │ CPU │ │
│ │16GB VRAM │ │ 8GB VRAM │ │ 16GB RAM │ │
│ │Health:367│ │Health:210│ │Health:50 │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└──────────────────────────────────────────────┘
▲ Auto-discovery | Performance tracking
```
**Want to understand how this works?** Read the **[📖 Architecture Deep Dive](docs/architecture.md)** for detailed explanations of:
- Why distributed AI inference solves real-world problems
- How adaptive routing decisions are made (sequential vs parallel)
- MCP integration details and privacy implications
- Technical trade-offs and design decisions
## **🚀 Quickstart (3 Steps)**
**Requirements:**
- Python 3.10 or later
- Ollama 0.1.20+ (install from [ollama.com](https://ollama.com))
- 4GB+ RAM (8GB+ recommended for GPU nodes)
```bash
# 1. Install FlockParser
pip install flockparser
# 2. Start Ollama and pull models
ollama serve # In a separate terminal
ollama pull mxbai-embed-large # Required for embeddings
ollama pull llama3.1:latest # Required for chat
# 3. Run your preferred interface
flockparse-webui # Web UI - easiest (recommended) ⭐
flockparse # CLI - 100% local
flockparse-api # REST API - multi-user
flockparse-mcp # MCP - Claude Desktop integration
```
**💡 Pro tip:** Start with the Web UI to see distributed processing with real-time VRAM monitoring and node health dashboards.
---
### Alternative: Install from Source
If you want to contribute or modify the code:
```bash
git clone https://github.com/BenevolentJoker-JohnL/FlockParser.git
cd FlockParser
pip install -e . # Editable install
```
### **Quick Test (30 seconds)**
```bash
# Start the CLI
python flockparsecli.py
# Process the sample PDF
> open_pdf testpdfs/sample.pdf
# Chat with it
> chat
🙋 You: Summarize this document
```
**First time?** Start with the Web UI (`streamlit run flock_webui.py`) - it's the easiest way to see distributed processing in action with a visual dashboard.
---
## **🐳 Docker Deployment (One Command)**
### **Quick Start with Docker Compose**
```bash
# Clone and deploy everything
git clone https://github.com/BenevolentJoker-JohnL/FlockParser.git
cd FlockParser
docker-compose up -d
# Access services
# Web UI: http://localhost:8501
# REST API: http://localhost:8000
# Ollama: http://localhost:11434
```
### **What Gets Deployed**
| Service | Port | Description |
|---------|------|-------------|
| **Web UI** | 8501 | Streamlit interface with visual monitoring |
| **REST API** | 8000 | FastAPI with authentication |
| **CLI** | - | Interactive terminal (docker-compose run cli) |
| **Ollama** | 11434 | Local LLM inference engine |
### **Production Features**
✅ **Multi-stage build** - Optimized image size
✅ **Non-root user** - Security hardened
✅ **Health checks** - Auto-restart on failure
✅ **Volume persistence** - Data survives restarts
✅ **GPU support** - Uncomment deploy section for NVIDIA GPUs
### **Custom Configuration**
```bash
# Set API key
export FLOCKPARSE_API_KEY="your-secret-key"
# Set log level
export LOG_LEVEL="DEBUG"
# Deploy with custom config
docker-compose up -d
```
### **GPU Support (NVIDIA)**
Uncomment the GPU section in `docker-compose.yml`:
```yaml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
Then run: `docker-compose up -d`
### **CI/CD Pipeline**
```mermaid
graph LR
A[📝 Git Push] --> B[🔍 Lint & Format]
B --> C[🧪 Test Suite]
B --> D[🔒 Security Scan]
C --> E[🐳 Build Multi-Arch]
D --> E
E --> F[📦 Push to GHCR]
F --> G[🚀 Deploy]
style A fill:#4CAF50
style B fill:#2196F3
style C fill:#2196F3
style D fill:#FF9800
style E fill:#9C27B0
style F fill:#9C27B0
style G fill:#F44336
```
**Automated on every push to `main`:**
| Stage | Tools | Purpose |
|-------|-------|---------|
| **Code Quality** | black, flake8, mypy | Enforce formatting & typing standards |
| **Testing** | pytest (Python 3.10/3.11/3.12) | 78% coverage across versions |
| **Security** | Trivy | Vulnerability scanning & SARIF reports |
| **Build** | Docker Buildx | Multi-architecture (amd64, arm64) |
| **Registry** | GitHub Container Registry | Versioned image storage |
| **Deploy** | On release events | Automated production deployment |
**Pull the latest image:**
```bash
docker pull ghcr.io/benevolentjoker-johnl/flockparser:latest
```
**View pipeline runs:** https://github.com/BenevolentJoker-JohnL/FlockParser/actions
---
## **🌐 Setting Up Distributed Nodes**
**Want the 60x speedup?** Set up multiple Ollama nodes across your network.
### Quick Multi-Node Setup
**On each additional machine:**
```bash
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Configure for network access
export OLLAMA_HOST=0.0.0.0:11434
ollama serve
# 3. Pull models
ollama pull mxbai-embed-large
ollama pull llama3.1:latest
# 4. Allow firewall (if needed)
sudo ufw allow 11434/tcp # Linux
```
**FlockParser will automatically discover these nodes!**
Check with:
```bash
python flockparsecli.py
> lb_stats # Shows all discovered nodes and their capabilities
```
**📖 Complete Guide:** See **[DISTRIBUTED_SETUP.md](DISTRIBUTED_SETUP.md)** for:
- Step-by-step multi-machine setup
- Network configuration and firewall rules
- Troubleshooting node discovery
- Example setups (budget home lab to professional clusters)
- GPU router configuration for automatic optimization
---
### **🔒 Privacy Levels by Interface:**
- **Web UI (`flock_webui.py`)**: 🟢 100% local, runs in your browser
- **CLI (`flockparsecli.py`)**: 🟢 100% local, zero external calls
- **REST API (`flock_ai_api.py`)**: 🟡 Local network only
- **MCP Server (`flock_mcp_server.py`)**: 🔴 Integrates with Claude Desktop (Anthropic cloud service)
**Choose the interface that matches your privacy requirements!**
## **🏆 Why FlockParse? Comparison to Competitors**
| Feature | **FlockParse** | LangChain | LlamaIndex | Haystack |
|---------|---------------|-----------|------------|----------|
| **100% Local/Offline** | ✅ Yes (CLI/JSON) | ⚠️ Partial | ⚠️ Partial | ⚠️ Partial |
| **Zero External API Calls** | ✅ Yes (CLI/JSON) | ❌ No | ❌ No | ❌ No |
| **Built-in GPU Load Balancing** | ✅ Yes (auto) | ❌ No | ❌ No | ❌ No |
| **VRAM Monitoring** | ✅ Yes (dynamic) | ❌ No | ❌ No | ❌ No |
| **Multi-Node Auto-Discovery** | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **CPU Fallback Detection** | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **Document Format Export** | ✅ 4 formats | ❌ Limited | ❌ Limited | ⚠️ Basic |
| **Setup Complexity** | 🟢 Simple | 🔴 Complex | 🔴 Complex | 🟡 Medium |
| **Dependencies** | 🟢 Minimal | 🔴 Heavy | 🔴 Heavy | 🟡 Medium |
| **Learning Curve** | 🟢 Low | 🔴 Steep | 🔴 Steep | 🟡 Medium |
| **Privacy Control** | 🟢 High (CLI/JSON) | 🔴 Limited | 🔴 Limited | 🟡 Medium |
| **Out-of-Box Functionality** | ✅ Complete | ⚠️ Requires config | ⚠️ Requires config | ⚠️ Requires config |
| **MCP Integration** | ✅ Native | ❌ No | ❌ No | ❌ No |
| **Embedding Cache** | ✅ MD5-based | ⚠️ Basic | ⚠️ Basic | ⚠️ Basic |
| **Batch Processing** | ✅ Parallel | ⚠️ Sequential | ⚠️ Sequential | ⚠️ Basic |
| **Performance** | 🚀 60x+ faster with GPU auto-routing | ⚠️ Varies by config | ⚠️ Varies by config | ⚠️ Varies by config |
| **Cost** | 💰 Free | 💰💰 Free + Paid | 💰💰 Free + Paid | 💰💰 Free + Paid |
### **Key Differentiators:**
1. **Privacy by Design**: CLI and JSON interfaces are 100% local with zero external calls (MCP interface uses Claude Desktop for chat)
2. **Intelligent GPU Management**: Automatically finds, tests, and prioritizes GPU nodes
3. **Production-Ready**: Works immediately with sensible defaults
4. **Resource-Aware**: Detects VRAM exhaustion and prevents performance degradation
5. **Complete Solution**: CLI, REST API, MCP, and batch interfaces - choose your privacy level
## **📊 Performance**
### **Real-World Benchmark Results**
| Processing Mode | Time | Speedup | What It Shows |
|----------------|------|---------|---------------|
| Single CPU node | 372.76s (~6 min) | 1x baseline | Sequential CPU processing |
| Parallel (multi-node) | 159.79s (~2.5 min) | **2.3x faster** | Distributed across cluster |
| GPU node routing | 6.04s (~6 sec) | **61.7x faster** | Automatic GPU detection & routing |
**Why the Massive Speedup?**
- GPU processes embeddings in milliseconds vs seconds on CPU
- Adaptive routing detected GPU was 60x+ faster and sent all work there
- Avoided bottleneck of waiting for slower CPU nodes to finish
- No network overhead (local cluster, no cloud APIs)
**Key Insight:** The system **automatically** detects performance differences and makes routing decisions - no manual GPU configuration needed.
**Hardware (Benchmark Cluster):**
- **Node 1 (10.9.66.90):** Intel i9-12900K, 32GB DDR5-6000, 6TB NVMe Gen4, RTX A4000 16GB - primary GPU node
- **Node 2 (10.9.66.159):** AMD Ryzen 7 5700X, 32GB DDR4-3600, GTX 1050Ti (CPU-mode fallback)
- **Node 3:** Intel i7-12th gen (laptop), 16GB DDR5, CPU-only
- **Software:** Python 3.10, Ollama, Ubuntu 22.04
**Reproducibility:**
- Full source code available in this repo
- Test with your own hardware - results will vary based on GPU
The project offers four main interfaces:
1. **flock_webui.py** - 🎨 Beautiful Streamlit web interface (NEW!)
2. **flockparsecli.py** - Command-line interface for personal document processing
3. **flock_ai_api.py** - REST API server for multi-user or application integration
4. **flock_mcp_server.py** - Model Context Protocol server for AI assistants like Claude Desktop
---
## **🎓 Showcase: Real-World Example**
**Processing influential AI research papers from arXiv.org**
Want to see FlockParser in action on real documents? Run the included showcase:
```bash
pip install flockparser
python showcase/process_arxiv_papers.py
```
### **What It Does**
Downloads and processes 5 seminal AI research papers:
- **Attention Is All You Need** (Transformers) - arXiv:1706.03762
- **BERT** - Pre-training Deep Bidirectional Transformers - arXiv:1810.04805
- **RAG** - Retrieval-Augmented Generation for NLP - arXiv:2005.11401
- **GPT-3** - Language Models are Few-Shot Learners - arXiv:2005.14165
- **Llama 2** - Open Foundation Language Models - arXiv:2307.09288
**Total: ~350 pages, ~25 MB of PDFs**
### **Expected Results**
| Configuration | Processing Time | Speedup |
|---------------|----------------|---------|
| **Single CPU node** | ~90s | 1.0× baseline |
| **Multi-node (1 GPU + 2 CPU)** | ~30s | 3.0× |
| **Single GPU node (RTX A4000)** | ~21s | **4.3×** |
### **What You Get**
After processing, the script demonstrates:
1. **Semantic Search** across all papers:
```python
# Example queries that work immediately:
"What is the transformer architecture?"
"How does retrieval-augmented generation work?"
"What are the benefits of attention mechanisms?"
```
2. **Performance Metrics** (`showcase/results.json`):
```json
{
"total_time": 21.3,
"papers": [
{
"title": "Attention Is All You Need",
"processing_time": 4.2,
"status": "success"
}
],
"node_info": [...]
}
```
3. **Human-Readable Summary** (`showcase/RESULTS.md`) with:
- Per-paper processing times
- Hardware configuration used
- Fastest/slowest/average performance
- Replication instructions
### **Why This Matters**
This isn't a toy demo - it's processing actual research papers that engineers read daily. It demonstrates:
✅ **Real document processing** - Complex PDFs with equations, figures, multi-column layouts
✅ **Production-grade pipeline** - PDF extraction → embeddings → vector storage → semantic search
✅ **Actual performance gains** - Measurable speedups on heterogeneous hardware
✅ **Reproducible results** - Run it yourself with `pip install`, compare your hardware
**Perfect for portfolio demonstrations:** Show this to hiring managers as proof of real distributed systems work.
---
## **🔧 Installation**
### **1. Clone the Repository**
```bash
git clone https://github.com/yourusername/flockparse.git
cd flockparse
```
### **2. Install System Dependencies (Required for OCR)**
**⚠️ IMPORTANT: Install these BEFORE pip install, as pytesseract and pdf2image require system packages**
#### For Better PDF Text Extraction:
- **Linux**:
```bash
sudo apt-get update
sudo apt-get install poppler-utils
```
- **macOS**:
```bash
brew install poppler
```
- **Windows**: Download from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)
#### For OCR Support (Scanned Documents):
FlockParse automatically detects scanned PDFs and uses OCR!
- **Linux (Ubuntu/Debian)**:
```bash
sudo apt-get update
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils
```
- **Linux (Fedora/RHEL)**:
```bash
sudo dnf install tesseract poppler-utils
```
- **macOS**:
```bash
brew install tesseract poppler
```
- **Windows**:
1. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) - Download the installer
2. Install [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)
3. Add both to your system PATH
**Verify installation:**
```bash
tesseract --version
pdftotext -v
```
### **3. Install Python Dependencies**
```bash
pip install -r requirements.txt
```
**Key Python dependencies** (installed automatically):
- fastapi, uvicorn - Web server
- pdfplumber, PyPDF2, pypdf - PDF processing
- **pytesseract** - Python wrapper for Tesseract OCR (requires system Tesseract)
- **pdf2image** - PDF to image conversion (requires system Poppler)
- Pillow - Image processing for OCR
- chromadb - Vector database
- python-docx - DOCX generation
- ollama - AI model integration
- numpy - Numerical operations
- markdown - Markdown generation
**How OCR fallback works:**
1. Tries PyPDF2 text extraction
2. Falls back to pdftotext if no text
3. **Falls back to OCR** if still no text (<100 chars) - **Requires Tesseract + Poppler**
4. Automatically processes scanned documents without manual intervention
### **4. Install and Configure Ollama**
1. Install Ollama from [ollama.com](https://ollama.com)
2. Start the Ollama service:
```bash
ollama serve
```
3. Pull the required models:
```bash
ollama pull mxbai-embed-large
ollama pull llama3.1:latest
```
## **📜 Usage**
### **🎨 Web UI (flock_webui.py) - Easiest Way to Get Started!**
Launch the beautiful Streamlit web interface:
```bash
streamlit run flock_webui.py
```
The web UI will open in your browser at `http://localhost:8501`
**Features:**
- 📤 **Upload & Process**: Drag-and-drop PDF files for processing
- 💬 **Chat Interface**: Interactive chat with your documents
- 📊 **Load Balancer Dashboard**: Real-time monitoring of GPU nodes
- 🔍 **Semantic Search**: Search across all documents
- 🌐 **Node Management**: Add/remove Ollama nodes, auto-discovery
- 🎯 **Routing Control**: Switch between routing strategies
**Perfect for:**
- Users who prefer graphical interfaces
- Quick document processing and exploration
- Monitoring distributed processing
- Managing multiple Ollama nodes visually
---
### **CLI Interface (flockparsecli.py)**
Run the script:
```bash
python flockparsecli.py
```
Available commands:
```
📖 open_pdf <file> → Process a single PDF file
📂 open_dir <dir> → Process all PDFs in a directory
💬 chat → Chat with processed PDFs
📊 list_docs → List all processed documents
🔍 check_deps → Check for required dependencies
🌐 discover_nodes → Auto-discover Ollama nodes on local network
➕ add_node <url> → Manually add an Ollama node
➖ remove_node <url> → Remove an Ollama node from the pool
📋 list_nodes → List all configured Ollama nodes
⚖️ lb_stats → Show load balancer statistics
❌ exit → Quit the program
```
### **Web Server API (flock_ai_api.py)**
Start the API server:
```bash
# Set your API key (or use default for testing)
export FLOCKPARSE_API_KEY="your-secret-key-here"
# Start server
python flock_ai_api.py
```
The server will run on `http://0.0.0.0:8000` by default.
#### **🔒 Authentication (NEW!)**
All endpoints except `/` require an API key in the `X-API-Key` header:
```bash
# Default API key (change in production!)
X-API-Key: your-secret-api-key-change-this
# Or set via environment variable
export FLOCKPARSE_API_KEY="my-super-secret-key"
```
#### **Available Endpoints:**
| Endpoint | Method | Auth Required | Description |
|----------|--------|---------------|-------------|
| `/` | GET | ❌ No | API status and version info |
| `/upload/` | POST | ✅ Yes | Upload and process a PDF file |
| `/summarize/{file_name}` | GET | ✅ Yes | Get an AI-generated summary |
| `/search/?query=...` | GET | ✅ Yes | Search for relevant documents |
#### **Example API Usage:**
**Check API status (no auth required):**
```bash
curl http://localhost:8000/
```
**Upload a document (with authentication):**
```bash
curl -X POST \
-H "X-API-Key: your-secret-api-key-change-this" \
-F "file=@your_document.pdf" \
http://localhost:8000/upload/
```
**Get a document summary:**
```bash
curl -H "X-API-Key: your-secret-api-key-change-this" \
http://localhost:8000/summarize/your_document.pdf
```
**Search across documents:**
```bash
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8000/search/?query=your%20search%20query"
```
**⚠️ Production Security:**
- Always change the default API key
- Use environment variables, never hardcode keys
- Use HTTPS in production (nginx/apache reverse proxy)
- Consider rate limiting for public deployments
### **MCP Server (flock_mcp_server.py)**
The MCP server allows FlockParse to be used as a tool by AI assistants like Claude Desktop.
#### **Setting up with Claude Desktop**
1. **Start the MCP server:**
```bash
python flock_mcp_server.py
```
2. **Configure Claude Desktop:**
Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
```json
{
"mcpServers": {
"flockparse": {
"command": "python",
"args": ["/absolute/path/to/FlockParser/flock_mcp_server.py"]
}
}
}
```
3. **Restart Claude Desktop** and you'll see FlockParse tools available!
#### **Available MCP Tools:**
- `process_pdf` - Process and add PDFs to the knowledge base
- `query_documents` - Search documents using semantic search
- `chat_with_documents` - Ask questions about your documents
- `list_documents` - List all processed documents
- `get_load_balancer_stats` - View node performance metrics
- `discover_ollama_nodes` - Auto-discover Ollama nodes
- `add_ollama_node` - Add an Ollama node manually
- `remove_ollama_node` - Remove an Ollama node
#### **Example MCP Usage:**
In Claude Desktop, you can now ask:
- "Process the PDF at /path/to/document.pdf"
- "What documents do I have in my knowledge base?"
- "Search my documents for information about quantum computing"
- "What does my research say about black holes?"
## **💡 Practical Use Cases**
### **Knowledge Management**
- Create searchable archives of research papers, legal documents, and technical manuals
- Generate summaries of lengthy documents for quick review
- Chat with your document collection to find specific information without manual searching
### **Legal & Compliance**
- Process contract repositories for semantic search capabilities
- Extract key terms and clauses from legal documents
- Analyze regulatory documents for compliance requirements
### **Research & Academia**
- Process and convert academic papers for easier reference
- Create a personal research assistant that can reference your document library
- Generate summaries of complex research for presentations or reviews
### **Business Intelligence**
- Convert business reports into searchable formats
- Extract insights from PDF-based market research
- Make proprietary documents more accessible throughout an organization
## **🌐 Distributed Processing with Load Balancer**
FlockParse includes a sophisticated load balancer that can distribute embedding generation across multiple Ollama instances on your local network.
### **Setting Up Distributed Processing**
#### **Option 1: Auto-Discovery (Easiest)**
```bash
# Start FlockParse
python flockparsecli.py
# Auto-discover Ollama nodes on your network
⚡ Enter command: discover_nodes
```
The system will automatically scan your local network (/24 subnet) and detect any running Ollama instances.
#### **Option 2: Manual Node Management**
```bash
# Add a specific node
⚡ Enter command: add_node http://192.168.1.100:11434
# List all configured nodes
⚡ Enter command: list_nodes
# Remove a node
⚡ Enter command: remove_node http://192.168.1.100:11434
# View load balancer statistics
⚡ Enter command: lb_stats
```
### **Benefits of Distributed Processing**
- **Speed**: Process documents 2-10x faster with multiple nodes
- **GPU Awareness**: Automatically detects and prioritizes GPU nodes over CPU nodes
- **VRAM Monitoring**: Detects when GPU nodes fall back to CPU due to insufficient VRAM
- **Fault Tolerance**: Automatic failover if a node becomes unavailable
- **Load Distribution**: Smart routing based on node performance, GPU availability, and VRAM capacity
- **Easy Scaling**: Just add more machines with Ollama installed
### **Setting Up Additional Ollama Nodes**
On each additional machine:
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the embedding model
ollama pull mxbai-embed-large
# Start Ollama (accessible from network)
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
Then use `discover_nodes` or `add_node` to add them to FlockParse.
### **GPU and VRAM Optimization**
FlockParse automatically detects GPU availability and VRAM usage using Ollama's `/api/ps` endpoint:
- **🚀 GPU nodes** with models loaded in VRAM get +200 health score bonus
- **⚠️ VRAM-limited nodes** that fall back to CPU get only +50 bonus
- **🐢 CPU-only nodes** get -50 penalty
**To ensure your GPU is being used:**
1. **Check GPU detection**: Run `lb_stats` command to see node status
2. **Preload model into GPU**: Run a small inference to load model into VRAM
```bash
ollama run mxbai-embed-large "test"
```
3. **Verify VRAM usage**: Check that `size_vram > 0` in `/api/ps`:
```bash
curl http://localhost:11434/api/ps
```
4. **Increase VRAM allocation**: If model won't load into VRAM, free up GPU memory or use a smaller model
**Dynamic VRAM monitoring**: FlockParse continuously monitors embedding performance and automatically detects when a GPU node falls back to CPU due to VRAM exhaustion during heavy load.
## **🔄 Example Workflows**
### **CLI Workflow: Research Paper Processing**
1. **Check Dependencies**:
```
⚡ Enter command: check_deps
```
2. **Process a Directory of Research Papers**:
```
⚡ Enter command: open_dir ~/research_papers
```
3. **Chat with Your Research Collection**:
```
⚡ Enter command: chat
🙋 You: What are the key methods used in the Smith 2023 paper?
```
### **API Workflow: Document Processing Service**
1. **Start the API Server**:
```bash
python flock_ai_api.py
```
2. **Upload Documents via API**:
```bash
curl -X POST -F "file=@quarterly_report.pdf" http://localhost:8000/upload/
```
3. **Generate a Summary**:
```bash
curl http://localhost:8000/summarize/quarterly_report.pdf
```
4. **Search Across Documents**:
```bash
curl http://localhost:8000/search/?query=revenue%20growth%20Q3
```
## **🔧 Troubleshooting Guide**
### **Ollama Connection Issues**
**Problem**: Error messages about Ollama not being available or connection failures.
**Solution**:
1. Verify Ollama is running: `ps aux | grep ollama`
2. Restart the Ollama service:
```bash
killall ollama
ollama serve
```
3. Check that you've pulled the required models:
```bash
ollama list
```
4. If models are missing:
```bash
ollama pull mxbai-embed-large
ollama pull llama3.1:latest
```
### **PDF Text Extraction Failures**
**Problem**: No text extracted from certain PDFs.
**Solution**:
1. Check if the PDF is scanned/image-based:
- Install OCR tools: `sudo apt-get install tesseract-ocr` (Linux)
- For better scanned PDF handling: `pip install ocrmypdf`
- Process with OCR: `ocrmypdf input.pdf output.pdf`
2. If the PDF has unusual fonts or formatting:
- Install poppler-utils for better extraction
- Try using the `-layout` option with pdftotext manually:
```bash
pdftotext -layout problem_document.pdf output.txt
```
### **Memory Issues with Large Documents**
**Problem**: Application crashes with large PDFs or many documents.
**Solution**:
1. Process one document at a time for very large PDFs
2. Reduce the chunk size in the code (default is 512 characters)
3. Increase your system's available memory or use a swap file
4. For server deployments, consider using a machine with more RAM
### **API Server Not Starting**
**Problem**: Error when trying to start the API server.
**Solution**:
1. Check for port conflicts: `lsof -i :8000`
2. If another process is using port 8000, kill it or change the port
3. Verify FastAPI is installed: `pip install fastapi uvicorn`
4. Check for Python version compatibility (requires Python 3.7+)
---
## **🔐 Security & Production Notes**
### **REST API Security**
**⚠️ The default API key is NOT secure - change it immediately!**
```bash
# Set a strong API key via environment variable
export FLOCKPARSE_API_KEY="your-super-secret-key-change-this-now"
# Or generate a random one
export FLOCKPARSE_API_KEY=$(openssl rand -hex 32)
# Start the API server
python flock_ai_api.py
```
**Production Checklist:**
- ✅ **Change default API key** - Never use `your-secret-api-key-change-this`
- ✅ **Use environment variables** - Never hardcode secrets in code
- ✅ **Enable HTTPS** - Use nginx or Apache as reverse proxy with SSL/TLS
- ✅ **Add rate limiting** - Use nginx `limit_req` or FastAPI middleware
- ✅ **Network isolation** - Don't expose API to public internet unless necessary
- ✅ **Monitor logs** - Watch for authentication failures and abuse
**Example nginx config with TLS:**
```nginx
server {
listen 443 ssl;
server_name your-domain.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
### **MCP Privacy & Security**
**What data leaves your machine:**
- 🔴 **Document queries** - Sent to Claude Desktop → Anthropic API
- 🔴 **Document snippets** - Retrieved context chunks sent as part of prompts
- 🔴 **Chat messages** - All RAG conversations processed by Claude
- 🟢 **Document files** - Never uploaded (processed locally, only embeddings stored)
**To disable MCP and stay 100% local:**
1. Remove FlockParse from Claude Desktop config
2. Use CLI (`flockparsecli.py`) or Web UI (`flock_webui.py`) instead
3. Both provide full RAG functionality without external API calls
**MCP is safe for:**
- ✅ Public documents (research papers, manuals, non-sensitive data)
- ✅ Testing and development
- ✅ Personal use where you trust Anthropic's privacy policy
**MCP is NOT recommended for:**
- ❌ Confidential business documents
- ❌ Personal identifiable information (PII)
- ❌ Regulated data (HIPAA, GDPR sensitive content)
- ❌ Air-gapped or classified environments
### **Database Security**
**SQLite limitations (ChromaDB backend):**
- ⚠️ No concurrent writes from multiple processes
- ⚠️ File permissions determine access (not true auth)
- ⚠️ No encryption at rest by default
**For production with multiple users:**
```bash
# Option 1: Separate databases per interface
CLI: chroma_db_cli/
API: chroma_db_api/
MCP: chroma_db_mcp/
# Option 2: Use PostgreSQL backend (ChromaDB supports it)
# See ChromaDB docs: https://docs.trychroma.com/
```
### **VRAM Detection Method**
FlockParse detects GPU usage via Ollama's `/api/ps` endpoint:
```bash
# Check what Ollama reports
curl http://localhost:11434/api/ps
# Response shows VRAM usage:
{
"models": [{
"name": "mxbai-embed-large:latest",
"size": 705530880,
"size_vram": 705530880, # <-- If >0, model is in GPU
...
}]
}
```
**Health score calculation:**
- `size_vram > 0` → +200 points (GPU in use)
- `size_vram == 0` but GPU present → +50 points (GPU available, not used)
- CPU-only → -50 points
This is **presence-based detection**, not utilization monitoring. It detects *if* the model loaded into VRAM, not *how efficiently* it's being used.
---
## **💡 Features**
| Feature | Description |
|---------|-------------|
| **Multi-method PDF Extraction** | Uses both PyPDF2 and pdftotext for best results |
| **Format Conversion** | Converts PDFs to TXT, Markdown, DOCX, and JSON |
| **Semantic Search** | Uses vector embeddings to find relevant information |
| **Interactive Chat** | Discuss your documents with AI assistance |
| **Privacy Options** | Web UI/CLI: 100% offline; REST API: local network; MCP: Claude Desktop (cloud) |
| **Distributed Processing** | Load balancer with auto-discovery for multiple Ollama nodes |
| **Accurate VRAM Monitoring** | Real GPU memory tracking with nvidia-smi/rocm-smi + Ollama API (NEW!) |
| **GPU & VRAM Awareness** | Automatically detects GPU nodes and prevents CPU fallback |
| **Intelligent Routing** | 4 strategies (adaptive, round_robin, least_loaded, lowest_latency) with GPU priority |
| **Flexible Model Matching** | Supports model name variants (llama3.1, llama3.1:latest, llama3.1:8b, etc.) |
| **ChromaDB Vector Store** | Production-ready persistent vector database with cosine similarity |
| **Embedding Cache** | MD5-based caching prevents reprocessing same content |
| **Model Weight Caching** | Keep models in VRAM for faster repeated inference |
| **Parallel Batch Processing** | Process multiple embeddings simultaneously |
| **Database Management** | Clear cache and clear DB commands for easy maintenance (NEW!) |
| **Filename Preservation** | Maintains original document names in converted files |
| **REST API** | Web server for multi-user/application integration |
| **Document Summarization** | AI-generated summaries of uploaded documents |
| **OCR Processing** | Extract text from scanned documents using image recognition |
## **Comparing FlockParse Interfaces**
| Feature | **flock_webui.py** | flockparsecli.py | flock_ai_api.py | flock_mcp_server.py |
|---------|-------------------|----------------|-----------|---------------------|
| **Interface** | 🎨 Web Browser (Streamlit) | Command line | REST API over HTTP | Model Context Protocol |
| **Ease of Use** | ⭐⭐⭐⭐⭐ Easiest | ⭐⭐⭐⭐ Easy | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate |
| **Use case** | Interactive GUI usage | Personal CLI processing | Service integration | AI Assistant integration |
| **Document formats** | Creates TXT, MD, DOCX, JSON | Creates TXT, MD, DOCX, JSON | Stores extracted text only | Creates TXT, MD, DOCX, JSON |
| **Interaction** | Point-and-click + chat | Interactive chat mode | Query/response via API | Tool calls from AI assistants |
| **Multi-user** | Single user (local) | Single user | Multiple users/applications | Single user (via AI assistant) |
| **Storage** | Local file-based | Local file-based | ChromaDB vector database | Local file-based |
| **Load Balancing** | ✅ Yes (visual dashboard) | ✅ Yes | ❌ No | ✅ Yes |
| **Node Discovery** | ✅ Yes (one-click) | ✅ Yes | ❌ No | ✅ Yes |
| **GPU Monitoring** | ✅ Yes (real-time charts) | ✅ Yes | ❌ No | ✅ Yes |
| **Batch Operations** | ⚠️ Multiple upload | ❌ No | ❌ No | ❌ No |
| **Privacy Level** | 🟢 100% Local | 🟢 100% Local | 🟡 Local Network | 🔴 Cloud (Claude) |
| **Best for** | **🌟 General users, GUI lovers** | Direct CLI usage | Integration with apps | Claude Desktop, AI workflows |
## **📁 Project Structure**
- `/converted_files` - Stores the converted document formats (flockparsecli.py)
- `/knowledge_base` - Legacy JSON storage (backwards compatibility only)
- `/chroma_db_cli` - **ChromaDB vector database for CLI** (flockparsecli.py) - **Production storage**
- `/uploads` - Temporary storage for uploaded documents (flock_ai_api.py)
- `/chroma_db` - ChromaDB vector database (flock_ai_api.py)
## **🚀 Recent Additions**
- ✅ **GPU Auto-Optimization** - Background process ensures models use GPU automatically (NEW!)
- ✅ **Programmatic GPU Control** - Force models to GPU/CPU across distributed nodes (NEW!)
- ✅ **Accurate VRAM Monitoring** - Real GPU memory tracking across distributed nodes
- ✅ **ChromaDB Production Integration** - Professional vector database for 100x faster search
- ✅ **Clear Cache & Clear DB Commands** - Manage embeddings and database efficiently
- ✅ **Model Weight Caching** - Keep models in VRAM for 5-10x faster inference
- ✅ **Web UI** - Beautiful Streamlit interface for easy document management
- ✅ **Advanced OCR Support** - Automatic fallback to OCR for scanned documents
- ✅ **API Authentication** - Secure API key authentication for REST API endpoints
- ⬜ **Document versioning** - Track changes over time (Coming soon)
## **📚 Complete Documentation**
### Core Documentation
- **[📖 Architecture Deep Dive](docs/architecture.md)** - System design, routing algorithms, technical decisions
- **[🌐 Distributed Setup Guide](DISTRIBUTED_SETUP.md)** - ⭐ **Set up your own multi-node cluster**
- **[📊 Performance Benchmarks](BENCHMARKS.md)** - Real-world performance data and scaling tests
- **[⚠️ Known Issues & Limitations](KNOWN_ISSUES.md)** - 🔴 **READ THIS** - Honest assessment of current state
- **[🔒 Security Policy](SECURITY.md)** - Security best practices and vulnerability reporting
- **[🐛 Error Handling Guide](ERROR_HANDLING.md)** - Troubleshooting common issues
- **[🤝 Contributing Guide](CONTRIBUTING.md)** - How to contribute to the project
- **[📋 Code of Conduct](CODE_OF_CONDUCT.md)** - Community guidelines
- **[📝 Changelog](CHANGELOG.md)** - Version history
### Technical Guides
- **[⚡ Performance Optimization](PERFORMANCE_OPTIMIZATION.md)** - Tuning for maximum speed
- **[🔧 GPU Router Setup](GPU_ROUTER_SETUP.md)** - Distributed cluster configuration
- **[🤖 GPU Auto-Optimization](GPU_AUTO_OPTIMIZATION.md)** - Automatic GPU management
- **[📊 VRAM Monitoring](VRAM_MONITORING.md)** - GPU memory tracking
- **[🎯 Adaptive Parallelism](ADAPTIVE_PARALLELISM.md)** - Smart workload distribution
- **[🗄️ ChromaDB Production](CHROMADB_PRODUCTION.md)** - Vector database scaling
- **[💾 Model Caching](MODEL_CACHING.md)** - Performance through caching
- **[🖥️ Node Management](NODE_MANAGEMENT.md)** - Managing distributed nodes
- **[⚡ Quick Setup](QUICK_SETUP.md)** - Fast track to getting started
### Additional Resources
- **[🏛️ FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)** - Original distributed inference implementation
- **[📦 Docker Setup](docker-compose.yml)** - Containerized deployment
- **[⚙️ Environment Config](.env.example)** - Configuration template
- **[🧪 Tests](tests/)** - Test suite and CI/CD
## **🔗 Integration with SynapticLlamas & SOLLOL**
FlockParser is designed to work seamlessly with **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** (multi-agent orchestration) and **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** (distributed inference platform) as a unified AI ecosystem.
### **The Complete Stack**
```
┌─────────────────────────────────────────────────────────────┐
│ SynapticLlamas (v0.1.0+) │
│ Multi-Agent System & Orchestration │
│ • Research agents • Editor agents • Storyteller agents │
└───────────┬────────────────────────────────────┬───────────┘
│ │
│ RAG Queries │ Distributed
│ (with pre-computed embeddings) │ Inference
│ │
┌──────▼──────────┐ ┌─────────▼────────────┐
│ FlockParser │ │ SOLLOL │
│ API (v1.0.4+) │ │ Load Balancer │
│ Port: 8000 │ │ (v0.9.31+) │
└─────────────────┘ └──────────────────────┘
│ │
│ ChromaDB │ Intelligent
│ Vector Store │ GPU/CPU Routing
│ │
┌──────▼──────────┐ ┌─────────▼────────────┐
│ Knowledge Base │ │ Ollama Nodes │
│ 41 Documents │ │ (Distributed) │
│ 6,141 Chunks │ │ GPU + CPU │
└─────────────────┘ └──────────────────────┘
```
### **Why This Integration Matters**
**FlockParser** provides document RAG capabilities, **SynapticLlamas** orchestrates multi-agent workflows, and **SOLLOL** handles distributed inference with intelligent load balancing.
| Component | Role | Key Feature |
|-----------|------|-------------|
| **FlockParser** | Document RAG & Knowledge Base | ChromaDB vector store with 6,141+ chunks |
| **SynapticLlamas** | Agent Orchestration | Multi-agent workflows with RAG integration |
| **SOLLOL** | Distributed Inference | Load balanced embedding & model inference |
### **Quick Start: Complete Ecosystem**
```bash
# Install all three packages (auto-installs dependencies)
pip install synaptic-llamas # Pulls in flockparser>=1.0.4 and sollol>=0.9.31
# Start FlockParser API (auto-starts with CLI)
flockparse
# Configure SynapticLlamas for integration
synaptic-llamas --interactive --distributed
```
### **Integration Example: Load Balanced RAG**
```python
from flockparser_adapter import FlockParserAdapter
from sollol_load_balancer import SOLLOLLoadBalancer
# Initialize SOLLOL for distributed inference
sollol = SOLLOLLoadBalancer(
rpc_backends=["http://gpu-node-1:50052", "http://gpu-node-2:50052"]
)
# Initialize FlockParser adapter
flockparser = FlockParserAdapter("http://localhost:8000", remote_mode=True)
# Step 1: Generate embedding using SOLLOL (load balanced!)
embedding = sollol.generate_embedding(
model="mxbai-embed-large",
prompt="quantum entanglement"
)
# SOLLOL routes to fastest GPU automatically
# Step 2: Query FlockParser with pre-computed embedding
results = flockparser.query_remote(
query="quantum entanglement",
embedding=embedding, # Skip FlockParser's embedding generation
n_results=5
)
# FlockParser returns relevant chunks from 41 documents
# Performance gain: 2-5x faster when SOLLOL has faster nodes!
```
### **New API Endpoints (v1.0.4+)**
FlockParser v1.0.4 adds **SynapticLlamas-compatible** public endpoints:
- **`GET /health`** - Check API availability and document count
- **`GET /stats`** - Get knowledge base statistics (41 docs, 6,141 chunks)
- **`POST /query`** - Query with pre-computed embeddings (critical for load balanced RAG)
**These endpoints allow SynapticLlamas to bypass FlockParser's embedding generation and use SOLLOL's load balancer instead!**
### **Learn More**
- **[📖 Complete Integration Guide](INTEGRATION_WITH_SYNAPTICLLAMAS.md)** - Full architecture, examples, and setup
- **[SynapticLlamas Repository](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** - Multi-agent orchestration
- **[SOLLOL Repository](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - Distributed inference platform
---
## **📝 Development Process**
This project was developed iteratively using Claude and Claude Code as coding assistants. All design decisions, architecture choices, and integration strategy were directed and reviewed by me.
## **🤝 Contributing**
Contributions are welcome! Please feel free to submit a Pull Request.
## **📄 License**
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/BenevolentJoker-JohnL/FlockParser",
"name": "flockparser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "rag, retrieval-augmented-generation, distributed-systems, document-processing, gpu-acceleration, ollama, chromadb, pdf-processing, ocr, ai, machine-learning, nlp",
"author": "BenevolentJoker (John L.)",
"author_email": "\"BenevolentJoker (John L.)\" <benevolentjoker@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e6/0f/a61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b/flockparser-1.0.5.tar.gz",
"platform": null,
"description": "# **FlockParse - Document RAG Intelligence with Distributed Processing**\n\n[](https://pypi.org/project/flockparser/)\n[](https://pypi.org/project/flockparser/)\n[](https://github.com/BenevolentJoker-JohnL/FlockParser/actions)\n[](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser)\n[](https://opensource.org/licenses/MIT)\n[](https://www.python.org/downloads/)\n[](https://github.com/psf/black)\n[](https://github.com/BenevolentJoker-JohnL/FlockParser)\n\n> **Distributed document RAG system that turns mismatched hardware into a coordinated inference cluster.** Auto-discovers Ollama nodes, intelligently routes workloads across heterogeneous GPUs/CPUs, and achieves 60x+ speedups through adaptive load balancing. Privacy-first with local/network/cloud interfaces.\n\n**What makes this different:** Real distributed systems engineering\u2014not just API wrappers. Handles heterogeneous hardware (RTX A4000 + GTX 1050Ti + CPU laptops working together), network failures, and privacy requirements that rule out cloud APIs.\n\n---\n\n## \u26a0\ufe0f Important: Current Maturity\n\n**Status:** Beta (v1.0.0) - **Early adopters welcome, but read this first!**\n\n**What works well:**\n- \u2705 Core distributed processing across heterogeneous nodes\n- \u2705 GPU detection and VRAM-aware routing\n- \u2705 Basic PDF extraction and OCR fallback\n- \u2705 Privacy-first local processing (CLI/Web UI modes)\n\n**Known limitations:**\n- \u26a0\ufe0f **Limited battle testing** - Tested by ~2 developers, not yet proven at scale\n- \u26a0\ufe0f **Security gaps** - See [SECURITY.md](SECURITY.md) for current limitations\n- \u26a0\ufe0f **Edge cases** - Some PDF types may fail (encrypted, complex layouts)\n- \u26a0\ufe0f **Test coverage** - ~40% coverage, integration tests incomplete\n\n**Read before using:** [KNOWN_ISSUES.md](KNOWN_ISSUES.md) documents all limitations, edge cases, and roadmap honestly.\n\n**Recommended for:**\n- \ud83c\udf93 Learning distributed systems\n- \ud83d\udd2c Research and experimentation\n- \ud83c\udfe0 Personal projects with non-critical data\n- \ud83d\udee0\ufe0f Contributors who want to help mature the project\n\n**Not yet recommended for:**\n- \u274c Mission-critical production workloads\n- \u274c Regulated industries (healthcare, finance) without additional hardening\n- \u274c Large-scale deployments (>50 concurrent users)\n\n**Help us improve:** Report issues, contribute fixes, share feedback!\n\n---\n\n## **\ud83c\udfdb\ufe0f Origins & Legacy**\n\nFlockParser's distributed inference architecture originated from **[FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)**, which pioneered:\n- **Auto-discovery** of Ollama nodes across heterogeneous hardware\n- **Adaptive load balancing** with GPU/CPU awareness\n- **VRAM-aware routing** and automatic failover mechanisms\n\nThis core distributed logic from FlockParser-legacy was later extracted and generalized to become **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - a standalone distributed inference platform that now powers both FlockParser and **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)**.\n\n### **\ud83d\udcca Quick Performance Reference**\n\n| Workload | Hardware | Time | Speedup | Notes |\n|----------|----------|------|---------|-------|\n| **5 AI papers (~350 pages)** | 1\u00d7 RTX A4000 (16GB) | 21.3s | **17.5\u00d7** | [Real arXiv showcase](#-showcase-real-world-example) |\n| **12-page PDF (demo video)** | 1\u00d7 RTX A4000 (16GB) | 6.0s | **61.7\u00d7** | GPU-aware routing |\n| **100 PDFs (2000 pages)** | 3-node cluster (mixed) | 3.2 min | **13.2\u00d7** | See [BENCHMARKS.md](BENCHMARKS.md) |\n| **Embedding generation** | RTX A4000 vs i9 CPU | 8.2s vs 178s | **21.7\u00d7** | 10K chunks |\n\n**\ud83c\udfaf Try it yourself:** `pip install flockparser && python showcase/process_arxiv_papers.py`\n\n---\n\n## **\ud83d\udd12 Privacy Model**\n\n| Interface | Privacy Level | External Calls | Best For |\n|-----------|---------------|----------------|----------|\n| **CLI** (`flockparsecli.py`) | \ud83d\udfe2 **100% Local** | None | Personal use, air-gapped systems |\n| **Web UI** (`flock_webui.py`) | \ud83d\udfe2 **100% Local** | None | GUI users, visual monitoring |\n| **REST API** (`flock_ai_api.py`) | \ud83d\udfe1 **Local Network** | None | Multi-user, app integration |\n| **MCP Server** (`flock_mcp_server.py`) | \ud83d\udd34 **Cloud** | \u26a0\ufe0f Claude Desktop (Anthropic) | AI assistant integration |\n\n**\u26a0\ufe0f MCP Privacy Warning:** The MCP server integrates with Claude Desktop, which sends queries and document snippets to Anthropic's cloud API. Use CLI/Web UI for 100% offline processing.\n\n---\n\n## **Table of Contents**\n\n- [Key Features](#-key-features)\n- [\ud83d\udc65 Who Uses This?](#-who-uses-this) - **Target users & scenarios**\n- [\ud83d\udcd0 How It Works (5-Second Overview)](#-how-it-works-5-second-overview) - **Visual for non-technical evaluators**\n- [Architecture](#-architecture) | **[\ud83d\udcd6 Deep Dive: Architecture & Design Decisions](docs/architecture.md)**\n- [Quickstart](#-quickstart-3-steps)\n- [Performance & Benchmarks](#-performance)\n- [\ud83c\udf93 Showcase: Real-World Example](#-showcase-real-world-example) \u2b50 **Try it yourself**\n- [Usage Examples](#-usage)\n- [Security & Production](#-security--production-notes)\n- [\ud83d\udd17 Integration with SynapticLlamas & SOLLOL](#-integration-with-synapticllamas--sollol) - **Complete AI Ecosystem** \u2b50\n- [Troubleshooting](#-troubleshooting-guide)\n- [Contributing](#-contributing)\n\n## **\u26a1 Key Features**\n\n- **\ud83c\udf10 Intelligent Load Balancing** - Auto-discovers Ollama nodes, detects GPU vs CPU, monitors VRAM, and routes work adaptively (10x speedup on heterogeneous clusters)\n- **\ud83d\udd0c Multi-Protocol Support** - CLI (100% local), REST API (network), MCP (Claude Desktop), Web UI (Streamlit) - choose your privacy level\n- **\ud83c\udfaf Adaptive Routing** - Sequential vs parallel decisions based on cluster characteristics (prevents slow nodes from bottlenecking)\n- **\ud83d\udcca Production Observability** - Real-time health scores, performance tracking, VRAM monitoring, automatic failover\n- **\ud83d\udd12 Privacy-First Architecture** - No external API calls required (CLI mode), all processing on-premise\n- **\ud83d\udcc4 Complete Pipeline** - PDF extraction \u2192 OCR fallback \u2192 Multi-format conversion \u2192 Vector embeddings \u2192 RAG with source citations\n\n---\n\n## **\ud83d\udc65 Who Uses This?**\n\nFlockParser is designed for engineers and researchers who need **private, on-premise document intelligence** with **real distributed systems capabilities**.\n\n### **Ideal Users**\n\n| User Type | Use Case | Why FlockParser? |\n|-----------|----------|------------------|\n| **\ud83d\udd2c ML/AI Engineers** | Process research papers, build knowledge bases, experiment with RAG systems | GPU-aware routing, 21\u00d7 faster embeddings, full pipeline control |\n| **\ud83d\udcca Data Scientists** | Extract insights from large document corpora (100s-1000s of PDFs) | Distributed processing, semantic search, production observability |\n| **\ud83c\udfe2 Enterprise Engineers** | On-premise document search for regulated industries (healthcare, legal, finance) | 100% local processing, no cloud APIs, privacy-first architecture |\n| **\ud83c\udf93 Researchers** | Build custom RAG systems, experiment with distributed inference patterns | Full source access, extensible architecture, real benchmarks |\n| **\ud83d\udee0\ufe0f DevOps/Platform Engineers** | Set up document intelligence infrastructure for teams | Multi-node setup, health monitoring, automatic failover |\n| **\ud83d\udc68\u200d\ud83d\udcbb Students/Learners** | Understand distributed systems, GPU orchestration, RAG architectures | Real working example, comprehensive docs, honest limitations |\n\n### **Real-World Scenarios**\n\n\u2705 **\"I have 500 research papers and a spare GPU machine\"** \u2192 Process your corpus 20\u00d7 faster with distributed nodes\n\u2705 **\"I can't send medical records to OpenAI\"** \u2192 100% local processing (CLI/Web UI modes)\n\u2705 **\"I want to experiment with RAG without cloud costs\"** \u2192 Full pipeline, runs on your hardware\n\u2705 **\"I need to search 10,000 internal documents\"** \u2192 ChromaDB vector search with sub-20ms latency\n\u2705 **\"I have mismatched hardware (old laptop + gaming PC)\"** \u2192 Adaptive routing handles heterogeneous clusters\n\n### **Not Ideal For**\n\n\u274c **Production SaaS with 1000+ concurrent users** \u2192 Current SQLite backend limits concurrency (~50 users)\n\u274c **Mission-critical systems requiring 99.9% uptime** \u2192 Still in Beta, see [KNOWN_ISSUES.md](KNOWN_ISSUES.md)\n\u274c **Simple one-time PDF extraction** \u2192 Overkill; use `pdfplumber` directly\n\u274c **Cloud-first deployments** \u2192 Designed for on-premise/hybrid; cloud works but misses GPU routing benefits\n\n**Bottom line:** If you're building document intelligence infrastructure on your own hardware and need distributed processing with privacy guarantees, FlockParser is for you.\n\n---\n\n## **\ud83d\udcd0 How It Works (5-Second Overview)**\n\n**For recruiters and non-technical evaluators:**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 INPUT \u2502\n\u2502 \ud83d\udcc4 Your Documents (PDFs, research papers, internal docs) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 FLOCKPARSER \u2502\n\u2502 \u2502\n\u2502 1. Extracts text from PDFs (handles scans with OCR) \u2502\n\u2502 2. Splits into chunks, creates vector embeddings \u2502\n\u2502 3. Distributes work across GPU/CPU nodes (auto-discovery) \u2502\n\u2502 4. Stores in searchable vector database (ChromaDB) \u2502\n\u2502 \u2502\n\u2502 \u26a1 Distributed Processing: 3 nodes \u2192 13\u00d7 faster \u2502\n\u2502 \ud83d\ude80 GPU Acceleration: RTX A4000 \u2192 61\u00d7 faster than CPU \u2502\n\u2502 \ud83d\udd12 Privacy: 100% local (no cloud APIs) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 OUTPUT \u2502\n\u2502 \ud83d\udd0d Semantic Search: \"Find all mentions of transformers\" \u2502\n\u2502 \ud83d\udcac AI Chat: \"Summarize the methodology section\" \u2502\n\u2502 \ud83d\udcca Source Citations: Exact page/document references \u2502\n\u2502 \ud83c\udf10 4 Interfaces: CLI, Web UI, REST API, Claude Desktop \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Key Innovation:** Auto-detects GPU nodes, measures performance, and routes work to fastest hardware. No manual configuration needed.\n\n---\n\n## **\ud83c\udfd7\ufe0f Architecture**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Interfaces (Choose Your Privacy Level) \u2502\n\u2502 CLI (Local) | REST API (Network) | MCP (Claude) | Web UI \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 FlockParse Core Engine \u2502\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n\u2502 \u2502 PDF \u2502 \u2502 Semantic \u2502 \u2502 RAG \u2502 \u2502\n\u2502 \u2502 Processing \u2502\u2192 \u2502 Search \u2502\u2192 \u2502 Engine \u2502 \u2502\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n\u2502 \u2502 \u2502 \u2502 \u2502\n\u2502 \u25bc \u25bc \u25bc \u2502\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n\u2502 \u2502 ChromaDB Vector Store (Persistent) \u2502 \u2502\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 Intelligent Load Balancer\n \u2502 \u2022 Health scoring (GPU/VRAM detection)\n \u2502 \u2022 Adaptive routing (sequential vs parallel)\n \u2502 \u2022 Automatic failover & caching\n \u25bc\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Distributed Ollama Cluster \u2502\n \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n \u2502 \u2502 Node 1 \u2502 \u2502 Node 2 \u2502 \u2502 Node 3 \u2502 \u2502\n \u2502 \u2502 GPU A \u2502 \u2502 GPU B \u2502 \u2502 CPU \u2502 \u2502\n \u2502 \u250216GB VRAM \u2502 \u2502 8GB VRAM \u2502 \u2502 16GB RAM \u2502 \u2502\n \u2502 \u2502Health:367\u2502 \u2502Health:210\u2502 \u2502Health:50 \u2502 \u2502\n \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u25b2 Auto-discovery | Performance tracking\n```\n\n**Want to understand how this works?** Read the **[\ud83d\udcd6 Architecture Deep Dive](docs/architecture.md)** for detailed explanations of:\n- Why distributed AI inference solves real-world problems\n- How adaptive routing decisions are made (sequential vs parallel)\n- MCP integration details and privacy implications\n- Technical trade-offs and design decisions\n\n## **\ud83d\ude80 Quickstart (3 Steps)**\n\n**Requirements:**\n- Python 3.10 or later\n- Ollama 0.1.20+ (install from [ollama.com](https://ollama.com))\n- 4GB+ RAM (8GB+ recommended for GPU nodes)\n\n```bash\n# 1. Install FlockParser\npip install flockparser\n\n# 2. Start Ollama and pull models\nollama serve # In a separate terminal\nollama pull mxbai-embed-large # Required for embeddings\nollama pull llama3.1:latest # Required for chat\n\n# 3. Run your preferred interface\nflockparse-webui # Web UI - easiest (recommended) \u2b50\nflockparse # CLI - 100% local\nflockparse-api # REST API - multi-user\nflockparse-mcp # MCP - Claude Desktop integration\n```\n\n**\ud83d\udca1 Pro tip:** Start with the Web UI to see distributed processing with real-time VRAM monitoring and node health dashboards.\n\n---\n\n### Alternative: Install from Source\n\nIf you want to contribute or modify the code:\n\n```bash\ngit clone https://github.com/BenevolentJoker-JohnL/FlockParser.git\ncd FlockParser\npip install -e . # Editable install\n```\n\n### **Quick Test (30 seconds)**\n\n```bash\n# Start the CLI\npython flockparsecli.py\n\n# Process the sample PDF\n> open_pdf testpdfs/sample.pdf\n\n# Chat with it\n> chat\n\ud83d\ude4b You: Summarize this document\n```\n\n**First time?** Start with the Web UI (`streamlit run flock_webui.py`) - it's the easiest way to see distributed processing in action with a visual dashboard.\n\n---\n\n## **\ud83d\udc33 Docker Deployment (One Command)**\n\n### **Quick Start with Docker Compose**\n\n```bash\n# Clone and deploy everything\ngit clone https://github.com/BenevolentJoker-JohnL/FlockParser.git\ncd FlockParser\ndocker-compose up -d\n\n# Access services\n# Web UI: http://localhost:8501\n# REST API: http://localhost:8000\n# Ollama: http://localhost:11434\n```\n\n### **What Gets Deployed**\n\n| Service | Port | Description |\n|---------|------|-------------|\n| **Web UI** | 8501 | Streamlit interface with visual monitoring |\n| **REST API** | 8000 | FastAPI with authentication |\n| **CLI** | - | Interactive terminal (docker-compose run cli) |\n| **Ollama** | 11434 | Local LLM inference engine |\n\n### **Production Features**\n\n\u2705 **Multi-stage build** - Optimized image size\n\u2705 **Non-root user** - Security hardened\n\u2705 **Health checks** - Auto-restart on failure\n\u2705 **Volume persistence** - Data survives restarts\n\u2705 **GPU support** - Uncomment deploy section for NVIDIA GPUs\n\n### **Custom Configuration**\n\n```bash\n# Set API key\nexport FLOCKPARSE_API_KEY=\"your-secret-key\"\n\n# Set log level\nexport LOG_LEVEL=\"DEBUG\"\n\n# Deploy with custom config\ndocker-compose up -d\n```\n\n### **GPU Support (NVIDIA)**\n\nUncomment the GPU section in `docker-compose.yml`:\n\n```yaml\ndeploy:\n resources:\n reservations:\n devices:\n - driver: nvidia\n count: all\n capabilities: [gpu]\n```\n\nThen run: `docker-compose up -d`\n\n### **CI/CD Pipeline**\n\n```mermaid\ngraph LR\n A[\ud83d\udcdd Git Push] --> B[\ud83d\udd0d Lint & Format]\n B --> C[\ud83e\uddea Test Suite]\n B --> D[\ud83d\udd12 Security Scan]\n C --> E[\ud83d\udc33 Build Multi-Arch]\n D --> E\n E --> F[\ud83d\udce6 Push to GHCR]\n F --> G[\ud83d\ude80 Deploy]\n\n style A fill:#4CAF50\n style B fill:#2196F3\n style C fill:#2196F3\n style D fill:#FF9800\n style E fill:#9C27B0\n style F fill:#9C27B0\n style G fill:#F44336\n```\n\n**Automated on every push to `main`:**\n\n| Stage | Tools | Purpose |\n|-------|-------|---------|\n| **Code Quality** | black, flake8, mypy | Enforce formatting & typing standards |\n| **Testing** | pytest (Python 3.10/3.11/3.12) | 78% coverage across versions |\n| **Security** | Trivy | Vulnerability scanning & SARIF reports |\n| **Build** | Docker Buildx | Multi-architecture (amd64, arm64) |\n| **Registry** | GitHub Container Registry | Versioned image storage |\n| **Deploy** | On release events | Automated production deployment |\n\n**Pull the latest image:**\n```bash\ndocker pull ghcr.io/benevolentjoker-johnl/flockparser:latest\n```\n\n**View pipeline runs:** https://github.com/BenevolentJoker-JohnL/FlockParser/actions\n\n---\n\n## **\ud83c\udf10 Setting Up Distributed Nodes**\n\n**Want the 60x speedup?** Set up multiple Ollama nodes across your network.\n\n### Quick Multi-Node Setup\n\n**On each additional machine:**\n\n```bash\n# 1. Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# 2. Configure for network access\nexport OLLAMA_HOST=0.0.0.0:11434\nollama serve\n\n# 3. Pull models\nollama pull mxbai-embed-large\nollama pull llama3.1:latest\n\n# 4. Allow firewall (if needed)\nsudo ufw allow 11434/tcp # Linux\n```\n\n**FlockParser will automatically discover these nodes!**\n\nCheck with:\n```bash\npython flockparsecli.py\n> lb_stats # Shows all discovered nodes and their capabilities\n```\n\n**\ud83d\udcd6 Complete Guide:** See **[DISTRIBUTED_SETUP.md](DISTRIBUTED_SETUP.md)** for:\n- Step-by-step multi-machine setup\n- Network configuration and firewall rules\n- Troubleshooting node discovery\n- Example setups (budget home lab to professional clusters)\n- GPU router configuration for automatic optimization\n\n---\n\n### **\ud83d\udd12 Privacy Levels by Interface:**\n- **Web UI (`flock_webui.py`)**: \ud83d\udfe2 100% local, runs in your browser\n- **CLI (`flockparsecli.py`)**: \ud83d\udfe2 100% local, zero external calls\n- **REST API (`flock_ai_api.py`)**: \ud83d\udfe1 Local network only\n- **MCP Server (`flock_mcp_server.py`)**: \ud83d\udd34 Integrates with Claude Desktop (Anthropic cloud service)\n\n**Choose the interface that matches your privacy requirements!**\n\n## **\ud83c\udfc6 Why FlockParse? Comparison to Competitors**\n\n| Feature | **FlockParse** | LangChain | LlamaIndex | Haystack |\n|---------|---------------|-----------|------------|----------|\n| **100% Local/Offline** | \u2705 Yes (CLI/JSON) | \u26a0\ufe0f Partial | \u26a0\ufe0f Partial | \u26a0\ufe0f Partial |\n| **Zero External API Calls** | \u2705 Yes (CLI/JSON) | \u274c No | \u274c No | \u274c No |\n| **Built-in GPU Load Balancing** | \u2705 Yes (auto) | \u274c No | \u274c No | \u274c No |\n| **VRAM Monitoring** | \u2705 Yes (dynamic) | \u274c No | \u274c No | \u274c No |\n| **Multi-Node Auto-Discovery** | \u2705 Yes | \u274c No | \u274c No | \u274c No |\n| **CPU Fallback Detection** | \u2705 Yes | \u274c No | \u274c No | \u274c No |\n| **Document Format Export** | \u2705 4 formats | \u274c Limited | \u274c Limited | \u26a0\ufe0f Basic |\n| **Setup Complexity** | \ud83d\udfe2 Simple | \ud83d\udd34 Complex | \ud83d\udd34 Complex | \ud83d\udfe1 Medium |\n| **Dependencies** | \ud83d\udfe2 Minimal | \ud83d\udd34 Heavy | \ud83d\udd34 Heavy | \ud83d\udfe1 Medium |\n| **Learning Curve** | \ud83d\udfe2 Low | \ud83d\udd34 Steep | \ud83d\udd34 Steep | \ud83d\udfe1 Medium |\n| **Privacy Control** | \ud83d\udfe2 High (CLI/JSON) | \ud83d\udd34 Limited | \ud83d\udd34 Limited | \ud83d\udfe1 Medium |\n| **Out-of-Box Functionality** | \u2705 Complete | \u26a0\ufe0f Requires config | \u26a0\ufe0f Requires config | \u26a0\ufe0f Requires config |\n| **MCP Integration** | \u2705 Native | \u274c No | \u274c No | \u274c No |\n| **Embedding Cache** | \u2705 MD5-based | \u26a0\ufe0f Basic | \u26a0\ufe0f Basic | \u26a0\ufe0f Basic |\n| **Batch Processing** | \u2705 Parallel | \u26a0\ufe0f Sequential | \u26a0\ufe0f Sequential | \u26a0\ufe0f Basic |\n| **Performance** | \ud83d\ude80 60x+ faster with GPU auto-routing | \u26a0\ufe0f Varies by config | \u26a0\ufe0f Varies by config | \u26a0\ufe0f Varies by config |\n| **Cost** | \ud83d\udcb0 Free | \ud83d\udcb0\ud83d\udcb0 Free + Paid | \ud83d\udcb0\ud83d\udcb0 Free + Paid | \ud83d\udcb0\ud83d\udcb0 Free + Paid |\n\n### **Key Differentiators:**\n\n1. **Privacy by Design**: CLI and JSON interfaces are 100% local with zero external calls (MCP interface uses Claude Desktop for chat)\n2. **Intelligent GPU Management**: Automatically finds, tests, and prioritizes GPU nodes\n3. **Production-Ready**: Works immediately with sensible defaults\n4. **Resource-Aware**: Detects VRAM exhaustion and prevents performance degradation\n5. **Complete Solution**: CLI, REST API, MCP, and batch interfaces - choose your privacy level\n\n## **\ud83d\udcca Performance**\n\n### **Real-World Benchmark Results**\n\n| Processing Mode | Time | Speedup | What It Shows |\n|----------------|------|---------|---------------|\n| Single CPU node | 372.76s (~6 min) | 1x baseline | Sequential CPU processing |\n| Parallel (multi-node) | 159.79s (~2.5 min) | **2.3x faster** | Distributed across cluster |\n| GPU node routing | 6.04s (~6 sec) | **61.7x faster** | Automatic GPU detection & routing |\n\n**Why the Massive Speedup?**\n- GPU processes embeddings in milliseconds vs seconds on CPU\n- Adaptive routing detected GPU was 60x+ faster and sent all work there\n- Avoided bottleneck of waiting for slower CPU nodes to finish\n- No network overhead (local cluster, no cloud APIs)\n\n**Key Insight:** The system **automatically** detects performance differences and makes routing decisions - no manual GPU configuration needed.\n\n**Hardware (Benchmark Cluster):**\n- **Node 1 (10.9.66.90):** Intel i9-12900K, 32GB DDR5-6000, 6TB NVMe Gen4, RTX A4000 16GB - primary GPU node\n- **Node 2 (10.9.66.159):** AMD Ryzen 7 5700X, 32GB DDR4-3600, GTX 1050Ti (CPU-mode fallback)\n- **Node 3:** Intel i7-12th gen (laptop), 16GB DDR5, CPU-only\n- **Software:** Python 3.10, Ollama, Ubuntu 22.04\n\n**Reproducibility:**\n- Full source code available in this repo\n- Test with your own hardware - results will vary based on GPU\n\nThe project offers four main interfaces:\n1. **flock_webui.py** - \ud83c\udfa8 Beautiful Streamlit web interface (NEW!)\n2. **flockparsecli.py** - Command-line interface for personal document processing\n3. **flock_ai_api.py** - REST API server for multi-user or application integration\n4. **flock_mcp_server.py** - Model Context Protocol server for AI assistants like Claude Desktop\n\n---\n\n## **\ud83c\udf93 Showcase: Real-World Example**\n\n**Processing influential AI research papers from arXiv.org**\n\nWant to see FlockParser in action on real documents? Run the included showcase:\n\n```bash\npip install flockparser\npython showcase/process_arxiv_papers.py\n```\n\n### **What It Does**\n\nDownloads and processes 5 seminal AI research papers:\n- **Attention Is All You Need** (Transformers) - arXiv:1706.03762\n- **BERT** - Pre-training Deep Bidirectional Transformers - arXiv:1810.04805\n- **RAG** - Retrieval-Augmented Generation for NLP - arXiv:2005.11401\n- **GPT-3** - Language Models are Few-Shot Learners - arXiv:2005.14165\n- **Llama 2** - Open Foundation Language Models - arXiv:2307.09288\n\n**Total: ~350 pages, ~25 MB of PDFs**\n\n### **Expected Results**\n\n| Configuration | Processing Time | Speedup |\n|---------------|----------------|---------|\n| **Single CPU node** | ~90s | 1.0\u00d7 baseline |\n| **Multi-node (1 GPU + 2 CPU)** | ~30s | 3.0\u00d7 |\n| **Single GPU node (RTX A4000)** | ~21s | **4.3\u00d7** |\n\n### **What You Get**\n\nAfter processing, the script demonstrates:\n\n1. **Semantic Search** across all papers:\n ```python\n # Example queries that work immediately:\n \"What is the transformer architecture?\"\n \"How does retrieval-augmented generation work?\"\n \"What are the benefits of attention mechanisms?\"\n ```\n\n2. **Performance Metrics** (`showcase/results.json`):\n ```json\n {\n \"total_time\": 21.3,\n \"papers\": [\n {\n \"title\": \"Attention Is All You Need\",\n \"processing_time\": 4.2,\n \"status\": \"success\"\n }\n ],\n \"node_info\": [...]\n }\n ```\n\n3. **Human-Readable Summary** (`showcase/RESULTS.md`) with:\n - Per-paper processing times\n - Hardware configuration used\n - Fastest/slowest/average performance\n - Replication instructions\n\n### **Why This Matters**\n\nThis isn't a toy demo - it's processing actual research papers that engineers read daily. It demonstrates:\n\n\u2705 **Real document processing** - Complex PDFs with equations, figures, multi-column layouts\n\u2705 **Production-grade pipeline** - PDF extraction \u2192 embeddings \u2192 vector storage \u2192 semantic search\n\u2705 **Actual performance gains** - Measurable speedups on heterogeneous hardware\n\u2705 **Reproducible results** - Run it yourself with `pip install`, compare your hardware\n\n**Perfect for portfolio demonstrations:** Show this to hiring managers as proof of real distributed systems work.\n\n---\n\n## **\ud83d\udd27 Installation** \n\n### **1. Clone the Repository** \n```bash\ngit clone https://github.com/yourusername/flockparse.git\ncd flockparse\n```\n\n### **2. Install System Dependencies (Required for OCR)**\n\n**\u26a0\ufe0f IMPORTANT: Install these BEFORE pip install, as pytesseract and pdf2image require system packages**\n\n#### For Better PDF Text Extraction:\n- **Linux**:\n ```bash\n sudo apt-get update\n sudo apt-get install poppler-utils\n ```\n- **macOS**:\n ```bash\n brew install poppler\n ```\n- **Windows**: Download from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n\n#### For OCR Support (Scanned Documents):\nFlockParse automatically detects scanned PDFs and uses OCR!\n\n- **Linux (Ubuntu/Debian)**:\n ```bash\n sudo apt-get update\n sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils\n ```\n- **Linux (Fedora/RHEL)**:\n ```bash\n sudo dnf install tesseract poppler-utils\n ```\n- **macOS**:\n ```bash\n brew install tesseract poppler\n ```\n- **Windows**:\n 1. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) - Download the installer\n 2. Install [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n 3. Add both to your system PATH\n\n**Verify installation:**\n```bash\ntesseract --version\npdftotext -v\n```\n\n### **3. Install Python Dependencies**\n```bash\npip install -r requirements.txt\n```\n\n**Key Python dependencies** (installed automatically):\n- fastapi, uvicorn - Web server\n- pdfplumber, PyPDF2, pypdf - PDF processing\n- **pytesseract** - Python wrapper for Tesseract OCR (requires system Tesseract)\n- **pdf2image** - PDF to image conversion (requires system Poppler)\n- Pillow - Image processing for OCR\n- chromadb - Vector database\n- python-docx - DOCX generation\n- ollama - AI model integration\n- numpy - Numerical operations\n- markdown - Markdown generation\n\n**How OCR fallback works:**\n1. Tries PyPDF2 text extraction\n2. Falls back to pdftotext if no text\n3. **Falls back to OCR** if still no text (<100 chars) - **Requires Tesseract + Poppler**\n4. Automatically processes scanned documents without manual intervention\n\n### **4. Install and Configure Ollama** \n\n1. Install Ollama from [ollama.com](https://ollama.com)\n2. Start the Ollama service:\n ```bash\n ollama serve\n ```\n3. Pull the required models:\n ```bash\n ollama pull mxbai-embed-large\n ollama pull llama3.1:latest\n ```\n\n## **\ud83d\udcdc Usage**\n\n### **\ud83c\udfa8 Web UI (flock_webui.py) - Easiest Way to Get Started!**\n\nLaunch the beautiful Streamlit web interface:\n```bash\nstreamlit run flock_webui.py\n```\n\nThe web UI will open in your browser at `http://localhost:8501`\n\n**Features:**\n- \ud83d\udce4 **Upload & Process**: Drag-and-drop PDF files for processing\n- \ud83d\udcac **Chat Interface**: Interactive chat with your documents\n- \ud83d\udcca **Load Balancer Dashboard**: Real-time monitoring of GPU nodes\n- \ud83d\udd0d **Semantic Search**: Search across all documents\n- \ud83c\udf10 **Node Management**: Add/remove Ollama nodes, auto-discovery\n- \ud83c\udfaf **Routing Control**: Switch between routing strategies\n\n**Perfect for:**\n- Users who prefer graphical interfaces\n- Quick document processing and exploration\n- Monitoring distributed processing\n- Managing multiple Ollama nodes visually\n\n---\n\n### **CLI Interface (flockparsecli.py)**\n\nRun the script:\n```bash\npython flockparsecli.py\n```\n\nAvailable commands:\n```\n\ud83d\udcd6 open_pdf <file> \u2192 Process a single PDF file\n\ud83d\udcc2 open_dir <dir> \u2192 Process all PDFs in a directory\n\ud83d\udcac chat \u2192 Chat with processed PDFs\n\ud83d\udcca list_docs \u2192 List all processed documents\n\ud83d\udd0d check_deps \u2192 Check for required dependencies\n\ud83c\udf10 discover_nodes \u2192 Auto-discover Ollama nodes on local network\n\u2795 add_node <url> \u2192 Manually add an Ollama node\n\u2796 remove_node <url> \u2192 Remove an Ollama node from the pool\n\ud83d\udccb list_nodes \u2192 List all configured Ollama nodes\n\u2696\ufe0f lb_stats \u2192 Show load balancer statistics\n\u274c exit \u2192 Quit the program\n```\n\n### **Web Server API (flock_ai_api.py)**\n\nStart the API server:\n```bash\n# Set your API key (or use default for testing)\nexport FLOCKPARSE_API_KEY=\"your-secret-key-here\"\n\n# Start server\npython flock_ai_api.py\n```\n\nThe server will run on `http://0.0.0.0:8000` by default.\n\n#### **\ud83d\udd12 Authentication (NEW!)**\n\nAll endpoints except `/` require an API key in the `X-API-Key` header:\n\n```bash\n# Default API key (change in production!)\nX-API-Key: your-secret-api-key-change-this\n\n# Or set via environment variable\nexport FLOCKPARSE_API_KEY=\"my-super-secret-key\"\n```\n\n#### **Available Endpoints:**\n\n| Endpoint | Method | Auth Required | Description |\n|----------|--------|---------------|-------------|\n| `/` | GET | \u274c No | API status and version info |\n| `/upload/` | POST | \u2705 Yes | Upload and process a PDF file |\n| `/summarize/{file_name}` | GET | \u2705 Yes | Get an AI-generated summary |\n| `/search/?query=...` | GET | \u2705 Yes | Search for relevant documents |\n\n#### **Example API Usage:**\n\n**Check API status (no auth required):**\n```bash\ncurl http://localhost:8000/\n```\n\n**Upload a document (with authentication):**\n```bash\ncurl -X POST \\\n -H \"X-API-Key: your-secret-api-key-change-this\" \\\n -F \"file=@your_document.pdf\" \\\n http://localhost:8000/upload/\n```\n\n**Get a document summary:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n http://localhost:8000/summarize/your_document.pdf\n```\n\n**Search across documents:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n \"http://localhost:8000/search/?query=your%20search%20query\"\n```\n\n**\u26a0\ufe0f Production Security:**\n- Always change the default API key\n- Use environment variables, never hardcode keys\n- Use HTTPS in production (nginx/apache reverse proxy)\n- Consider rate limiting for public deployments\n\n### **MCP Server (flock_mcp_server.py)**\n\nThe MCP server allows FlockParse to be used as a tool by AI assistants like Claude Desktop.\n\n#### **Setting up with Claude Desktop**\n\n1. **Start the MCP server:**\n ```bash\n python flock_mcp_server.py\n ```\n\n2. **Configure Claude Desktop:**\n Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, or `%APPDATA%\\Claude\\claude_desktop_config.json` on Windows):\n\n ```json\n {\n \"mcpServers\": {\n \"flockparse\": {\n \"command\": \"python\",\n \"args\": [\"/absolute/path/to/FlockParser/flock_mcp_server.py\"]\n }\n }\n }\n ```\n\n3. **Restart Claude Desktop** and you'll see FlockParse tools available!\n\n#### **Available MCP Tools:**\n\n- `process_pdf` - Process and add PDFs to the knowledge base\n- `query_documents` - Search documents using semantic search\n- `chat_with_documents` - Ask questions about your documents\n- `list_documents` - List all processed documents\n- `get_load_balancer_stats` - View node performance metrics\n- `discover_ollama_nodes` - Auto-discover Ollama nodes\n- `add_ollama_node` - Add an Ollama node manually\n- `remove_ollama_node` - Remove an Ollama node\n\n#### **Example MCP Usage:**\n\nIn Claude Desktop, you can now ask:\n- \"Process the PDF at /path/to/document.pdf\"\n- \"What documents do I have in my knowledge base?\"\n- \"Search my documents for information about quantum computing\"\n- \"What does my research say about black holes?\"\n\n## **\ud83d\udca1 Practical Use Cases**\n\n### **Knowledge Management**\n- Create searchable archives of research papers, legal documents, and technical manuals\n- Generate summaries of lengthy documents for quick review\n- Chat with your document collection to find specific information without manual searching\n\n### **Legal & Compliance**\n- Process contract repositories for semantic search capabilities\n- Extract key terms and clauses from legal documents\n- Analyze regulatory documents for compliance requirements\n\n### **Research & Academia**\n- Process and convert academic papers for easier reference\n- Create a personal research assistant that can reference your document library\n- Generate summaries of complex research for presentations or reviews\n\n### **Business Intelligence**\n- Convert business reports into searchable formats\n- Extract insights from PDF-based market research\n- Make proprietary documents more accessible throughout an organization\n\n## **\ud83c\udf10 Distributed Processing with Load Balancer**\n\nFlockParse includes a sophisticated load balancer that can distribute embedding generation across multiple Ollama instances on your local network.\n\n### **Setting Up Distributed Processing**\n\n#### **Option 1: Auto-Discovery (Easiest)**\n```bash\n# Start FlockParse\npython flockparsecli.py\n\n# Auto-discover Ollama nodes on your network\n\u26a1 Enter command: discover_nodes\n```\n\nThe system will automatically scan your local network (/24 subnet) and detect any running Ollama instances.\n\n#### **Option 2: Manual Node Management**\n```bash\n# Add a specific node\n\u26a1 Enter command: add_node http://192.168.1.100:11434\n\n# List all configured nodes\n\u26a1 Enter command: list_nodes\n\n# Remove a node\n\u26a1 Enter command: remove_node http://192.168.1.100:11434\n\n# View load balancer statistics\n\u26a1 Enter command: lb_stats\n```\n\n### **Benefits of Distributed Processing**\n\n- **Speed**: Process documents 2-10x faster with multiple nodes\n- **GPU Awareness**: Automatically detects and prioritizes GPU nodes over CPU nodes\n- **VRAM Monitoring**: Detects when GPU nodes fall back to CPU due to insufficient VRAM\n- **Fault Tolerance**: Automatic failover if a node becomes unavailable\n- **Load Distribution**: Smart routing based on node performance, GPU availability, and VRAM capacity\n- **Easy Scaling**: Just add more machines with Ollama installed\n\n### **Setting Up Additional Ollama Nodes**\n\nOn each additional machine:\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Pull the embedding model\nollama pull mxbai-embed-large\n\n# Start Ollama (accessible from network)\nOLLAMA_HOST=0.0.0.0:11434 ollama serve\n```\n\nThen use `discover_nodes` or `add_node` to add them to FlockParse.\n\n### **GPU and VRAM Optimization**\n\nFlockParse automatically detects GPU availability and VRAM usage using Ollama's `/api/ps` endpoint:\n\n- **\ud83d\ude80 GPU nodes** with models loaded in VRAM get +200 health score bonus\n- **\u26a0\ufe0f VRAM-limited nodes** that fall back to CPU get only +50 bonus\n- **\ud83d\udc22 CPU-only nodes** get -50 penalty\n\n**To ensure your GPU is being used:**\n\n1. **Check GPU detection**: Run `lb_stats` command to see node status\n2. **Preload model into GPU**: Run a small inference to load model into VRAM\n ```bash\n ollama run mxbai-embed-large \"test\"\n ```\n3. **Verify VRAM usage**: Check that `size_vram > 0` in `/api/ps`:\n ```bash\n curl http://localhost:11434/api/ps\n ```\n4. **Increase VRAM allocation**: If model won't load into VRAM, free up GPU memory or use a smaller model\n\n**Dynamic VRAM monitoring**: FlockParse continuously monitors embedding performance and automatically detects when a GPU node falls back to CPU due to VRAM exhaustion during heavy load.\n\n## **\ud83d\udd04 Example Workflows**\n\n### **CLI Workflow: Research Paper Processing**\n\n1. **Check Dependencies**:\n ```\n \u26a1 Enter command: check_deps\n ```\n\n2. **Process a Directory of Research Papers**:\n ```\n \u26a1 Enter command: open_dir ~/research_papers\n ```\n\n3. **Chat with Your Research Collection**:\n ```\n \u26a1 Enter command: chat\n \ud83d\ude4b You: What are the key methods used in the Smith 2023 paper?\n ```\n\n### **API Workflow: Document Processing Service**\n\n1. **Start the API Server**:\n ```bash\n python flock_ai_api.py\n ```\n\n2. **Upload Documents via API**:\n ```bash\n curl -X POST -F \"file=@quarterly_report.pdf\" http://localhost:8000/upload/\n ```\n\n3. **Generate a Summary**:\n ```bash\n curl http://localhost:8000/summarize/quarterly_report.pdf\n ```\n\n4. **Search Across Documents**:\n ```bash\n curl http://localhost:8000/search/?query=revenue%20growth%20Q3\n ```\n\n## **\ud83d\udd27 Troubleshooting Guide**\n\n### **Ollama Connection Issues**\n\n**Problem**: Error messages about Ollama not being available or connection failures.\n\n**Solution**:\n1. Verify Ollama is running: `ps aux | grep ollama`\n2. Restart the Ollama service: \n ```bash\n killall ollama\n ollama serve\n ```\n3. Check that you've pulled the required models:\n ```bash\n ollama list\n ```\n4. If models are missing:\n ```bash\n ollama pull mxbai-embed-large\n ollama pull llama3.1:latest\n ```\n\n### **PDF Text Extraction Failures**\n\n**Problem**: No text extracted from certain PDFs.\n\n**Solution**:\n1. Check if the PDF is scanned/image-based:\n - Install OCR tools: `sudo apt-get install tesseract-ocr` (Linux)\n - For better scanned PDF handling: `pip install ocrmypdf`\n - Process with OCR: `ocrmypdf input.pdf output.pdf`\n\n2. If the PDF has unusual fonts or formatting:\n - Install poppler-utils for better extraction\n - Try using the `-layout` option with pdftotext manually:\n ```bash\n pdftotext -layout problem_document.pdf output.txt\n ```\n\n### **Memory Issues with Large Documents**\n\n**Problem**: Application crashes with large PDFs or many documents.\n\n**Solution**:\n1. Process one document at a time for very large PDFs\n2. Reduce the chunk size in the code (default is 512 characters)\n3. Increase your system's available memory or use a swap file\n4. For server deployments, consider using a machine with more RAM\n\n### **API Server Not Starting**\n\n**Problem**: Error when trying to start the API server.\n\n**Solution**:\n1. Check for port conflicts: `lsof -i :8000`\n2. If another process is using port 8000, kill it or change the port\n3. Verify FastAPI is installed: `pip install fastapi uvicorn`\n4. Check for Python version compatibility (requires Python 3.7+)\n\n---\n\n## **\ud83d\udd10 Security & Production Notes**\n\n### **REST API Security**\n\n**\u26a0\ufe0f The default API key is NOT secure - change it immediately!**\n\n```bash\n# Set a strong API key via environment variable\nexport FLOCKPARSE_API_KEY=\"your-super-secret-key-change-this-now\"\n\n# Or generate a random one\nexport FLOCKPARSE_API_KEY=$(openssl rand -hex 32)\n\n# Start the API server\npython flock_ai_api.py\n```\n\n**Production Checklist:**\n- \u2705 **Change default API key** - Never use `your-secret-api-key-change-this`\n- \u2705 **Use environment variables** - Never hardcode secrets in code\n- \u2705 **Enable HTTPS** - Use nginx or Apache as reverse proxy with SSL/TLS\n- \u2705 **Add rate limiting** - Use nginx `limit_req` or FastAPI middleware\n- \u2705 **Network isolation** - Don't expose API to public internet unless necessary\n- \u2705 **Monitor logs** - Watch for authentication failures and abuse\n\n**Example nginx config with TLS:**\n```nginx\nserver {\n listen 443 ssl;\n server_name your-domain.com;\n\n ssl_certificate /path/to/cert.pem;\n ssl_certificate_key /path/to/key.pem;\n\n location / {\n proxy_pass http://127.0.0.1:8000;\n proxy_set_header Host $host;\n proxy_set_header X-Real-IP $remote_addr;\n }\n}\n```\n\n### **MCP Privacy & Security**\n\n**What data leaves your machine:**\n- \ud83d\udd34 **Document queries** - Sent to Claude Desktop \u2192 Anthropic API\n- \ud83d\udd34 **Document snippets** - Retrieved context chunks sent as part of prompts\n- \ud83d\udd34 **Chat messages** - All RAG conversations processed by Claude\n- \ud83d\udfe2 **Document files** - Never uploaded (processed locally, only embeddings stored)\n\n**To disable MCP and stay 100% local:**\n1. Remove FlockParse from Claude Desktop config\n2. Use CLI (`flockparsecli.py`) or Web UI (`flock_webui.py`) instead\n3. Both provide full RAG functionality without external API calls\n\n**MCP is safe for:**\n- \u2705 Public documents (research papers, manuals, non-sensitive data)\n- \u2705 Testing and development\n- \u2705 Personal use where you trust Anthropic's privacy policy\n\n**MCP is NOT recommended for:**\n- \u274c Confidential business documents\n- \u274c Personal identifiable information (PII)\n- \u274c Regulated data (HIPAA, GDPR sensitive content)\n- \u274c Air-gapped or classified environments\n\n### **Database Security**\n\n**SQLite limitations (ChromaDB backend):**\n- \u26a0\ufe0f No concurrent writes from multiple processes\n- \u26a0\ufe0f File permissions determine access (not true auth)\n- \u26a0\ufe0f No encryption at rest by default\n\n**For production with multiple users:**\n```bash\n# Option 1: Separate databases per interface\nCLI: chroma_db_cli/\nAPI: chroma_db_api/\nMCP: chroma_db_mcp/\n\n# Option 2: Use PostgreSQL backend (ChromaDB supports it)\n# See ChromaDB docs: https://docs.trychroma.com/\n```\n\n### **VRAM Detection Method**\n\nFlockParse detects GPU usage via Ollama's `/api/ps` endpoint:\n\n```bash\n# Check what Ollama reports\ncurl http://localhost:11434/api/ps\n\n# Response shows VRAM usage:\n{\n \"models\": [{\n \"name\": \"mxbai-embed-large:latest\",\n \"size\": 705530880,\n \"size_vram\": 705530880, # <-- If >0, model is in GPU\n ...\n }]\n}\n```\n\n**Health score calculation:**\n- `size_vram > 0` \u2192 +200 points (GPU in use)\n- `size_vram == 0` but GPU present \u2192 +50 points (GPU available, not used)\n- CPU-only \u2192 -50 points\n\nThis is **presence-based detection**, not utilization monitoring. It detects *if* the model loaded into VRAM, not *how efficiently* it's being used.\n\n---\n\n## **\ud83d\udca1 Features**\n\n| Feature | Description |\n|---------|-------------|\n| **Multi-method PDF Extraction** | Uses both PyPDF2 and pdftotext for best results |\n| **Format Conversion** | Converts PDFs to TXT, Markdown, DOCX, and JSON |\n| **Semantic Search** | Uses vector embeddings to find relevant information |\n| **Interactive Chat** | Discuss your documents with AI assistance |\n| **Privacy Options** | Web UI/CLI: 100% offline; REST API: local network; MCP: Claude Desktop (cloud) |\n| **Distributed Processing** | Load balancer with auto-discovery for multiple Ollama nodes |\n| **Accurate VRAM Monitoring** | Real GPU memory tracking with nvidia-smi/rocm-smi + Ollama API (NEW!) |\n| **GPU & VRAM Awareness** | Automatically detects GPU nodes and prevents CPU fallback |\n| **Intelligent Routing** | 4 strategies (adaptive, round_robin, least_loaded, lowest_latency) with GPU priority |\n| **Flexible Model Matching** | Supports model name variants (llama3.1, llama3.1:latest, llama3.1:8b, etc.) |\n| **ChromaDB Vector Store** | Production-ready persistent vector database with cosine similarity |\n| **Embedding Cache** | MD5-based caching prevents reprocessing same content |\n| **Model Weight Caching** | Keep models in VRAM for faster repeated inference |\n| **Parallel Batch Processing** | Process multiple embeddings simultaneously |\n| **Database Management** | Clear cache and clear DB commands for easy maintenance (NEW!) |\n| **Filename Preservation** | Maintains original document names in converted files |\n| **REST API** | Web server for multi-user/application integration |\n| **Document Summarization** | AI-generated summaries of uploaded documents |\n| **OCR Processing** | Extract text from scanned documents using image recognition |\n\n## **Comparing FlockParse Interfaces**\n\n| Feature | **flock_webui.py** | flockparsecli.py | flock_ai_api.py | flock_mcp_server.py |\n|---------|-------------------|----------------|-----------|---------------------|\n| **Interface** | \ud83c\udfa8 Web Browser (Streamlit) | Command line | REST API over HTTP | Model Context Protocol |\n| **Ease of Use** | \u2b50\u2b50\u2b50\u2b50\u2b50 Easiest | \u2b50\u2b50\u2b50\u2b50 Easy | \u2b50\u2b50\u2b50 Moderate | \u2b50\u2b50\u2b50 Moderate |\n| **Use case** | Interactive GUI usage | Personal CLI processing | Service integration | AI Assistant integration |\n| **Document formats** | Creates TXT, MD, DOCX, JSON | Creates TXT, MD, DOCX, JSON | Stores extracted text only | Creates TXT, MD, DOCX, JSON |\n| **Interaction** | Point-and-click + chat | Interactive chat mode | Query/response via API | Tool calls from AI assistants |\n| **Multi-user** | Single user (local) | Single user | Multiple users/applications | Single user (via AI assistant) |\n| **Storage** | Local file-based | Local file-based | ChromaDB vector database | Local file-based |\n| **Load Balancing** | \u2705 Yes (visual dashboard) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **Node Discovery** | \u2705 Yes (one-click) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **GPU Monitoring** | \u2705 Yes (real-time charts) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **Batch Operations** | \u26a0\ufe0f Multiple upload | \u274c No | \u274c No | \u274c No |\n| **Privacy Level** | \ud83d\udfe2 100% Local | \ud83d\udfe2 100% Local | \ud83d\udfe1 Local Network | \ud83d\udd34 Cloud (Claude) |\n| **Best for** | **\ud83c\udf1f General users, GUI lovers** | Direct CLI usage | Integration with apps | Claude Desktop, AI workflows |\n\n## **\ud83d\udcc1 Project Structure**\n\n- `/converted_files` - Stores the converted document formats (flockparsecli.py)\n- `/knowledge_base` - Legacy JSON storage (backwards compatibility only)\n- `/chroma_db_cli` - **ChromaDB vector database for CLI** (flockparsecli.py) - **Production storage**\n- `/uploads` - Temporary storage for uploaded documents (flock_ai_api.py)\n- `/chroma_db` - ChromaDB vector database (flock_ai_api.py)\n\n## **\ud83d\ude80 Recent Additions**\n- \u2705 **GPU Auto-Optimization** - Background process ensures models use GPU automatically (NEW!)\n- \u2705 **Programmatic GPU Control** - Force models to GPU/CPU across distributed nodes (NEW!)\n- \u2705 **Accurate VRAM Monitoring** - Real GPU memory tracking across distributed nodes\n- \u2705 **ChromaDB Production Integration** - Professional vector database for 100x faster search\n- \u2705 **Clear Cache & Clear DB Commands** - Manage embeddings and database efficiently\n- \u2705 **Model Weight Caching** - Keep models in VRAM for 5-10x faster inference\n- \u2705 **Web UI** - Beautiful Streamlit interface for easy document management\n- \u2705 **Advanced OCR Support** - Automatic fallback to OCR for scanned documents\n- \u2705 **API Authentication** - Secure API key authentication for REST API endpoints\n- \u2b1c **Document versioning** - Track changes over time (Coming soon)\n\n## **\ud83d\udcda Complete Documentation**\n\n### Core Documentation\n- **[\ud83d\udcd6 Architecture Deep Dive](docs/architecture.md)** - System design, routing algorithms, technical decisions\n- **[\ud83c\udf10 Distributed Setup Guide](DISTRIBUTED_SETUP.md)** - \u2b50 **Set up your own multi-node cluster**\n- **[\ud83d\udcca Performance Benchmarks](BENCHMARKS.md)** - Real-world performance data and scaling tests\n- **[\u26a0\ufe0f Known Issues & Limitations](KNOWN_ISSUES.md)** - \ud83d\udd34 **READ THIS** - Honest assessment of current state\n- **[\ud83d\udd12 Security Policy](SECURITY.md)** - Security best practices and vulnerability reporting\n- **[\ud83d\udc1b Error Handling Guide](ERROR_HANDLING.md)** - Troubleshooting common issues\n- **[\ud83e\udd1d Contributing Guide](CONTRIBUTING.md)** - How to contribute to the project\n- **[\ud83d\udccb Code of Conduct](CODE_OF_CONDUCT.md)** - Community guidelines\n- **[\ud83d\udcdd Changelog](CHANGELOG.md)** - Version history\n\n### Technical Guides\n- **[\u26a1 Performance Optimization](PERFORMANCE_OPTIMIZATION.md)** - Tuning for maximum speed\n- **[\ud83d\udd27 GPU Router Setup](GPU_ROUTER_SETUP.md)** - Distributed cluster configuration\n- **[\ud83e\udd16 GPU Auto-Optimization](GPU_AUTO_OPTIMIZATION.md)** - Automatic GPU management\n- **[\ud83d\udcca VRAM Monitoring](VRAM_MONITORING.md)** - GPU memory tracking\n- **[\ud83c\udfaf Adaptive Parallelism](ADAPTIVE_PARALLELISM.md)** - Smart workload distribution\n- **[\ud83d\uddc4\ufe0f ChromaDB Production](CHROMADB_PRODUCTION.md)** - Vector database scaling\n- **[\ud83d\udcbe Model Caching](MODEL_CACHING.md)** - Performance through caching\n- **[\ud83d\udda5\ufe0f Node Management](NODE_MANAGEMENT.md)** - Managing distributed nodes\n- **[\u26a1 Quick Setup](QUICK_SETUP.md)** - Fast track to getting started\n\n### Additional Resources\n- **[\ud83c\udfdb\ufe0f FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)** - Original distributed inference implementation\n- **[\ud83d\udce6 Docker Setup](docker-compose.yml)** - Containerized deployment\n- **[\u2699\ufe0f Environment Config](.env.example)** - Configuration template\n- **[\ud83e\uddea Tests](tests/)** - Test suite and CI/CD\n\n## **\ud83d\udd17 Integration with SynapticLlamas & SOLLOL**\n\nFlockParser is designed to work seamlessly with **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** (multi-agent orchestration) and **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** (distributed inference platform) as a unified AI ecosystem.\n\n### **The Complete Stack**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 SynapticLlamas (v0.1.0+) \u2502\n\u2502 Multi-Agent System & Orchestration \u2502\n\u2502 \u2022 Research agents \u2022 Editor agents \u2022 Storyteller agents \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 \u2502\n \u2502 RAG Queries \u2502 Distributed\n \u2502 (with pre-computed embeddings) \u2502 Inference\n \u2502 \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 FlockParser \u2502 \u2502 SOLLOL \u2502\n \u2502 API (v1.0.4+) \u2502 \u2502 Load Balancer \u2502\n \u2502 Port: 8000 \u2502 \u2502 (v0.9.31+) \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 \u2502\n \u2502 ChromaDB \u2502 Intelligent\n \u2502 Vector Store \u2502 GPU/CPU Routing\n \u2502 \u2502\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 Knowledge Base \u2502 \u2502 Ollama Nodes \u2502\n \u2502 41 Documents \u2502 \u2502 (Distributed) \u2502\n \u2502 6,141 Chunks \u2502 \u2502 GPU + CPU \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### **Why This Integration Matters**\n\n**FlockParser** provides document RAG capabilities, **SynapticLlamas** orchestrates multi-agent workflows, and **SOLLOL** handles distributed inference with intelligent load balancing.\n\n| Component | Role | Key Feature |\n|-----------|------|-------------|\n| **FlockParser** | Document RAG & Knowledge Base | ChromaDB vector store with 6,141+ chunks |\n| **SynapticLlamas** | Agent Orchestration | Multi-agent workflows with RAG integration |\n| **SOLLOL** | Distributed Inference | Load balanced embedding & model inference |\n\n### **Quick Start: Complete Ecosystem**\n\n```bash\n# Install all three packages (auto-installs dependencies)\npip install synaptic-llamas # Pulls in flockparser>=1.0.4 and sollol>=0.9.31\n\n# Start FlockParser API (auto-starts with CLI)\nflockparse\n\n# Configure SynapticLlamas for integration\nsynaptic-llamas --interactive --distributed\n```\n\n### **Integration Example: Load Balanced RAG**\n\n```python\nfrom flockparser_adapter import FlockParserAdapter\nfrom sollol_load_balancer import SOLLOLLoadBalancer\n\n# Initialize SOLLOL for distributed inference\nsollol = SOLLOLLoadBalancer(\n rpc_backends=[\"http://gpu-node-1:50052\", \"http://gpu-node-2:50052\"]\n)\n\n# Initialize FlockParser adapter\nflockparser = FlockParserAdapter(\"http://localhost:8000\", remote_mode=True)\n\n# Step 1: Generate embedding using SOLLOL (load balanced!)\nembedding = sollol.generate_embedding(\n model=\"mxbai-embed-large\",\n prompt=\"quantum entanglement\"\n)\n# SOLLOL routes to fastest GPU automatically\n\n# Step 2: Query FlockParser with pre-computed embedding\nresults = flockparser.query_remote(\n query=\"quantum entanglement\",\n embedding=embedding, # Skip FlockParser's embedding generation\n n_results=5\n)\n# FlockParser returns relevant chunks from 41 documents\n\n# Performance gain: 2-5x faster when SOLLOL has faster nodes!\n```\n\n### **New API Endpoints (v1.0.4+)**\n\nFlockParser v1.0.4 adds **SynapticLlamas-compatible** public endpoints:\n\n- **`GET /health`** - Check API availability and document count\n- **`GET /stats`** - Get knowledge base statistics (41 docs, 6,141 chunks)\n- **`POST /query`** - Query with pre-computed embeddings (critical for load balanced RAG)\n\n**These endpoints allow SynapticLlamas to bypass FlockParser's embedding generation and use SOLLOL's load balancer instead!**\n\n### **Learn More**\n\n- **[\ud83d\udcd6 Complete Integration Guide](INTEGRATION_WITH_SYNAPTICLLAMAS.md)** - Full architecture, examples, and setup\n- **[SynapticLlamas Repository](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** - Multi-agent orchestration\n- **[SOLLOL Repository](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - Distributed inference platform\n\n---\n\n## **\ud83d\udcdd Development Process**\n\nThis project was developed iteratively using Claude and Claude Code as coding assistants. All design decisions, architecture choices, and integration strategy were directed and reviewed by me.\n\n## **\ud83e\udd1d Contributing**\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## **\ud83d\udcc4 License**\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distributed document RAG system with intelligent GPU/CPU orchestration",
"version": "1.0.5",
"project_urls": {
"Bug Tracker": "https://github.com/BenevolentJoker-JohnL/FlockParser/issues",
"Demo Video": "https://youtu.be/M-HjXkWYRLM",
"Documentation": "https://github.com/BenevolentJoker-JohnL/FlockParser#readme",
"Homepage": "https://github.com/BenevolentJoker-JohnL/FlockParser",
"Repository": "https://github.com/BenevolentJoker-JohnL/FlockParser"
},
"split_keywords": [
"rag",
" retrieval-augmented-generation",
" distributed-systems",
" document-processing",
" gpu-acceleration",
" ollama",
" chromadb",
" pdf-processing",
" ocr",
" ai",
" machine-learning",
" nlp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "96f8962ee272c9cab279da8c0e734bc06a1c735bcac13a8ab3367cb6bb497b3b",
"md5": "2732ecc28e4957e38ae4660310d7d8fd",
"sha256": "830b3b19f207e80a654255c50704257413818e2d416e014c845203a67e7503c9"
},
"downloads": -1,
"filename": "flockparser-1.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2732ecc28e4957e38ae4660310d7d8fd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 79499,
"upload_time": "2025-10-21T19:53:43",
"upload_time_iso_8601": "2025-10-21T19:53:43.049380Z",
"url": "https://files.pythonhosted.org/packages/96/f8/962ee272c9cab279da8c0e734bc06a1c735bcac13a8ab3367cb6bb497b3b/flockparser-1.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e60fa61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b",
"md5": "bdefb61625288422f536eaf478cea1ac",
"sha256": "61c0945709cbf5a1f6b42e85924727ad5eb55ae67e65ab80f0b85a179abc3fbb"
},
"downloads": -1,
"filename": "flockparser-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "bdefb61625288422f536eaf478cea1ac",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 157426,
"upload_time": "2025-10-21T19:53:44",
"upload_time_iso_8601": "2025-10-21T19:53:44.809368Z",
"url": "https://files.pythonhosted.org/packages/e6/0f/a61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b/flockparser-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 19:53:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BenevolentJoker-JohnL",
"github_project": "FlockParser",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "fastapi",
"specs": [
[
">=",
"0.103.1"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.23.2"
]
]
},
{
"name": "python-multipart",
"specs": [
[
">=",
"0.0.6"
]
]
},
{
"name": "pdfplumber",
"specs": [
[
">=",
"0.10.2"
]
]
},
{
"name": "PyPDF2",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "pypdf",
"specs": [
[
">=",
"3.15.1"
]
]
},
{
"name": "pytesseract",
"specs": [
[
">=",
"0.3.10"
]
]
},
{
"name": "Pillow",
"specs": [
[
">=",
"10.0.0"
]
]
},
{
"name": "pdf2image",
"specs": [
[
">=",
"1.16.0"
]
]
},
{
"name": "python-docx",
"specs": [
[
">=",
"0.8.11"
]
]
},
{
"name": "markdown",
"specs": [
[
">=",
"3.4.4"
]
]
},
{
"name": "chromadb",
"specs": [
[
">=",
"0.4.13"
]
]
},
{
"name": "ollama",
"specs": [
[
">=",
"0.1.4"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.24.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "mcp",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "streamlit",
"specs": [
[
">=",
"1.40.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.1.0"
]
]
},
{
"name": "pytest-timeout",
"specs": [
[
">=",
"2.1.0"
]
]
}
],
"lcname": "flockparser"
}