flockparser

Name	flockparser JSON
Version	1.0.5 JSON
	download
home_page	https://github.com/BenevolentJoker-JohnL/FlockParser
Summary	Distributed document RAG system with intelligent GPU/CPU orchestration
upload_time	2025-10-21 19:53:44
maintainer	None
docs_url	None
author	BenevolentJoker (John L.)
requires_python	>=3.10
license	MIT
keywords	rag retrieval-augmented-generation distributed-systems document-processing gpu-acceleration ollama chromadb pdf-processing ocr ai machine-learning nlp
VCS
bugtrack_url
requirements	fastapi uvicorn python-multipart pdfplumber PyPDF2 pypdf pytesseract Pillow pdf2image python-docx markdown chromadb ollama numpy requests mcp streamlit pytest pytest-cov pytest-timeout
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # **FlockParse - Document RAG Intelligence with Distributed Processing**

[![PyPI version](https://img.shields.io/pypi/v/flockparser.svg)](https://pypi.org/project/flockparser/)
[![PyPI downloads](https://img.shields.io/pypi/dm/flockparser.svg)](https://pypi.org/project/flockparser/)
[![CI Status](https://img.shields.io/github/actions/workflow/status/BenevolentJoker-JohnL/FlockParser/ci.yml?branch=main&label=tests)](https://github.com/BenevolentJoker-JohnL/FlockParser/actions)
[![codecov](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser/branch/main/graph/badge.svg)](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![GitHub Stars](https://img.shields.io/github/stars/BenevolentJoker-JohnL/FlockParser?style=social)](https://github.com/BenevolentJoker-JohnL/FlockParser)

> **Distributed document RAG system that turns mismatched hardware into a coordinated inference cluster.** Auto-discovers Ollama nodes, intelligently routes workloads across heterogeneous GPUs/CPUs, and achieves 60x+ speedups through adaptive load balancing. Privacy-first with local/network/cloud interfaces.

**What makes this different:** Real distributed systems engineering—not just API wrappers. Handles heterogeneous hardware (RTX A4000 + GTX 1050Ti + CPU laptops working together), network failures, and privacy requirements that rule out cloud APIs.

---

## ⚠️ Important: Current Maturity

**Status:** Beta (v1.0.0) - **Early adopters welcome, but read this first!**

**What works well:**
- ✅ Core distributed processing across heterogeneous nodes
- ✅ GPU detection and VRAM-aware routing
- ✅ Basic PDF extraction and OCR fallback
- ✅ Privacy-first local processing (CLI/Web UI modes)

**Known limitations:**
- ⚠️ **Limited battle testing** - Tested by ~2 developers, not yet proven at scale
- ⚠️ **Security gaps** - See [SECURITY.md](SECURITY.md) for current limitations
- ⚠️ **Edge cases** - Some PDF types may fail (encrypted, complex layouts)
- ⚠️ **Test coverage** - ~40% coverage, integration tests incomplete

**Read before using:** [KNOWN_ISSUES.md](KNOWN_ISSUES.md) documents all limitations, edge cases, and roadmap honestly.

**Recommended for:**
- 🎓 Learning distributed systems
- 🔬 Research and experimentation
- 🏠 Personal projects with non-critical data
- 🛠️ Contributors who want to help mature the project

**Not yet recommended for:**
- ❌ Mission-critical production workloads
- ❌ Regulated industries (healthcare, finance) without additional hardening
- ❌ Large-scale deployments (>50 concurrent users)

**Help us improve:** Report issues, contribute fixes, share feedback!

---

## **🏛️ Origins & Legacy**

FlockParser's distributed inference architecture originated from **[FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)**, which pioneered:
- **Auto-discovery** of Ollama nodes across heterogeneous hardware
- **Adaptive load balancing** with GPU/CPU awareness
- **VRAM-aware routing** and automatic failover mechanisms

This core distributed logic from FlockParser-legacy was later extracted and generalized to become **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - a standalone distributed inference platform that now powers both FlockParser and **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)**.

### **📊 Quick Performance Reference**

| Workload | Hardware | Time | Speedup | Notes |
|----------|----------|------|---------|-------|
| **5 AI papers (~350 pages)** | 1× RTX A4000 (16GB) | 21.3s | **17.5×** | [Real arXiv showcase](#-showcase-real-world-example) |
| **12-page PDF (demo video)** | 1× RTX A4000 (16GB) | 6.0s | **61.7×** | GPU-aware routing |
| **100 PDFs (2000 pages)** | 3-node cluster (mixed) | 3.2 min | **13.2×** | See [BENCHMARKS.md](BENCHMARKS.md) |
| **Embedding generation** | RTX A4000 vs i9 CPU | 8.2s vs 178s | **21.7×** | 10K chunks |

**🎯 Try it yourself:** `pip install flockparser && python showcase/process_arxiv_papers.py`

---

## **🔒 Privacy Model**

| Interface | Privacy Level | External Calls | Best For |
|-----------|---------------|----------------|----------|
| **CLI** (`flockparsecli.py`) | 🟢 **100% Local** | None | Personal use, air-gapped systems |
| **Web UI** (`flock_webui.py`) | 🟢 **100% Local** | None | GUI users, visual monitoring |
| **REST API** (`flock_ai_api.py`) | 🟡 **Local Network** | None | Multi-user, app integration |
| **MCP Server** (`flock_mcp_server.py`) | 🔴 **Cloud** | ⚠️ Claude Desktop (Anthropic) | AI assistant integration |

**⚠️ MCP Privacy Warning:** The MCP server integrates with Claude Desktop, which sends queries and document snippets to Anthropic's cloud API. Use CLI/Web UI for 100% offline processing.

---

## **Table of Contents**

- [Key Features](#-key-features)
- [👥 Who Uses This?](#-who-uses-this) - **Target users & scenarios**
- [📐 How It Works (5-Second Overview)](#-how-it-works-5-second-overview) - **Visual for non-technical evaluators**
- [Architecture](#-architecture) | **[📖 Deep Dive: Architecture & Design Decisions](docs/architecture.md)**
- [Quickstart](#-quickstart-3-steps)
- [Performance & Benchmarks](#-performance)
- [🎓 Showcase: Real-World Example](#-showcase-real-world-example) ⭐ **Try it yourself**
- [Usage Examples](#-usage)
- [Security & Production](#-security--production-notes)
- [🔗 Integration with SynapticLlamas & SOLLOL](#-integration-with-synapticllamas--sollol) - **Complete AI Ecosystem** ⭐
- [Troubleshooting](#-troubleshooting-guide)
- [Contributing](#-contributing)

## **⚡ Key Features**

- **🌐 Intelligent Load Balancing** - Auto-discovers Ollama nodes, detects GPU vs CPU, monitors VRAM, and routes work adaptively (10x speedup on heterogeneous clusters)
- **🔌 Multi-Protocol Support** - CLI (100% local), REST API (network), MCP (Claude Desktop), Web UI (Streamlit) - choose your privacy level
- **🎯 Adaptive Routing** - Sequential vs parallel decisions based on cluster characteristics (prevents slow nodes from bottlenecking)
- **📊 Production Observability** - Real-time health scores, performance tracking, VRAM monitoring, automatic failover
- **🔒 Privacy-First Architecture** - No external API calls required (CLI mode), all processing on-premise
- **📄 Complete Pipeline** - PDF extraction → OCR fallback → Multi-format conversion → Vector embeddings → RAG with source citations

---

## **👥 Who Uses This?**

FlockParser is designed for engineers and researchers who need **private, on-premise document intelligence** with **real distributed systems capabilities**.

### **Ideal Users**

| User Type | Use Case | Why FlockParser? |
|-----------|----------|------------------|
| **🔬 ML/AI Engineers** | Process research papers, build knowledge bases, experiment with RAG systems | GPU-aware routing, 21× faster embeddings, full pipeline control |
| **📊 Data Scientists** | Extract insights from large document corpora (100s-1000s of PDFs) | Distributed processing, semantic search, production observability |
| **🏢 Enterprise Engineers** | On-premise document search for regulated industries (healthcare, legal, finance) | 100% local processing, no cloud APIs, privacy-first architecture |
| **🎓 Researchers** | Build custom RAG systems, experiment with distributed inference patterns | Full source access, extensible architecture, real benchmarks |
| **🛠️ DevOps/Platform Engineers** | Set up document intelligence infrastructure for teams | Multi-node setup, health monitoring, automatic failover |
| **👨‍💻 Students/Learners** | Understand distributed systems, GPU orchestration, RAG architectures | Real working example, comprehensive docs, honest limitations |

### **Real-World Scenarios**

✅ **"I have 500 research papers and a spare GPU machine"** → Process your corpus 20× faster with distributed nodes
✅ **"I can't send medical records to OpenAI"** → 100% local processing (CLI/Web UI modes)
✅ **"I want to experiment with RAG without cloud costs"** → Full pipeline, runs on your hardware
✅ **"I need to search 10,000 internal documents"** → ChromaDB vector search with sub-20ms latency
✅ **"I have mismatched hardware (old laptop + gaming PC)"** → Adaptive routing handles heterogeneous clusters

### **Not Ideal For**

❌ **Production SaaS with 1000+ concurrent users** → Current SQLite backend limits concurrency (~50 users)
❌ **Mission-critical systems requiring 99.9% uptime** → Still in Beta, see [KNOWN_ISSUES.md](KNOWN_ISSUES.md)
❌ **Simple one-time PDF extraction** → Overkill; use `pdfplumber` directly
❌ **Cloud-first deployments** → Designed for on-premise/hybrid; cloud works but misses GPU routing benefits

**Bottom line:** If you're building document intelligence infrastructure on your own hardware and need distributed processing with privacy guarantees, FlockParser is for you.

---

## **📐 How It Works (5-Second Overview)**

**For recruiters and non-technical evaluators:**

```
┌─────────────────────────────────────────────────────────────────┐
│                         INPUT                                    │
│  📄 Your Documents (PDFs, research papers, internal docs)       │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                     FLOCKPARSER                                  │
│                                                                  │
│  1. Extracts text from PDFs (handles scans with OCR)           │
│  2. Splits into chunks, creates vector embeddings              │
│  3. Distributes work across GPU/CPU nodes (auto-discovery)     │
│  4. Stores in searchable vector database (ChromaDB)            │
│                                                                  │
│  ⚡ Distributed Processing: 3 nodes → 13× faster               │
│  🚀 GPU Acceleration: RTX A4000 → 61× faster than CPU          │
│  🔒 Privacy: 100% local (no cloud APIs)                        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        OUTPUT                                    │
│  🔍 Semantic Search: "Find all mentions of transformers"        │
│  💬 AI Chat: "Summarize the methodology section"                │
│  📊 Source Citations: Exact page/document references            │
│  🌐 4 Interfaces: CLI, Web UI, REST API, Claude Desktop         │
└─────────────────────────────────────────────────────────────────┘
```

**Key Innovation:** Auto-detects GPU nodes, measures performance, and routes work to fastest hardware. No manual configuration needed.

---

## **🏗️ Architecture**

```
┌─────────────────────────────────────────────────────────────┐
│             Interfaces (Choose Your Privacy Level)           │
│  CLI (Local) | REST API (Network) | MCP (Claude) | Web UI   │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                  FlockParse Core Engine                      │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   PDF       │  │  Semantic    │  │     RAG      │       │
│  │ Processing  │→ │   Search     │→ │   Engine     │       │
│  └─────────────┘  └──────────────┘  └──────────────┘       │
│         │                │                    │              │
│         ▼                ▼                    ▼              │
│  ┌───────────────────────────────────────────────────┐      │
│  │        ChromaDB Vector Store (Persistent)         │      │
│  └───────────────────────────────────────────────────┘      │
└──────────────────────┬──────────────────────────────────────┘
                       │ Intelligent Load Balancer
                       │ • Health scoring (GPU/VRAM detection)
                       │ • Adaptive routing (sequential vs parallel)
                       │ • Automatic failover & caching
                       ▼
    ┌──────────────────────────────────────────────┐
    │       Distributed Ollama Cluster              │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
    │  │ Node 1   │  │ Node 2   │  │ Node 3   │   │
    │  │ GPU A    │  │ GPU B    │  │ CPU      │   │
    │  │16GB VRAM │  │ 8GB VRAM │  │ 16GB RAM │   │
    │  │Health:367│  │Health:210│  │Health:50 │   │
    │  └──────────┘  └──────────┘  └──────────┘   │
    └──────────────────────────────────────────────┘
         ▲ Auto-discovery | Performance tracking
```

**Want to understand how this works?** Read the **[📖 Architecture Deep Dive](docs/architecture.md)** for detailed explanations of:
- Why distributed AI inference solves real-world problems
- How adaptive routing decisions are made (sequential vs parallel)
- MCP integration details and privacy implications
- Technical trade-offs and design decisions

## **🚀 Quickstart (3 Steps)**

**Requirements:**
- Python 3.10 or later
- Ollama 0.1.20+ (install from [ollama.com](https://ollama.com))
- 4GB+ RAM (8GB+ recommended for GPU nodes)

```bash
# 1. Install FlockParser
pip install flockparser

# 2. Start Ollama and pull models
ollama serve  # In a separate terminal
ollama pull mxbai-embed-large    # Required for embeddings
ollama pull llama3.1:latest       # Required for chat

# 3. Run your preferred interface
flockparse-webui                     # Web UI - easiest (recommended) ⭐
flockparse                           # CLI - 100% local
flockparse-api                       # REST API - multi-user
flockparse-mcp                       # MCP - Claude Desktop integration
```

**💡 Pro tip:** Start with the Web UI to see distributed processing with real-time VRAM monitoring and node health dashboards.

---

### Alternative: Install from Source

If you want to contribute or modify the code:

```bash
git clone https://github.com/BenevolentJoker-JohnL/FlockParser.git
cd FlockParser
pip install -e .  # Editable install
```

### **Quick Test (30 seconds)**

```bash
# Start the CLI
python flockparsecli.py

# Process the sample PDF
> open_pdf testpdfs/sample.pdf

# Chat with it
> chat
🙋 You: Summarize this document
```

**First time?** Start with the Web UI (`streamlit run flock_webui.py`) - it's the easiest way to see distributed processing in action with a visual dashboard.

---

## **🐳 Docker Deployment (One Command)**

### **Quick Start with Docker Compose**

```bash
# Clone and deploy everything
git clone https://github.com/BenevolentJoker-JohnL/FlockParser.git
cd FlockParser
docker-compose up -d

# Access services
# Web UI: http://localhost:8501
# REST API: http://localhost:8000
# Ollama: http://localhost:11434
```

### **What Gets Deployed**

| Service | Port | Description |
|---------|------|-------------|
| **Web UI** | 8501 | Streamlit interface with visual monitoring |
| **REST API** | 8000 | FastAPI with authentication |
| **CLI** | - | Interactive terminal (docker-compose run cli) |
| **Ollama** | 11434 | Local LLM inference engine |

### **Production Features**

✅ **Multi-stage build** - Optimized image size
✅ **Non-root user** - Security hardened
✅ **Health checks** - Auto-restart on failure
✅ **Volume persistence** - Data survives restarts
✅ **GPU support** - Uncomment deploy section for NVIDIA GPUs

### **Custom Configuration**

```bash
# Set API key
export FLOCKPARSE_API_KEY="your-secret-key"

# Set log level
export LOG_LEVEL="DEBUG"

# Deploy with custom config
docker-compose up -d
```

### **GPU Support (NVIDIA)**

Uncomment the GPU section in `docker-compose.yml`:

```yaml
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]
```

Then run: `docker-compose up -d`

### **CI/CD Pipeline**

```mermaid
graph LR
    A[📝 Git Push] --> B[🔍 Lint & Format]
    B --> C[🧪 Test Suite]
    B --> D[🔒 Security Scan]
    C --> E[🐳 Build Multi-Arch]
    D --> E
    E --> F[📦 Push to GHCR]
    F --> G[🚀 Deploy]

    style A fill:#4CAF50
    style B fill:#2196F3
    style C fill:#2196F3
    style D fill:#FF9800
    style E fill:#9C27B0
    style F fill:#9C27B0
    style G fill:#F44336
```

**Automated on every push to `main`:**

| Stage | Tools | Purpose |
|-------|-------|---------|
| **Code Quality** | black, flake8, mypy | Enforce formatting & typing standards |
| **Testing** | pytest (Python 3.10/3.11/3.12) | 78% coverage across versions |
| **Security** | Trivy | Vulnerability scanning & SARIF reports |
| **Build** | Docker Buildx | Multi-architecture (amd64, arm64) |
| **Registry** | GitHub Container Registry | Versioned image storage |
| **Deploy** | On release events | Automated production deployment |

**Pull the latest image:**
```bash
docker pull ghcr.io/benevolentjoker-johnl/flockparser:latest
```

**View pipeline runs:** https://github.com/BenevolentJoker-JohnL/FlockParser/actions

---

## **🌐 Setting Up Distributed Nodes**

**Want the 60x speedup?** Set up multiple Ollama nodes across your network.

### Quick Multi-Node Setup

**On each additional machine:**

```bash
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Configure for network access
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

# 3. Pull models
ollama pull mxbai-embed-large
ollama pull llama3.1:latest

# 4. Allow firewall (if needed)
sudo ufw allow 11434/tcp  # Linux
```

**FlockParser will automatically discover these nodes!**

Check with:
```bash
python flockparsecli.py
> lb_stats  # Shows all discovered nodes and their capabilities
```

**📖 Complete Guide:** See **[DISTRIBUTED_SETUP.md](DISTRIBUTED_SETUP.md)** for:
- Step-by-step multi-machine setup
- Network configuration and firewall rules
- Troubleshooting node discovery
- Example setups (budget home lab to professional clusters)
- GPU router configuration for automatic optimization

---

### **🔒 Privacy Levels by Interface:**
- **Web UI (`flock_webui.py`)**: 🟢 100% local, runs in your browser
- **CLI (`flockparsecli.py`)**: 🟢 100% local, zero external calls
- **REST API (`flock_ai_api.py`)**: 🟡 Local network only
- **MCP Server (`flock_mcp_server.py`)**: 🔴 Integrates with Claude Desktop (Anthropic cloud service)

**Choose the interface that matches your privacy requirements!**

## **🏆 Why FlockParse? Comparison to Competitors**

| Feature | **FlockParse** | LangChain | LlamaIndex | Haystack |
|---------|---------------|-----------|------------|----------|
| **100% Local/Offline** | ✅ Yes (CLI/JSON) | ⚠️ Partial | ⚠️ Partial | ⚠️ Partial |
| **Zero External API Calls** | ✅ Yes (CLI/JSON) | ❌ No | ❌ No | ❌ No |
| **Built-in GPU Load Balancing** | ✅ Yes (auto) | ❌ No | ❌ No | ❌ No |
| **VRAM Monitoring** | ✅ Yes (dynamic) | ❌ No | ❌ No | ❌ No |
| **Multi-Node Auto-Discovery** | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **CPU Fallback Detection** | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **Document Format Export** | ✅ 4 formats | ❌ Limited | ❌ Limited | ⚠️ Basic |
| **Setup Complexity** | 🟢 Simple | 🔴 Complex | 🔴 Complex | 🟡 Medium |
| **Dependencies** | 🟢 Minimal | 🔴 Heavy | 🔴 Heavy | 🟡 Medium |
| **Learning Curve** | 🟢 Low | 🔴 Steep | 🔴 Steep | 🟡 Medium |
| **Privacy Control** | 🟢 High (CLI/JSON) | 🔴 Limited | 🔴 Limited | 🟡 Medium |
| **Out-of-Box Functionality** | ✅ Complete | ⚠️ Requires config | ⚠️ Requires config | ⚠️ Requires config |
| **MCP Integration** | ✅ Native | ❌ No | ❌ No | ❌ No |
| **Embedding Cache** | ✅ MD5-based | ⚠️ Basic | ⚠️ Basic | ⚠️ Basic |
| **Batch Processing** | ✅ Parallel | ⚠️ Sequential | ⚠️ Sequential | ⚠️ Basic |
| **Performance** | 🚀 60x+ faster with GPU auto-routing | ⚠️ Varies by config | ⚠️ Varies by config | ⚠️ Varies by config |
| **Cost** | 💰 Free | 💰💰 Free + Paid | 💰💰 Free + Paid | 💰💰 Free + Paid |

### **Key Differentiators:**

1. **Privacy by Design**: CLI and JSON interfaces are 100% local with zero external calls (MCP interface uses Claude Desktop for chat)
2. **Intelligent GPU Management**: Automatically finds, tests, and prioritizes GPU nodes
3. **Production-Ready**: Works immediately with sensible defaults
4. **Resource-Aware**: Detects VRAM exhaustion and prevents performance degradation
5. **Complete Solution**: CLI, REST API, MCP, and batch interfaces - choose your privacy level

## **📊 Performance**

### **Real-World Benchmark Results**

| Processing Mode | Time | Speedup | What It Shows |
|----------------|------|---------|---------------|
| Single CPU node | 372.76s (~6 min) | 1x baseline | Sequential CPU processing |
| Parallel (multi-node) | 159.79s (~2.5 min) | **2.3x faster** | Distributed across cluster |
| GPU node routing | 6.04s (~6 sec) | **61.7x faster** | Automatic GPU detection & routing |

**Why the Massive Speedup?**
- GPU processes embeddings in milliseconds vs seconds on CPU
- Adaptive routing detected GPU was 60x+ faster and sent all work there
- Avoided bottleneck of waiting for slower CPU nodes to finish
- No network overhead (local cluster, no cloud APIs)

**Key Insight:** The system **automatically** detects performance differences and makes routing decisions - no manual GPU configuration needed.

**Hardware (Benchmark Cluster):**
- **Node 1 (10.9.66.90):** Intel i9-12900K, 32GB DDR5-6000, 6TB NVMe Gen4, RTX A4000 16GB - primary GPU node
- **Node 2 (10.9.66.159):** AMD Ryzen 7 5700X, 32GB DDR4-3600, GTX 1050Ti (CPU-mode fallback)
- **Node 3:** Intel i7-12th gen (laptop), 16GB DDR5, CPU-only
- **Software:** Python 3.10, Ollama, Ubuntu 22.04

**Reproducibility:**
- Full source code available in this repo
- Test with your own hardware - results will vary based on GPU

The project offers four main interfaces:
1. **flock_webui.py** - 🎨 Beautiful Streamlit web interface (NEW!)
2. **flockparsecli.py** - Command-line interface for personal document processing
3. **flock_ai_api.py** - REST API server for multi-user or application integration
4. **flock_mcp_server.py** - Model Context Protocol server for AI assistants like Claude Desktop

---

## **🎓 Showcase: Real-World Example**

**Processing influential AI research papers from arXiv.org**

Want to see FlockParser in action on real documents? Run the included showcase:

```bash
pip install flockparser
python showcase/process_arxiv_papers.py
```

### **What It Does**

Downloads and processes 5 seminal AI research papers:
- **Attention Is All You Need** (Transformers) - arXiv:1706.03762
- **BERT** - Pre-training Deep Bidirectional Transformers - arXiv:1810.04805
- **RAG** - Retrieval-Augmented Generation for NLP - arXiv:2005.11401
- **GPT-3** - Language Models are Few-Shot Learners - arXiv:2005.14165
- **Llama 2** - Open Foundation Language Models - arXiv:2307.09288

**Total: ~350 pages, ~25 MB of PDFs**

### **Expected Results**

| Configuration | Processing Time | Speedup |
|---------------|----------------|---------|
| **Single CPU node** | ~90s | 1.0× baseline |
| **Multi-node (1 GPU + 2 CPU)** | ~30s | 3.0× |
| **Single GPU node (RTX A4000)** | ~21s | **4.3×** |

### **What You Get**

After processing, the script demonstrates:

1. **Semantic Search** across all papers:
   ```python
   # Example queries that work immediately:
   "What is the transformer architecture?"
   "How does retrieval-augmented generation work?"
   "What are the benefits of attention mechanisms?"
   ```

2. **Performance Metrics** (`showcase/results.json`):
   ```json
   {
     "total_time": 21.3,
     "papers": [
       {
         "title": "Attention Is All You Need",
         "processing_time": 4.2,
         "status": "success"
       }
     ],
     "node_info": [...]
   }
   ```

3. **Human-Readable Summary** (`showcase/RESULTS.md`) with:
   - Per-paper processing times
   - Hardware configuration used
   - Fastest/slowest/average performance
   - Replication instructions

### **Why This Matters**

This isn't a toy demo - it's processing actual research papers that engineers read daily. It demonstrates:

✅ **Real document processing** - Complex PDFs with equations, figures, multi-column layouts
✅ **Production-grade pipeline** - PDF extraction → embeddings → vector storage → semantic search
✅ **Actual performance gains** - Measurable speedups on heterogeneous hardware
✅ **Reproducible results** - Run it yourself with `pip install`, compare your hardware

**Perfect for portfolio demonstrations:** Show this to hiring managers as proof of real distributed systems work.

---

## **🔧 Installation**  

### **1. Clone the Repository**  
```bash
git clone https://github.com/yourusername/flockparse.git
cd flockparse
```

### **2. Install System Dependencies (Required for OCR)**

**⚠️ IMPORTANT: Install these BEFORE pip install, as pytesseract and pdf2image require system packages**

#### For Better PDF Text Extraction:
- **Linux**:
  ```bash
  sudo apt-get update
  sudo apt-get install poppler-utils
  ```
- **macOS**:
  ```bash
  brew install poppler
  ```
- **Windows**: Download from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)

#### For OCR Support (Scanned Documents):
FlockParse automatically detects scanned PDFs and uses OCR!

- **Linux (Ubuntu/Debian)**:
  ```bash
  sudo apt-get update
  sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils
  ```
- **Linux (Fedora/RHEL)**:
  ```bash
  sudo dnf install tesseract poppler-utils
  ```
- **macOS**:
  ```bash
  brew install tesseract poppler
  ```
- **Windows**:
  1. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) - Download the installer
  2. Install [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)
  3. Add both to your system PATH

**Verify installation:**
```bash
tesseract --version
pdftotext -v
```

### **3. Install Python Dependencies**
```bash
pip install -r requirements.txt
```

**Key Python dependencies** (installed automatically):
- fastapi, uvicorn - Web server
- pdfplumber, PyPDF2, pypdf - PDF processing
- **pytesseract** - Python wrapper for Tesseract OCR (requires system Tesseract)
- **pdf2image** - PDF to image conversion (requires system Poppler)
- Pillow - Image processing for OCR
- chromadb - Vector database
- python-docx - DOCX generation
- ollama - AI model integration
- numpy - Numerical operations
- markdown - Markdown generation

**How OCR fallback works:**
1. Tries PyPDF2 text extraction
2. Falls back to pdftotext if no text
3. **Falls back to OCR** if still no text (<100 chars) - **Requires Tesseract + Poppler**
4. Automatically processes scanned documents without manual intervention

### **4. Install and Configure Ollama**  

1. Install Ollama from [ollama.com](https://ollama.com)
2. Start the Ollama service:
   ```bash
   ollama serve
   ```
3. Pull the required models:
   ```bash
   ollama pull mxbai-embed-large
   ollama pull llama3.1:latest
   ```

## **📜 Usage**

### **🎨 Web UI (flock_webui.py) - Easiest Way to Get Started!**

Launch the beautiful Streamlit web interface:
```bash
streamlit run flock_webui.py
```

The web UI will open in your browser at `http://localhost:8501`

**Features:**
- 📤 **Upload & Process**: Drag-and-drop PDF files for processing
- 💬 **Chat Interface**: Interactive chat with your documents
- 📊 **Load Balancer Dashboard**: Real-time monitoring of GPU nodes
- 🔍 **Semantic Search**: Search across all documents
- 🌐 **Node Management**: Add/remove Ollama nodes, auto-discovery
- 🎯 **Routing Control**: Switch between routing strategies

**Perfect for:**
- Users who prefer graphical interfaces
- Quick document processing and exploration
- Monitoring distributed processing
- Managing multiple Ollama nodes visually

---

### **CLI Interface (flockparsecli.py)**

Run the script:
```bash
python flockparsecli.py
```

Available commands:
```
📖 open_pdf <file>   → Process a single PDF file
📂 open_dir <dir>    → Process all PDFs in a directory
💬 chat              → Chat with processed PDFs
📊 list_docs         → List all processed documents
🔍 check_deps        → Check for required dependencies
🌐 discover_nodes    → Auto-discover Ollama nodes on local network
➕ add_node <url>    → Manually add an Ollama node
➖ remove_node <url> → Remove an Ollama node from the pool
📋 list_nodes        → List all configured Ollama nodes
⚖️  lb_stats          → Show load balancer statistics
❌ exit              → Quit the program
```

### **Web Server API (flock_ai_api.py)**

Start the API server:
```bash
# Set your API key (or use default for testing)
export FLOCKPARSE_API_KEY="your-secret-key-here"

# Start server
python flock_ai_api.py
```

The server will run on `http://0.0.0.0:8000` by default.

#### **🔒 Authentication (NEW!)**

All endpoints except `/` require an API key in the `X-API-Key` header:

```bash
# Default API key (change in production!)
X-API-Key: your-secret-api-key-change-this

# Or set via environment variable
export FLOCKPARSE_API_KEY="my-super-secret-key"
```

#### **Available Endpoints:**

| Endpoint | Method | Auth Required | Description |
|----------|--------|---------------|-------------|
| `/` | GET | ❌ No | API status and version info |
| `/upload/` | POST | ✅ Yes | Upload and process a PDF file |
| `/summarize/{file_name}` | GET | ✅ Yes | Get an AI-generated summary |
| `/search/?query=...` | GET | ✅ Yes | Search for relevant documents |

#### **Example API Usage:**

**Check API status (no auth required):**
```bash
curl http://localhost:8000/
```

**Upload a document (with authentication):**
```bash
curl -X POST \
  -H "X-API-Key: your-secret-api-key-change-this" \
  -F "file=@your_document.pdf" \
  http://localhost:8000/upload/
```

**Get a document summary:**
```bash
curl -H "X-API-Key: your-secret-api-key-change-this" \
  http://localhost:8000/summarize/your_document.pdf
```

**Search across documents:**
```bash
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8000/search/?query=your%20search%20query"
```

**⚠️ Production Security:**
- Always change the default API key
- Use environment variables, never hardcode keys
- Use HTTPS in production (nginx/apache reverse proxy)
- Consider rate limiting for public deployments

### **MCP Server (flock_mcp_server.py)**

The MCP server allows FlockParse to be used as a tool by AI assistants like Claude Desktop.

#### **Setting up with Claude Desktop**

1. **Start the MCP server:**
   ```bash
   python flock_mcp_server.py
   ```

2. **Configure Claude Desktop:**
   Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):

   ```json
   {
     "mcpServers": {
       "flockparse": {
         "command": "python",
         "args": ["/absolute/path/to/FlockParser/flock_mcp_server.py"]
       }
     }
   }
   ```

3. **Restart Claude Desktop** and you'll see FlockParse tools available!

#### **Available MCP Tools:**

- `process_pdf` - Process and add PDFs to the knowledge base
- `query_documents` - Search documents using semantic search
- `chat_with_documents` - Ask questions about your documents
- `list_documents` - List all processed documents
- `get_load_balancer_stats` - View node performance metrics
- `discover_ollama_nodes` - Auto-discover Ollama nodes
- `add_ollama_node` - Add an Ollama node manually
- `remove_ollama_node` - Remove an Ollama node

#### **Example MCP Usage:**

In Claude Desktop, you can now ask:
- "Process the PDF at /path/to/document.pdf"
- "What documents do I have in my knowledge base?"
- "Search my documents for information about quantum computing"
- "What does my research say about black holes?"

## **💡 Practical Use Cases**

### **Knowledge Management**
- Create searchable archives of research papers, legal documents, and technical manuals
- Generate summaries of lengthy documents for quick review
- Chat with your document collection to find specific information without manual searching

### **Legal & Compliance**
- Process contract repositories for semantic search capabilities
- Extract key terms and clauses from legal documents
- Analyze regulatory documents for compliance requirements

### **Research & Academia**
- Process and convert academic papers for easier reference
- Create a personal research assistant that can reference your document library
- Generate summaries of complex research for presentations or reviews

### **Business Intelligence**
- Convert business reports into searchable formats
- Extract insights from PDF-based market research
- Make proprietary documents more accessible throughout an organization

## **🌐 Distributed Processing with Load Balancer**

FlockParse includes a sophisticated load balancer that can distribute embedding generation across multiple Ollama instances on your local network.

### **Setting Up Distributed Processing**

#### **Option 1: Auto-Discovery (Easiest)**
```bash
# Start FlockParse
python flockparsecli.py

# Auto-discover Ollama nodes on your network
⚡ Enter command: discover_nodes
```

The system will automatically scan your local network (/24 subnet) and detect any running Ollama instances.

#### **Option 2: Manual Node Management**
```bash
# Add a specific node
⚡ Enter command: add_node http://192.168.1.100:11434

# List all configured nodes
⚡ Enter command: list_nodes

# Remove a node
⚡ Enter command: remove_node http://192.168.1.100:11434

# View load balancer statistics
⚡ Enter command: lb_stats
```

### **Benefits of Distributed Processing**

- **Speed**: Process documents 2-10x faster with multiple nodes
- **GPU Awareness**: Automatically detects and prioritizes GPU nodes over CPU nodes
- **VRAM Monitoring**: Detects when GPU nodes fall back to CPU due to insufficient VRAM
- **Fault Tolerance**: Automatic failover if a node becomes unavailable
- **Load Distribution**: Smart routing based on node performance, GPU availability, and VRAM capacity
- **Easy Scaling**: Just add more machines with Ollama installed

### **Setting Up Additional Ollama Nodes**

On each additional machine:
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the embedding model
ollama pull mxbai-embed-large

# Start Ollama (accessible from network)
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```

Then use `discover_nodes` or `add_node` to add them to FlockParse.

### **GPU and VRAM Optimization**

FlockParse automatically detects GPU availability and VRAM usage using Ollama's `/api/ps` endpoint:

- **🚀 GPU nodes** with models loaded in VRAM get +200 health score bonus
- **⚠️ VRAM-limited nodes** that fall back to CPU get only +50 bonus
- **🐢 CPU-only nodes** get -50 penalty

**To ensure your GPU is being used:**

1. **Check GPU detection**: Run `lb_stats` command to see node status
2. **Preload model into GPU**: Run a small inference to load model into VRAM
   ```bash
   ollama run mxbai-embed-large "test"
   ```
3. **Verify VRAM usage**: Check that `size_vram > 0` in `/api/ps`:
   ```bash
   curl http://localhost:11434/api/ps
   ```
4. **Increase VRAM allocation**: If model won't load into VRAM, free up GPU memory or use a smaller model

**Dynamic VRAM monitoring**: FlockParse continuously monitors embedding performance and automatically detects when a GPU node falls back to CPU due to VRAM exhaustion during heavy load.

## **🔄 Example Workflows**

### **CLI Workflow: Research Paper Processing**

1. **Check Dependencies**:
   ```
   ⚡ Enter command: check_deps
   ```

2. **Process a Directory of Research Papers**:
   ```
   ⚡ Enter command: open_dir ~/research_papers
   ```

3. **Chat with Your Research Collection**:
   ```
   ⚡ Enter command: chat
   🙋 You: What are the key methods used in the Smith 2023 paper?
   ```

### **API Workflow: Document Processing Service**

1. **Start the API Server**:
   ```bash
   python flock_ai_api.py
   ```

2. **Upload Documents via API**:
   ```bash
   curl -X POST -F "file=@quarterly_report.pdf" http://localhost:8000/upload/
   ```

3. **Generate a Summary**:
   ```bash
   curl http://localhost:8000/summarize/quarterly_report.pdf
   ```

4. **Search Across Documents**:
   ```bash
   curl http://localhost:8000/search/?query=revenue%20growth%20Q3
   ```

## **🔧 Troubleshooting Guide**

### **Ollama Connection Issues**

**Problem**: Error messages about Ollama not being available or connection failures.

**Solution**:
1. Verify Ollama is running: `ps aux | grep ollama`
2. Restart the Ollama service: 
   ```bash
   killall ollama
   ollama serve
   ```
3. Check that you've pulled the required models:
   ```bash
   ollama list
   ```
4. If models are missing:
   ```bash
   ollama pull mxbai-embed-large
   ollama pull llama3.1:latest
   ```

### **PDF Text Extraction Failures**

**Problem**: No text extracted from certain PDFs.

**Solution**:
1. Check if the PDF is scanned/image-based:
   - Install OCR tools: `sudo apt-get install tesseract-ocr` (Linux)
   - For better scanned PDF handling: `pip install ocrmypdf`
   - Process with OCR: `ocrmypdf input.pdf output.pdf`

2. If the PDF has unusual fonts or formatting:
   - Install poppler-utils for better extraction
   - Try using the `-layout` option with pdftotext manually:
     ```bash
     pdftotext -layout problem_document.pdf output.txt
     ```

### **Memory Issues with Large Documents**

**Problem**: Application crashes with large PDFs or many documents.

**Solution**:
1. Process one document at a time for very large PDFs
2. Reduce the chunk size in the code (default is 512 characters)
3. Increase your system's available memory or use a swap file
4. For server deployments, consider using a machine with more RAM

### **API Server Not Starting**

**Problem**: Error when trying to start the API server.

**Solution**:
1. Check for port conflicts: `lsof -i :8000`
2. If another process is using port 8000, kill it or change the port
3. Verify FastAPI is installed: `pip install fastapi uvicorn`
4. Check for Python version compatibility (requires Python 3.7+)

---

## **🔐 Security & Production Notes**

### **REST API Security**

**⚠️ The default API key is NOT secure - change it immediately!**

```bash
# Set a strong API key via environment variable
export FLOCKPARSE_API_KEY="your-super-secret-key-change-this-now"

# Or generate a random one
export FLOCKPARSE_API_KEY=$(openssl rand -hex 32)

# Start the API server
python flock_ai_api.py
```

**Production Checklist:**
- ✅ **Change default API key** - Never use `your-secret-api-key-change-this`
- ✅ **Use environment variables** - Never hardcode secrets in code
- ✅ **Enable HTTPS** - Use nginx or Apache as reverse proxy with SSL/TLS
- ✅ **Add rate limiting** - Use nginx `limit_req` or FastAPI middleware
- ✅ **Network isolation** - Don't expose API to public internet unless necessary
- ✅ **Monitor logs** - Watch for authentication failures and abuse

**Example nginx config with TLS:**
```nginx
server {
    listen 443 ssl;
    server_name your-domain.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```

### **MCP Privacy & Security**

**What data leaves your machine:**
- 🔴 **Document queries** - Sent to Claude Desktop → Anthropic API
- 🔴 **Document snippets** - Retrieved context chunks sent as part of prompts
- 🔴 **Chat messages** - All RAG conversations processed by Claude
- 🟢 **Document files** - Never uploaded (processed locally, only embeddings stored)

**To disable MCP and stay 100% local:**
1. Remove FlockParse from Claude Desktop config
2. Use CLI (`flockparsecli.py`) or Web UI (`flock_webui.py`) instead
3. Both provide full RAG functionality without external API calls

**MCP is safe for:**
- ✅ Public documents (research papers, manuals, non-sensitive data)
- ✅ Testing and development
- ✅ Personal use where you trust Anthropic's privacy policy

**MCP is NOT recommended for:**
- ❌ Confidential business documents
- ❌ Personal identifiable information (PII)
- ❌ Regulated data (HIPAA, GDPR sensitive content)
- ❌ Air-gapped or classified environments

### **Database Security**

**SQLite limitations (ChromaDB backend):**
- ⚠️ No concurrent writes from multiple processes
- ⚠️ File permissions determine access (not true auth)
- ⚠️ No encryption at rest by default

**For production with multiple users:**
```bash
# Option 1: Separate databases per interface
CLI:     chroma_db_cli/
API:     chroma_db_api/
MCP:     chroma_db_mcp/

# Option 2: Use PostgreSQL backend (ChromaDB supports it)
# See ChromaDB docs: https://docs.trychroma.com/
```

### **VRAM Detection Method**

FlockParse detects GPU usage via Ollama's `/api/ps` endpoint:

```bash
# Check what Ollama reports
curl http://localhost:11434/api/ps

# Response shows VRAM usage:
{
  "models": [{
    "name": "mxbai-embed-large:latest",
    "size": 705530880,
    "size_vram": 705530880,  # <-- If >0, model is in GPU
    ...
  }]
}
```

**Health score calculation:**
- `size_vram > 0` → +200 points (GPU in use)
- `size_vram == 0` but GPU present → +50 points (GPU available, not used)
- CPU-only → -50 points

This is **presence-based detection**, not utilization monitoring. It detects *if* the model loaded into VRAM, not *how efficiently* it's being used.

---

## **💡 Features**

| Feature | Description |
|---------|-------------|
| **Multi-method PDF Extraction** | Uses both PyPDF2 and pdftotext for best results |
| **Format Conversion** | Converts PDFs to TXT, Markdown, DOCX, and JSON |
| **Semantic Search** | Uses vector embeddings to find relevant information |
| **Interactive Chat** | Discuss your documents with AI assistance |
| **Privacy Options** | Web UI/CLI: 100% offline; REST API: local network; MCP: Claude Desktop (cloud) |
| **Distributed Processing** | Load balancer with auto-discovery for multiple Ollama nodes |
| **Accurate VRAM Monitoring** | Real GPU memory tracking with nvidia-smi/rocm-smi + Ollama API (NEW!) |
| **GPU & VRAM Awareness** | Automatically detects GPU nodes and prevents CPU fallback |
| **Intelligent Routing** | 4 strategies (adaptive, round_robin, least_loaded, lowest_latency) with GPU priority |
| **Flexible Model Matching** | Supports model name variants (llama3.1, llama3.1:latest, llama3.1:8b, etc.) |
| **ChromaDB Vector Store** | Production-ready persistent vector database with cosine similarity |
| **Embedding Cache** | MD5-based caching prevents reprocessing same content |
| **Model Weight Caching** | Keep models in VRAM for faster repeated inference |
| **Parallel Batch Processing** | Process multiple embeddings simultaneously |
| **Database Management** | Clear cache and clear DB commands for easy maintenance (NEW!) |
| **Filename Preservation** | Maintains original document names in converted files |
| **REST API** | Web server for multi-user/application integration |
| **Document Summarization** | AI-generated summaries of uploaded documents |
| **OCR Processing** | Extract text from scanned documents using image recognition |

## **Comparing FlockParse Interfaces**

| Feature | **flock_webui.py** | flockparsecli.py | flock_ai_api.py | flock_mcp_server.py |
|---------|-------------------|----------------|-----------|---------------------|
| **Interface** | 🎨 Web Browser (Streamlit) | Command line | REST API over HTTP | Model Context Protocol |
| **Ease of Use** | ⭐⭐⭐⭐⭐ Easiest | ⭐⭐⭐⭐ Easy | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate |
| **Use case** | Interactive GUI usage | Personal CLI processing | Service integration | AI Assistant integration |
| **Document formats** | Creates TXT, MD, DOCX, JSON | Creates TXT, MD, DOCX, JSON | Stores extracted text only | Creates TXT, MD, DOCX, JSON |
| **Interaction** | Point-and-click + chat | Interactive chat mode | Query/response via API | Tool calls from AI assistants |
| **Multi-user** | Single user (local) | Single user | Multiple users/applications | Single user (via AI assistant) |
| **Storage** | Local file-based | Local file-based | ChromaDB vector database | Local file-based |
| **Load Balancing** | ✅ Yes (visual dashboard) | ✅ Yes | ❌ No | ✅ Yes |
| **Node Discovery** | ✅ Yes (one-click) | ✅ Yes | ❌ No | ✅ Yes |
| **GPU Monitoring** | ✅ Yes (real-time charts) | ✅ Yes | ❌ No | ✅ Yes |
| **Batch Operations** | ⚠️ Multiple upload | ❌ No | ❌ No | ❌ No |
| **Privacy Level** | 🟢 100% Local | 🟢 100% Local | 🟡 Local Network | 🔴 Cloud (Claude) |
| **Best for** | **🌟 General users, GUI lovers** | Direct CLI usage | Integration with apps | Claude Desktop, AI workflows |

## **📁 Project Structure**

- `/converted_files` - Stores the converted document formats (flockparsecli.py)
- `/knowledge_base` - Legacy JSON storage (backwards compatibility only)
- `/chroma_db_cli` - **ChromaDB vector database for CLI** (flockparsecli.py) - **Production storage**
- `/uploads` - Temporary storage for uploaded documents (flock_ai_api.py)
- `/chroma_db` - ChromaDB vector database (flock_ai_api.py)

## **🚀 Recent Additions**
- ✅ **GPU Auto-Optimization** - Background process ensures models use GPU automatically (NEW!)
- ✅ **Programmatic GPU Control** - Force models to GPU/CPU across distributed nodes (NEW!)
- ✅ **Accurate VRAM Monitoring** - Real GPU memory tracking across distributed nodes
- ✅ **ChromaDB Production Integration** - Professional vector database for 100x faster search
- ✅ **Clear Cache & Clear DB Commands** - Manage embeddings and database efficiently
- ✅ **Model Weight Caching** - Keep models in VRAM for 5-10x faster inference
- ✅ **Web UI** - Beautiful Streamlit interface for easy document management
- ✅ **Advanced OCR Support** - Automatic fallback to OCR for scanned documents
- ✅ **API Authentication** - Secure API key authentication for REST API endpoints
- ⬜ **Document versioning** - Track changes over time (Coming soon)

## **📚 Complete Documentation**

### Core Documentation
- **[📖 Architecture Deep Dive](docs/architecture.md)** - System design, routing algorithms, technical decisions
- **[🌐 Distributed Setup Guide](DISTRIBUTED_SETUP.md)** - ⭐ **Set up your own multi-node cluster**
- **[📊 Performance Benchmarks](BENCHMARKS.md)** - Real-world performance data and scaling tests
- **[⚠️ Known Issues & Limitations](KNOWN_ISSUES.md)** - 🔴 **READ THIS** - Honest assessment of current state
- **[🔒 Security Policy](SECURITY.md)** - Security best practices and vulnerability reporting
- **[🐛 Error Handling Guide](ERROR_HANDLING.md)** - Troubleshooting common issues
- **[🤝 Contributing Guide](CONTRIBUTING.md)** - How to contribute to the project
- **[📋 Code of Conduct](CODE_OF_CONDUCT.md)** - Community guidelines
- **[📝 Changelog](CHANGELOG.md)** - Version history

### Technical Guides
- **[⚡ Performance Optimization](PERFORMANCE_OPTIMIZATION.md)** - Tuning for maximum speed
- **[🔧 GPU Router Setup](GPU_ROUTER_SETUP.md)** - Distributed cluster configuration
- **[🤖 GPU Auto-Optimization](GPU_AUTO_OPTIMIZATION.md)** - Automatic GPU management
- **[📊 VRAM Monitoring](VRAM_MONITORING.md)** - GPU memory tracking
- **[🎯 Adaptive Parallelism](ADAPTIVE_PARALLELISM.md)** - Smart workload distribution
- **[🗄️ ChromaDB Production](CHROMADB_PRODUCTION.md)** - Vector database scaling
- **[💾 Model Caching](MODEL_CACHING.md)** - Performance through caching
- **[🖥️ Node Management](NODE_MANAGEMENT.md)** - Managing distributed nodes
- **[⚡ Quick Setup](QUICK_SETUP.md)** - Fast track to getting started

### Additional Resources
- **[🏛️ FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)** - Original distributed inference implementation
- **[📦 Docker Setup](docker-compose.yml)** - Containerized deployment
- **[⚙️ Environment Config](.env.example)** - Configuration template
- **[🧪 Tests](tests/)** - Test suite and CI/CD

## **🔗 Integration with SynapticLlamas & SOLLOL**

FlockParser is designed to work seamlessly with **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** (multi-agent orchestration) and **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** (distributed inference platform) as a unified AI ecosystem.

### **The Complete Stack**

```
┌─────────────────────────────────────────────────────────────┐
│              SynapticLlamas (v0.1.0+)                       │
│          Multi-Agent System & Orchestration                 │
│  • Research agents  • Editor agents  • Storyteller agents  │
└───────────┬────────────────────────────────────┬───────────┘
            │                                    │
            │ RAG Queries                        │ Distributed
            │ (with pre-computed embeddings)     │ Inference
            │                                    │
     ┌──────▼──────────┐              ┌─────────▼────────────┐
     │  FlockParser    │              │      SOLLOL          │
     │  API (v1.0.4+)  │              │  Load Balancer       │
     │  Port: 8000     │              │  (v0.9.31+)          │
     └─────────────────┘              └──────────────────────┘
            │                                    │
            │ ChromaDB                          │ Intelligent
            │ Vector Store                      │ GPU/CPU Routing
            │                                    │
     ┌──────▼──────────┐              ┌─────────▼────────────┐
     │  Knowledge Base │              │  Ollama Nodes        │
     │  41 Documents   │              │  (Distributed)       │
     │  6,141 Chunks   │              │  GPU + CPU           │
     └─────────────────┘              └──────────────────────┘
```

### **Why This Integration Matters**

**FlockParser** provides document RAG capabilities, **SynapticLlamas** orchestrates multi-agent workflows, and **SOLLOL** handles distributed inference with intelligent load balancing.

| Component | Role | Key Feature |
|-----------|------|-------------|
| **FlockParser** | Document RAG & Knowledge Base | ChromaDB vector store with 6,141+ chunks |
| **SynapticLlamas** | Agent Orchestration | Multi-agent workflows with RAG integration |
| **SOLLOL** | Distributed Inference | Load balanced embedding & model inference |

### **Quick Start: Complete Ecosystem**

```bash
# Install all three packages (auto-installs dependencies)
pip install synaptic-llamas  # Pulls in flockparser>=1.0.4 and sollol>=0.9.31

# Start FlockParser API (auto-starts with CLI)
flockparse

# Configure SynapticLlamas for integration
synaptic-llamas --interactive --distributed
```

### **Integration Example: Load Balanced RAG**

```python
from flockparser_adapter import FlockParserAdapter
from sollol_load_balancer import SOLLOLLoadBalancer

# Initialize SOLLOL for distributed inference
sollol = SOLLOLLoadBalancer(
    rpc_backends=["http://gpu-node-1:50052", "http://gpu-node-2:50052"]
)

# Initialize FlockParser adapter
flockparser = FlockParserAdapter("http://localhost:8000", remote_mode=True)

# Step 1: Generate embedding using SOLLOL (load balanced!)
embedding = sollol.generate_embedding(
    model="mxbai-embed-large",
    prompt="quantum entanglement"
)
# SOLLOL routes to fastest GPU automatically

# Step 2: Query FlockParser with pre-computed embedding
results = flockparser.query_remote(
    query="quantum entanglement",
    embedding=embedding,  # Skip FlockParser's embedding generation
    n_results=5
)
# FlockParser returns relevant chunks from 41 documents

# Performance gain: 2-5x faster when SOLLOL has faster nodes!
```

### **New API Endpoints (v1.0.4+)**

FlockParser v1.0.4 adds **SynapticLlamas-compatible** public endpoints:

- **`GET /health`** - Check API availability and document count
- **`GET /stats`** - Get knowledge base statistics (41 docs, 6,141 chunks)
- **`POST /query`** - Query with pre-computed embeddings (critical for load balanced RAG)

**These endpoints allow SynapticLlamas to bypass FlockParser's embedding generation and use SOLLOL's load balancer instead!**

### **Learn More**

- **[📖 Complete Integration Guide](INTEGRATION_WITH_SYNAPTICLLAMAS.md)** - Full architecture, examples, and setup
- **[SynapticLlamas Repository](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** - Multi-agent orchestration
- **[SOLLOL Repository](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - Distributed inference platform

---

## **📝 Development Process**

This project was developed iteratively using Claude and Claude Code as coding assistants. All design decisions, architecture choices, and integration strategy were directed and reviewed by me.

## **🤝 Contributing**
Contributions are welcome! Please feel free to submit a Pull Request.

## **📄 License**
This project is licensed under the MIT License - see the LICENSE file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BenevolentJoker-JohnL/FlockParser",
    "name": "flockparser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "rag, retrieval-augmented-generation, distributed-systems, document-processing, gpu-acceleration, ollama, chromadb, pdf-processing, ocr, ai, machine-learning, nlp",
    "author": "BenevolentJoker (John L.)",
    "author_email": "\"BenevolentJoker (John L.)\" <benevolentjoker@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e6/0f/a61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b/flockparser-1.0.5.tar.gz",
    "platform": null,
    "description": "# **FlockParse - Document RAG Intelligence with Distributed Processing**\n\n[![PyPI version](https://img.shields.io/pypi/v/flockparser.svg)](https://pypi.org/project/flockparser/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/flockparser.svg)](https://pypi.org/project/flockparser/)\n[![CI Status](https://img.shields.io/github/actions/workflow/status/BenevolentJoker-JohnL/FlockParser/ci.yml?branch=main&label=tests)](https://github.com/BenevolentJoker-JohnL/FlockParser/actions)\n[![codecov](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser/branch/main/graph/badge.svg)](https://codecov.io/gh/BenevolentJoker-JohnL/FlockParser)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![GitHub Stars](https://img.shields.io/github/stars/BenevolentJoker-JohnL/FlockParser?style=social)](https://github.com/BenevolentJoker-JohnL/FlockParser)\n\n> **Distributed document RAG system that turns mismatched hardware into a coordinated inference cluster.** Auto-discovers Ollama nodes, intelligently routes workloads across heterogeneous GPUs/CPUs, and achieves 60x+ speedups through adaptive load balancing. Privacy-first with local/network/cloud interfaces.\n\n**What makes this different:** Real distributed systems engineering\u2014not just API wrappers. Handles heterogeneous hardware (RTX A4000 + GTX 1050Ti + CPU laptops working together), network failures, and privacy requirements that rule out cloud APIs.\n\n---\n\n## \u26a0\ufe0f Important: Current Maturity\n\n**Status:** Beta (v1.0.0) - **Early adopters welcome, but read this first!**\n\n**What works well:**\n- \u2705 Core distributed processing across heterogeneous nodes\n- \u2705 GPU detection and VRAM-aware routing\n- \u2705 Basic PDF extraction and OCR fallback\n- \u2705 Privacy-first local processing (CLI/Web UI modes)\n\n**Known limitations:**\n- \u26a0\ufe0f **Limited battle testing** - Tested by ~2 developers, not yet proven at scale\n- \u26a0\ufe0f **Security gaps** - See [SECURITY.md](SECURITY.md) for current limitations\n- \u26a0\ufe0f **Edge cases** - Some PDF types may fail (encrypted, complex layouts)\n- \u26a0\ufe0f **Test coverage** - ~40% coverage, integration tests incomplete\n\n**Read before using:** [KNOWN_ISSUES.md](KNOWN_ISSUES.md) documents all limitations, edge cases, and roadmap honestly.\n\n**Recommended for:**\n- \ud83c\udf93 Learning distributed systems\n- \ud83d\udd2c Research and experimentation\n- \ud83c\udfe0 Personal projects with non-critical data\n- \ud83d\udee0\ufe0f Contributors who want to help mature the project\n\n**Not yet recommended for:**\n- \u274c Mission-critical production workloads\n- \u274c Regulated industries (healthcare, finance) without additional hardening\n- \u274c Large-scale deployments (>50 concurrent users)\n\n**Help us improve:** Report issues, contribute fixes, share feedback!\n\n---\n\n## **\ud83c\udfdb\ufe0f Origins & Legacy**\n\nFlockParser's distributed inference architecture originated from **[FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)**, which pioneered:\n- **Auto-discovery** of Ollama nodes across heterogeneous hardware\n- **Adaptive load balancing** with GPU/CPU awareness\n- **VRAM-aware routing** and automatic failover mechanisms\n\nThis core distributed logic from FlockParser-legacy was later extracted and generalized to become **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - a standalone distributed inference platform that now powers both FlockParser and **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)**.\n\n### **\ud83d\udcca Quick Performance Reference**\n\n| Workload | Hardware | Time | Speedup | Notes |\n|----------|----------|------|---------|-------|\n| **5 AI papers (~350 pages)** | 1\u00d7 RTX A4000 (16GB) | 21.3s | **17.5\u00d7** | [Real arXiv showcase](#-showcase-real-world-example) |\n| **12-page PDF (demo video)** | 1\u00d7 RTX A4000 (16GB) | 6.0s | **61.7\u00d7** | GPU-aware routing |\n| **100 PDFs (2000 pages)** | 3-node cluster (mixed) | 3.2 min | **13.2\u00d7** | See [BENCHMARKS.md](BENCHMARKS.md) |\n| **Embedding generation** | RTX A4000 vs i9 CPU | 8.2s vs 178s | **21.7\u00d7** | 10K chunks |\n\n**\ud83c\udfaf Try it yourself:** `pip install flockparser && python showcase/process_arxiv_papers.py`\n\n---\n\n## **\ud83d\udd12 Privacy Model**\n\n| Interface | Privacy Level | External Calls | Best For |\n|-----------|---------------|----------------|----------|\n| **CLI** (`flockparsecli.py`) | \ud83d\udfe2 **100% Local** | None | Personal use, air-gapped systems |\n| **Web UI** (`flock_webui.py`) | \ud83d\udfe2 **100% Local** | None | GUI users, visual monitoring |\n| **REST API** (`flock_ai_api.py`) | \ud83d\udfe1 **Local Network** | None | Multi-user, app integration |\n| **MCP Server** (`flock_mcp_server.py`) | \ud83d\udd34 **Cloud** | \u26a0\ufe0f Claude Desktop (Anthropic) | AI assistant integration |\n\n**\u26a0\ufe0f MCP Privacy Warning:** The MCP server integrates with Claude Desktop, which sends queries and document snippets to Anthropic's cloud API. Use CLI/Web UI for 100% offline processing.\n\n---\n\n## **Table of Contents**\n\n- [Key Features](#-key-features)\n- [\ud83d\udc65 Who Uses This?](#-who-uses-this) - **Target users & scenarios**\n- [\ud83d\udcd0 How It Works (5-Second Overview)](#-how-it-works-5-second-overview) - **Visual for non-technical evaluators**\n- [Architecture](#-architecture) | **[\ud83d\udcd6 Deep Dive: Architecture & Design Decisions](docs/architecture.md)**\n- [Quickstart](#-quickstart-3-steps)\n- [Performance & Benchmarks](#-performance)\n- [\ud83c\udf93 Showcase: Real-World Example](#-showcase-real-world-example) \u2b50 **Try it yourself**\n- [Usage Examples](#-usage)\n- [Security & Production](#-security--production-notes)\n- [\ud83d\udd17 Integration with SynapticLlamas & SOLLOL](#-integration-with-synapticllamas--sollol) - **Complete AI Ecosystem** \u2b50\n- [Troubleshooting](#-troubleshooting-guide)\n- [Contributing](#-contributing)\n\n## **\u26a1 Key Features**\n\n- **\ud83c\udf10 Intelligent Load Balancing** - Auto-discovers Ollama nodes, detects GPU vs CPU, monitors VRAM, and routes work adaptively (10x speedup on heterogeneous clusters)\n- **\ud83d\udd0c Multi-Protocol Support** - CLI (100% local), REST API (network), MCP (Claude Desktop), Web UI (Streamlit) - choose your privacy level\n- **\ud83c\udfaf Adaptive Routing** - Sequential vs parallel decisions based on cluster characteristics (prevents slow nodes from bottlenecking)\n- **\ud83d\udcca Production Observability** - Real-time health scores, performance tracking, VRAM monitoring, automatic failover\n- **\ud83d\udd12 Privacy-First Architecture** - No external API calls required (CLI mode), all processing on-premise\n- **\ud83d\udcc4 Complete Pipeline** - PDF extraction \u2192 OCR fallback \u2192 Multi-format conversion \u2192 Vector embeddings \u2192 RAG with source citations\n\n---\n\n## **\ud83d\udc65 Who Uses This?**\n\nFlockParser is designed for engineers and researchers who need **private, on-premise document intelligence** with **real distributed systems capabilities**.\n\n### **Ideal Users**\n\n| User Type | Use Case | Why FlockParser? |\n|-----------|----------|------------------|\n| **\ud83d\udd2c ML/AI Engineers** | Process research papers, build knowledge bases, experiment with RAG systems | GPU-aware routing, 21\u00d7 faster embeddings, full pipeline control |\n| **\ud83d\udcca Data Scientists** | Extract insights from large document corpora (100s-1000s of PDFs) | Distributed processing, semantic search, production observability |\n| **\ud83c\udfe2 Enterprise Engineers** | On-premise document search for regulated industries (healthcare, legal, finance) | 100% local processing, no cloud APIs, privacy-first architecture |\n| **\ud83c\udf93 Researchers** | Build custom RAG systems, experiment with distributed inference patterns | Full source access, extensible architecture, real benchmarks |\n| **\ud83d\udee0\ufe0f DevOps/Platform Engineers** | Set up document intelligence infrastructure for teams | Multi-node setup, health monitoring, automatic failover |\n| **\ud83d\udc68\u200d\ud83d\udcbb Students/Learners** | Understand distributed systems, GPU orchestration, RAG architectures | Real working example, comprehensive docs, honest limitations |\n\n### **Real-World Scenarios**\n\n\u2705 **\"I have 500 research papers and a spare GPU machine\"** \u2192 Process your corpus 20\u00d7 faster with distributed nodes\n\u2705 **\"I can't send medical records to OpenAI\"** \u2192 100% local processing (CLI/Web UI modes)\n\u2705 **\"I want to experiment with RAG without cloud costs\"** \u2192 Full pipeline, runs on your hardware\n\u2705 **\"I need to search 10,000 internal documents\"** \u2192 ChromaDB vector search with sub-20ms latency\n\u2705 **\"I have mismatched hardware (old laptop + gaming PC)\"** \u2192 Adaptive routing handles heterogeneous clusters\n\n### **Not Ideal For**\n\n\u274c **Production SaaS with 1000+ concurrent users** \u2192 Current SQLite backend limits concurrency (~50 users)\n\u274c **Mission-critical systems requiring 99.9% uptime** \u2192 Still in Beta, see [KNOWN_ISSUES.md](KNOWN_ISSUES.md)\n\u274c **Simple one-time PDF extraction** \u2192 Overkill; use `pdfplumber` directly\n\u274c **Cloud-first deployments** \u2192 Designed for on-premise/hybrid; cloud works but misses GPU routing benefits\n\n**Bottom line:** If you're building document intelligence infrastructure on your own hardware and need distributed processing with privacy guarantees, FlockParser is for you.\n\n---\n\n## **\ud83d\udcd0 How It Works (5-Second Overview)**\n\n**For recruiters and non-technical evaluators:**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                         INPUT                                    \u2502\n\u2502  \ud83d\udcc4 Your Documents (PDFs, research papers, internal docs)       \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                         \u2502\n                         \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                     FLOCKPARSER                                  \u2502\n\u2502                                                                  \u2502\n\u2502  1. Extracts text from PDFs (handles scans with OCR)           \u2502\n\u2502  2. Splits into chunks, creates vector embeddings              \u2502\n\u2502  3. Distributes work across GPU/CPU nodes (auto-discovery)     \u2502\n\u2502  4. Stores in searchable vector database (ChromaDB)            \u2502\n\u2502                                                                  \u2502\n\u2502  \u26a1 Distributed Processing: 3 nodes \u2192 13\u00d7 faster               \u2502\n\u2502  \ud83d\ude80 GPU Acceleration: RTX A4000 \u2192 61\u00d7 faster than CPU          \u2502\n\u2502  \ud83d\udd12 Privacy: 100% local (no cloud APIs)                        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                         \u2502\n                         \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                        OUTPUT                                    \u2502\n\u2502  \ud83d\udd0d Semantic Search: \"Find all mentions of transformers\"        \u2502\n\u2502  \ud83d\udcac AI Chat: \"Summarize the methodology section\"                \u2502\n\u2502  \ud83d\udcca Source Citations: Exact page/document references            \u2502\n\u2502  \ud83c\udf10 4 Interfaces: CLI, Web UI, REST API, Claude Desktop         \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Key Innovation:** Auto-detects GPU nodes, measures performance, and routes work to fastest hardware. No manual configuration needed.\n\n---\n\n## **\ud83c\udfd7\ufe0f Architecture**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502             Interfaces (Choose Your Privacy Level)           \u2502\n\u2502  CLI (Local) | REST API (Network) | MCP (Claude) | Web UI   \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                       \u2502\n                       \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                  FlockParse Core Engine                      \u2502\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510       \u2502\n\u2502  \u2502   PDF       \u2502  \u2502  Semantic    \u2502  \u2502     RAG      \u2502       \u2502\n\u2502  \u2502 Processing  \u2502\u2192 \u2502   Search     \u2502\u2192 \u2502   Engine     \u2502       \u2502\n\u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518       \u2502\n\u2502         \u2502                \u2502                    \u2502              \u2502\n\u2502         \u25bc                \u25bc                    \u25bc              \u2502\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510      \u2502\n\u2502  \u2502        ChromaDB Vector Store (Persistent)         \u2502      \u2502\n\u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518      \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                       \u2502 Intelligent Load Balancer\n                       \u2502 \u2022 Health scoring (GPU/VRAM detection)\n                       \u2502 \u2022 Adaptive routing (sequential vs parallel)\n                       \u2502 \u2022 Automatic failover & caching\n                       \u25bc\n    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n    \u2502       Distributed Ollama Cluster              \u2502\n    \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n    \u2502  \u2502 Node 1   \u2502  \u2502 Node 2   \u2502  \u2502 Node 3   \u2502   \u2502\n    \u2502  \u2502 GPU A    \u2502  \u2502 GPU B    \u2502  \u2502 CPU      \u2502   \u2502\n    \u2502  \u250216GB VRAM \u2502  \u2502 8GB VRAM \u2502  \u2502 16GB RAM \u2502   \u2502\n    \u2502  \u2502Health:367\u2502  \u2502Health:210\u2502  \u2502Health:50 \u2502   \u2502\n    \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n         \u25b2 Auto-discovery | Performance tracking\n```\n\n**Want to understand how this works?** Read the **[\ud83d\udcd6 Architecture Deep Dive](docs/architecture.md)** for detailed explanations of:\n- Why distributed AI inference solves real-world problems\n- How adaptive routing decisions are made (sequential vs parallel)\n- MCP integration details and privacy implications\n- Technical trade-offs and design decisions\n\n## **\ud83d\ude80 Quickstart (3 Steps)**\n\n**Requirements:**\n- Python 3.10 or later\n- Ollama 0.1.20+ (install from [ollama.com](https://ollama.com))\n- 4GB+ RAM (8GB+ recommended for GPU nodes)\n\n```bash\n# 1. Install FlockParser\npip install flockparser\n\n# 2. Start Ollama and pull models\nollama serve  # In a separate terminal\nollama pull mxbai-embed-large    # Required for embeddings\nollama pull llama3.1:latest       # Required for chat\n\n# 3. Run your preferred interface\nflockparse-webui                     # Web UI - easiest (recommended) \u2b50\nflockparse                           # CLI - 100% local\nflockparse-api                       # REST API - multi-user\nflockparse-mcp                       # MCP - Claude Desktop integration\n```\n\n**\ud83d\udca1 Pro tip:** Start with the Web UI to see distributed processing with real-time VRAM monitoring and node health dashboards.\n\n---\n\n### Alternative: Install from Source\n\nIf you want to contribute or modify the code:\n\n```bash\ngit clone https://github.com/BenevolentJoker-JohnL/FlockParser.git\ncd FlockParser\npip install -e .  # Editable install\n```\n\n### **Quick Test (30 seconds)**\n\n```bash\n# Start the CLI\npython flockparsecli.py\n\n# Process the sample PDF\n> open_pdf testpdfs/sample.pdf\n\n# Chat with it\n> chat\n\ud83d\ude4b You: Summarize this document\n```\n\n**First time?** Start with the Web UI (`streamlit run flock_webui.py`) - it's the easiest way to see distributed processing in action with a visual dashboard.\n\n---\n\n## **\ud83d\udc33 Docker Deployment (One Command)**\n\n### **Quick Start with Docker Compose**\n\n```bash\n# Clone and deploy everything\ngit clone https://github.com/BenevolentJoker-JohnL/FlockParser.git\ncd FlockParser\ndocker-compose up -d\n\n# Access services\n# Web UI: http://localhost:8501\n# REST API: http://localhost:8000\n# Ollama: http://localhost:11434\n```\n\n### **What Gets Deployed**\n\n| Service | Port | Description |\n|---------|------|-------------|\n| **Web UI** | 8501 | Streamlit interface with visual monitoring |\n| **REST API** | 8000 | FastAPI with authentication |\n| **CLI** | - | Interactive terminal (docker-compose run cli) |\n| **Ollama** | 11434 | Local LLM inference engine |\n\n### **Production Features**\n\n\u2705 **Multi-stage build** - Optimized image size\n\u2705 **Non-root user** - Security hardened\n\u2705 **Health checks** - Auto-restart on failure\n\u2705 **Volume persistence** - Data survives restarts\n\u2705 **GPU support** - Uncomment deploy section for NVIDIA GPUs\n\n### **Custom Configuration**\n\n```bash\n# Set API key\nexport FLOCKPARSE_API_KEY=\"your-secret-key\"\n\n# Set log level\nexport LOG_LEVEL=\"DEBUG\"\n\n# Deploy with custom config\ndocker-compose up -d\n```\n\n### **GPU Support (NVIDIA)**\n\nUncomment the GPU section in `docker-compose.yml`:\n\n```yaml\ndeploy:\n  resources:\n    reservations:\n      devices:\n        - driver: nvidia\n          count: all\n          capabilities: [gpu]\n```\n\nThen run: `docker-compose up -d`\n\n### **CI/CD Pipeline**\n\n```mermaid\ngraph LR\n    A[\ud83d\udcdd Git Push] --> B[\ud83d\udd0d Lint & Format]\n    B --> C[\ud83e\uddea Test Suite]\n    B --> D[\ud83d\udd12 Security Scan]\n    C --> E[\ud83d\udc33 Build Multi-Arch]\n    D --> E\n    E --> F[\ud83d\udce6 Push to GHCR]\n    F --> G[\ud83d\ude80 Deploy]\n\n    style A fill:#4CAF50\n    style B fill:#2196F3\n    style C fill:#2196F3\n    style D fill:#FF9800\n    style E fill:#9C27B0\n    style F fill:#9C27B0\n    style G fill:#F44336\n```\n\n**Automated on every push to `main`:**\n\n| Stage | Tools | Purpose |\n|-------|-------|---------|\n| **Code Quality** | black, flake8, mypy | Enforce formatting & typing standards |\n| **Testing** | pytest (Python 3.10/3.11/3.12) | 78% coverage across versions |\n| **Security** | Trivy | Vulnerability scanning & SARIF reports |\n| **Build** | Docker Buildx | Multi-architecture (amd64, arm64) |\n| **Registry** | GitHub Container Registry | Versioned image storage |\n| **Deploy** | On release events | Automated production deployment |\n\n**Pull the latest image:**\n```bash\ndocker pull ghcr.io/benevolentjoker-johnl/flockparser:latest\n```\n\n**View pipeline runs:** https://github.com/BenevolentJoker-JohnL/FlockParser/actions\n\n---\n\n## **\ud83c\udf10 Setting Up Distributed Nodes**\n\n**Want the 60x speedup?** Set up multiple Ollama nodes across your network.\n\n### Quick Multi-Node Setup\n\n**On each additional machine:**\n\n```bash\n# 1. Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# 2. Configure for network access\nexport OLLAMA_HOST=0.0.0.0:11434\nollama serve\n\n# 3. Pull models\nollama pull mxbai-embed-large\nollama pull llama3.1:latest\n\n# 4. Allow firewall (if needed)\nsudo ufw allow 11434/tcp  # Linux\n```\n\n**FlockParser will automatically discover these nodes!**\n\nCheck with:\n```bash\npython flockparsecli.py\n> lb_stats  # Shows all discovered nodes and their capabilities\n```\n\n**\ud83d\udcd6 Complete Guide:** See **[DISTRIBUTED_SETUP.md](DISTRIBUTED_SETUP.md)** for:\n- Step-by-step multi-machine setup\n- Network configuration and firewall rules\n- Troubleshooting node discovery\n- Example setups (budget home lab to professional clusters)\n- GPU router configuration for automatic optimization\n\n---\n\n### **\ud83d\udd12 Privacy Levels by Interface:**\n- **Web UI (`flock_webui.py`)**: \ud83d\udfe2 100% local, runs in your browser\n- **CLI (`flockparsecli.py`)**: \ud83d\udfe2 100% local, zero external calls\n- **REST API (`flock_ai_api.py`)**: \ud83d\udfe1 Local network only\n- **MCP Server (`flock_mcp_server.py`)**: \ud83d\udd34 Integrates with Claude Desktop (Anthropic cloud service)\n\n**Choose the interface that matches your privacy requirements!**\n\n## **\ud83c\udfc6 Why FlockParse? Comparison to Competitors**\n\n| Feature | **FlockParse** | LangChain | LlamaIndex | Haystack |\n|---------|---------------|-----------|------------|----------|\n| **100% Local/Offline** | \u2705 Yes (CLI/JSON) | \u26a0\ufe0f Partial | \u26a0\ufe0f Partial | \u26a0\ufe0f Partial |\n| **Zero External API Calls** | \u2705 Yes (CLI/JSON) | \u274c No | \u274c No | \u274c No |\n| **Built-in GPU Load Balancing** | \u2705 Yes (auto) | \u274c No | \u274c No | \u274c No |\n| **VRAM Monitoring** | \u2705 Yes (dynamic) | \u274c No | \u274c No | \u274c No |\n| **Multi-Node Auto-Discovery** | \u2705 Yes | \u274c No | \u274c No | \u274c No |\n| **CPU Fallback Detection** | \u2705 Yes | \u274c No | \u274c No | \u274c No |\n| **Document Format Export** | \u2705 4 formats | \u274c Limited | \u274c Limited | \u26a0\ufe0f Basic |\n| **Setup Complexity** | \ud83d\udfe2 Simple | \ud83d\udd34 Complex | \ud83d\udd34 Complex | \ud83d\udfe1 Medium |\n| **Dependencies** | \ud83d\udfe2 Minimal | \ud83d\udd34 Heavy | \ud83d\udd34 Heavy | \ud83d\udfe1 Medium |\n| **Learning Curve** | \ud83d\udfe2 Low | \ud83d\udd34 Steep | \ud83d\udd34 Steep | \ud83d\udfe1 Medium |\n| **Privacy Control** | \ud83d\udfe2 High (CLI/JSON) | \ud83d\udd34 Limited | \ud83d\udd34 Limited | \ud83d\udfe1 Medium |\n| **Out-of-Box Functionality** | \u2705 Complete | \u26a0\ufe0f Requires config | \u26a0\ufe0f Requires config | \u26a0\ufe0f Requires config |\n| **MCP Integration** | \u2705 Native | \u274c No | \u274c No | \u274c No |\n| **Embedding Cache** | \u2705 MD5-based | \u26a0\ufe0f Basic | \u26a0\ufe0f Basic | \u26a0\ufe0f Basic |\n| **Batch Processing** | \u2705 Parallel | \u26a0\ufe0f Sequential | \u26a0\ufe0f Sequential | \u26a0\ufe0f Basic |\n| **Performance** | \ud83d\ude80 60x+ faster with GPU auto-routing | \u26a0\ufe0f Varies by config | \u26a0\ufe0f Varies by config | \u26a0\ufe0f Varies by config |\n| **Cost** | \ud83d\udcb0 Free | \ud83d\udcb0\ud83d\udcb0 Free + Paid | \ud83d\udcb0\ud83d\udcb0 Free + Paid | \ud83d\udcb0\ud83d\udcb0 Free + Paid |\n\n### **Key Differentiators:**\n\n1. **Privacy by Design**: CLI and JSON interfaces are 100% local with zero external calls (MCP interface uses Claude Desktop for chat)\n2. **Intelligent GPU Management**: Automatically finds, tests, and prioritizes GPU nodes\n3. **Production-Ready**: Works immediately with sensible defaults\n4. **Resource-Aware**: Detects VRAM exhaustion and prevents performance degradation\n5. **Complete Solution**: CLI, REST API, MCP, and batch interfaces - choose your privacy level\n\n## **\ud83d\udcca Performance**\n\n### **Real-World Benchmark Results**\n\n| Processing Mode | Time | Speedup | What It Shows |\n|----------------|------|---------|---------------|\n| Single CPU node | 372.76s (~6 min) | 1x baseline | Sequential CPU processing |\n| Parallel (multi-node) | 159.79s (~2.5 min) | **2.3x faster** | Distributed across cluster |\n| GPU node routing | 6.04s (~6 sec) | **61.7x faster** | Automatic GPU detection & routing |\n\n**Why the Massive Speedup?**\n- GPU processes embeddings in milliseconds vs seconds on CPU\n- Adaptive routing detected GPU was 60x+ faster and sent all work there\n- Avoided bottleneck of waiting for slower CPU nodes to finish\n- No network overhead (local cluster, no cloud APIs)\n\n**Key Insight:** The system **automatically** detects performance differences and makes routing decisions - no manual GPU configuration needed.\n\n**Hardware (Benchmark Cluster):**\n- **Node 1 (10.9.66.90):** Intel i9-12900K, 32GB DDR5-6000, 6TB NVMe Gen4, RTX A4000 16GB - primary GPU node\n- **Node 2 (10.9.66.159):** AMD Ryzen 7 5700X, 32GB DDR4-3600, GTX 1050Ti (CPU-mode fallback)\n- **Node 3:** Intel i7-12th gen (laptop), 16GB DDR5, CPU-only\n- **Software:** Python 3.10, Ollama, Ubuntu 22.04\n\n**Reproducibility:**\n- Full source code available in this repo\n- Test with your own hardware - results will vary based on GPU\n\nThe project offers four main interfaces:\n1. **flock_webui.py** - \ud83c\udfa8 Beautiful Streamlit web interface (NEW!)\n2. **flockparsecli.py** - Command-line interface for personal document processing\n3. **flock_ai_api.py** - REST API server for multi-user or application integration\n4. **flock_mcp_server.py** - Model Context Protocol server for AI assistants like Claude Desktop\n\n---\n\n## **\ud83c\udf93 Showcase: Real-World Example**\n\n**Processing influential AI research papers from arXiv.org**\n\nWant to see FlockParser in action on real documents? Run the included showcase:\n\n```bash\npip install flockparser\npython showcase/process_arxiv_papers.py\n```\n\n### **What It Does**\n\nDownloads and processes 5 seminal AI research papers:\n- **Attention Is All You Need** (Transformers) - arXiv:1706.03762\n- **BERT** - Pre-training Deep Bidirectional Transformers - arXiv:1810.04805\n- **RAG** - Retrieval-Augmented Generation for NLP - arXiv:2005.11401\n- **GPT-3** - Language Models are Few-Shot Learners - arXiv:2005.14165\n- **Llama 2** - Open Foundation Language Models - arXiv:2307.09288\n\n**Total: ~350 pages, ~25 MB of PDFs**\n\n### **Expected Results**\n\n| Configuration | Processing Time | Speedup |\n|---------------|----------------|---------|\n| **Single CPU node** | ~90s | 1.0\u00d7 baseline |\n| **Multi-node (1 GPU + 2 CPU)** | ~30s | 3.0\u00d7 |\n| **Single GPU node (RTX A4000)** | ~21s | **4.3\u00d7** |\n\n### **What You Get**\n\nAfter processing, the script demonstrates:\n\n1. **Semantic Search** across all papers:\n   ```python\n   # Example queries that work immediately:\n   \"What is the transformer architecture?\"\n   \"How does retrieval-augmented generation work?\"\n   \"What are the benefits of attention mechanisms?\"\n   ```\n\n2. **Performance Metrics** (`showcase/results.json`):\n   ```json\n   {\n     \"total_time\": 21.3,\n     \"papers\": [\n       {\n         \"title\": \"Attention Is All You Need\",\n         \"processing_time\": 4.2,\n         \"status\": \"success\"\n       }\n     ],\n     \"node_info\": [...]\n   }\n   ```\n\n3. **Human-Readable Summary** (`showcase/RESULTS.md`) with:\n   - Per-paper processing times\n   - Hardware configuration used\n   - Fastest/slowest/average performance\n   - Replication instructions\n\n### **Why This Matters**\n\nThis isn't a toy demo - it's processing actual research papers that engineers read daily. It demonstrates:\n\n\u2705 **Real document processing** - Complex PDFs with equations, figures, multi-column layouts\n\u2705 **Production-grade pipeline** - PDF extraction \u2192 embeddings \u2192 vector storage \u2192 semantic search\n\u2705 **Actual performance gains** - Measurable speedups on heterogeneous hardware\n\u2705 **Reproducible results** - Run it yourself with `pip install`, compare your hardware\n\n**Perfect for portfolio demonstrations:** Show this to hiring managers as proof of real distributed systems work.\n\n---\n\n## **\ud83d\udd27 Installation**  \n\n### **1. Clone the Repository**  \n```bash\ngit clone https://github.com/yourusername/flockparse.git\ncd flockparse\n```\n\n### **2. Install System Dependencies (Required for OCR)**\n\n**\u26a0\ufe0f IMPORTANT: Install these BEFORE pip install, as pytesseract and pdf2image require system packages**\n\n#### For Better PDF Text Extraction:\n- **Linux**:\n  ```bash\n  sudo apt-get update\n  sudo apt-get install poppler-utils\n  ```\n- **macOS**:\n  ```bash\n  brew install poppler\n  ```\n- **Windows**: Download from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n\n#### For OCR Support (Scanned Documents):\nFlockParse automatically detects scanned PDFs and uses OCR!\n\n- **Linux (Ubuntu/Debian)**:\n  ```bash\n  sudo apt-get update\n  sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils\n  ```\n- **Linux (Fedora/RHEL)**:\n  ```bash\n  sudo dnf install tesseract poppler-utils\n  ```\n- **macOS**:\n  ```bash\n  brew install tesseract poppler\n  ```\n- **Windows**:\n  1. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) - Download the installer\n  2. Install [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n  3. Add both to your system PATH\n\n**Verify installation:**\n```bash\ntesseract --version\npdftotext -v\n```\n\n### **3. Install Python Dependencies**\n```bash\npip install -r requirements.txt\n```\n\n**Key Python dependencies** (installed automatically):\n- fastapi, uvicorn - Web server\n- pdfplumber, PyPDF2, pypdf - PDF processing\n- **pytesseract** - Python wrapper for Tesseract OCR (requires system Tesseract)\n- **pdf2image** - PDF to image conversion (requires system Poppler)\n- Pillow - Image processing for OCR\n- chromadb - Vector database\n- python-docx - DOCX generation\n- ollama - AI model integration\n- numpy - Numerical operations\n- markdown - Markdown generation\n\n**How OCR fallback works:**\n1. Tries PyPDF2 text extraction\n2. Falls back to pdftotext if no text\n3. **Falls back to OCR** if still no text (<100 chars) - **Requires Tesseract + Poppler**\n4. Automatically processes scanned documents without manual intervention\n\n### **4. Install and Configure Ollama**  \n\n1. Install Ollama from [ollama.com](https://ollama.com)\n2. Start the Ollama service:\n   ```bash\n   ollama serve\n   ```\n3. Pull the required models:\n   ```bash\n   ollama pull mxbai-embed-large\n   ollama pull llama3.1:latest\n   ```\n\n## **\ud83d\udcdc Usage**\n\n### **\ud83c\udfa8 Web UI (flock_webui.py) - Easiest Way to Get Started!**\n\nLaunch the beautiful Streamlit web interface:\n```bash\nstreamlit run flock_webui.py\n```\n\nThe web UI will open in your browser at `http://localhost:8501`\n\n**Features:**\n- \ud83d\udce4 **Upload & Process**: Drag-and-drop PDF files for processing\n- \ud83d\udcac **Chat Interface**: Interactive chat with your documents\n- \ud83d\udcca **Load Balancer Dashboard**: Real-time monitoring of GPU nodes\n- \ud83d\udd0d **Semantic Search**: Search across all documents\n- \ud83c\udf10 **Node Management**: Add/remove Ollama nodes, auto-discovery\n- \ud83c\udfaf **Routing Control**: Switch between routing strategies\n\n**Perfect for:**\n- Users who prefer graphical interfaces\n- Quick document processing and exploration\n- Monitoring distributed processing\n- Managing multiple Ollama nodes visually\n\n---\n\n### **CLI Interface (flockparsecli.py)**\n\nRun the script:\n```bash\npython flockparsecli.py\n```\n\nAvailable commands:\n```\n\ud83d\udcd6 open_pdf <file>   \u2192 Process a single PDF file\n\ud83d\udcc2 open_dir <dir>    \u2192 Process all PDFs in a directory\n\ud83d\udcac chat              \u2192 Chat with processed PDFs\n\ud83d\udcca list_docs         \u2192 List all processed documents\n\ud83d\udd0d check_deps        \u2192 Check for required dependencies\n\ud83c\udf10 discover_nodes    \u2192 Auto-discover Ollama nodes on local network\n\u2795 add_node <url>    \u2192 Manually add an Ollama node\n\u2796 remove_node <url> \u2192 Remove an Ollama node from the pool\n\ud83d\udccb list_nodes        \u2192 List all configured Ollama nodes\n\u2696\ufe0f  lb_stats          \u2192 Show load balancer statistics\n\u274c exit              \u2192 Quit the program\n```\n\n### **Web Server API (flock_ai_api.py)**\n\nStart the API server:\n```bash\n# Set your API key (or use default for testing)\nexport FLOCKPARSE_API_KEY=\"your-secret-key-here\"\n\n# Start server\npython flock_ai_api.py\n```\n\nThe server will run on `http://0.0.0.0:8000` by default.\n\n#### **\ud83d\udd12 Authentication (NEW!)**\n\nAll endpoints except `/` require an API key in the `X-API-Key` header:\n\n```bash\n# Default API key (change in production!)\nX-API-Key: your-secret-api-key-change-this\n\n# Or set via environment variable\nexport FLOCKPARSE_API_KEY=\"my-super-secret-key\"\n```\n\n#### **Available Endpoints:**\n\n| Endpoint | Method | Auth Required | Description |\n|----------|--------|---------------|-------------|\n| `/` | GET | \u274c No | API status and version info |\n| `/upload/` | POST | \u2705 Yes | Upload and process a PDF file |\n| `/summarize/{file_name}` | GET | \u2705 Yes | Get an AI-generated summary |\n| `/search/?query=...` | GET | \u2705 Yes | Search for relevant documents |\n\n#### **Example API Usage:**\n\n**Check API status (no auth required):**\n```bash\ncurl http://localhost:8000/\n```\n\n**Upload a document (with authentication):**\n```bash\ncurl -X POST \\\n  -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  -F \"file=@your_document.pdf\" \\\n  http://localhost:8000/upload/\n```\n\n**Get a document summary:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  http://localhost:8000/summarize/your_document.pdf\n```\n\n**Search across documents:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  \"http://localhost:8000/search/?query=your%20search%20query\"\n```\n\n**\u26a0\ufe0f Production Security:**\n- Always change the default API key\n- Use environment variables, never hardcode keys\n- Use HTTPS in production (nginx/apache reverse proxy)\n- Consider rate limiting for public deployments\n\n### **MCP Server (flock_mcp_server.py)**\n\nThe MCP server allows FlockParse to be used as a tool by AI assistants like Claude Desktop.\n\n#### **Setting up with Claude Desktop**\n\n1. **Start the MCP server:**\n   ```bash\n   python flock_mcp_server.py\n   ```\n\n2. **Configure Claude Desktop:**\n   Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, or `%APPDATA%\\Claude\\claude_desktop_config.json` on Windows):\n\n   ```json\n   {\n     \"mcpServers\": {\n       \"flockparse\": {\n         \"command\": \"python\",\n         \"args\": [\"/absolute/path/to/FlockParser/flock_mcp_server.py\"]\n       }\n     }\n   }\n   ```\n\n3. **Restart Claude Desktop** and you'll see FlockParse tools available!\n\n#### **Available MCP Tools:**\n\n- `process_pdf` - Process and add PDFs to the knowledge base\n- `query_documents` - Search documents using semantic search\n- `chat_with_documents` - Ask questions about your documents\n- `list_documents` - List all processed documents\n- `get_load_balancer_stats` - View node performance metrics\n- `discover_ollama_nodes` - Auto-discover Ollama nodes\n- `add_ollama_node` - Add an Ollama node manually\n- `remove_ollama_node` - Remove an Ollama node\n\n#### **Example MCP Usage:**\n\nIn Claude Desktop, you can now ask:\n- \"Process the PDF at /path/to/document.pdf\"\n- \"What documents do I have in my knowledge base?\"\n- \"Search my documents for information about quantum computing\"\n- \"What does my research say about black holes?\"\n\n## **\ud83d\udca1 Practical Use Cases**\n\n### **Knowledge Management**\n- Create searchable archives of research papers, legal documents, and technical manuals\n- Generate summaries of lengthy documents for quick review\n- Chat with your document collection to find specific information without manual searching\n\n### **Legal & Compliance**\n- Process contract repositories for semantic search capabilities\n- Extract key terms and clauses from legal documents\n- Analyze regulatory documents for compliance requirements\n\n### **Research & Academia**\n- Process and convert academic papers for easier reference\n- Create a personal research assistant that can reference your document library\n- Generate summaries of complex research for presentations or reviews\n\n### **Business Intelligence**\n- Convert business reports into searchable formats\n- Extract insights from PDF-based market research\n- Make proprietary documents more accessible throughout an organization\n\n## **\ud83c\udf10 Distributed Processing with Load Balancer**\n\nFlockParse includes a sophisticated load balancer that can distribute embedding generation across multiple Ollama instances on your local network.\n\n### **Setting Up Distributed Processing**\n\n#### **Option 1: Auto-Discovery (Easiest)**\n```bash\n# Start FlockParse\npython flockparsecli.py\n\n# Auto-discover Ollama nodes on your network\n\u26a1 Enter command: discover_nodes\n```\n\nThe system will automatically scan your local network (/24 subnet) and detect any running Ollama instances.\n\n#### **Option 2: Manual Node Management**\n```bash\n# Add a specific node\n\u26a1 Enter command: add_node http://192.168.1.100:11434\n\n# List all configured nodes\n\u26a1 Enter command: list_nodes\n\n# Remove a node\n\u26a1 Enter command: remove_node http://192.168.1.100:11434\n\n# View load balancer statistics\n\u26a1 Enter command: lb_stats\n```\n\n### **Benefits of Distributed Processing**\n\n- **Speed**: Process documents 2-10x faster with multiple nodes\n- **GPU Awareness**: Automatically detects and prioritizes GPU nodes over CPU nodes\n- **VRAM Monitoring**: Detects when GPU nodes fall back to CPU due to insufficient VRAM\n- **Fault Tolerance**: Automatic failover if a node becomes unavailable\n- **Load Distribution**: Smart routing based on node performance, GPU availability, and VRAM capacity\n- **Easy Scaling**: Just add more machines with Ollama installed\n\n### **Setting Up Additional Ollama Nodes**\n\nOn each additional machine:\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Pull the embedding model\nollama pull mxbai-embed-large\n\n# Start Ollama (accessible from network)\nOLLAMA_HOST=0.0.0.0:11434 ollama serve\n```\n\nThen use `discover_nodes` or `add_node` to add them to FlockParse.\n\n### **GPU and VRAM Optimization**\n\nFlockParse automatically detects GPU availability and VRAM usage using Ollama's `/api/ps` endpoint:\n\n- **\ud83d\ude80 GPU nodes** with models loaded in VRAM get +200 health score bonus\n- **\u26a0\ufe0f VRAM-limited nodes** that fall back to CPU get only +50 bonus\n- **\ud83d\udc22 CPU-only nodes** get -50 penalty\n\n**To ensure your GPU is being used:**\n\n1. **Check GPU detection**: Run `lb_stats` command to see node status\n2. **Preload model into GPU**: Run a small inference to load model into VRAM\n   ```bash\n   ollama run mxbai-embed-large \"test\"\n   ```\n3. **Verify VRAM usage**: Check that `size_vram > 0` in `/api/ps`:\n   ```bash\n   curl http://localhost:11434/api/ps\n   ```\n4. **Increase VRAM allocation**: If model won't load into VRAM, free up GPU memory or use a smaller model\n\n**Dynamic VRAM monitoring**: FlockParse continuously monitors embedding performance and automatically detects when a GPU node falls back to CPU due to VRAM exhaustion during heavy load.\n\n## **\ud83d\udd04 Example Workflows**\n\n### **CLI Workflow: Research Paper Processing**\n\n1. **Check Dependencies**:\n   ```\n   \u26a1 Enter command: check_deps\n   ```\n\n2. **Process a Directory of Research Papers**:\n   ```\n   \u26a1 Enter command: open_dir ~/research_papers\n   ```\n\n3. **Chat with Your Research Collection**:\n   ```\n   \u26a1 Enter command: chat\n   \ud83d\ude4b You: What are the key methods used in the Smith 2023 paper?\n   ```\n\n### **API Workflow: Document Processing Service**\n\n1. **Start the API Server**:\n   ```bash\n   python flock_ai_api.py\n   ```\n\n2. **Upload Documents via API**:\n   ```bash\n   curl -X POST -F \"file=@quarterly_report.pdf\" http://localhost:8000/upload/\n   ```\n\n3. **Generate a Summary**:\n   ```bash\n   curl http://localhost:8000/summarize/quarterly_report.pdf\n   ```\n\n4. **Search Across Documents**:\n   ```bash\n   curl http://localhost:8000/search/?query=revenue%20growth%20Q3\n   ```\n\n## **\ud83d\udd27 Troubleshooting Guide**\n\n### **Ollama Connection Issues**\n\n**Problem**: Error messages about Ollama not being available or connection failures.\n\n**Solution**:\n1. Verify Ollama is running: `ps aux | grep ollama`\n2. Restart the Ollama service: \n   ```bash\n   killall ollama\n   ollama serve\n   ```\n3. Check that you've pulled the required models:\n   ```bash\n   ollama list\n   ```\n4. If models are missing:\n   ```bash\n   ollama pull mxbai-embed-large\n   ollama pull llama3.1:latest\n   ```\n\n### **PDF Text Extraction Failures**\n\n**Problem**: No text extracted from certain PDFs.\n\n**Solution**:\n1. Check if the PDF is scanned/image-based:\n   - Install OCR tools: `sudo apt-get install tesseract-ocr` (Linux)\n   - For better scanned PDF handling: `pip install ocrmypdf`\n   - Process with OCR: `ocrmypdf input.pdf output.pdf`\n\n2. If the PDF has unusual fonts or formatting:\n   - Install poppler-utils for better extraction\n   - Try using the `-layout` option with pdftotext manually:\n     ```bash\n     pdftotext -layout problem_document.pdf output.txt\n     ```\n\n### **Memory Issues with Large Documents**\n\n**Problem**: Application crashes with large PDFs or many documents.\n\n**Solution**:\n1. Process one document at a time for very large PDFs\n2. Reduce the chunk size in the code (default is 512 characters)\n3. Increase your system's available memory or use a swap file\n4. For server deployments, consider using a machine with more RAM\n\n### **API Server Not Starting**\n\n**Problem**: Error when trying to start the API server.\n\n**Solution**:\n1. Check for port conflicts: `lsof -i :8000`\n2. If another process is using port 8000, kill it or change the port\n3. Verify FastAPI is installed: `pip install fastapi uvicorn`\n4. Check for Python version compatibility (requires Python 3.7+)\n\n---\n\n## **\ud83d\udd10 Security & Production Notes**\n\n### **REST API Security**\n\n**\u26a0\ufe0f The default API key is NOT secure - change it immediately!**\n\n```bash\n# Set a strong API key via environment variable\nexport FLOCKPARSE_API_KEY=\"your-super-secret-key-change-this-now\"\n\n# Or generate a random one\nexport FLOCKPARSE_API_KEY=$(openssl rand -hex 32)\n\n# Start the API server\npython flock_ai_api.py\n```\n\n**Production Checklist:**\n- \u2705 **Change default API key** - Never use `your-secret-api-key-change-this`\n- \u2705 **Use environment variables** - Never hardcode secrets in code\n- \u2705 **Enable HTTPS** - Use nginx or Apache as reverse proxy with SSL/TLS\n- \u2705 **Add rate limiting** - Use nginx `limit_req` or FastAPI middleware\n- \u2705 **Network isolation** - Don't expose API to public internet unless necessary\n- \u2705 **Monitor logs** - Watch for authentication failures and abuse\n\n**Example nginx config with TLS:**\n```nginx\nserver {\n    listen 443 ssl;\n    server_name your-domain.com;\n\n    ssl_certificate /path/to/cert.pem;\n    ssl_certificate_key /path/to/key.pem;\n\n    location / {\n        proxy_pass http://127.0.0.1:8000;\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n    }\n}\n```\n\n### **MCP Privacy & Security**\n\n**What data leaves your machine:**\n- \ud83d\udd34 **Document queries** - Sent to Claude Desktop \u2192 Anthropic API\n- \ud83d\udd34 **Document snippets** - Retrieved context chunks sent as part of prompts\n- \ud83d\udd34 **Chat messages** - All RAG conversations processed by Claude\n- \ud83d\udfe2 **Document files** - Never uploaded (processed locally, only embeddings stored)\n\n**To disable MCP and stay 100% local:**\n1. Remove FlockParse from Claude Desktop config\n2. Use CLI (`flockparsecli.py`) or Web UI (`flock_webui.py`) instead\n3. Both provide full RAG functionality without external API calls\n\n**MCP is safe for:**\n- \u2705 Public documents (research papers, manuals, non-sensitive data)\n- \u2705 Testing and development\n- \u2705 Personal use where you trust Anthropic's privacy policy\n\n**MCP is NOT recommended for:**\n- \u274c Confidential business documents\n- \u274c Personal identifiable information (PII)\n- \u274c Regulated data (HIPAA, GDPR sensitive content)\n- \u274c Air-gapped or classified environments\n\n### **Database Security**\n\n**SQLite limitations (ChromaDB backend):**\n- \u26a0\ufe0f No concurrent writes from multiple processes\n- \u26a0\ufe0f File permissions determine access (not true auth)\n- \u26a0\ufe0f No encryption at rest by default\n\n**For production with multiple users:**\n```bash\n# Option 1: Separate databases per interface\nCLI:     chroma_db_cli/\nAPI:     chroma_db_api/\nMCP:     chroma_db_mcp/\n\n# Option 2: Use PostgreSQL backend (ChromaDB supports it)\n# See ChromaDB docs: https://docs.trychroma.com/\n```\n\n### **VRAM Detection Method**\n\nFlockParse detects GPU usage via Ollama's `/api/ps` endpoint:\n\n```bash\n# Check what Ollama reports\ncurl http://localhost:11434/api/ps\n\n# Response shows VRAM usage:\n{\n  \"models\": [{\n    \"name\": \"mxbai-embed-large:latest\",\n    \"size\": 705530880,\n    \"size_vram\": 705530880,  # <-- If >0, model is in GPU\n    ...\n  }]\n}\n```\n\n**Health score calculation:**\n- `size_vram > 0` \u2192 +200 points (GPU in use)\n- `size_vram == 0` but GPU present \u2192 +50 points (GPU available, not used)\n- CPU-only \u2192 -50 points\n\nThis is **presence-based detection**, not utilization monitoring. It detects *if* the model loaded into VRAM, not *how efficiently* it's being used.\n\n---\n\n## **\ud83d\udca1 Features**\n\n| Feature | Description |\n|---------|-------------|\n| **Multi-method PDF Extraction** | Uses both PyPDF2 and pdftotext for best results |\n| **Format Conversion** | Converts PDFs to TXT, Markdown, DOCX, and JSON |\n| **Semantic Search** | Uses vector embeddings to find relevant information |\n| **Interactive Chat** | Discuss your documents with AI assistance |\n| **Privacy Options** | Web UI/CLI: 100% offline; REST API: local network; MCP: Claude Desktop (cloud) |\n| **Distributed Processing** | Load balancer with auto-discovery for multiple Ollama nodes |\n| **Accurate VRAM Monitoring** | Real GPU memory tracking with nvidia-smi/rocm-smi + Ollama API (NEW!) |\n| **GPU & VRAM Awareness** | Automatically detects GPU nodes and prevents CPU fallback |\n| **Intelligent Routing** | 4 strategies (adaptive, round_robin, least_loaded, lowest_latency) with GPU priority |\n| **Flexible Model Matching** | Supports model name variants (llama3.1, llama3.1:latest, llama3.1:8b, etc.) |\n| **ChromaDB Vector Store** | Production-ready persistent vector database with cosine similarity |\n| **Embedding Cache** | MD5-based caching prevents reprocessing same content |\n| **Model Weight Caching** | Keep models in VRAM for faster repeated inference |\n| **Parallel Batch Processing** | Process multiple embeddings simultaneously |\n| **Database Management** | Clear cache and clear DB commands for easy maintenance (NEW!) |\n| **Filename Preservation** | Maintains original document names in converted files |\n| **REST API** | Web server for multi-user/application integration |\n| **Document Summarization** | AI-generated summaries of uploaded documents |\n| **OCR Processing** | Extract text from scanned documents using image recognition |\n\n## **Comparing FlockParse Interfaces**\n\n| Feature | **flock_webui.py** | flockparsecli.py | flock_ai_api.py | flock_mcp_server.py |\n|---------|-------------------|----------------|-----------|---------------------|\n| **Interface** | \ud83c\udfa8 Web Browser (Streamlit) | Command line | REST API over HTTP | Model Context Protocol |\n| **Ease of Use** | \u2b50\u2b50\u2b50\u2b50\u2b50 Easiest | \u2b50\u2b50\u2b50\u2b50 Easy | \u2b50\u2b50\u2b50 Moderate | \u2b50\u2b50\u2b50 Moderate |\n| **Use case** | Interactive GUI usage | Personal CLI processing | Service integration | AI Assistant integration |\n| **Document formats** | Creates TXT, MD, DOCX, JSON | Creates TXT, MD, DOCX, JSON | Stores extracted text only | Creates TXT, MD, DOCX, JSON |\n| **Interaction** | Point-and-click + chat | Interactive chat mode | Query/response via API | Tool calls from AI assistants |\n| **Multi-user** | Single user (local) | Single user | Multiple users/applications | Single user (via AI assistant) |\n| **Storage** | Local file-based | Local file-based | ChromaDB vector database | Local file-based |\n| **Load Balancing** | \u2705 Yes (visual dashboard) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **Node Discovery** | \u2705 Yes (one-click) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **GPU Monitoring** | \u2705 Yes (real-time charts) | \u2705 Yes | \u274c No | \u2705 Yes |\n| **Batch Operations** | \u26a0\ufe0f Multiple upload | \u274c No | \u274c No | \u274c No |\n| **Privacy Level** | \ud83d\udfe2 100% Local | \ud83d\udfe2 100% Local | \ud83d\udfe1 Local Network | \ud83d\udd34 Cloud (Claude) |\n| **Best for** | **\ud83c\udf1f General users, GUI lovers** | Direct CLI usage | Integration with apps | Claude Desktop, AI workflows |\n\n## **\ud83d\udcc1 Project Structure**\n\n- `/converted_files` - Stores the converted document formats (flockparsecli.py)\n- `/knowledge_base` - Legacy JSON storage (backwards compatibility only)\n- `/chroma_db_cli` - **ChromaDB vector database for CLI** (flockparsecli.py) - **Production storage**\n- `/uploads` - Temporary storage for uploaded documents (flock_ai_api.py)\n- `/chroma_db` - ChromaDB vector database (flock_ai_api.py)\n\n## **\ud83d\ude80 Recent Additions**\n- \u2705 **GPU Auto-Optimization** - Background process ensures models use GPU automatically (NEW!)\n- \u2705 **Programmatic GPU Control** - Force models to GPU/CPU across distributed nodes (NEW!)\n- \u2705 **Accurate VRAM Monitoring** - Real GPU memory tracking across distributed nodes\n- \u2705 **ChromaDB Production Integration** - Professional vector database for 100x faster search\n- \u2705 **Clear Cache & Clear DB Commands** - Manage embeddings and database efficiently\n- \u2705 **Model Weight Caching** - Keep models in VRAM for 5-10x faster inference\n- \u2705 **Web UI** - Beautiful Streamlit interface for easy document management\n- \u2705 **Advanced OCR Support** - Automatic fallback to OCR for scanned documents\n- \u2705 **API Authentication** - Secure API key authentication for REST API endpoints\n- \u2b1c **Document versioning** - Track changes over time (Coming soon)\n\n## **\ud83d\udcda Complete Documentation**\n\n### Core Documentation\n- **[\ud83d\udcd6 Architecture Deep Dive](docs/architecture.md)** - System design, routing algorithms, technical decisions\n- **[\ud83c\udf10 Distributed Setup Guide](DISTRIBUTED_SETUP.md)** - \u2b50 **Set up your own multi-node cluster**\n- **[\ud83d\udcca Performance Benchmarks](BENCHMARKS.md)** - Real-world performance data and scaling tests\n- **[\u26a0\ufe0f Known Issues & Limitations](KNOWN_ISSUES.md)** - \ud83d\udd34 **READ THIS** - Honest assessment of current state\n- **[\ud83d\udd12 Security Policy](SECURITY.md)** - Security best practices and vulnerability reporting\n- **[\ud83d\udc1b Error Handling Guide](ERROR_HANDLING.md)** - Troubleshooting common issues\n- **[\ud83e\udd1d Contributing Guide](CONTRIBUTING.md)** - How to contribute to the project\n- **[\ud83d\udccb Code of Conduct](CODE_OF_CONDUCT.md)** - Community guidelines\n- **[\ud83d\udcdd Changelog](CHANGELOG.md)** - Version history\n\n### Technical Guides\n- **[\u26a1 Performance Optimization](PERFORMANCE_OPTIMIZATION.md)** - Tuning for maximum speed\n- **[\ud83d\udd27 GPU Router Setup](GPU_ROUTER_SETUP.md)** - Distributed cluster configuration\n- **[\ud83e\udd16 GPU Auto-Optimization](GPU_AUTO_OPTIMIZATION.md)** - Automatic GPU management\n- **[\ud83d\udcca VRAM Monitoring](VRAM_MONITORING.md)** - GPU memory tracking\n- **[\ud83c\udfaf Adaptive Parallelism](ADAPTIVE_PARALLELISM.md)** - Smart workload distribution\n- **[\ud83d\uddc4\ufe0f ChromaDB Production](CHROMADB_PRODUCTION.md)** - Vector database scaling\n- **[\ud83d\udcbe Model Caching](MODEL_CACHING.md)** - Performance through caching\n- **[\ud83d\udda5\ufe0f Node Management](NODE_MANAGEMENT.md)** - Managing distributed nodes\n- **[\u26a1 Quick Setup](QUICK_SETUP.md)** - Fast track to getting started\n\n### Additional Resources\n- **[\ud83c\udfdb\ufe0f FlockParser-legacy](https://github.com/BenevolentJoker-JohnL/FlockParser-legacy)** - Original distributed inference implementation\n- **[\ud83d\udce6 Docker Setup](docker-compose.yml)** - Containerized deployment\n- **[\u2699\ufe0f Environment Config](.env.example)** - Configuration template\n- **[\ud83e\uddea Tests](tests/)** - Test suite and CI/CD\n\n## **\ud83d\udd17 Integration with SynapticLlamas & SOLLOL**\n\nFlockParser is designed to work seamlessly with **[SynapticLlamas](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** (multi-agent orchestration) and **[SOLLOL](https://github.com/BenevolentJoker-JohnL/SOLLOL)** (distributed inference platform) as a unified AI ecosystem.\n\n### **The Complete Stack**\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502              SynapticLlamas (v0.1.0+)                       \u2502\n\u2502          Multi-Agent System & Orchestration                 \u2502\n\u2502  \u2022 Research agents  \u2022 Editor agents  \u2022 Storyteller agents  \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n            \u2502                                    \u2502\n            \u2502 RAG Queries                        \u2502 Distributed\n            \u2502 (with pre-computed embeddings)     \u2502 Inference\n            \u2502                                    \u2502\n     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n     \u2502  FlockParser    \u2502              \u2502      SOLLOL          \u2502\n     \u2502  API (v1.0.4+)  \u2502              \u2502  Load Balancer       \u2502\n     \u2502  Port: 8000     \u2502              \u2502  (v0.9.31+)          \u2502\n     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n            \u2502                                    \u2502\n            \u2502 ChromaDB                          \u2502 Intelligent\n            \u2502 Vector Store                      \u2502 GPU/CPU Routing\n            \u2502                                    \u2502\n     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n     \u2502  Knowledge Base \u2502              \u2502  Ollama Nodes        \u2502\n     \u2502  41 Documents   \u2502              \u2502  (Distributed)       \u2502\n     \u2502  6,141 Chunks   \u2502              \u2502  GPU + CPU           \u2502\n     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### **Why This Integration Matters**\n\n**FlockParser** provides document RAG capabilities, **SynapticLlamas** orchestrates multi-agent workflows, and **SOLLOL** handles distributed inference with intelligent load balancing.\n\n| Component | Role | Key Feature |\n|-----------|------|-------------|\n| **FlockParser** | Document RAG & Knowledge Base | ChromaDB vector store with 6,141+ chunks |\n| **SynapticLlamas** | Agent Orchestration | Multi-agent workflows with RAG integration |\n| **SOLLOL** | Distributed Inference | Load balanced embedding & model inference |\n\n### **Quick Start: Complete Ecosystem**\n\n```bash\n# Install all three packages (auto-installs dependencies)\npip install synaptic-llamas  # Pulls in flockparser>=1.0.4 and sollol>=0.9.31\n\n# Start FlockParser API (auto-starts with CLI)\nflockparse\n\n# Configure SynapticLlamas for integration\nsynaptic-llamas --interactive --distributed\n```\n\n### **Integration Example: Load Balanced RAG**\n\n```python\nfrom flockparser_adapter import FlockParserAdapter\nfrom sollol_load_balancer import SOLLOLLoadBalancer\n\n# Initialize SOLLOL for distributed inference\nsollol = SOLLOLLoadBalancer(\n    rpc_backends=[\"http://gpu-node-1:50052\", \"http://gpu-node-2:50052\"]\n)\n\n# Initialize FlockParser adapter\nflockparser = FlockParserAdapter(\"http://localhost:8000\", remote_mode=True)\n\n# Step 1: Generate embedding using SOLLOL (load balanced!)\nembedding = sollol.generate_embedding(\n    model=\"mxbai-embed-large\",\n    prompt=\"quantum entanglement\"\n)\n# SOLLOL routes to fastest GPU automatically\n\n# Step 2: Query FlockParser with pre-computed embedding\nresults = flockparser.query_remote(\n    query=\"quantum entanglement\",\n    embedding=embedding,  # Skip FlockParser's embedding generation\n    n_results=5\n)\n# FlockParser returns relevant chunks from 41 documents\n\n# Performance gain: 2-5x faster when SOLLOL has faster nodes!\n```\n\n### **New API Endpoints (v1.0.4+)**\n\nFlockParser v1.0.4 adds **SynapticLlamas-compatible** public endpoints:\n\n- **`GET /health`** - Check API availability and document count\n- **`GET /stats`** - Get knowledge base statistics (41 docs, 6,141 chunks)\n- **`POST /query`** - Query with pre-computed embeddings (critical for load balanced RAG)\n\n**These endpoints allow SynapticLlamas to bypass FlockParser's embedding generation and use SOLLOL's load balancer instead!**\n\n### **Learn More**\n\n- **[\ud83d\udcd6 Complete Integration Guide](INTEGRATION_WITH_SYNAPTICLLAMAS.md)** - Full architecture, examples, and setup\n- **[SynapticLlamas Repository](https://github.com/BenevolentJoker-JohnL/SynapticLlamas)** - Multi-agent orchestration\n- **[SOLLOL Repository](https://github.com/BenevolentJoker-JohnL/SOLLOL)** - Distributed inference platform\n\n---\n\n## **\ud83d\udcdd Development Process**\n\nThis project was developed iteratively using Claude and Claude Code as coding assistants. All design decisions, architecture choices, and integration strategy were directed and reviewed by me.\n\n## **\ud83e\udd1d Contributing**\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## **\ud83d\udcc4 License**\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Distributed document RAG system with intelligent GPU/CPU orchestration",
    "version": "1.0.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/BenevolentJoker-JohnL/FlockParser/issues",
        "Demo Video": "https://youtu.be/M-HjXkWYRLM",
        "Documentation": "https://github.com/BenevolentJoker-JohnL/FlockParser#readme",
        "Homepage": "https://github.com/BenevolentJoker-JohnL/FlockParser",
        "Repository": "https://github.com/BenevolentJoker-JohnL/FlockParser"
    },
    "split_keywords": [
        "rag",
        " retrieval-augmented-generation",
        " distributed-systems",
        " document-processing",
        " gpu-acceleration",
        " ollama",
        " chromadb",
        " pdf-processing",
        " ocr",
        " ai",
        " machine-learning",
        " nlp"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96f8962ee272c9cab279da8c0e734bc06a1c735bcac13a8ab3367cb6bb497b3b",
                "md5": "2732ecc28e4957e38ae4660310d7d8fd",
                "sha256": "830b3b19f207e80a654255c50704257413818e2d416e014c845203a67e7503c9"
            },
            "downloads": -1,
            "filename": "flockparser-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2732ecc28e4957e38ae4660310d7d8fd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 79499,
            "upload_time": "2025-10-21T19:53:43",
            "upload_time_iso_8601": "2025-10-21T19:53:43.049380Z",
            "url": "https://files.pythonhosted.org/packages/96/f8/962ee272c9cab279da8c0e734bc06a1c735bcac13a8ab3367cb6bb497b3b/flockparser-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e60fa61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b",
                "md5": "bdefb61625288422f536eaf478cea1ac",
                "sha256": "61c0945709cbf5a1f6b42e85924727ad5eb55ae67e65ab80f0b85a179abc3fbb"
            },
            "downloads": -1,
            "filename": "flockparser-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bdefb61625288422f536eaf478cea1ac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 157426,
            "upload_time": "2025-10-21T19:53:44",
            "upload_time_iso_8601": "2025-10-21T19:53:44.809368Z",
            "url": "https://files.pythonhosted.org/packages/e6/0f/a61b6efc1b1ba9fbfeb5cfad6d6edaa5643843afca6c1f6e74d633b7c26b/flockparser-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 19:53:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BenevolentJoker-JohnL",
    "github_project": "FlockParser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.103.1"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.23.2"
                ]
            ]
        },
        {
            "name": "python-multipart",
            "specs": [
                [
                    ">=",
                    "0.0.6"
                ]
            ]
        },
        {
            "name": "pdfplumber",
            "specs": [
                [
                    ">=",
                    "0.10.2"
                ]
            ]
        },
        {
            "name": "PyPDF2",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    ">=",
                    "3.15.1"
                ]
            ]
        },
        {
            "name": "pytesseract",
            "specs": [
                [
                    ">=",
                    "0.3.10"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": [
                [
                    ">=",
                    "10.0.0"
                ]
            ]
        },
        {
            "name": "pdf2image",
            "specs": [
                [
                    ">=",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "python-docx",
            "specs": [
                [
                    ">=",
                    "0.8.11"
                ]
            ]
        },
        {
            "name": "markdown",
            "specs": [
                [
                    ">=",
                    "3.4.4"
                ]
            ]
        },
        {
            "name": "chromadb",
            "specs": [
                [
                    ">=",
                    "0.4.13"
                ]
            ]
        },
        {
            "name": "ollama",
            "specs": [
                [
                    ">=",
                    "0.1.4"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "mcp",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "streamlit",
            "specs": [
                [
                    ">=",
                    "1.40.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.4.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.1.0"
                ]
            ]
        },
        {
            "name": "pytest-timeout",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        }
    ],
    "lcname": "flockparser"
}

BenevolentJoker (John L.)