kssrag

Name	kssrag JSON
Version	0.2.1 JSON
	download
home_page	https://github.com/Ksschkw/kssrag
Summary	A flexible Retrieval-Augmented Generation framework by Ksschkw
upload_time	2025-10-27 10:13:44
maintainer	None
docs_url	None
author	Ksschkw
requires_python	<4,>=3.8
license	None
keywords	rag retrieval generation ai nlp faiss bm25
VCS
bugtrack_url
requirements	fastapi uvicorn python-dotenv requests rank-bm25 numpy faiss-cpu sentence-transformers pydantic pydantic-settings rapidfuzz python-multipart pypdf scikit-learn scipy paddleocr Pillow paddlepaddle python-docx bm25S pystemmer stemmer openpyxl python-pptx
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # KSS RAG - Knowledge Retrieval Augmented Generation Framework

<div align="center">

![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)
![Version](https://img.shields.io/badge/version-0.2.0-brightgreen)
![Framework](https://img.shields.io/badge/framework-RAG-orange)
![Documentation](https://img.shields.io/badge/docs-comprehensive-brightgreen)

**Enterprise-Grade Retrieval-Augmented Generation for Modern Applications**

[Quick Start](#quick-start) • [Features](#features) • [Documentation](#documentation) • [Examples](#examples) • [API Reference](#api-reference)

</div>

## Table of Contents

- [Overview](#overview)
- [Architecture](#architecture)
- [Features](#features)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Core Concepts](#core-concepts)
- [Documentation](#documentation)
- [Examples](#examples)
- [API Reference](#api-reference)
- [Deployment](#deployment)
- [Contributing](#contributing)
- [Support](#support)
- [License](#license)

## Overview

KSS RAG is a production-ready Retrieval-Augmented Generation framework designed for enterprises requiring robust, scalable, and maintainable AI-powered document processing. Built with architectural excellence and engineering rigor, this framework provides comprehensive solutions for knowledge retrieval, document understanding, and intelligent question answering.

### Key Capabilities

- **Multi-Format Document Processing**: Text, PDF, Office documents, images with OCR
- **Advanced Vector Search**: Multiple vector store implementations with hybrid approaches
- **Real-time Streaming**: Token-by-token response streaming for enhanced user experience
- **Enterprise Security**: Comprehensive security and input validation
- **Production Monitoring**: Health checks, metrics, and observability

## Architecture

```mermaid
graph TB
    A[Document Input] --> B[Document Loader]
    B --> C[Chunker]
    C --> D[Vector Store]
    D --> E[FAISS Index]
    D --> F[BM25 Index]
    D --> G[Hybrid Index]
    
    H[User Query] --> I[Query Processor]
    I --> J[Retriever]
    J --> K[Vector Store]
    J --> L[Context Builder]
    
    M[LLM Provider] --> N[OpenRouter]
    M --> O[Custom LLMs]
    
    L --> P[Prompt Engineer]
    P --> M
    M --> Q[Response Generator]
    Q --> R[Streaming Output]
    Q --> S[Standard Output]
    
    subgraph "Document Processing Pipeline"
        B --> C --> D
    end
    
    subgraph "Query Processing Pipeline"
        I --> J --> L --> P
    end
    
    style A fill:#e1f5fe
    style H fill:#f3e5f5
    style R fill:#e8f5e8
    style S fill:#e8f5e8
```

## Features

### 🎯 Core Capabilities

| Feature | Description | Status |
|---------|-------------|--------|
| **Multi-Format Support** | Text, PDF, JSON, DOCX, Excel, PowerPoint, Images | ✅ Production Ready |
| **Advanced OCR** | Handwritten (PaddleOCR) & Typed (Tesseract) text recognition | ✅ Production Ready |
| **Vector Stores** | BM25, BM25S, FAISS, TFIDF, Hybrid implementations | ✅ Production Ready |
| **Streaming Responses** | Real-time token streaming with OpenRouter | ✅ Production Ready |
| **REST API** | FastAPI with comprehensive endpoints | ✅ Production Ready |
| **CLI Interface** | Command-line tools for rapid development | ✅ Production Ready |

### 🔧 Technical Excellence

| Aspect | Implementation | Benefits |
|--------|----------------|----------|
| **Windows Compatibility** | No AVX2 dependencies, hybrid fallbacks | Enterprise deployment |
| **Extensible Architecture** | Plugin system for custom components | Future-proof design |
| **Performance Optimization** | Batch processing, caching, memory management | High throughput |
| **Error Resilience** | Smart fallbacks, retry mechanisms | Production reliability |

### 📊 Performance Metrics

| Operation | Average Latency | Throughput |
|-----------|-----------------|------------|
| Document Indexing | 2-5 sec/1000 chunks | 200+ docs/min |
| Query Processing | 500-1500 ms | 50+ QPS |
| OCR Processing | 1-3 sec/image (Handwritten would take longer) | 20+ images/min (Handwritten would take longer) |
| Streaming Response | 50-200 ms/first token | Real-time |

## Quick Start

### Installation

```bash
# Base installation
pip install kssrag

# With extended capabilities
pip install kssrag[ocr,gpu,dev]
```

### Basic Usage

```python
from kssrag import KSSRAG
import os

# Configure environment
os.environ["OPENROUTER_API_KEY"] = "your-api-key-here"

# Initialize framework
rag = KSSRAG()

# Load knowledge base
rag.load_document("technical_docs.pdf")
rag.load_document("product_specs.docx") 
rag.load_document("research_data.xlsx")

# Execute intelligent query
response = rag.query(
    "What are the technical specifications and key differentiators?",
    top_k=5
)
print(response)
```

### CLI Demonstration

```bash
# Stream processing with hybrid retrieval
python -m kssrag.cli query \
    --file enterprise_docs.pdf \
    --query "Architecture decisions and rationale" \
    --vector-store hybrid_online \
    --top-k 8 \
    --stream

# Production API server
python -m kssrag.cli server \
    --file knowledge_base.docx \
    --port 8080 \
    --host 0.0.0.0 \
    --vector-store faiss
```

## Installation

### System Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| Python | 3.8+ | 3.11+ |
| RAM | 4 GB | 16 GB |
| Storage | 1 GB | 10 GB+ |
| OS | Windows 10+, Linux, macOS | Linux |

### Installation Methods

**Standard Installation**
```bash
pip install kssrag
```

**Extended Capabilities**
```bash
# OCR functionality (PaddleOCR + Tesseract)
pip install kssrag[ocr]

# GPU acceleration
pip install kssrag[gpu]

# Development tools
pip install kssrag[dev]

# All features
pip install kssrag[all]
```

**Source Installation**
```bash
git clone https://github.com/Ksschkw/kssrag
cd kssrag
pip install -e .[all]
```

### Verification

```python
# Verify installation
import kssrag
from kssrag import KSSRAG

print(f"KSS RAG Version: {kssrag.__version__}")

# Test basic functionality
rag = KSSRAG()
print("Framework initialized successfully")
```

## Core Concepts

### Document Processing Pipeline

```mermaid
sequenceDiagram
    participant User
    participant System
    participant Loader
    participant Chunker
    participant VectorStore
    participant Retriever
    participant LLM

    User->>System: load_document(file_path)
    System->>Loader: parse_document()
    Loader->>Chunker: chunk_content()
    Chunker->>VectorStore: add_documents()
    VectorStore->>System: indexing_complete()
    
    User->>System: query(question)
    System->>Retriever: retrieve_relevant()
    Retriever->>VectorStore: similarity_search()
    VectorStore->>Retriever: relevant_chunks()
    Retriever->>LLM: generate_response()
    LLM->>System: final_response()
    System->>User: display_result()
```

### Vector Store Architecture

```mermaid
graph LR
    A[Query] --> B{Vector Store Router}
    
    B --> C[BM25 Store]
    B --> D[BM25S Store]
    B --> E[FAISS Store]
    B --> F[TFIDF Store]
    B --> G[Hybrid Online]
    B --> H[Hybrid Offline]
    
    C --> I[Keyword Matching]
    D --> J[Stemmed Keywords]
    E --> K[Semantic Search]
    F --> L[Statistical Analysis]
    
    G --> M[FAISS + BM25 Fusion]
    H --> N[BM25 + TFIDF Fusion]
    
    I --> O[Results]
    J --> O
    K --> O
    L --> O
    M --> O
    N --> O
    
    style O fill:#c8e6c9
```

## Documentation

### Comprehensive Guides

- [**Configuration Guide**](docs/configuration.md) - Detailed configuration options and best practices
- [**API Reference**](docs/api_reference.md) - Complete API documentation with examples
- [**Deployment Guide**](docs/deployment.md) - Production deployment strategies
- [**Performance Tuning**](docs/performance.md) - Optimization techniques and benchmarks

### Tutorials

- [**Getting Started**](examples/basic_usage.py) - Basic framework usage
- [**Advanced Features**](examples/advanced_usage.py) - Custom configurations and extensions
- [**Custom Components**](examples/custom_config.py) - Building custom chunkers and vector stores

## Examples

### Basic Implementation

```python
"""
Basic KSS RAG implementation for document Q&A
"""
from kssrag import KSSRAG
import os

def main():
    # Configuration
    os.environ["OPENROUTER_API_KEY"] = "your-api-key"
    
    # Initialize
    rag = KSSRAG()
    
    # Load documents
    rag.load_document("technical_manual.pdf")
    rag.load_document("api_documentation.md")
    
    # Query system
    response = rag.query(
        "How do I implement the authentication system?",
        top_k=5
    )
    
    print("Response:", response)

if __name__ == "__main__":
    main()
```

### Advanced Configuration

```python
"""
Enterprise-grade configuration with custom components
"""
from kssrag import KSSRAG, Config, VectorStoreType, RetrieverType
from kssrag.core.agents import RAGAgent
from kssrag.models.openrouter import OpenRouterLLM
import os

def main():
    # Enterprise configuration
    config = Config(
        OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
        DEFAULT_MODEL="anthropic/claude-3-sonnet",
        VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,
        RETRIEVER_TYPE=RetrieverType.HYBRID,
        TOP_K=10,
        CHUNK_SIZE=1000,
        CHUNK_OVERLAP=150,
        BATCH_SIZE=32,
        ENABLE_CACHE=True,
        CACHE_DIR="/opt/kssrag/cache",
        LOG_LEVEL="INFO"
    )
    
    # Initialize with custom config
    rag = KSSRAG(config=config)
    
    # Load enterprise documents
    rag.load_document("product_requirements.pdf")
    rag.load_document("architecture_docs.docx")
    rag.load_document("user_research.json")
    
    # Custom expert prompt
    expert_prompt = """
    You are a senior technical expert analyzing documentation. 
    Provide authoritative, precise responses based on the source material.
    Focus on actionable insights and technical accuracy.
    """
    
    # Custom agent configuration
    llm = OpenRouterLLM(
        api_key=config.OPENROUTER_API_KEY,
        model=config.DEFAULT_MODEL,
        stream=True
    )
    
    rag.agent = RAGAgent(
        retriever=rag.retriever,
        llm=llm,
        system_prompt=expert_prompt
    )
    
    # Execute complex query
    query = """
    Analyze the technical architecture and identify:
    1. Key design decisions
    2. Potential scalability concerns  
    3. Recommended improvements
    """
    
    print("Processing complex query...")
    for chunk in rag.agent.query_stream(query, top_k=8):
        print(chunk, end="", flush=True)

if __name__ == "__main__":
    main()
```

### OCR Integration

```python
"""
Advanced OCR processing for document digitization
"""
from kssrag import KSSRAG, Config
import os

def process_scanned_documents():
    config = Config(
        OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
        OCR_DEFAULT_MODE="handwritten"  # or "typed"
    )
    
    rag = KSSRAG(config=config)
    
    # Process various document types
    documents = [
        ("handwritten_notes.jpg", "handwritten"),
        ("typed_contract.jpg", "typed"), 
        ("mixed_document.png", "handwritten")
    ]
    
    for doc_path, ocr_mode in documents:
        try:
            print(f"Processing {doc_path} with {ocr_mode} OCR...")
            rag.load_document(doc_path, format="image")
            print(f"Successfully processed {doc_path}")
        except Exception as e:
            print(f"Error processing {doc_path}: {str(e)}")
    
    # Query across all processed documents
    response = rag.query(
        "Extract and summarize all action items and deadlines",
        top_k=6
    )
    
    return response

if __name__ == "__main__":
    result = process_scanned_documents()
    print("OCR Processing Result:", result)
```

## API Reference

### Core Classes

#### KSSRAG
The primary interface for the RAG framework.

```python
class KSSRAG:
    """
    Main RAG framework class providing document processing and query capabilities.
    
    Attributes:
        config (Config): Framework configuration
        vector_store: Active vector store instance
        retriever: Document retriever instance  
        agent: RAG agent for query processing
        documents (List): Processed document chunks
    """
    
    def __init__(self, config: Optional[Config] = None):
        """Initialize RAG system with optional configuration"""
        
    def load_document(self, file_path: str, format: Optional[str] = None,
                     chunker: Optional[Any] = None, metadata: Optional[Dict[str, Any]] = None):
        """
        Load and process document for retrieval
        
        Args:
            file_path: Path to document file
            format: Document format (auto-detected if None)
            chunker: Custom chunker instance
            metadata: Additional document metadata
        """
        
    def query(self, question: str, top_k: Optional[int] = None) -> str:
        """
        Execute query against loaded documents
        
        Args:
            question: Natural language query
            top_k: Number of results to retrieve
            
        Returns:
            Generated response string
        """
        
    def create_server(self, server_config=None):
        """
        Create FastAPI server instance
        
        Returns:
            FastAPI application instance
        """
```

#### Configuration Management

```python
class Config(BaseSettings):
    """
    Comprehensive configuration management with validation
    
    Example:
        config = Config(
            OPENROUTER_API_KEY="key",
            VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,
            TOP_K=10
        )
    """
    
    # API Configuration
    OPENROUTER_API_KEY: str
    DEFAULT_MODEL: str = "anthropic/claude-3-sonnet"
    FALLBACK_MODELS: List[str] = ["deepseek/deepseek-chat-v3.1:free"]
    
    # Processing Configuration  
    CHUNK_SIZE: int = 800
    CHUNK_OVERLAP: int = 100
    VECTOR_STORE_TYPE: VectorStoreType = VectorStoreType.HYBRID_OFFLINE
    
    # Performance Configuration
    BATCH_SIZE: int = 64
    ENABLE_CACHE: bool = True
    
    # Server Configuration
    SERVER_HOST: str = "localhost"
    SERVER_PORT: int = 8000
```

### Server Endpoints

| Endpoint | Method | Description | Parameters |
|----------|--------|-------------|------------|
| `/query` | POST | Execute RAG query | `query`, `session_id` |
| `/stream` | POST | Streaming query | `query`, `session_id` |
| `/health` | GET | System health | - |
| `/config` | GET | Server configuration | - |
| `/sessions/{id}/clear` | GET | Clear session | `session_id` |

## Deployment

### Docker Deployment

```dockerfile
# Dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    tesseract-ocr \
    libgl1 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Copy requirements and install
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create non-privileged user
RUN useradd -m -u 1000 kssrag
USER kssrag

EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

CMD ["python", "-m", "kssrag.cli", "server", "--host", "0.0.0.0", "--port", "8000"]
```

### Kubernetes Deployment

```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kssrag
  labels:
    app: kssrag
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kssrag
  template:
    metadata:
      labels:
        app: kssrag
    spec:
      containers:
      - name: kssrag
        image: kssrag:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENROUTER_API_KEY
          valueFrom:
            secretKeyRef:
              name: kssrag-secrets
              key: openrouter-api-key
        - name: VECTOR_STORE_TYPE
          value: "hybrid_offline"
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: kssrag-service
spec:
  selector:
    app: kssrag
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
```

### Production Configuration

```python
# production_config.py
from kssrag import Config, VectorStoreType

production_config = Config(
    OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
    VECTOR_STORE_TYPE=VectorStoreType.HYBRID_OFFLINE,
    CHUNK_SIZE=1000,
    TOP_K=8,
    BATCH_SIZE=32,
    ENABLE_CACHE=True,
    CACHE_DIR="/var/lib/kssrag/cache",
    LOG_LEVEL="INFO",
    SERVER_HOST="0.0.0.0",
    SERVER_PORT=8000,
    CORS_ORIGINS=[
        "https://app.company.com",
        "https://api.company.com"
    ]
)
```

## Contributing

We welcome contributions from the community. Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup

```bash
# Clone repository
git clone https://github.com/Ksschkw/kssrag
cd kssrag

# Install development dependencies
pip install -e .[dev,ocr,all]

# Run test suite
python -m pytest tests/ -v --cov=kssrag

# Code quality checks
black kssrag/ tests/
flake8 kssrag/
mypy kssrag/

# Build documentation
cd docs && make html
```

### Code Organization

```
kssrag/
├── core/                   # Core framework components
│   ├── chunkers.py         # Document segmentation strategies
│   ├── vectorstores.py     # Vector database implementations
│   ├── retrievers.py       # Information retrieval algorithms
│   └── agents.py           # RAG orchestration logic
├── models/                 # LLM provider integrations
│   ├── openrouter.py       # OpenRouter API client
│   └── local_llms.py       # Local LLM implementations
├── utils/                  # Utility functions
│   ├── helpers.py          # Common utilities
│   ├── document_loaders.py # Document parsing
│   ├── ocr_loader.py       # OCR processing (PaddleOCR/Tesseract)
│   └── preprocessors.py    # Text preprocessing
├── config.py               # Configuration management
├── server.py               # FastAPI web server
├── cli.py                  # Command-line interface
└── __init__.py             # Package exports
```

## Support

### Documentation
- [**Full Documentation**](https://github.com/Ksschkw/kssrag/docs)
- [**API Reference**](https://github.com/Ksschkw/kssrag/docs/api_reference.md)
- [**Examples Directory**](https://github.com/Ksschkw/kssrag/examples)

### Community
- [**GitHub Issues**](https://github.com/Ksschkw/kssrag/issues) - Bug reports and feature requests
- [**Discussions**](https://github.com/Ksschkw/kssrag/discussions) - Community support and ideas
- [**Releases**](https://github.com/Ksschkw/kssrag/releases) - Release notes and updates

### Acknowledgments

This project builds upon several outstanding open-source projects:

- [**FAISS**](https://github.com/facebookresearch/faiss) - Efficient similarity search
- [**PaddleOCR**](https://github.com/PaddlePaddle/PaddleOCR) - Advanced OCR capabilities
- [**SentenceTransformers**](https://github.com/UKPLab/sentence-transformers) - Text embeddings
- [**OpenRouter**](https://openrouter.ai/) - Unified LLM API access

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

<div align="center">

**KSS RAG** - Enterprise-Grade Retrieval-Augmented Generation  
*Built with precision for production environments*

[Get Started](#quick-start) • [Explore Features](#features) • [View Examples](#examples)

</div>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ksschkw/kssrag",
    "name": "kssrag",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": null,
    "keywords": "rag, retrieval, generation, ai, nlp, faiss, bm25",
    "author": "Ksschkw",
    "author_email": "kookafor893@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/af/9f/8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04/kssrag-0.2.1.tar.gz",
    "platform": null,
    "description": "# KSS RAG - Knowledge Retrieval Augmented Generation Framework\r\n\r\n<div align=\"center\">\r\n\r\n![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)\r\n![License](https://img.shields.io/badge/license-MIT-green)\r\n![Version](https://img.shields.io/badge/version-0.2.0-brightgreen)\r\n![Framework](https://img.shields.io/badge/framework-RAG-orange)\r\n![Documentation](https://img.shields.io/badge/docs-comprehensive-brightgreen)\r\n\r\n**Enterprise-Grade Retrieval-Augmented Generation for Modern Applications**\r\n\r\n[Quick Start](#quick-start) \u2022 [Features](#features) \u2022 [Documentation](#documentation) \u2022 [Examples](#examples) \u2022 [API Reference](#api-reference)\r\n\r\n</div>\r\n\r\n## Table of Contents\r\n\r\n- [Overview](#overview)\r\n- [Architecture](#architecture)\r\n- [Features](#features)\r\n- [Quick Start](#quick-start)\r\n- [Installation](#installation)\r\n- [Core Concepts](#core-concepts)\r\n- [Documentation](#documentation)\r\n- [Examples](#examples)\r\n- [API Reference](#api-reference)\r\n- [Deployment](#deployment)\r\n- [Contributing](#contributing)\r\n- [Support](#support)\r\n- [License](#license)\r\n\r\n## Overview\r\n\r\nKSS RAG is a production-ready Retrieval-Augmented Generation framework designed for enterprises requiring robust, scalable, and maintainable AI-powered document processing. Built with architectural excellence and engineering rigor, this framework provides comprehensive solutions for knowledge retrieval, document understanding, and intelligent question answering.\r\n\r\n### Key Capabilities\r\n\r\n- **Multi-Format Document Processing**: Text, PDF, Office documents, images with OCR\r\n- **Advanced Vector Search**: Multiple vector store implementations with hybrid approaches\r\n- **Real-time Streaming**: Token-by-token response streaming for enhanced user experience\r\n- **Enterprise Security**: Comprehensive security and input validation\r\n- **Production Monitoring**: Health checks, metrics, and observability\r\n\r\n## Architecture\r\n\r\n```mermaid\r\ngraph TB\r\n    A[Document Input] --> B[Document Loader]\r\n    B --> C[Chunker]\r\n    C --> D[Vector Store]\r\n    D --> E[FAISS Index]\r\n    D --> F[BM25 Index]\r\n    D --> G[Hybrid Index]\r\n    \r\n    H[User Query] --> I[Query Processor]\r\n    I --> J[Retriever]\r\n    J --> K[Vector Store]\r\n    J --> L[Context Builder]\r\n    \r\n    M[LLM Provider] --> N[OpenRouter]\r\n    M --> O[Custom LLMs]\r\n    \r\n    L --> P[Prompt Engineer]\r\n    P --> M\r\n    M --> Q[Response Generator]\r\n    Q --> R[Streaming Output]\r\n    Q --> S[Standard Output]\r\n    \r\n    subgraph \"Document Processing Pipeline\"\r\n        B --> C --> D\r\n    end\r\n    \r\n    subgraph \"Query Processing Pipeline\"\r\n        I --> J --> L --> P\r\n    end\r\n    \r\n    style A fill:#e1f5fe\r\n    style H fill:#f3e5f5\r\n    style R fill:#e8f5e8\r\n    style S fill:#e8f5e8\r\n```\r\n\r\n## Features\r\n\r\n### \ud83c\udfaf Core Capabilities\r\n\r\n| Feature | Description | Status |\r\n|---------|-------------|--------|\r\n| **Multi-Format Support** | Text, PDF, JSON, DOCX, Excel, PowerPoint, Images | \u2705 Production Ready |\r\n| **Advanced OCR** | Handwritten (PaddleOCR) & Typed (Tesseract) text recognition | \u2705 Production Ready |\r\n| **Vector Stores** | BM25, BM25S, FAISS, TFIDF, Hybrid implementations | \u2705 Production Ready |\r\n| **Streaming Responses** | Real-time token streaming with OpenRouter | \u2705 Production Ready |\r\n| **REST API** | FastAPI with comprehensive endpoints | \u2705 Production Ready |\r\n| **CLI Interface** | Command-line tools for rapid development | \u2705 Production Ready |\r\n\r\n### \ud83d\udd27 Technical Excellence\r\n\r\n| Aspect | Implementation | Benefits |\r\n|--------|----------------|----------|\r\n| **Windows Compatibility** | No AVX2 dependencies, hybrid fallbacks | Enterprise deployment |\r\n| **Extensible Architecture** | Plugin system for custom components | Future-proof design |\r\n| **Performance Optimization** | Batch processing, caching, memory management | High throughput |\r\n| **Error Resilience** | Smart fallbacks, retry mechanisms | Production reliability |\r\n\r\n### \ud83d\udcca Performance Metrics\r\n\r\n| Operation | Average Latency | Throughput |\r\n|-----------|-----------------|------------|\r\n| Document Indexing | 2-5 sec/1000 chunks | 200+ docs/min |\r\n| Query Processing | 500-1500 ms | 50+ QPS |\r\n| OCR Processing | 1-3 sec/image (Handwritten would take longer) | 20+ images/min (Handwritten would take longer) |\r\n| Streaming Response | 50-200 ms/first token | Real-time |\r\n\r\n## Quick Start\r\n\r\n### Installation\r\n\r\n```bash\r\n# Base installation\r\npip install kssrag\r\n\r\n# With extended capabilities\r\npip install kssrag[ocr,gpu,dev]\r\n```\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom kssrag import KSSRAG\r\nimport os\r\n\r\n# Configure environment\r\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-api-key-here\"\r\n\r\n# Initialize framework\r\nrag = KSSRAG()\r\n\r\n# Load knowledge base\r\nrag.load_document(\"technical_docs.pdf\")\r\nrag.load_document(\"product_specs.docx\") \r\nrag.load_document(\"research_data.xlsx\")\r\n\r\n# Execute intelligent query\r\nresponse = rag.query(\r\n    \"What are the technical specifications and key differentiators?\",\r\n    top_k=5\r\n)\r\nprint(response)\r\n```\r\n\r\n### CLI Demonstration\r\n\r\n```bash\r\n# Stream processing with hybrid retrieval\r\npython -m kssrag.cli query \\\r\n    --file enterprise_docs.pdf \\\r\n    --query \"Architecture decisions and rationale\" \\\r\n    --vector-store hybrid_online \\\r\n    --top-k 8 \\\r\n    --stream\r\n\r\n# Production API server\r\npython -m kssrag.cli server \\\r\n    --file knowledge_base.docx \\\r\n    --port 8080 \\\r\n    --host 0.0.0.0 \\\r\n    --vector-store faiss\r\n```\r\n\r\n## Installation\r\n\r\n### System Requirements\r\n\r\n| Component | Minimum | Recommended |\r\n|-----------|---------|-------------|\r\n| Python | 3.8+ | 3.11+ |\r\n| RAM | 4 GB | 16 GB |\r\n| Storage | 1 GB | 10 GB+ |\r\n| OS | Windows 10+, Linux, macOS | Linux |\r\n\r\n### Installation Methods\r\n\r\n**Standard Installation**\r\n```bash\r\npip install kssrag\r\n```\r\n\r\n**Extended Capabilities**\r\n```bash\r\n# OCR functionality (PaddleOCR + Tesseract)\r\npip install kssrag[ocr]\r\n\r\n# GPU acceleration\r\npip install kssrag[gpu]\r\n\r\n# Development tools\r\npip install kssrag[dev]\r\n\r\n# All features\r\npip install kssrag[all]\r\n```\r\n\r\n**Source Installation**\r\n```bash\r\ngit clone https://github.com/Ksschkw/kssrag\r\ncd kssrag\r\npip install -e .[all]\r\n```\r\n\r\n### Verification\r\n\r\n```python\r\n# Verify installation\r\nimport kssrag\r\nfrom kssrag import KSSRAG\r\n\r\nprint(f\"KSS RAG Version: {kssrag.__version__}\")\r\n\r\n# Test basic functionality\r\nrag = KSSRAG()\r\nprint(\"Framework initialized successfully\")\r\n```\r\n\r\n## Core Concepts\r\n\r\n### Document Processing Pipeline\r\n\r\n```mermaid\r\nsequenceDiagram\r\n    participant User\r\n    participant System\r\n    participant Loader\r\n    participant Chunker\r\n    participant VectorStore\r\n    participant Retriever\r\n    participant LLM\r\n\r\n    User->>System: load_document(file_path)\r\n    System->>Loader: parse_document()\r\n    Loader->>Chunker: chunk_content()\r\n    Chunker->>VectorStore: add_documents()\r\n    VectorStore->>System: indexing_complete()\r\n    \r\n    User->>System: query(question)\r\n    System->>Retriever: retrieve_relevant()\r\n    Retriever->>VectorStore: similarity_search()\r\n    VectorStore->>Retriever: relevant_chunks()\r\n    Retriever->>LLM: generate_response()\r\n    LLM->>System: final_response()\r\n    System->>User: display_result()\r\n```\r\n\r\n### Vector Store Architecture\r\n\r\n```mermaid\r\ngraph LR\r\n    A[Query] --> B{Vector Store Router}\r\n    \r\n    B --> C[BM25 Store]\r\n    B --> D[BM25S Store]\r\n    B --> E[FAISS Store]\r\n    B --> F[TFIDF Store]\r\n    B --> G[Hybrid Online]\r\n    B --> H[Hybrid Offline]\r\n    \r\n    C --> I[Keyword Matching]\r\n    D --> J[Stemmed Keywords]\r\n    E --> K[Semantic Search]\r\n    F --> L[Statistical Analysis]\r\n    \r\n    G --> M[FAISS + BM25 Fusion]\r\n    H --> N[BM25 + TFIDF Fusion]\r\n    \r\n    I --> O[Results]\r\n    J --> O\r\n    K --> O\r\n    L --> O\r\n    M --> O\r\n    N --> O\r\n    \r\n    style O fill:#c8e6c9\r\n```\r\n\r\n## Documentation\r\n\r\n### Comprehensive Guides\r\n\r\n- [**Configuration Guide**](docs/configuration.md) - Detailed configuration options and best practices\r\n- [**API Reference**](docs/api_reference.md) - Complete API documentation with examples\r\n- [**Deployment Guide**](docs/deployment.md) - Production deployment strategies\r\n- [**Performance Tuning**](docs/performance.md) - Optimization techniques and benchmarks\r\n\r\n### Tutorials\r\n\r\n- [**Getting Started**](examples/basic_usage.py) - Basic framework usage\r\n- [**Advanced Features**](examples/advanced_usage.py) - Custom configurations and extensions\r\n- [**Custom Components**](examples/custom_config.py) - Building custom chunkers and vector stores\r\n\r\n## Examples\r\n\r\n### Basic Implementation\r\n\r\n```python\r\n\"\"\"\r\nBasic KSS RAG implementation for document Q&A\r\n\"\"\"\r\nfrom kssrag import KSSRAG\r\nimport os\r\n\r\ndef main():\r\n    # Configuration\r\n    os.environ[\"OPENROUTER_API_KEY\"] = \"your-api-key\"\r\n    \r\n    # Initialize\r\n    rag = KSSRAG()\r\n    \r\n    # Load documents\r\n    rag.load_document(\"technical_manual.pdf\")\r\n    rag.load_document(\"api_documentation.md\")\r\n    \r\n    # Query system\r\n    response = rag.query(\r\n        \"How do I implement the authentication system?\",\r\n        top_k=5\r\n    )\r\n    \r\n    print(\"Response:\", response)\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n```\r\n\r\n### Advanced Configuration\r\n\r\n```python\r\n\"\"\"\r\nEnterprise-grade configuration with custom components\r\n\"\"\"\r\nfrom kssrag import KSSRAG, Config, VectorStoreType, RetrieverType\r\nfrom kssrag.core.agents import RAGAgent\r\nfrom kssrag.models.openrouter import OpenRouterLLM\r\nimport os\r\n\r\ndef main():\r\n    # Enterprise configuration\r\n    config = Config(\r\n        OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n        DEFAULT_MODEL=\"anthropic/claude-3-sonnet\",\r\n        VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,\r\n        RETRIEVER_TYPE=RetrieverType.HYBRID,\r\n        TOP_K=10,\r\n        CHUNK_SIZE=1000,\r\n        CHUNK_OVERLAP=150,\r\n        BATCH_SIZE=32,\r\n        ENABLE_CACHE=True,\r\n        CACHE_DIR=\"/opt/kssrag/cache\",\r\n        LOG_LEVEL=\"INFO\"\r\n    )\r\n    \r\n    # Initialize with custom config\r\n    rag = KSSRAG(config=config)\r\n    \r\n    # Load enterprise documents\r\n    rag.load_document(\"product_requirements.pdf\")\r\n    rag.load_document(\"architecture_docs.docx\")\r\n    rag.load_document(\"user_research.json\")\r\n    \r\n    # Custom expert prompt\r\n    expert_prompt = \"\"\"\r\n    You are a senior technical expert analyzing documentation. \r\n    Provide authoritative, precise responses based on the source material.\r\n    Focus on actionable insights and technical accuracy.\r\n    \"\"\"\r\n    \r\n    # Custom agent configuration\r\n    llm = OpenRouterLLM(\r\n        api_key=config.OPENROUTER_API_KEY,\r\n        model=config.DEFAULT_MODEL,\r\n        stream=True\r\n    )\r\n    \r\n    rag.agent = RAGAgent(\r\n        retriever=rag.retriever,\r\n        llm=llm,\r\n        system_prompt=expert_prompt\r\n    )\r\n    \r\n    # Execute complex query\r\n    query = \"\"\"\r\n    Analyze the technical architecture and identify:\r\n    1. Key design decisions\r\n    2. Potential scalability concerns  \r\n    3. Recommended improvements\r\n    \"\"\"\r\n    \r\n    print(\"Processing complex query...\")\r\n    for chunk in rag.agent.query_stream(query, top_k=8):\r\n        print(chunk, end=\"\", flush=True)\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n```\r\n\r\n### OCR Integration\r\n\r\n```python\r\n\"\"\"\r\nAdvanced OCR processing for document digitization\r\n\"\"\"\r\nfrom kssrag import KSSRAG, Config\r\nimport os\r\n\r\ndef process_scanned_documents():\r\n    config = Config(\r\n        OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n        OCR_DEFAULT_MODE=\"handwritten\"  # or \"typed\"\r\n    )\r\n    \r\n    rag = KSSRAG(config=config)\r\n    \r\n    # Process various document types\r\n    documents = [\r\n        (\"handwritten_notes.jpg\", \"handwritten\"),\r\n        (\"typed_contract.jpg\", \"typed\"), \r\n        (\"mixed_document.png\", \"handwritten\")\r\n    ]\r\n    \r\n    for doc_path, ocr_mode in documents:\r\n        try:\r\n            print(f\"Processing {doc_path} with {ocr_mode} OCR...\")\r\n            rag.load_document(doc_path, format=\"image\")\r\n            print(f\"Successfully processed {doc_path}\")\r\n        except Exception as e:\r\n            print(f\"Error processing {doc_path}: {str(e)}\")\r\n    \r\n    # Query across all processed documents\r\n    response = rag.query(\r\n        \"Extract and summarize all action items and deadlines\",\r\n        top_k=6\r\n    )\r\n    \r\n    return response\r\n\r\nif __name__ == \"__main__\":\r\n    result = process_scanned_documents()\r\n    print(\"OCR Processing Result:\", result)\r\n```\r\n\r\n## API Reference\r\n\r\n### Core Classes\r\n\r\n#### KSSRAG\r\nThe primary interface for the RAG framework.\r\n\r\n```python\r\nclass KSSRAG:\r\n    \"\"\"\r\n    Main RAG framework class providing document processing and query capabilities.\r\n    \r\n    Attributes:\r\n        config (Config): Framework configuration\r\n        vector_store: Active vector store instance\r\n        retriever: Document retriever instance  \r\n        agent: RAG agent for query processing\r\n        documents (List): Processed document chunks\r\n    \"\"\"\r\n    \r\n    def __init__(self, config: Optional[Config] = None):\r\n        \"\"\"Initialize RAG system with optional configuration\"\"\"\r\n        \r\n    def load_document(self, file_path: str, format: Optional[str] = None,\r\n                     chunker: Optional[Any] = None, metadata: Optional[Dict[str, Any]] = None):\r\n        \"\"\"\r\n        Load and process document for retrieval\r\n        \r\n        Args:\r\n            file_path: Path to document file\r\n            format: Document format (auto-detected if None)\r\n            chunker: Custom chunker instance\r\n            metadata: Additional document metadata\r\n        \"\"\"\r\n        \r\n    def query(self, question: str, top_k: Optional[int] = None) -> str:\r\n        \"\"\"\r\n        Execute query against loaded documents\r\n        \r\n        Args:\r\n            question: Natural language query\r\n            top_k: Number of results to retrieve\r\n            \r\n        Returns:\r\n            Generated response string\r\n        \"\"\"\r\n        \r\n    def create_server(self, server_config=None):\r\n        \"\"\"\r\n        Create FastAPI server instance\r\n        \r\n        Returns:\r\n            FastAPI application instance\r\n        \"\"\"\r\n```\r\n\r\n#### Configuration Management\r\n\r\n```python\r\nclass Config(BaseSettings):\r\n    \"\"\"\r\n    Comprehensive configuration management with validation\r\n    \r\n    Example:\r\n        config = Config(\r\n            OPENROUTER_API_KEY=\"key\",\r\n            VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,\r\n            TOP_K=10\r\n        )\r\n    \"\"\"\r\n    \r\n    # API Configuration\r\n    OPENROUTER_API_KEY: str\r\n    DEFAULT_MODEL: str = \"anthropic/claude-3-sonnet\"\r\n    FALLBACK_MODELS: List[str] = [\"deepseek/deepseek-chat-v3.1:free\"]\r\n    \r\n    # Processing Configuration  \r\n    CHUNK_SIZE: int = 800\r\n    CHUNK_OVERLAP: int = 100\r\n    VECTOR_STORE_TYPE: VectorStoreType = VectorStoreType.HYBRID_OFFLINE\r\n    \r\n    # Performance Configuration\r\n    BATCH_SIZE: int = 64\r\n    ENABLE_CACHE: bool = True\r\n    \r\n    # Server Configuration\r\n    SERVER_HOST: str = \"localhost\"\r\n    SERVER_PORT: int = 8000\r\n```\r\n\r\n### Server Endpoints\r\n\r\n| Endpoint | Method | Description | Parameters |\r\n|----------|--------|-------------|------------|\r\n| `/query` | POST | Execute RAG query | `query`, `session_id` |\r\n| `/stream` | POST | Streaming query | `query`, `session_id` |\r\n| `/health` | GET | System health | - |\r\n| `/config` | GET | Server configuration | - |\r\n| `/sessions/{id}/clear` | GET | Clear session | `session_id` |\r\n\r\n## Deployment\r\n\r\n### Docker Deployment\r\n\r\n```dockerfile\r\n# Dockerfile\r\nFROM python:3.11-slim\r\n\r\n# Install system dependencies\r\nRUN apt-get update && apt-get install -y \\\r\n    tesseract-ocr \\\r\n    libgl1 \\\r\n    && rm -rf /var/lib/apt/lists/*\r\n\r\nWORKDIR /app\r\n\r\n# Copy requirements and install\r\nCOPY requirements.txt .\r\nRUN pip install --no-cache-dir -r requirements.txt\r\n\r\n# Copy application\r\nCOPY . .\r\n\r\n# Create non-privileged user\r\nRUN useradd -m -u 1000 kssrag\r\nUSER kssrag\r\n\r\nEXPOSE 8000\r\n\r\n# Health check\r\nHEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \\\r\n    CMD curl -f http://localhost:8000/health || exit 1\r\n\r\nCMD [\"python\", \"-m\", \"kssrag.cli\", \"server\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\r\n```\r\n\r\n### Kubernetes Deployment\r\n\r\n```yaml\r\n# k8s-deployment.yaml\r\napiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n  name: kssrag\r\n  labels:\r\n    app: kssrag\r\nspec:\r\n  replicas: 3\r\n  selector:\r\n    matchLabels:\r\n      app: kssrag\r\n  template:\r\n    metadata:\r\n      labels:\r\n        app: kssrag\r\n    spec:\r\n      containers:\r\n      - name: kssrag\r\n        image: kssrag:latest\r\n        ports:\r\n        - containerPort: 8000\r\n        env:\r\n        - name: OPENROUTER_API_KEY\r\n          valueFrom:\r\n            secretKeyRef:\r\n              name: kssrag-secrets\r\n              key: openrouter-api-key\r\n        - name: VECTOR_STORE_TYPE\r\n          value: \"hybrid_offline\"\r\n        resources:\r\n          requests:\r\n            memory: \"1Gi\"\r\n            cpu: \"500m\"\r\n          limits:\r\n            memory: \"2Gi\"\r\n            cpu: \"1000m\"\r\n        livenessProbe:\r\n          httpGet:\r\n            path: /health\r\n            port: 8000\r\n          initialDelaySeconds: 30\r\n          periodSeconds: 10\r\n        readinessProbe:\r\n          httpGet:\r\n            path: /health\r\n            port: 8000\r\n          initialDelaySeconds: 5\r\n          periodSeconds: 5\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: kssrag-service\r\nspec:\r\n  selector:\r\n    app: kssrag\r\n  ports:\r\n  - port: 80\r\n    targetPort: 8000\r\n  type: LoadBalancer\r\n```\r\n\r\n### Production Configuration\r\n\r\n```python\r\n# production_config.py\r\nfrom kssrag import Config, VectorStoreType\r\n\r\nproduction_config = Config(\r\n    OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n    VECTOR_STORE_TYPE=VectorStoreType.HYBRID_OFFLINE,\r\n    CHUNK_SIZE=1000,\r\n    TOP_K=8,\r\n    BATCH_SIZE=32,\r\n    ENABLE_CACHE=True,\r\n    CACHE_DIR=\"/var/lib/kssrag/cache\",\r\n    LOG_LEVEL=\"INFO\",\r\n    SERVER_HOST=\"0.0.0.0\",\r\n    SERVER_PORT=8000,\r\n    CORS_ORIGINS=[\r\n        \"https://app.company.com\",\r\n        \"https://api.company.com\"\r\n    ]\r\n)\r\n```\r\n\r\n## Contributing\r\n\r\nWe welcome contributions from the community. Please see our [Contributing Guide](CONTRIBUTING.md) for details.\r\n\r\n### Development Setup\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/Ksschkw/kssrag\r\ncd kssrag\r\n\r\n# Install development dependencies\r\npip install -e .[dev,ocr,all]\r\n\r\n# Run test suite\r\npython -m pytest tests/ -v --cov=kssrag\r\n\r\n# Code quality checks\r\nblack kssrag/ tests/\r\nflake8 kssrag/\r\nmypy kssrag/\r\n\r\n# Build documentation\r\ncd docs && make html\r\n```\r\n\r\n### Code Organization\r\n\r\n```\r\nkssrag/\r\n\u251c\u2500\u2500 core/                   # Core framework components\r\n\u2502   \u251c\u2500\u2500 chunkers.py         # Document segmentation strategies\r\n\u2502   \u251c\u2500\u2500 vectorstores.py     # Vector database implementations\r\n\u2502   \u251c\u2500\u2500 retrievers.py       # Information retrieval algorithms\r\n\u2502   \u2514\u2500\u2500 agents.py           # RAG orchestration logic\r\n\u251c\u2500\u2500 models/                 # LLM provider integrations\r\n\u2502   \u251c\u2500\u2500 openrouter.py       # OpenRouter API client\r\n\u2502   \u2514\u2500\u2500 local_llms.py       # Local LLM implementations\r\n\u251c\u2500\u2500 utils/                  # Utility functions\r\n\u2502   \u251c\u2500\u2500 helpers.py          # Common utilities\r\n\u2502   \u251c\u2500\u2500 document_loaders.py # Document parsing\r\n\u2502   \u251c\u2500\u2500 ocr_loader.py       # OCR processing (PaddleOCR/Tesseract)\r\n\u2502   \u2514\u2500\u2500 preprocessors.py    # Text preprocessing\r\n\u251c\u2500\u2500 config.py               # Configuration management\r\n\u251c\u2500\u2500 server.py               # FastAPI web server\r\n\u251c\u2500\u2500 cli.py                  # Command-line interface\r\n\u2514\u2500\u2500 __init__.py             # Package exports\r\n```\r\n\r\n## Support\r\n\r\n### Documentation\r\n- [**Full Documentation**](https://github.com/Ksschkw/kssrag/docs)\r\n- [**API Reference**](https://github.com/Ksschkw/kssrag/docs/api_reference.md)\r\n- [**Examples Directory**](https://github.com/Ksschkw/kssrag/examples)\r\n\r\n### Community\r\n- [**GitHub Issues**](https://github.com/Ksschkw/kssrag/issues) - Bug reports and feature requests\r\n- [**Discussions**](https://github.com/Ksschkw/kssrag/discussions) - Community support and ideas\r\n- [**Releases**](https://github.com/Ksschkw/kssrag/releases) - Release notes and updates\r\n\r\n### Acknowledgments\r\n\r\nThis project builds upon several outstanding open-source projects:\r\n\r\n- [**FAISS**](https://github.com/facebookresearch/faiss) - Efficient similarity search\r\n- [**PaddleOCR**](https://github.com/PaddlePaddle/PaddleOCR) - Advanced OCR capabilities\r\n- [**SentenceTransformers**](https://github.com/UKPLab/sentence-transformers) - Text embeddings\r\n- [**OpenRouter**](https://openrouter.ai/) - Unified LLM API access\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n\r\n<div align=\"center\">\r\n\r\n**KSS RAG** - Enterprise-Grade Retrieval-Augmented Generation  \r\n*Built with precision for production environments*\r\n\r\n[Get Started](#quick-start) \u2022 [Explore Features](#features) \u2022 [View Examples](#examples)\r\n\r\n</div>\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A flexible Retrieval-Augmented Generation framework by Ksschkw",
    "version": "0.2.1",
    "project_urls": {
        "Bug Reports": "https://github.com/Ksschkw/kssrag/issues",
        "Documentation": "https://github.com/Ksschkw/kssrag/docs",
        "Homepage": "https://github.com/Ksschkw/kssrag",
        "Source": "https://github.com/Ksschkw/kssrag"
    },
    "split_keywords": [
        "rag",
        " retrieval",
        " generation",
        " ai",
        " nlp",
        " faiss",
        " bm25"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c6c61e7e8c42ce705d3623f7a2130c812f52f483c77f15615a2178a837bd84a4",
                "md5": "88081ca22294cedfd5798a584291419a",
                "sha256": "bcfe75427502d0671d8c8abde2b202819586e0dc9d5311645bd64a2808e7a011"
            },
            "downloads": -1,
            "filename": "kssrag-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "88081ca22294cedfd5798a584291419a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 41622,
            "upload_time": "2025-10-27T10:13:43",
            "upload_time_iso_8601": "2025-10-27T10:13:43.125771Z",
            "url": "https://files.pythonhosted.org/packages/c6/c6/1e7e8c42ce705d3623f7a2130c812f52f483c77f15615a2178a837bd84a4/kssrag-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af9f8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04",
                "md5": "2527431c62cc51aafb5b4e0e14dcae14",
                "sha256": "feefe2579e784dde2fc750940433f388bcb10163dd4e94495dd491f0a0baf7ec"
            },
            "downloads": -1,
            "filename": "kssrag-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2527431c62cc51aafb5b4e0e14dcae14",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 41048,
            "upload_time": "2025-10-27T10:13:44",
            "upload_time_iso_8601": "2025-10-27T10:13:44.461884Z",
            "url": "https://files.pythonhosted.org/packages/af/9f/8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04/kssrag-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 10:13:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Ksschkw",
    "github_project": "kssrag",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.104.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "rank-bm25",
            "specs": [
                [
                    ">=",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.0"
                ]
            ]
        },
        {
            "name": "faiss-cpu",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pydantic-settings",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rapidfuzz",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "python-multipart",
            "specs": [
                [
                    ">=",
                    "0.0.6"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "paddleocr",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "paddlepaddle",
            "specs": []
        },
        {
            "name": "python-docx",
            "specs": []
        },
        {
            "name": "bm25S",
            "specs": []
        },
        {
            "name": "pystemmer",
            "specs": []
        },
        {
            "name": "stemmer",
            "specs": []
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "python-pptx",
            "specs": [
                [
                    ">=",
                    "1.0.2"
                ]
            ]
        }
    ],
    "lcname": "kssrag"
}

Ksschkw