# KSS RAG - Knowledge Retrieval Augmented Generation Framework
<div align="center">





**Enterprise-Grade Retrieval-Augmented Generation for Modern Applications**
[Quick Start](#quick-start) • [Features](#features) • [Documentation](#documentation) • [Examples](#examples) • [API Reference](#api-reference)
</div>
## Table of Contents
- [Overview](#overview)
- [Architecture](#architecture)
- [Features](#features)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Core Concepts](#core-concepts)
- [Documentation](#documentation)
- [Examples](#examples)
- [API Reference](#api-reference)
- [Deployment](#deployment)
- [Contributing](#contributing)
- [Support](#support)
- [License](#license)
## Overview
KSS RAG is a production-ready Retrieval-Augmented Generation framework designed for enterprises requiring robust, scalable, and maintainable AI-powered document processing. Built with architectural excellence and engineering rigor, this framework provides comprehensive solutions for knowledge retrieval, document understanding, and intelligent question answering.
### Key Capabilities
- **Multi-Format Document Processing**: Text, PDF, Office documents, images with OCR
- **Advanced Vector Search**: Multiple vector store implementations with hybrid approaches
- **Real-time Streaming**: Token-by-token response streaming for enhanced user experience
- **Enterprise Security**: Comprehensive security and input validation
- **Production Monitoring**: Health checks, metrics, and observability
## Architecture
```mermaid
graph TB
A[Document Input] --> B[Document Loader]
B --> C[Chunker]
C --> D[Vector Store]
D --> E[FAISS Index]
D --> F[BM25 Index]
D --> G[Hybrid Index]
H[User Query] --> I[Query Processor]
I --> J[Retriever]
J --> K[Vector Store]
J --> L[Context Builder]
M[LLM Provider] --> N[OpenRouter]
M --> O[Custom LLMs]
L --> P[Prompt Engineer]
P --> M
M --> Q[Response Generator]
Q --> R[Streaming Output]
Q --> S[Standard Output]
subgraph "Document Processing Pipeline"
B --> C --> D
end
subgraph "Query Processing Pipeline"
I --> J --> L --> P
end
style A fill:#e1f5fe
style H fill:#f3e5f5
style R fill:#e8f5e8
style S fill:#e8f5e8
```
## Features
### 🎯 Core Capabilities
| Feature | Description | Status |
|---------|-------------|--------|
| **Multi-Format Support** | Text, PDF, JSON, DOCX, Excel, PowerPoint, Images | ✅ Production Ready |
| **Advanced OCR** | Handwritten (PaddleOCR) & Typed (Tesseract) text recognition | ✅ Production Ready |
| **Vector Stores** | BM25, BM25S, FAISS, TFIDF, Hybrid implementations | ✅ Production Ready |
| **Streaming Responses** | Real-time token streaming with OpenRouter | ✅ Production Ready |
| **REST API** | FastAPI with comprehensive endpoints | ✅ Production Ready |
| **CLI Interface** | Command-line tools for rapid development | ✅ Production Ready |
### 🔧 Technical Excellence
| Aspect | Implementation | Benefits |
|--------|----------------|----------|
| **Windows Compatibility** | No AVX2 dependencies, hybrid fallbacks | Enterprise deployment |
| **Extensible Architecture** | Plugin system for custom components | Future-proof design |
| **Performance Optimization** | Batch processing, caching, memory management | High throughput |
| **Error Resilience** | Smart fallbacks, retry mechanisms | Production reliability |
### 📊 Performance Metrics
| Operation | Average Latency | Throughput |
|-----------|-----------------|------------|
| Document Indexing | 2-5 sec/1000 chunks | 200+ docs/min |
| Query Processing | 500-1500 ms | 50+ QPS |
| OCR Processing | 1-3 sec/image (Handwritten would take longer) | 20+ images/min (Handwritten would take longer) |
| Streaming Response | 50-200 ms/first token | Real-time |
## Quick Start
### Installation
```bash
# Base installation
pip install kssrag
# With extended capabilities
pip install kssrag[ocr,gpu,dev]
```
### Basic Usage
```python
from kssrag import KSSRAG
import os
# Configure environment
os.environ["OPENROUTER_API_KEY"] = "your-api-key-here"
# Initialize framework
rag = KSSRAG()
# Load knowledge base
rag.load_document("technical_docs.pdf")
rag.load_document("product_specs.docx")
rag.load_document("research_data.xlsx")
# Execute intelligent query
response = rag.query(
"What are the technical specifications and key differentiators?",
top_k=5
)
print(response)
```
### CLI Demonstration
```bash
# Stream processing with hybrid retrieval
python -m kssrag.cli query \
--file enterprise_docs.pdf \
--query "Architecture decisions and rationale" \
--vector-store hybrid_online \
--top-k 8 \
--stream
# Production API server
python -m kssrag.cli server \
--file knowledge_base.docx \
--port 8080 \
--host 0.0.0.0 \
--vector-store faiss
```
## Installation
### System Requirements
| Component | Minimum | Recommended |
|-----------|---------|-------------|
| Python | 3.8+ | 3.11+ |
| RAM | 4 GB | 16 GB |
| Storage | 1 GB | 10 GB+ |
| OS | Windows 10+, Linux, macOS | Linux |
### Installation Methods
**Standard Installation**
```bash
pip install kssrag
```
**Extended Capabilities**
```bash
# OCR functionality (PaddleOCR + Tesseract)
pip install kssrag[ocr]
# GPU acceleration
pip install kssrag[gpu]
# Development tools
pip install kssrag[dev]
# All features
pip install kssrag[all]
```
**Source Installation**
```bash
git clone https://github.com/Ksschkw/kssrag
cd kssrag
pip install -e .[all]
```
### Verification
```python
# Verify installation
import kssrag
from kssrag import KSSRAG
print(f"KSS RAG Version: {kssrag.__version__}")
# Test basic functionality
rag = KSSRAG()
print("Framework initialized successfully")
```
## Core Concepts
### Document Processing Pipeline
```mermaid
sequenceDiagram
participant User
participant System
participant Loader
participant Chunker
participant VectorStore
participant Retriever
participant LLM
User->>System: load_document(file_path)
System->>Loader: parse_document()
Loader->>Chunker: chunk_content()
Chunker->>VectorStore: add_documents()
VectorStore->>System: indexing_complete()
User->>System: query(question)
System->>Retriever: retrieve_relevant()
Retriever->>VectorStore: similarity_search()
VectorStore->>Retriever: relevant_chunks()
Retriever->>LLM: generate_response()
LLM->>System: final_response()
System->>User: display_result()
```
### Vector Store Architecture
```mermaid
graph LR
A[Query] --> B{Vector Store Router}
B --> C[BM25 Store]
B --> D[BM25S Store]
B --> E[FAISS Store]
B --> F[TFIDF Store]
B --> G[Hybrid Online]
B --> H[Hybrid Offline]
C --> I[Keyword Matching]
D --> J[Stemmed Keywords]
E --> K[Semantic Search]
F --> L[Statistical Analysis]
G --> M[FAISS + BM25 Fusion]
H --> N[BM25 + TFIDF Fusion]
I --> O[Results]
J --> O
K --> O
L --> O
M --> O
N --> O
style O fill:#c8e6c9
```
## Documentation
### Comprehensive Guides
- [**Configuration Guide**](docs/configuration.md) - Detailed configuration options and best practices
- [**API Reference**](docs/api_reference.md) - Complete API documentation with examples
- [**Deployment Guide**](docs/deployment.md) - Production deployment strategies
- [**Performance Tuning**](docs/performance.md) - Optimization techniques and benchmarks
### Tutorials
- [**Getting Started**](examples/basic_usage.py) - Basic framework usage
- [**Advanced Features**](examples/advanced_usage.py) - Custom configurations and extensions
- [**Custom Components**](examples/custom_config.py) - Building custom chunkers and vector stores
## Examples
### Basic Implementation
```python
"""
Basic KSS RAG implementation for document Q&A
"""
from kssrag import KSSRAG
import os
def main():
# Configuration
os.environ["OPENROUTER_API_KEY"] = "your-api-key"
# Initialize
rag = KSSRAG()
# Load documents
rag.load_document("technical_manual.pdf")
rag.load_document("api_documentation.md")
# Query system
response = rag.query(
"How do I implement the authentication system?",
top_k=5
)
print("Response:", response)
if __name__ == "__main__":
main()
```
### Advanced Configuration
```python
"""
Enterprise-grade configuration with custom components
"""
from kssrag import KSSRAG, Config, VectorStoreType, RetrieverType
from kssrag.core.agents import RAGAgent
from kssrag.models.openrouter import OpenRouterLLM
import os
def main():
# Enterprise configuration
config = Config(
OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
DEFAULT_MODEL="anthropic/claude-3-sonnet",
VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,
RETRIEVER_TYPE=RetrieverType.HYBRID,
TOP_K=10,
CHUNK_SIZE=1000,
CHUNK_OVERLAP=150,
BATCH_SIZE=32,
ENABLE_CACHE=True,
CACHE_DIR="/opt/kssrag/cache",
LOG_LEVEL="INFO"
)
# Initialize with custom config
rag = KSSRAG(config=config)
# Load enterprise documents
rag.load_document("product_requirements.pdf")
rag.load_document("architecture_docs.docx")
rag.load_document("user_research.json")
# Custom expert prompt
expert_prompt = """
You are a senior technical expert analyzing documentation.
Provide authoritative, precise responses based on the source material.
Focus on actionable insights and technical accuracy.
"""
# Custom agent configuration
llm = OpenRouterLLM(
api_key=config.OPENROUTER_API_KEY,
model=config.DEFAULT_MODEL,
stream=True
)
rag.agent = RAGAgent(
retriever=rag.retriever,
llm=llm,
system_prompt=expert_prompt
)
# Execute complex query
query = """
Analyze the technical architecture and identify:
1. Key design decisions
2. Potential scalability concerns
3. Recommended improvements
"""
print("Processing complex query...")
for chunk in rag.agent.query_stream(query, top_k=8):
print(chunk, end="", flush=True)
if __name__ == "__main__":
main()
```
### OCR Integration
```python
"""
Advanced OCR processing for document digitization
"""
from kssrag import KSSRAG, Config
import os
def process_scanned_documents():
config = Config(
OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
OCR_DEFAULT_MODE="handwritten" # or "typed"
)
rag = KSSRAG(config=config)
# Process various document types
documents = [
("handwritten_notes.jpg", "handwritten"),
("typed_contract.jpg", "typed"),
("mixed_document.png", "handwritten")
]
for doc_path, ocr_mode in documents:
try:
print(f"Processing {doc_path} with {ocr_mode} OCR...")
rag.load_document(doc_path, format="image")
print(f"Successfully processed {doc_path}")
except Exception as e:
print(f"Error processing {doc_path}: {str(e)}")
# Query across all processed documents
response = rag.query(
"Extract and summarize all action items and deadlines",
top_k=6
)
return response
if __name__ == "__main__":
result = process_scanned_documents()
print("OCR Processing Result:", result)
```
## API Reference
### Core Classes
#### KSSRAG
The primary interface for the RAG framework.
```python
class KSSRAG:
"""
Main RAG framework class providing document processing and query capabilities.
Attributes:
config (Config): Framework configuration
vector_store: Active vector store instance
retriever: Document retriever instance
agent: RAG agent for query processing
documents (List): Processed document chunks
"""
def __init__(self, config: Optional[Config] = None):
"""Initialize RAG system with optional configuration"""
def load_document(self, file_path: str, format: Optional[str] = None,
chunker: Optional[Any] = None, metadata: Optional[Dict[str, Any]] = None):
"""
Load and process document for retrieval
Args:
file_path: Path to document file
format: Document format (auto-detected if None)
chunker: Custom chunker instance
metadata: Additional document metadata
"""
def query(self, question: str, top_k: Optional[int] = None) -> str:
"""
Execute query against loaded documents
Args:
question: Natural language query
top_k: Number of results to retrieve
Returns:
Generated response string
"""
def create_server(self, server_config=None):
"""
Create FastAPI server instance
Returns:
FastAPI application instance
"""
```
#### Configuration Management
```python
class Config(BaseSettings):
"""
Comprehensive configuration management with validation
Example:
config = Config(
OPENROUTER_API_KEY="key",
VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,
TOP_K=10
)
"""
# API Configuration
OPENROUTER_API_KEY: str
DEFAULT_MODEL: str = "anthropic/claude-3-sonnet"
FALLBACK_MODELS: List[str] = ["deepseek/deepseek-chat-v3.1:free"]
# Processing Configuration
CHUNK_SIZE: int = 800
CHUNK_OVERLAP: int = 100
VECTOR_STORE_TYPE: VectorStoreType = VectorStoreType.HYBRID_OFFLINE
# Performance Configuration
BATCH_SIZE: int = 64
ENABLE_CACHE: bool = True
# Server Configuration
SERVER_HOST: str = "localhost"
SERVER_PORT: int = 8000
```
### Server Endpoints
| Endpoint | Method | Description | Parameters |
|----------|--------|-------------|------------|
| `/query` | POST | Execute RAG query | `query`, `session_id` |
| `/stream` | POST | Streaming query | `query`, `session_id` |
| `/health` | GET | System health | - |
| `/config` | GET | Server configuration | - |
| `/sessions/{id}/clear` | GET | Clear session | `session_id` |
## Deployment
### Docker Deployment
```dockerfile
# Dockerfile
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
tesseract-ocr \
libgl1 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy requirements and install
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-privileged user
RUN useradd -m -u 1000 kssrag
USER kssrag
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
CMD ["python", "-m", "kssrag.cli", "server", "--host", "0.0.0.0", "--port", "8000"]
```
### Kubernetes Deployment
```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kssrag
labels:
app: kssrag
spec:
replicas: 3
selector:
matchLabels:
app: kssrag
template:
metadata:
labels:
app: kssrag
spec:
containers:
- name: kssrag
image: kssrag:latest
ports:
- containerPort: 8000
env:
- name: OPENROUTER_API_KEY
valueFrom:
secretKeyRef:
name: kssrag-secrets
key: openrouter-api-key
- name: VECTOR_STORE_TYPE
value: "hybrid_offline"
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: kssrag-service
spec:
selector:
app: kssrag
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
```
### Production Configuration
```python
# production_config.py
from kssrag import Config, VectorStoreType
production_config = Config(
OPENROUTER_API_KEY=os.getenv("OPENROUTER_API_KEY"),
VECTOR_STORE_TYPE=VectorStoreType.HYBRID_OFFLINE,
CHUNK_SIZE=1000,
TOP_K=8,
BATCH_SIZE=32,
ENABLE_CACHE=True,
CACHE_DIR="/var/lib/kssrag/cache",
LOG_LEVEL="INFO",
SERVER_HOST="0.0.0.0",
SERVER_PORT=8000,
CORS_ORIGINS=[
"https://app.company.com",
"https://api.company.com"
]
)
```
## Contributing
We welcome contributions from the community. Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone repository
git clone https://github.com/Ksschkw/kssrag
cd kssrag
# Install development dependencies
pip install -e .[dev,ocr,all]
# Run test suite
python -m pytest tests/ -v --cov=kssrag
# Code quality checks
black kssrag/ tests/
flake8 kssrag/
mypy kssrag/
# Build documentation
cd docs && make html
```
### Code Organization
```
kssrag/
├── core/ # Core framework components
│ ├── chunkers.py # Document segmentation strategies
│ ├── vectorstores.py # Vector database implementations
│ ├── retrievers.py # Information retrieval algorithms
│ └── agents.py # RAG orchestration logic
├── models/ # LLM provider integrations
│ ├── openrouter.py # OpenRouter API client
│ └── local_llms.py # Local LLM implementations
├── utils/ # Utility functions
│ ├── helpers.py # Common utilities
│ ├── document_loaders.py # Document parsing
│ ├── ocr_loader.py # OCR processing (PaddleOCR/Tesseract)
│ └── preprocessors.py # Text preprocessing
├── config.py # Configuration management
├── server.py # FastAPI web server
├── cli.py # Command-line interface
└── __init__.py # Package exports
```
## Support
### Documentation
- [**Full Documentation**](https://github.com/Ksschkw/kssrag/docs)
- [**API Reference**](https://github.com/Ksschkw/kssrag/docs/api_reference.md)
- [**Examples Directory**](https://github.com/Ksschkw/kssrag/examples)
### Community
- [**GitHub Issues**](https://github.com/Ksschkw/kssrag/issues) - Bug reports and feature requests
- [**Discussions**](https://github.com/Ksschkw/kssrag/discussions) - Community support and ideas
- [**Releases**](https://github.com/Ksschkw/kssrag/releases) - Release notes and updates
### Acknowledgments
This project builds upon several outstanding open-source projects:
- [**FAISS**](https://github.com/facebookresearch/faiss) - Efficient similarity search
- [**PaddleOCR**](https://github.com/PaddlePaddle/PaddleOCR) - Advanced OCR capabilities
- [**SentenceTransformers**](https://github.com/UKPLab/sentence-transformers) - Text embeddings
- [**OpenRouter**](https://openrouter.ai/) - Unified LLM API access
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
<div align="center">
**KSS RAG** - Enterprise-Grade Retrieval-Augmented Generation
*Built with precision for production environments*
[Get Started](#quick-start) • [Explore Features](#features) • [View Examples](#examples)
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/Ksschkw/kssrag",
"name": "kssrag",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": null,
"keywords": "rag, retrieval, generation, ai, nlp, faiss, bm25",
"author": "Ksschkw",
"author_email": "kookafor893@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/af/9f/8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04/kssrag-0.2.1.tar.gz",
"platform": null,
"description": "# KSS RAG - Knowledge Retrieval Augmented Generation Framework\r\n\r\n<div align=\"center\">\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n**Enterprise-Grade Retrieval-Augmented Generation for Modern Applications**\r\n\r\n[Quick Start](#quick-start) \u2022 [Features](#features) \u2022 [Documentation](#documentation) \u2022 [Examples](#examples) \u2022 [API Reference](#api-reference)\r\n\r\n</div>\r\n\r\n## Table of Contents\r\n\r\n- [Overview](#overview)\r\n- [Architecture](#architecture)\r\n- [Features](#features)\r\n- [Quick Start](#quick-start)\r\n- [Installation](#installation)\r\n- [Core Concepts](#core-concepts)\r\n- [Documentation](#documentation)\r\n- [Examples](#examples)\r\n- [API Reference](#api-reference)\r\n- [Deployment](#deployment)\r\n- [Contributing](#contributing)\r\n- [Support](#support)\r\n- [License](#license)\r\n\r\n## Overview\r\n\r\nKSS RAG is a production-ready Retrieval-Augmented Generation framework designed for enterprises requiring robust, scalable, and maintainable AI-powered document processing. Built with architectural excellence and engineering rigor, this framework provides comprehensive solutions for knowledge retrieval, document understanding, and intelligent question answering.\r\n\r\n### Key Capabilities\r\n\r\n- **Multi-Format Document Processing**: Text, PDF, Office documents, images with OCR\r\n- **Advanced Vector Search**: Multiple vector store implementations with hybrid approaches\r\n- **Real-time Streaming**: Token-by-token response streaming for enhanced user experience\r\n- **Enterprise Security**: Comprehensive security and input validation\r\n- **Production Monitoring**: Health checks, metrics, and observability\r\n\r\n## Architecture\r\n\r\n```mermaid\r\ngraph TB\r\n A[Document Input] --> B[Document Loader]\r\n B --> C[Chunker]\r\n C --> D[Vector Store]\r\n D --> E[FAISS Index]\r\n D --> F[BM25 Index]\r\n D --> G[Hybrid Index]\r\n \r\n H[User Query] --> I[Query Processor]\r\n I --> J[Retriever]\r\n J --> K[Vector Store]\r\n J --> L[Context Builder]\r\n \r\n M[LLM Provider] --> N[OpenRouter]\r\n M --> O[Custom LLMs]\r\n \r\n L --> P[Prompt Engineer]\r\n P --> M\r\n M --> Q[Response Generator]\r\n Q --> R[Streaming Output]\r\n Q --> S[Standard Output]\r\n \r\n subgraph \"Document Processing Pipeline\"\r\n B --> C --> D\r\n end\r\n \r\n subgraph \"Query Processing Pipeline\"\r\n I --> J --> L --> P\r\n end\r\n \r\n style A fill:#e1f5fe\r\n style H fill:#f3e5f5\r\n style R fill:#e8f5e8\r\n style S fill:#e8f5e8\r\n```\r\n\r\n## Features\r\n\r\n### \ud83c\udfaf Core Capabilities\r\n\r\n| Feature | Description | Status |\r\n|---------|-------------|--------|\r\n| **Multi-Format Support** | Text, PDF, JSON, DOCX, Excel, PowerPoint, Images | \u2705 Production Ready |\r\n| **Advanced OCR** | Handwritten (PaddleOCR) & Typed (Tesseract) text recognition | \u2705 Production Ready |\r\n| **Vector Stores** | BM25, BM25S, FAISS, TFIDF, Hybrid implementations | \u2705 Production Ready |\r\n| **Streaming Responses** | Real-time token streaming with OpenRouter | \u2705 Production Ready |\r\n| **REST API** | FastAPI with comprehensive endpoints | \u2705 Production Ready |\r\n| **CLI Interface** | Command-line tools for rapid development | \u2705 Production Ready |\r\n\r\n### \ud83d\udd27 Technical Excellence\r\n\r\n| Aspect | Implementation | Benefits |\r\n|--------|----------------|----------|\r\n| **Windows Compatibility** | No AVX2 dependencies, hybrid fallbacks | Enterprise deployment |\r\n| **Extensible Architecture** | Plugin system for custom components | Future-proof design |\r\n| **Performance Optimization** | Batch processing, caching, memory management | High throughput |\r\n| **Error Resilience** | Smart fallbacks, retry mechanisms | Production reliability |\r\n\r\n### \ud83d\udcca Performance Metrics\r\n\r\n| Operation | Average Latency | Throughput |\r\n|-----------|-----------------|------------|\r\n| Document Indexing | 2-5 sec/1000 chunks | 200+ docs/min |\r\n| Query Processing | 500-1500 ms | 50+ QPS |\r\n| OCR Processing | 1-3 sec/image (Handwritten would take longer) | 20+ images/min (Handwritten would take longer) |\r\n| Streaming Response | 50-200 ms/first token | Real-time |\r\n\r\n## Quick Start\r\n\r\n### Installation\r\n\r\n```bash\r\n# Base installation\r\npip install kssrag\r\n\r\n# With extended capabilities\r\npip install kssrag[ocr,gpu,dev]\r\n```\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom kssrag import KSSRAG\r\nimport os\r\n\r\n# Configure environment\r\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-api-key-here\"\r\n\r\n# Initialize framework\r\nrag = KSSRAG()\r\n\r\n# Load knowledge base\r\nrag.load_document(\"technical_docs.pdf\")\r\nrag.load_document(\"product_specs.docx\") \r\nrag.load_document(\"research_data.xlsx\")\r\n\r\n# Execute intelligent query\r\nresponse = rag.query(\r\n \"What are the technical specifications and key differentiators?\",\r\n top_k=5\r\n)\r\nprint(response)\r\n```\r\n\r\n### CLI Demonstration\r\n\r\n```bash\r\n# Stream processing with hybrid retrieval\r\npython -m kssrag.cli query \\\r\n --file enterprise_docs.pdf \\\r\n --query \"Architecture decisions and rationale\" \\\r\n --vector-store hybrid_online \\\r\n --top-k 8 \\\r\n --stream\r\n\r\n# Production API server\r\npython -m kssrag.cli server \\\r\n --file knowledge_base.docx \\\r\n --port 8080 \\\r\n --host 0.0.0.0 \\\r\n --vector-store faiss\r\n```\r\n\r\n## Installation\r\n\r\n### System Requirements\r\n\r\n| Component | Minimum | Recommended |\r\n|-----------|---------|-------------|\r\n| Python | 3.8+ | 3.11+ |\r\n| RAM | 4 GB | 16 GB |\r\n| Storage | 1 GB | 10 GB+ |\r\n| OS | Windows 10+, Linux, macOS | Linux |\r\n\r\n### Installation Methods\r\n\r\n**Standard Installation**\r\n```bash\r\npip install kssrag\r\n```\r\n\r\n**Extended Capabilities**\r\n```bash\r\n# OCR functionality (PaddleOCR + Tesseract)\r\npip install kssrag[ocr]\r\n\r\n# GPU acceleration\r\npip install kssrag[gpu]\r\n\r\n# Development tools\r\npip install kssrag[dev]\r\n\r\n# All features\r\npip install kssrag[all]\r\n```\r\n\r\n**Source Installation**\r\n```bash\r\ngit clone https://github.com/Ksschkw/kssrag\r\ncd kssrag\r\npip install -e .[all]\r\n```\r\n\r\n### Verification\r\n\r\n```python\r\n# Verify installation\r\nimport kssrag\r\nfrom kssrag import KSSRAG\r\n\r\nprint(f\"KSS RAG Version: {kssrag.__version__}\")\r\n\r\n# Test basic functionality\r\nrag = KSSRAG()\r\nprint(\"Framework initialized successfully\")\r\n```\r\n\r\n## Core Concepts\r\n\r\n### Document Processing Pipeline\r\n\r\n```mermaid\r\nsequenceDiagram\r\n participant User\r\n participant System\r\n participant Loader\r\n participant Chunker\r\n participant VectorStore\r\n participant Retriever\r\n participant LLM\r\n\r\n User->>System: load_document(file_path)\r\n System->>Loader: parse_document()\r\n Loader->>Chunker: chunk_content()\r\n Chunker->>VectorStore: add_documents()\r\n VectorStore->>System: indexing_complete()\r\n \r\n User->>System: query(question)\r\n System->>Retriever: retrieve_relevant()\r\n Retriever->>VectorStore: similarity_search()\r\n VectorStore->>Retriever: relevant_chunks()\r\n Retriever->>LLM: generate_response()\r\n LLM->>System: final_response()\r\n System->>User: display_result()\r\n```\r\n\r\n### Vector Store Architecture\r\n\r\n```mermaid\r\ngraph LR\r\n A[Query] --> B{Vector Store Router}\r\n \r\n B --> C[BM25 Store]\r\n B --> D[BM25S Store]\r\n B --> E[FAISS Store]\r\n B --> F[TFIDF Store]\r\n B --> G[Hybrid Online]\r\n B --> H[Hybrid Offline]\r\n \r\n C --> I[Keyword Matching]\r\n D --> J[Stemmed Keywords]\r\n E --> K[Semantic Search]\r\n F --> L[Statistical Analysis]\r\n \r\n G --> M[FAISS + BM25 Fusion]\r\n H --> N[BM25 + TFIDF Fusion]\r\n \r\n I --> O[Results]\r\n J --> O\r\n K --> O\r\n L --> O\r\n M --> O\r\n N --> O\r\n \r\n style O fill:#c8e6c9\r\n```\r\n\r\n## Documentation\r\n\r\n### Comprehensive Guides\r\n\r\n- [**Configuration Guide**](docs/configuration.md) - Detailed configuration options and best practices\r\n- [**API Reference**](docs/api_reference.md) - Complete API documentation with examples\r\n- [**Deployment Guide**](docs/deployment.md) - Production deployment strategies\r\n- [**Performance Tuning**](docs/performance.md) - Optimization techniques and benchmarks\r\n\r\n### Tutorials\r\n\r\n- [**Getting Started**](examples/basic_usage.py) - Basic framework usage\r\n- [**Advanced Features**](examples/advanced_usage.py) - Custom configurations and extensions\r\n- [**Custom Components**](examples/custom_config.py) - Building custom chunkers and vector stores\r\n\r\n## Examples\r\n\r\n### Basic Implementation\r\n\r\n```python\r\n\"\"\"\r\nBasic KSS RAG implementation for document Q&A\r\n\"\"\"\r\nfrom kssrag import KSSRAG\r\nimport os\r\n\r\ndef main():\r\n # Configuration\r\n os.environ[\"OPENROUTER_API_KEY\"] = \"your-api-key\"\r\n \r\n # Initialize\r\n rag = KSSRAG()\r\n \r\n # Load documents\r\n rag.load_document(\"technical_manual.pdf\")\r\n rag.load_document(\"api_documentation.md\")\r\n \r\n # Query system\r\n response = rag.query(\r\n \"How do I implement the authentication system?\",\r\n top_k=5\r\n )\r\n \r\n print(\"Response:\", response)\r\n\r\nif __name__ == \"__main__\":\r\n main()\r\n```\r\n\r\n### Advanced Configuration\r\n\r\n```python\r\n\"\"\"\r\nEnterprise-grade configuration with custom components\r\n\"\"\"\r\nfrom kssrag import KSSRAG, Config, VectorStoreType, RetrieverType\r\nfrom kssrag.core.agents import RAGAgent\r\nfrom kssrag.models.openrouter import OpenRouterLLM\r\nimport os\r\n\r\ndef main():\r\n # Enterprise configuration\r\n config = Config(\r\n OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n DEFAULT_MODEL=\"anthropic/claude-3-sonnet\",\r\n VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,\r\n RETRIEVER_TYPE=RetrieverType.HYBRID,\r\n TOP_K=10,\r\n CHUNK_SIZE=1000,\r\n CHUNK_OVERLAP=150,\r\n BATCH_SIZE=32,\r\n ENABLE_CACHE=True,\r\n CACHE_DIR=\"/opt/kssrag/cache\",\r\n LOG_LEVEL=\"INFO\"\r\n )\r\n \r\n # Initialize with custom config\r\n rag = KSSRAG(config=config)\r\n \r\n # Load enterprise documents\r\n rag.load_document(\"product_requirements.pdf\")\r\n rag.load_document(\"architecture_docs.docx\")\r\n rag.load_document(\"user_research.json\")\r\n \r\n # Custom expert prompt\r\n expert_prompt = \"\"\"\r\n You are a senior technical expert analyzing documentation. \r\n Provide authoritative, precise responses based on the source material.\r\n Focus on actionable insights and technical accuracy.\r\n \"\"\"\r\n \r\n # Custom agent configuration\r\n llm = OpenRouterLLM(\r\n api_key=config.OPENROUTER_API_KEY,\r\n model=config.DEFAULT_MODEL,\r\n stream=True\r\n )\r\n \r\n rag.agent = RAGAgent(\r\n retriever=rag.retriever,\r\n llm=llm,\r\n system_prompt=expert_prompt\r\n )\r\n \r\n # Execute complex query\r\n query = \"\"\"\r\n Analyze the technical architecture and identify:\r\n 1. Key design decisions\r\n 2. Potential scalability concerns \r\n 3. Recommended improvements\r\n \"\"\"\r\n \r\n print(\"Processing complex query...\")\r\n for chunk in rag.agent.query_stream(query, top_k=8):\r\n print(chunk, end=\"\", flush=True)\r\n\r\nif __name__ == \"__main__\":\r\n main()\r\n```\r\n\r\n### OCR Integration\r\n\r\n```python\r\n\"\"\"\r\nAdvanced OCR processing for document digitization\r\n\"\"\"\r\nfrom kssrag import KSSRAG, Config\r\nimport os\r\n\r\ndef process_scanned_documents():\r\n config = Config(\r\n OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n OCR_DEFAULT_MODE=\"handwritten\" # or \"typed\"\r\n )\r\n \r\n rag = KSSRAG(config=config)\r\n \r\n # Process various document types\r\n documents = [\r\n (\"handwritten_notes.jpg\", \"handwritten\"),\r\n (\"typed_contract.jpg\", \"typed\"), \r\n (\"mixed_document.png\", \"handwritten\")\r\n ]\r\n \r\n for doc_path, ocr_mode in documents:\r\n try:\r\n print(f\"Processing {doc_path} with {ocr_mode} OCR...\")\r\n rag.load_document(doc_path, format=\"image\")\r\n print(f\"Successfully processed {doc_path}\")\r\n except Exception as e:\r\n print(f\"Error processing {doc_path}: {str(e)}\")\r\n \r\n # Query across all processed documents\r\n response = rag.query(\r\n \"Extract and summarize all action items and deadlines\",\r\n top_k=6\r\n )\r\n \r\n return response\r\n\r\nif __name__ == \"__main__\":\r\n result = process_scanned_documents()\r\n print(\"OCR Processing Result:\", result)\r\n```\r\n\r\n## API Reference\r\n\r\n### Core Classes\r\n\r\n#### KSSRAG\r\nThe primary interface for the RAG framework.\r\n\r\n```python\r\nclass KSSRAG:\r\n \"\"\"\r\n Main RAG framework class providing document processing and query capabilities.\r\n \r\n Attributes:\r\n config (Config): Framework configuration\r\n vector_store: Active vector store instance\r\n retriever: Document retriever instance \r\n agent: RAG agent for query processing\r\n documents (List): Processed document chunks\r\n \"\"\"\r\n \r\n def __init__(self, config: Optional[Config] = None):\r\n \"\"\"Initialize RAG system with optional configuration\"\"\"\r\n \r\n def load_document(self, file_path: str, format: Optional[str] = None,\r\n chunker: Optional[Any] = None, metadata: Optional[Dict[str, Any]] = None):\r\n \"\"\"\r\n Load and process document for retrieval\r\n \r\n Args:\r\n file_path: Path to document file\r\n format: Document format (auto-detected if None)\r\n chunker: Custom chunker instance\r\n metadata: Additional document metadata\r\n \"\"\"\r\n \r\n def query(self, question: str, top_k: Optional[int] = None) -> str:\r\n \"\"\"\r\n Execute query against loaded documents\r\n \r\n Args:\r\n question: Natural language query\r\n top_k: Number of results to retrieve\r\n \r\n Returns:\r\n Generated response string\r\n \"\"\"\r\n \r\n def create_server(self, server_config=None):\r\n \"\"\"\r\n Create FastAPI server instance\r\n \r\n Returns:\r\n FastAPI application instance\r\n \"\"\"\r\n```\r\n\r\n#### Configuration Management\r\n\r\n```python\r\nclass Config(BaseSettings):\r\n \"\"\"\r\n Comprehensive configuration management with validation\r\n \r\n Example:\r\n config = Config(\r\n OPENROUTER_API_KEY=\"key\",\r\n VECTOR_STORE_TYPE=VectorStoreType.HYBRID_ONLINE,\r\n TOP_K=10\r\n )\r\n \"\"\"\r\n \r\n # API Configuration\r\n OPENROUTER_API_KEY: str\r\n DEFAULT_MODEL: str = \"anthropic/claude-3-sonnet\"\r\n FALLBACK_MODELS: List[str] = [\"deepseek/deepseek-chat-v3.1:free\"]\r\n \r\n # Processing Configuration \r\n CHUNK_SIZE: int = 800\r\n CHUNK_OVERLAP: int = 100\r\n VECTOR_STORE_TYPE: VectorStoreType = VectorStoreType.HYBRID_OFFLINE\r\n \r\n # Performance Configuration\r\n BATCH_SIZE: int = 64\r\n ENABLE_CACHE: bool = True\r\n \r\n # Server Configuration\r\n SERVER_HOST: str = \"localhost\"\r\n SERVER_PORT: int = 8000\r\n```\r\n\r\n### Server Endpoints\r\n\r\n| Endpoint | Method | Description | Parameters |\r\n|----------|--------|-------------|------------|\r\n| `/query` | POST | Execute RAG query | `query`, `session_id` |\r\n| `/stream` | POST | Streaming query | `query`, `session_id` |\r\n| `/health` | GET | System health | - |\r\n| `/config` | GET | Server configuration | - |\r\n| `/sessions/{id}/clear` | GET | Clear session | `session_id` |\r\n\r\n## Deployment\r\n\r\n### Docker Deployment\r\n\r\n```dockerfile\r\n# Dockerfile\r\nFROM python:3.11-slim\r\n\r\n# Install system dependencies\r\nRUN apt-get update && apt-get install -y \\\r\n tesseract-ocr \\\r\n libgl1 \\\r\n && rm -rf /var/lib/apt/lists/*\r\n\r\nWORKDIR /app\r\n\r\n# Copy requirements and install\r\nCOPY requirements.txt .\r\nRUN pip install --no-cache-dir -r requirements.txt\r\n\r\n# Copy application\r\nCOPY . .\r\n\r\n# Create non-privileged user\r\nRUN useradd -m -u 1000 kssrag\r\nUSER kssrag\r\n\r\nEXPOSE 8000\r\n\r\n# Health check\r\nHEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \\\r\n CMD curl -f http://localhost:8000/health || exit 1\r\n\r\nCMD [\"python\", \"-m\", \"kssrag.cli\", \"server\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\r\n```\r\n\r\n### Kubernetes Deployment\r\n\r\n```yaml\r\n# k8s-deployment.yaml\r\napiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: kssrag\r\n labels:\r\n app: kssrag\r\nspec:\r\n replicas: 3\r\n selector:\r\n matchLabels:\r\n app: kssrag\r\n template:\r\n metadata:\r\n labels:\r\n app: kssrag\r\n spec:\r\n containers:\r\n - name: kssrag\r\n image: kssrag:latest\r\n ports:\r\n - containerPort: 8000\r\n env:\r\n - name: OPENROUTER_API_KEY\r\n valueFrom:\r\n secretKeyRef:\r\n name: kssrag-secrets\r\n key: openrouter-api-key\r\n - name: VECTOR_STORE_TYPE\r\n value: \"hybrid_offline\"\r\n resources:\r\n requests:\r\n memory: \"1Gi\"\r\n cpu: \"500m\"\r\n limits:\r\n memory: \"2Gi\"\r\n cpu: \"1000m\"\r\n livenessProbe:\r\n httpGet:\r\n path: /health\r\n port: 8000\r\n initialDelaySeconds: 30\r\n periodSeconds: 10\r\n readinessProbe:\r\n httpGet:\r\n path: /health\r\n port: 8000\r\n initialDelaySeconds: 5\r\n periodSeconds: 5\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n name: kssrag-service\r\nspec:\r\n selector:\r\n app: kssrag\r\n ports:\r\n - port: 80\r\n targetPort: 8000\r\n type: LoadBalancer\r\n```\r\n\r\n### Production Configuration\r\n\r\n```python\r\n# production_config.py\r\nfrom kssrag import Config, VectorStoreType\r\n\r\nproduction_config = Config(\r\n OPENROUTER_API_KEY=os.getenv(\"OPENROUTER_API_KEY\"),\r\n VECTOR_STORE_TYPE=VectorStoreType.HYBRID_OFFLINE,\r\n CHUNK_SIZE=1000,\r\n TOP_K=8,\r\n BATCH_SIZE=32,\r\n ENABLE_CACHE=True,\r\n CACHE_DIR=\"/var/lib/kssrag/cache\",\r\n LOG_LEVEL=\"INFO\",\r\n SERVER_HOST=\"0.0.0.0\",\r\n SERVER_PORT=8000,\r\n CORS_ORIGINS=[\r\n \"https://app.company.com\",\r\n \"https://api.company.com\"\r\n ]\r\n)\r\n```\r\n\r\n## Contributing\r\n\r\nWe welcome contributions from the community. Please see our [Contributing Guide](CONTRIBUTING.md) for details.\r\n\r\n### Development Setup\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/Ksschkw/kssrag\r\ncd kssrag\r\n\r\n# Install development dependencies\r\npip install -e .[dev,ocr,all]\r\n\r\n# Run test suite\r\npython -m pytest tests/ -v --cov=kssrag\r\n\r\n# Code quality checks\r\nblack kssrag/ tests/\r\nflake8 kssrag/\r\nmypy kssrag/\r\n\r\n# Build documentation\r\ncd docs && make html\r\n```\r\n\r\n### Code Organization\r\n\r\n```\r\nkssrag/\r\n\u251c\u2500\u2500 core/ # Core framework components\r\n\u2502 \u251c\u2500\u2500 chunkers.py # Document segmentation strategies\r\n\u2502 \u251c\u2500\u2500 vectorstores.py # Vector database implementations\r\n\u2502 \u251c\u2500\u2500 retrievers.py # Information retrieval algorithms\r\n\u2502 \u2514\u2500\u2500 agents.py # RAG orchestration logic\r\n\u251c\u2500\u2500 models/ # LLM provider integrations\r\n\u2502 \u251c\u2500\u2500 openrouter.py # OpenRouter API client\r\n\u2502 \u2514\u2500\u2500 local_llms.py # Local LLM implementations\r\n\u251c\u2500\u2500 utils/ # Utility functions\r\n\u2502 \u251c\u2500\u2500 helpers.py # Common utilities\r\n\u2502 \u251c\u2500\u2500 document_loaders.py # Document parsing\r\n\u2502 \u251c\u2500\u2500 ocr_loader.py # OCR processing (PaddleOCR/Tesseract)\r\n\u2502 \u2514\u2500\u2500 preprocessors.py # Text preprocessing\r\n\u251c\u2500\u2500 config.py # Configuration management\r\n\u251c\u2500\u2500 server.py # FastAPI web server\r\n\u251c\u2500\u2500 cli.py # Command-line interface\r\n\u2514\u2500\u2500 __init__.py # Package exports\r\n```\r\n\r\n## Support\r\n\r\n### Documentation\r\n- [**Full Documentation**](https://github.com/Ksschkw/kssrag/docs)\r\n- [**API Reference**](https://github.com/Ksschkw/kssrag/docs/api_reference.md)\r\n- [**Examples Directory**](https://github.com/Ksschkw/kssrag/examples)\r\n\r\n### Community\r\n- [**GitHub Issues**](https://github.com/Ksschkw/kssrag/issues) - Bug reports and feature requests\r\n- [**Discussions**](https://github.com/Ksschkw/kssrag/discussions) - Community support and ideas\r\n- [**Releases**](https://github.com/Ksschkw/kssrag/releases) - Release notes and updates\r\n\r\n### Acknowledgments\r\n\r\nThis project builds upon several outstanding open-source projects:\r\n\r\n- [**FAISS**](https://github.com/facebookresearch/faiss) - Efficient similarity search\r\n- [**PaddleOCR**](https://github.com/PaddlePaddle/PaddleOCR) - Advanced OCR capabilities\r\n- [**SentenceTransformers**](https://github.com/UKPLab/sentence-transformers) - Text embeddings\r\n- [**OpenRouter**](https://openrouter.ai/) - Unified LLM API access\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n\r\n<div align=\"center\">\r\n\r\n**KSS RAG** - Enterprise-Grade Retrieval-Augmented Generation \r\n*Built with precision for production environments*\r\n\r\n[Get Started](#quick-start) \u2022 [Explore Features](#features) \u2022 [View Examples](#examples)\r\n\r\n</div>\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A flexible Retrieval-Augmented Generation framework by Ksschkw",
"version": "0.2.1",
"project_urls": {
"Bug Reports": "https://github.com/Ksschkw/kssrag/issues",
"Documentation": "https://github.com/Ksschkw/kssrag/docs",
"Homepage": "https://github.com/Ksschkw/kssrag",
"Source": "https://github.com/Ksschkw/kssrag"
},
"split_keywords": [
"rag",
" retrieval",
" generation",
" ai",
" nlp",
" faiss",
" bm25"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c6c61e7e8c42ce705d3623f7a2130c812f52f483c77f15615a2178a837bd84a4",
"md5": "88081ca22294cedfd5798a584291419a",
"sha256": "bcfe75427502d0671d8c8abde2b202819586e0dc9d5311645bd64a2808e7a011"
},
"downloads": -1,
"filename": "kssrag-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "88081ca22294cedfd5798a584291419a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 41622,
"upload_time": "2025-10-27T10:13:43",
"upload_time_iso_8601": "2025-10-27T10:13:43.125771Z",
"url": "https://files.pythonhosted.org/packages/c6/c6/1e7e8c42ce705d3623f7a2130c812f52f483c77f15615a2178a837bd84a4/kssrag-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "af9f8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04",
"md5": "2527431c62cc51aafb5b4e0e14dcae14",
"sha256": "feefe2579e784dde2fc750940433f388bcb10163dd4e94495dd491f0a0baf7ec"
},
"downloads": -1,
"filename": "kssrag-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "2527431c62cc51aafb5b4e0e14dcae14",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 41048,
"upload_time": "2025-10-27T10:13:44",
"upload_time_iso_8601": "2025-10-27T10:13:44.461884Z",
"url": "https://files.pythonhosted.org/packages/af/9f/8bcdcf9f282041d51a01ff00cf8aecb7aa59fb4f0ffda3d39d8950ddea04/kssrag-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-27 10:13:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ksschkw",
"github_project": "kssrag",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "fastapi",
"specs": [
[
">=",
"0.104.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "rank-bm25",
"specs": [
[
">=",
"0.2.2"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26.0"
]
]
},
{
"name": "faiss-cpu",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "rapidfuzz",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "python-multipart",
"specs": [
[
">=",
"0.0.6"
]
]
},
{
"name": "pypdf",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "paddleocr",
"specs": []
},
{
"name": "Pillow",
"specs": []
},
{
"name": "paddlepaddle",
"specs": []
},
{
"name": "python-docx",
"specs": []
},
{
"name": "bm25S",
"specs": []
},
{
"name": "pystemmer",
"specs": []
},
{
"name": "stemmer",
"specs": []
},
{
"name": "openpyxl",
"specs": [
[
">=",
"3.1.5"
]
]
},
{
"name": "python-pptx",
"specs": [
[
">=",
"1.0.2"
]
]
}
],
"lcname": "kssrag"
}