ragpdf

Name	ragpdf JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/alfredwallace7/ragpdf
Summary	Retrive PDF files context for your LLMs
upload_time	2025-01-18 21:41:27
maintainer	None
docs_url	None
author	Alfred Wallace
requires_python	>=3.8
license	None
keywords	rag pdf llm embeddings vector-search faiss context retrieval augmented generation
VCS
bugtrack_url
requirements	litellm faiss-cpu PyPDF2 numpy pydantic python-dotenv
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # RAGPDF

A Python package for Retrieval-Augmented Generation (RAG) using PDFs. RAGPDF makes it easy to extract, embed, and query content from PDF documents using modern language models.

## Features

- **Easy to Use**: Simple API for adding PDFs and querying their content
- **PDF Processing**: Automatic text extraction and chunking from PDF documents
- **Vector Search**: Fast similarity search using FAISS
- **Async Support**: Built with asyncio for high performance
- **LLM Integration**: Seamless integration with various LLM providers through litellm
- **Configurable**: Flexible configuration for embedding and LLM models
- **Persistent Storage**: Optional FAISS index persistence
- **Context Inspection**: Access and analyze intermediate context for better control

## Installation

```bash
pip install ragpdf
```

## Quick Start

```python
import asyncio
from ragpdf import RAGPDF, EmbeddingConfig, LLMConfig

# Configure your models
embedding_config = EmbeddingConfig(
    model="text-embedding-ada-002",  # OpenAI embedding model
    api_key="your-api-key",
    api_base="https://api.openai.com/v1"  # Optional: default OpenAI base URL
)

llm_config = LLMConfig(
    model="gpt-3.5-turbo",  # OpenAI chat model
    api_key="your-api-key",
    api_base="https://api.openai.com/v1",  # Optional: default OpenAI base URL
    temperature=0.7
)

# Create RAGPDF instance
rag = RAGPDF(embedding_config, llm_config)

async def main():
    # Add a PDF
    await rag.add("document.pdf")
    
    # Get and inspect context
    context = await rag.context("What is this document about?")
    
    # View context in different formats
    print("\nFormatted context:")
    print(context.to_string())  # Human-readable format
    
    print("\nJSON format for detailed inspection:")
    print(context.to_json())    # Structured format for analysis
    
    # Use the context for chat
    response = await rag.chat("Summarize the key points")
    print("\nAI Response:")
    print(response)

if __name__ == "__main__":
    asyncio.run(main())
```

## Context Inspection

RAGPDF provides powerful context inspection capabilities, allowing you to examine and validate the intermediate context used for RAG. This is particularly useful during development and debugging.

### RAGContext Class

```python
class RAGContext:
    """Context information for RAG operations."""
    query: str           # Original query
    chunks: List[DocumentChunk]  # Retrieved text chunks
    files: List[str]     # Source PDF files
    total_chunks: int    # Total chunks found
    
    def to_string(self) -> str:
        """Convert context to human-readable format."""
        # Example output:
        # Query: What is the main topic?
        # Found 3 relevant chunks from 2 files:
        # document1.pdf, document2.pdf
        #
        # From document1.pdf (page 1):
        # [chunk content...]
    
    def to_json(self) -> str:
        """Convert context to JSON for detailed analysis."""
        # Returns structured JSON with all context details
```

### Development Workflow

```python
async def development_workflow():
    rag = RAGPDF(embedding_config, llm_config)
    await rag.add("document.pdf")
    
    # 1. Inspect retrieved context
    context = await rag.context("What is the main topic?")
    
    # Check which files were used
    print(f"Retrieved chunks from: {context.files}")
    
    # Examine individual chunks
    for chunk in context.chunks:
        print(f"\nFrom {chunk.file}" + 
              (f" (page {chunk.page})" if chunk.page else ""))
        print(chunk.content)
    
    # 2. Validate context quality
    if not any("relevant keyword" in chunk.content 
               for chunk in context.chunks):
        print("Warning: Expected content not found in context")
    
    # 3. Generate response with validated context
    response = await rag.chat("What is the main topic?")
    print("\nAI Response:", response)
```

### Context Analysis Examples

```python
async def analyze_context():
    rag = RAGPDF(embedding_config, llm_config)
    
    # Add multiple PDFs
    for pdf in ["doc1.pdf", "doc2.pdf"]:
        await rag.add(pdf)
    
    # Get context for analysis
    context = await rag.context("What are the key findings?")
    
    # 1. Source distribution analysis
    file_distribution = {}
    for chunk in context.chunks:
        file_distribution[chunk.file] = file_distribution.get(chunk.file, 0) + 1
    
    print("\nChunk distribution across files:")
    for file, count in file_distribution.items():
        print(f"{file}: {count} chunks")
    
    # 2. Content relevance check
    query_terms = set(context.query.lower().split())
    relevant_chunks = []
    
    for chunk in context.chunks:
        chunk_terms = set(chunk.content.lower().split())
        overlap = len(query_terms & chunk_terms)
        relevant_chunks.append({
            'file': chunk.file,
            'page': chunk.page,
            'term_overlap': overlap
        })
    
    print("\nChunk relevance analysis:")
    for chunk in sorted(relevant_chunks, 
                       key=lambda x: x['term_overlap'], 
                       reverse=True):
        print(f"File: {chunk['file']}, "
              f"Page: {chunk['page']}, "
              f"Term overlap: {chunk['term_overlap']}")
```

## Model Configuration

RAGPDF uses litellm under the hood, making it compatible with any LLM provider supported by litellm. The model name and configuration must follow litellm's format.

### OpenAI

```python
# OpenAI API
config = LLMConfig(
    model="gpt-3.5-turbo",
    api_key="your-openai-key",
    api_base="https://api.openai.com/v1"  # Default OpenAI base URL
)

# Azure OpenAI
config = LLMConfig(
    model="azure/gpt-35-turbo",  # Prefix with 'azure/'
    api_key="your-azure-key",
    api_base="https://your-endpoint.openai.azure.com"
)
```

### Anthropic

```python
config = LLMConfig(
    model="claude-2",
    api_key="your-anthropic-key",
    api_base="https://api.anthropic.com"  # Default Anthropic base URL
)
```

### Google

```python
config = LLMConfig(
    model="gemini/gemini-pro",  # Prefix with 'gemini/'
    api_key="your-google-key",
    api_base="https://generativelanguage.googleapis.com"
)
```

### Ollama

```python
config = LLMConfig(
    model="ollama/llama2",  # Prefix with 'ollama/'
    api_base="http://localhost:11434"  # Local Ollama server
)
```

### Custom Endpoints

```python
# Self-hosted LLM API
config = LLMConfig(
    model="your-model-name",
    api_base="http://your-custom-endpoint:8000/v1",
    api_key="optional-key"  # Optional for self-hosted
)
```

## Environment Variables

RAGPDF supports configuration through environment variables. The api_base is optional and defaults to the provider's standard endpoint:

```env
# OpenAI
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_API_KEY=your-openai-key
EMBEDDING_BASE_URL=https://api.openai.com/v1

LLM_MODEL=gpt-3.5-turbo
LLM_API_KEY=your-openai-key
LLM_BASE_URL=https://api.openai.com/v1

# Azure OpenAI
LLM_MODEL=azure/gpt-35-turbo
LLM_API_KEY=your-azure-key
LLM_BASE_URL=https://your-endpoint.openai.azure.com

# Anthropic
LLM_MODEL=claude-2
LLM_API_KEY=your-anthropic-key
LLM_BASE_URL=https://api.anthropic.com

# Google
LLM_MODEL=gemini/gemini-pro
LLM_API_KEY=your-google-key
LLM_BASE_URL=https://generativelanguage.googleapis.com

# Ollama
LLM_MODEL=ollama/llama2
LLM_BASE_URL=http://localhost:11434
```

## API Reference

### RAGPDF Class

```python
class RAGPDF:
    def __init__(self, 
                 embedding_config: Union[Dict[str, Any], EmbeddingConfig],
                 llm_config: Optional[Union[Dict[str, Any], LLMConfig]] = None,
                 index_path: Optional[str] = None):
        """Initialize RAGPDF with embedding and LLM configurations."""

    async def add(self, pdf_path: str) -> None:
        """Add a PDF document to the system."""

    async def context(self, query: str, k: int = 5) -> RAGContext:
        """Get relevant context for a query."""

    async def chat(self, prompt: str, k: int = 5, stream: bool = False) -> Union[str, AsyncIterator[str]]:
        """Generate a response using the LLM based on context."""
```

### Configuration Models

```python
class BaseConfig:
    """Base configuration for API models."""
    model: str           # Model name (litellm compatible)
    api_key: str = ""   # API key (optional)
    api_base: str = None # API base URL (optional)

class EmbeddingConfig(BaseConfig):
    """Configuration for embedding model."""
    pass

class LLMConfig(BaseConfig):
    """Configuration for language model."""
    temperature: float = 0.7  # Response temperature (optional)
    max_tokens: int = None   # Maximum response length (optional)
```

## Examples

### Using Different LLM Providers

```python
# OpenAI
rag = RAGPDF(
    embedding_config=EmbeddingConfig(
        model="text-embedding-ada-002",
        api_key="your-openai-key"
    ),
    llm_config=LLMConfig(
        model="gpt-3.5-turbo",
        api_key="your-openai-key"
    )
)

# Ollama (local)
rag = RAGPDF(
    embedding_config=EmbeddingConfig(
        model="ollama/nomic-embed-text",
        api_base="http://localhost:11434"
    ),
    llm_config=LLMConfig(
        model="ollama/llama2",
        api_base="http://localhost:11434"
    )
)

# Azure OpenAI
rag = RAGPDF(
    embedding_config=EmbeddingConfig(
        model="azure/text-embedding-ada-002",
        api_key="your-azure-key",
        api_base="https://your-endpoint.openai.azure.com"
    ),
    llm_config=LLMConfig(
        model="azure/gpt-35-turbo",
        api_key="your-azure-key",
        api_base="https://your-endpoint.openai.azure.com"
    )
)
```

### Persistent Storage

```python
# Initialize with index storage
rag = RAGPDF(
    embedding_config=embedding_config,
    llm_config=llm_config,
    index_path="data/faiss_index.bin"
)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/alfredwallace7/ragpdf",
    "name": "ragpdf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "rag pdf llm embeddings vector-search faiss context retrieval augmented generation",
    "author": "Alfred Wallace",
    "author_email": "alfred.wallace@netcraft.fr",
    "download_url": "https://files.pythonhosted.org/packages/e3/f3/4a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7/ragpdf-0.1.2.tar.gz",
    "platform": null,
    "description": "# RAGPDF\r\n\r\nA Python package for Retrieval-Augmented Generation (RAG) using PDFs. RAGPDF makes it easy to extract, embed, and query content from PDF documents using modern language models.\r\n\r\n## Features\r\n\r\n- **Easy to Use**: Simple API for adding PDFs and querying their content\r\n- **PDF Processing**: Automatic text extraction and chunking from PDF documents\r\n- **Vector Search**: Fast similarity search using FAISS\r\n- **Async Support**: Built with asyncio for high performance\r\n- **LLM Integration**: Seamless integration with various LLM providers through litellm\r\n- **Configurable**: Flexible configuration for embedding and LLM models\r\n- **Persistent Storage**: Optional FAISS index persistence\r\n- **Context Inspection**: Access and analyze intermediate context for better control\r\n\r\n## Installation\r\n\r\n```bash\r\npip install ragpdf\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nimport asyncio\r\nfrom ragpdf import RAGPDF, EmbeddingConfig, LLMConfig\r\n\r\n# Configure your models\r\nembedding_config = EmbeddingConfig(\r\n    model=\"text-embedding-ada-002\",  # OpenAI embedding model\r\n    api_key=\"your-api-key\",\r\n    api_base=\"https://api.openai.com/v1\"  # Optional: default OpenAI base URL\r\n)\r\n\r\nllm_config = LLMConfig(\r\n    model=\"gpt-3.5-turbo\",  # OpenAI chat model\r\n    api_key=\"your-api-key\",\r\n    api_base=\"https://api.openai.com/v1\",  # Optional: default OpenAI base URL\r\n    temperature=0.7\r\n)\r\n\r\n# Create RAGPDF instance\r\nrag = RAGPDF(embedding_config, llm_config)\r\n\r\nasync def main():\r\n    # Add a PDF\r\n    await rag.add(\"document.pdf\")\r\n    \r\n    # Get and inspect context\r\n    context = await rag.context(\"What is this document about?\")\r\n    \r\n    # View context in different formats\r\n    print(\"\\nFormatted context:\")\r\n    print(context.to_string())  # Human-readable format\r\n    \r\n    print(\"\\nJSON format for detailed inspection:\")\r\n    print(context.to_json())    # Structured format for analysis\r\n    \r\n    # Use the context for chat\r\n    response = await rag.chat(\"Summarize the key points\")\r\n    print(\"\\nAI Response:\")\r\n    print(response)\r\n\r\nif __name__ == \"__main__\":\r\n    asyncio.run(main())\r\n```\r\n\r\n## Context Inspection\r\n\r\nRAGPDF provides powerful context inspection capabilities, allowing you to examine and validate the intermediate context used for RAG. This is particularly useful during development and debugging.\r\n\r\n### RAGContext Class\r\n\r\n```python\r\nclass RAGContext:\r\n    \"\"\"Context information for RAG operations.\"\"\"\r\n    query: str           # Original query\r\n    chunks: List[DocumentChunk]  # Retrieved text chunks\r\n    files: List[str]     # Source PDF files\r\n    total_chunks: int    # Total chunks found\r\n    \r\n    def to_string(self) -> str:\r\n        \"\"\"Convert context to human-readable format.\"\"\"\r\n        # Example output:\r\n        # Query: What is the main topic?\r\n        # Found 3 relevant chunks from 2 files:\r\n        # document1.pdf, document2.pdf\r\n        #\r\n        # From document1.pdf (page 1):\r\n        # [chunk content...]\r\n    \r\n    def to_json(self) -> str:\r\n        \"\"\"Convert context to JSON for detailed analysis.\"\"\"\r\n        # Returns structured JSON with all context details\r\n```\r\n\r\n### Development Workflow\r\n\r\n```python\r\nasync def development_workflow():\r\n    rag = RAGPDF(embedding_config, llm_config)\r\n    await rag.add(\"document.pdf\")\r\n    \r\n    # 1. Inspect retrieved context\r\n    context = await rag.context(\"What is the main topic?\")\r\n    \r\n    # Check which files were used\r\n    print(f\"Retrieved chunks from: {context.files}\")\r\n    \r\n    # Examine individual chunks\r\n    for chunk in context.chunks:\r\n        print(f\"\\nFrom {chunk.file}\" + \r\n              (f\" (page {chunk.page})\" if chunk.page else \"\"))\r\n        print(chunk.content)\r\n    \r\n    # 2. Validate context quality\r\n    if not any(\"relevant keyword\" in chunk.content \r\n               for chunk in context.chunks):\r\n        print(\"Warning: Expected content not found in context\")\r\n    \r\n    # 3. Generate response with validated context\r\n    response = await rag.chat(\"What is the main topic?\")\r\n    print(\"\\nAI Response:\", response)\r\n```\r\n\r\n### Context Analysis Examples\r\n\r\n```python\r\nasync def analyze_context():\r\n    rag = RAGPDF(embedding_config, llm_config)\r\n    \r\n    # Add multiple PDFs\r\n    for pdf in [\"doc1.pdf\", \"doc2.pdf\"]:\r\n        await rag.add(pdf)\r\n    \r\n    # Get context for analysis\r\n    context = await rag.context(\"What are the key findings?\")\r\n    \r\n    # 1. Source distribution analysis\r\n    file_distribution = {}\r\n    for chunk in context.chunks:\r\n        file_distribution[chunk.file] = file_distribution.get(chunk.file, 0) + 1\r\n    \r\n    print(\"\\nChunk distribution across files:\")\r\n    for file, count in file_distribution.items():\r\n        print(f\"{file}: {count} chunks\")\r\n    \r\n    # 2. Content relevance check\r\n    query_terms = set(context.query.lower().split())\r\n    relevant_chunks = []\r\n    \r\n    for chunk in context.chunks:\r\n        chunk_terms = set(chunk.content.lower().split())\r\n        overlap = len(query_terms & chunk_terms)\r\n        relevant_chunks.append({\r\n            'file': chunk.file,\r\n            'page': chunk.page,\r\n            'term_overlap': overlap\r\n        })\r\n    \r\n    print(\"\\nChunk relevance analysis:\")\r\n    for chunk in sorted(relevant_chunks, \r\n                       key=lambda x: x['term_overlap'], \r\n                       reverse=True):\r\n        print(f\"File: {chunk['file']}, \"\r\n              f\"Page: {chunk['page']}, \"\r\n              f\"Term overlap: {chunk['term_overlap']}\")\r\n```\r\n\r\n## Model Configuration\r\n\r\nRAGPDF uses litellm under the hood, making it compatible with any LLM provider supported by litellm. The model name and configuration must follow litellm's format.\r\n\r\n### OpenAI\r\n\r\n```python\r\n# OpenAI API\r\nconfig = LLMConfig(\r\n    model=\"gpt-3.5-turbo\",\r\n    api_key=\"your-openai-key\",\r\n    api_base=\"https://api.openai.com/v1\"  # Default OpenAI base URL\r\n)\r\n\r\n# Azure OpenAI\r\nconfig = LLMConfig(\r\n    model=\"azure/gpt-35-turbo\",  # Prefix with 'azure/'\r\n    api_key=\"your-azure-key\",\r\n    api_base=\"https://your-endpoint.openai.azure.com\"\r\n)\r\n```\r\n\r\n### Anthropic\r\n\r\n```python\r\nconfig = LLMConfig(\r\n    model=\"claude-2\",\r\n    api_key=\"your-anthropic-key\",\r\n    api_base=\"https://api.anthropic.com\"  # Default Anthropic base URL\r\n)\r\n```\r\n\r\n### Google\r\n\r\n```python\r\nconfig = LLMConfig(\r\n    model=\"gemini/gemini-pro\",  # Prefix with 'gemini/'\r\n    api_key=\"your-google-key\",\r\n    api_base=\"https://generativelanguage.googleapis.com\"\r\n)\r\n```\r\n\r\n### Ollama\r\n\r\n```python\r\nconfig = LLMConfig(\r\n    model=\"ollama/llama2\",  # Prefix with 'ollama/'\r\n    api_base=\"http://localhost:11434\"  # Local Ollama server\r\n)\r\n```\r\n\r\n### Custom Endpoints\r\n\r\n```python\r\n# Self-hosted LLM API\r\nconfig = LLMConfig(\r\n    model=\"your-model-name\",\r\n    api_base=\"http://your-custom-endpoint:8000/v1\",\r\n    api_key=\"optional-key\"  # Optional for self-hosted\r\n)\r\n```\r\n\r\n## Environment Variables\r\n\r\nRAGPDF supports configuration through environment variables. The api_base is optional and defaults to the provider's standard endpoint:\r\n\r\n```env\r\n# OpenAI\r\nEMBEDDING_MODEL=text-embedding-ada-002\r\nEMBEDDING_API_KEY=your-openai-key\r\nEMBEDDING_BASE_URL=https://api.openai.com/v1\r\n\r\nLLM_MODEL=gpt-3.5-turbo\r\nLLM_API_KEY=your-openai-key\r\nLLM_BASE_URL=https://api.openai.com/v1\r\n\r\n# Azure OpenAI\r\nLLM_MODEL=azure/gpt-35-turbo\r\nLLM_API_KEY=your-azure-key\r\nLLM_BASE_URL=https://your-endpoint.openai.azure.com\r\n\r\n# Anthropic\r\nLLM_MODEL=claude-2\r\nLLM_API_KEY=your-anthropic-key\r\nLLM_BASE_URL=https://api.anthropic.com\r\n\r\n# Google\r\nLLM_MODEL=gemini/gemini-pro\r\nLLM_API_KEY=your-google-key\r\nLLM_BASE_URL=https://generativelanguage.googleapis.com\r\n\r\n# Ollama\r\nLLM_MODEL=ollama/llama2\r\nLLM_BASE_URL=http://localhost:11434\r\n```\r\n\r\n## API Reference\r\n\r\n### RAGPDF Class\r\n\r\n```python\r\nclass RAGPDF:\r\n    def __init__(self, \r\n                 embedding_config: Union[Dict[str, Any], EmbeddingConfig],\r\n                 llm_config: Optional[Union[Dict[str, Any], LLMConfig]] = None,\r\n                 index_path: Optional[str] = None):\r\n        \"\"\"Initialize RAGPDF with embedding and LLM configurations.\"\"\"\r\n\r\n    async def add(self, pdf_path: str) -> None:\r\n        \"\"\"Add a PDF document to the system.\"\"\"\r\n\r\n    async def context(self, query: str, k: int = 5) -> RAGContext:\r\n        \"\"\"Get relevant context for a query.\"\"\"\r\n\r\n    async def chat(self, prompt: str, k: int = 5, stream: bool = False) -> Union[str, AsyncIterator[str]]:\r\n        \"\"\"Generate a response using the LLM based on context.\"\"\"\r\n```\r\n\r\n### Configuration Models\r\n\r\n```python\r\nclass BaseConfig:\r\n    \"\"\"Base configuration for API models.\"\"\"\r\n    model: str           # Model name (litellm compatible)\r\n    api_key: str = \"\"   # API key (optional)\r\n    api_base: str = None # API base URL (optional)\r\n\r\nclass EmbeddingConfig(BaseConfig):\r\n    \"\"\"Configuration for embedding model.\"\"\"\r\n    pass\r\n\r\nclass LLMConfig(BaseConfig):\r\n    \"\"\"Configuration for language model.\"\"\"\r\n    temperature: float = 0.7  # Response temperature (optional)\r\n    max_tokens: int = None   # Maximum response length (optional)\r\n```\r\n\r\n## Examples\r\n\r\n### Using Different LLM Providers\r\n\r\n```python\r\n# OpenAI\r\nrag = RAGPDF(\r\n    embedding_config=EmbeddingConfig(\r\n        model=\"text-embedding-ada-002\",\r\n        api_key=\"your-openai-key\"\r\n    ),\r\n    llm_config=LLMConfig(\r\n        model=\"gpt-3.5-turbo\",\r\n        api_key=\"your-openai-key\"\r\n    )\r\n)\r\n\r\n# Ollama (local)\r\nrag = RAGPDF(\r\n    embedding_config=EmbeddingConfig(\r\n        model=\"ollama/nomic-embed-text\",\r\n        api_base=\"http://localhost:11434\"\r\n    ),\r\n    llm_config=LLMConfig(\r\n        model=\"ollama/llama2\",\r\n        api_base=\"http://localhost:11434\"\r\n    )\r\n)\r\n\r\n# Azure OpenAI\r\nrag = RAGPDF(\r\n    embedding_config=EmbeddingConfig(\r\n        model=\"azure/text-embedding-ada-002\",\r\n        api_key=\"your-azure-key\",\r\n        api_base=\"https://your-endpoint.openai.azure.com\"\r\n    ),\r\n    llm_config=LLMConfig(\r\n        model=\"azure/gpt-35-turbo\",\r\n        api_key=\"your-azure-key\",\r\n        api_base=\"https://your-endpoint.openai.azure.com\"\r\n    )\r\n)\r\n```\r\n\r\n### Persistent Storage\r\n\r\n```python\r\n# Initialize with index storage\r\nrag = RAGPDF(\r\n    embedding_config=embedding_config,\r\n    llm_config=llm_config,\r\n    index_path=\"data/faiss_index.bin\"\r\n)\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Retrive PDF files context for your LLMs",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/alfredwallace7/ragpdf/issues",
        "Homepage": "https://github.com/alfredwallace7/ragpdf",
        "Source": "https://github.com/alfredwallace7/ragpdf"
    },
    "split_keywords": [
        "rag",
        "pdf",
        "llm",
        "embeddings",
        "vector-search",
        "faiss",
        "context",
        "retrieval",
        "augmented",
        "generation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1bee61bc6bf710e0ef18c7e105ddcbf0ac84775ff75ff600de1c50c3a0c48e12",
                "md5": "d590da8c73bee49cad578fa2ff098574",
                "sha256": "b97c06f5b0c09f12dc3c0f67de1f2feb5a98fe48343c9b7a639ec7956cd52c32"
            },
            "downloads": -1,
            "filename": "ragpdf-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d590da8c73bee49cad578fa2ff098574",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11731,
            "upload_time": "2025-01-18T21:41:25",
            "upload_time_iso_8601": "2025-01-18T21:41:25.412859Z",
            "url": "https://files.pythonhosted.org/packages/1b/ee/61bc6bf710e0ef18c7e105ddcbf0ac84775ff75ff600de1c50c3a0c48e12/ragpdf-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e3f34a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7",
                "md5": "e6a1484e36bad23f18a7d604341d1d0a",
                "sha256": "9444475ccbf9589dc422199943267f035df7d2c3a3f97f2d5a98ef1c1fe7d7a0"
            },
            "downloads": -1,
            "filename": "ragpdf-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e6a1484e36bad23f18a7d604341d1d0a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13456,
            "upload_time": "2025-01-18T21:41:27",
            "upload_time_iso_8601": "2025-01-18T21:41:27.368526Z",
            "url": "https://files.pythonhosted.org/packages/e3/f3/4a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7/ragpdf-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-18 21:41:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alfredwallace7",
    "github_project": "ragpdf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "litellm",
            "specs": [
                [
                    ">=",
                    "1.30.3"
                ]
            ]
        },
        {
            "name": "faiss-cpu",
            "specs": [
                [
                    ">=",
                    "1.7.4"
                ]
            ]
        },
        {
            "name": "PyPDF2",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.5.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        }
    ],
    "lcname": "ragpdf"
}

Alfred Wallace