# RAGPDF
A Python package for Retrieval-Augmented Generation (RAG) using PDFs. RAGPDF makes it easy to extract, embed, and query content from PDF documents using modern language models.
## Features
- **Easy to Use**: Simple API for adding PDFs and querying their content
- **PDF Processing**: Automatic text extraction and chunking from PDF documents
- **Vector Search**: Fast similarity search using FAISS
- **Async Support**: Built with asyncio for high performance
- **LLM Integration**: Seamless integration with various LLM providers through litellm
- **Configurable**: Flexible configuration for embedding and LLM models
- **Persistent Storage**: Optional FAISS index persistence
- **Context Inspection**: Access and analyze intermediate context for better control
## Installation
```bash
pip install ragpdf
```
## Quick Start
```python
import asyncio
from ragpdf import RAGPDF, EmbeddingConfig, LLMConfig
# Configure your models
embedding_config = EmbeddingConfig(
model="text-embedding-ada-002", # OpenAI embedding model
api_key="your-api-key",
api_base="https://api.openai.com/v1" # Optional: default OpenAI base URL
)
llm_config = LLMConfig(
model="gpt-3.5-turbo", # OpenAI chat model
api_key="your-api-key",
api_base="https://api.openai.com/v1", # Optional: default OpenAI base URL
temperature=0.7
)
# Create RAGPDF instance
rag = RAGPDF(embedding_config, llm_config)
async def main():
# Add a PDF
await rag.add("document.pdf")
# Get and inspect context
context = await rag.context("What is this document about?")
# View context in different formats
print("\nFormatted context:")
print(context.to_string()) # Human-readable format
print("\nJSON format for detailed inspection:")
print(context.to_json()) # Structured format for analysis
# Use the context for chat
response = await rag.chat("Summarize the key points")
print("\nAI Response:")
print(response)
if __name__ == "__main__":
asyncio.run(main())
```
## Context Inspection
RAGPDF provides powerful context inspection capabilities, allowing you to examine and validate the intermediate context used for RAG. This is particularly useful during development and debugging.
### RAGContext Class
```python
class RAGContext:
"""Context information for RAG operations."""
query: str # Original query
chunks: List[DocumentChunk] # Retrieved text chunks
files: List[str] # Source PDF files
total_chunks: int # Total chunks found
def to_string(self) -> str:
"""Convert context to human-readable format."""
# Example output:
# Query: What is the main topic?
# Found 3 relevant chunks from 2 files:
# document1.pdf, document2.pdf
#
# From document1.pdf (page 1):
# [chunk content...]
def to_json(self) -> str:
"""Convert context to JSON for detailed analysis."""
# Returns structured JSON with all context details
```
### Development Workflow
```python
async def development_workflow():
rag = RAGPDF(embedding_config, llm_config)
await rag.add("document.pdf")
# 1. Inspect retrieved context
context = await rag.context("What is the main topic?")
# Check which files were used
print(f"Retrieved chunks from: {context.files}")
# Examine individual chunks
for chunk in context.chunks:
print(f"\nFrom {chunk.file}" +
(f" (page {chunk.page})" if chunk.page else ""))
print(chunk.content)
# 2. Validate context quality
if not any("relevant keyword" in chunk.content
for chunk in context.chunks):
print("Warning: Expected content not found in context")
# 3. Generate response with validated context
response = await rag.chat("What is the main topic?")
print("\nAI Response:", response)
```
### Context Analysis Examples
```python
async def analyze_context():
rag = RAGPDF(embedding_config, llm_config)
# Add multiple PDFs
for pdf in ["doc1.pdf", "doc2.pdf"]:
await rag.add(pdf)
# Get context for analysis
context = await rag.context("What are the key findings?")
# 1. Source distribution analysis
file_distribution = {}
for chunk in context.chunks:
file_distribution[chunk.file] = file_distribution.get(chunk.file, 0) + 1
print("\nChunk distribution across files:")
for file, count in file_distribution.items():
print(f"{file}: {count} chunks")
# 2. Content relevance check
query_terms = set(context.query.lower().split())
relevant_chunks = []
for chunk in context.chunks:
chunk_terms = set(chunk.content.lower().split())
overlap = len(query_terms & chunk_terms)
relevant_chunks.append({
'file': chunk.file,
'page': chunk.page,
'term_overlap': overlap
})
print("\nChunk relevance analysis:")
for chunk in sorted(relevant_chunks,
key=lambda x: x['term_overlap'],
reverse=True):
print(f"File: {chunk['file']}, "
f"Page: {chunk['page']}, "
f"Term overlap: {chunk['term_overlap']}")
```
## Model Configuration
RAGPDF uses litellm under the hood, making it compatible with any LLM provider supported by litellm. The model name and configuration must follow litellm's format.
### OpenAI
```python
# OpenAI API
config = LLMConfig(
model="gpt-3.5-turbo",
api_key="your-openai-key",
api_base="https://api.openai.com/v1" # Default OpenAI base URL
)
# Azure OpenAI
config = LLMConfig(
model="azure/gpt-35-turbo", # Prefix with 'azure/'
api_key="your-azure-key",
api_base="https://your-endpoint.openai.azure.com"
)
```
### Anthropic
```python
config = LLMConfig(
model="claude-2",
api_key="your-anthropic-key",
api_base="https://api.anthropic.com" # Default Anthropic base URL
)
```
### Google
```python
config = LLMConfig(
model="gemini/gemini-pro", # Prefix with 'gemini/'
api_key="your-google-key",
api_base="https://generativelanguage.googleapis.com"
)
```
### Ollama
```python
config = LLMConfig(
model="ollama/llama2", # Prefix with 'ollama/'
api_base="http://localhost:11434" # Local Ollama server
)
```
### Custom Endpoints
```python
# Self-hosted LLM API
config = LLMConfig(
model="your-model-name",
api_base="http://your-custom-endpoint:8000/v1",
api_key="optional-key" # Optional for self-hosted
)
```
## Environment Variables
RAGPDF supports configuration through environment variables. The api_base is optional and defaults to the provider's standard endpoint:
```env
# OpenAI
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_API_KEY=your-openai-key
EMBEDDING_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-3.5-turbo
LLM_API_KEY=your-openai-key
LLM_BASE_URL=https://api.openai.com/v1
# Azure OpenAI
LLM_MODEL=azure/gpt-35-turbo
LLM_API_KEY=your-azure-key
LLM_BASE_URL=https://your-endpoint.openai.azure.com
# Anthropic
LLM_MODEL=claude-2
LLM_API_KEY=your-anthropic-key
LLM_BASE_URL=https://api.anthropic.com
# Google
LLM_MODEL=gemini/gemini-pro
LLM_API_KEY=your-google-key
LLM_BASE_URL=https://generativelanguage.googleapis.com
# Ollama
LLM_MODEL=ollama/llama2
LLM_BASE_URL=http://localhost:11434
```
## API Reference
### RAGPDF Class
```python
class RAGPDF:
def __init__(self,
embedding_config: Union[Dict[str, Any], EmbeddingConfig],
llm_config: Optional[Union[Dict[str, Any], LLMConfig]] = None,
index_path: Optional[str] = None):
"""Initialize RAGPDF with embedding and LLM configurations."""
async def add(self, pdf_path: str) -> None:
"""Add a PDF document to the system."""
async def context(self, query: str, k: int = 5) -> RAGContext:
"""Get relevant context for a query."""
async def chat(self, prompt: str, k: int = 5, stream: bool = False) -> Union[str, AsyncIterator[str]]:
"""Generate a response using the LLM based on context."""
```
### Configuration Models
```python
class BaseConfig:
"""Base configuration for API models."""
model: str # Model name (litellm compatible)
api_key: str = "" # API key (optional)
api_base: str = None # API base URL (optional)
class EmbeddingConfig(BaseConfig):
"""Configuration for embedding model."""
pass
class LLMConfig(BaseConfig):
"""Configuration for language model."""
temperature: float = 0.7 # Response temperature (optional)
max_tokens: int = None # Maximum response length (optional)
```
## Examples
### Using Different LLM Providers
```python
# OpenAI
rag = RAGPDF(
embedding_config=EmbeddingConfig(
model="text-embedding-ada-002",
api_key="your-openai-key"
),
llm_config=LLMConfig(
model="gpt-3.5-turbo",
api_key="your-openai-key"
)
)
# Ollama (local)
rag = RAGPDF(
embedding_config=EmbeddingConfig(
model="ollama/nomic-embed-text",
api_base="http://localhost:11434"
),
llm_config=LLMConfig(
model="ollama/llama2",
api_base="http://localhost:11434"
)
)
# Azure OpenAI
rag = RAGPDF(
embedding_config=EmbeddingConfig(
model="azure/text-embedding-ada-002",
api_key="your-azure-key",
api_base="https://your-endpoint.openai.azure.com"
),
llm_config=LLMConfig(
model="azure/gpt-35-turbo",
api_key="your-azure-key",
api_base="https://your-endpoint.openai.azure.com"
)
)
```
### Persistent Storage
```python
# Initialize with index storage
rag = RAGPDF(
embedding_config=embedding_config,
llm_config=llm_config,
index_path="data/faiss_index.bin"
)
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/alfredwallace7/ragpdf",
"name": "ragpdf",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "rag pdf llm embeddings vector-search faiss context retrieval augmented generation",
"author": "Alfred Wallace",
"author_email": "alfred.wallace@netcraft.fr",
"download_url": "https://files.pythonhosted.org/packages/e3/f3/4a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7/ragpdf-0.1.2.tar.gz",
"platform": null,
"description": "# RAGPDF\r\n\r\nA Python package for Retrieval-Augmented Generation (RAG) using PDFs. RAGPDF makes it easy to extract, embed, and query content from PDF documents using modern language models.\r\n\r\n## Features\r\n\r\n- **Easy to Use**: Simple API for adding PDFs and querying their content\r\n- **PDF Processing**: Automatic text extraction and chunking from PDF documents\r\n- **Vector Search**: Fast similarity search using FAISS\r\n- **Async Support**: Built with asyncio for high performance\r\n- **LLM Integration**: Seamless integration with various LLM providers through litellm\r\n- **Configurable**: Flexible configuration for embedding and LLM models\r\n- **Persistent Storage**: Optional FAISS index persistence\r\n- **Context Inspection**: Access and analyze intermediate context for better control\r\n\r\n## Installation\r\n\r\n```bash\r\npip install ragpdf\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nimport asyncio\r\nfrom ragpdf import RAGPDF, EmbeddingConfig, LLMConfig\r\n\r\n# Configure your models\r\nembedding_config = EmbeddingConfig(\r\n model=\"text-embedding-ada-002\", # OpenAI embedding model\r\n api_key=\"your-api-key\",\r\n api_base=\"https://api.openai.com/v1\" # Optional: default OpenAI base URL\r\n)\r\n\r\nllm_config = LLMConfig(\r\n model=\"gpt-3.5-turbo\", # OpenAI chat model\r\n api_key=\"your-api-key\",\r\n api_base=\"https://api.openai.com/v1\", # Optional: default OpenAI base URL\r\n temperature=0.7\r\n)\r\n\r\n# Create RAGPDF instance\r\nrag = RAGPDF(embedding_config, llm_config)\r\n\r\nasync def main():\r\n # Add a PDF\r\n await rag.add(\"document.pdf\")\r\n \r\n # Get and inspect context\r\n context = await rag.context(\"What is this document about?\")\r\n \r\n # View context in different formats\r\n print(\"\\nFormatted context:\")\r\n print(context.to_string()) # Human-readable format\r\n \r\n print(\"\\nJSON format for detailed inspection:\")\r\n print(context.to_json()) # Structured format for analysis\r\n \r\n # Use the context for chat\r\n response = await rag.chat(\"Summarize the key points\")\r\n print(\"\\nAI Response:\")\r\n print(response)\r\n\r\nif __name__ == \"__main__\":\r\n asyncio.run(main())\r\n```\r\n\r\n## Context Inspection\r\n\r\nRAGPDF provides powerful context inspection capabilities, allowing you to examine and validate the intermediate context used for RAG. This is particularly useful during development and debugging.\r\n\r\n### RAGContext Class\r\n\r\n```python\r\nclass RAGContext:\r\n \"\"\"Context information for RAG operations.\"\"\"\r\n query: str # Original query\r\n chunks: List[DocumentChunk] # Retrieved text chunks\r\n files: List[str] # Source PDF files\r\n total_chunks: int # Total chunks found\r\n \r\n def to_string(self) -> str:\r\n \"\"\"Convert context to human-readable format.\"\"\"\r\n # Example output:\r\n # Query: What is the main topic?\r\n # Found 3 relevant chunks from 2 files:\r\n # document1.pdf, document2.pdf\r\n #\r\n # From document1.pdf (page 1):\r\n # [chunk content...]\r\n \r\n def to_json(self) -> str:\r\n \"\"\"Convert context to JSON for detailed analysis.\"\"\"\r\n # Returns structured JSON with all context details\r\n```\r\n\r\n### Development Workflow\r\n\r\n```python\r\nasync def development_workflow():\r\n rag = RAGPDF(embedding_config, llm_config)\r\n await rag.add(\"document.pdf\")\r\n \r\n # 1. Inspect retrieved context\r\n context = await rag.context(\"What is the main topic?\")\r\n \r\n # Check which files were used\r\n print(f\"Retrieved chunks from: {context.files}\")\r\n \r\n # Examine individual chunks\r\n for chunk in context.chunks:\r\n print(f\"\\nFrom {chunk.file}\" + \r\n (f\" (page {chunk.page})\" if chunk.page else \"\"))\r\n print(chunk.content)\r\n \r\n # 2. Validate context quality\r\n if not any(\"relevant keyword\" in chunk.content \r\n for chunk in context.chunks):\r\n print(\"Warning: Expected content not found in context\")\r\n \r\n # 3. Generate response with validated context\r\n response = await rag.chat(\"What is the main topic?\")\r\n print(\"\\nAI Response:\", response)\r\n```\r\n\r\n### Context Analysis Examples\r\n\r\n```python\r\nasync def analyze_context():\r\n rag = RAGPDF(embedding_config, llm_config)\r\n \r\n # Add multiple PDFs\r\n for pdf in [\"doc1.pdf\", \"doc2.pdf\"]:\r\n await rag.add(pdf)\r\n \r\n # Get context for analysis\r\n context = await rag.context(\"What are the key findings?\")\r\n \r\n # 1. Source distribution analysis\r\n file_distribution = {}\r\n for chunk in context.chunks:\r\n file_distribution[chunk.file] = file_distribution.get(chunk.file, 0) + 1\r\n \r\n print(\"\\nChunk distribution across files:\")\r\n for file, count in file_distribution.items():\r\n print(f\"{file}: {count} chunks\")\r\n \r\n # 2. Content relevance check\r\n query_terms = set(context.query.lower().split())\r\n relevant_chunks = []\r\n \r\n for chunk in context.chunks:\r\n chunk_terms = set(chunk.content.lower().split())\r\n overlap = len(query_terms & chunk_terms)\r\n relevant_chunks.append({\r\n 'file': chunk.file,\r\n 'page': chunk.page,\r\n 'term_overlap': overlap\r\n })\r\n \r\n print(\"\\nChunk relevance analysis:\")\r\n for chunk in sorted(relevant_chunks, \r\n key=lambda x: x['term_overlap'], \r\n reverse=True):\r\n print(f\"File: {chunk['file']}, \"\r\n f\"Page: {chunk['page']}, \"\r\n f\"Term overlap: {chunk['term_overlap']}\")\r\n```\r\n\r\n## Model Configuration\r\n\r\nRAGPDF uses litellm under the hood, making it compatible with any LLM provider supported by litellm. The model name and configuration must follow litellm's format.\r\n\r\n### OpenAI\r\n\r\n```python\r\n# OpenAI API\r\nconfig = LLMConfig(\r\n model=\"gpt-3.5-turbo\",\r\n api_key=\"your-openai-key\",\r\n api_base=\"https://api.openai.com/v1\" # Default OpenAI base URL\r\n)\r\n\r\n# Azure OpenAI\r\nconfig = LLMConfig(\r\n model=\"azure/gpt-35-turbo\", # Prefix with 'azure/'\r\n api_key=\"your-azure-key\",\r\n api_base=\"https://your-endpoint.openai.azure.com\"\r\n)\r\n```\r\n\r\n### Anthropic\r\n\r\n```python\r\nconfig = LLMConfig(\r\n model=\"claude-2\",\r\n api_key=\"your-anthropic-key\",\r\n api_base=\"https://api.anthropic.com\" # Default Anthropic base URL\r\n)\r\n```\r\n\r\n### Google\r\n\r\n```python\r\nconfig = LLMConfig(\r\n model=\"gemini/gemini-pro\", # Prefix with 'gemini/'\r\n api_key=\"your-google-key\",\r\n api_base=\"https://generativelanguage.googleapis.com\"\r\n)\r\n```\r\n\r\n### Ollama\r\n\r\n```python\r\nconfig = LLMConfig(\r\n model=\"ollama/llama2\", # Prefix with 'ollama/'\r\n api_base=\"http://localhost:11434\" # Local Ollama server\r\n)\r\n```\r\n\r\n### Custom Endpoints\r\n\r\n```python\r\n# Self-hosted LLM API\r\nconfig = LLMConfig(\r\n model=\"your-model-name\",\r\n api_base=\"http://your-custom-endpoint:8000/v1\",\r\n api_key=\"optional-key\" # Optional for self-hosted\r\n)\r\n```\r\n\r\n## Environment Variables\r\n\r\nRAGPDF supports configuration through environment variables. The api_base is optional and defaults to the provider's standard endpoint:\r\n\r\n```env\r\n# OpenAI\r\nEMBEDDING_MODEL=text-embedding-ada-002\r\nEMBEDDING_API_KEY=your-openai-key\r\nEMBEDDING_BASE_URL=https://api.openai.com/v1\r\n\r\nLLM_MODEL=gpt-3.5-turbo\r\nLLM_API_KEY=your-openai-key\r\nLLM_BASE_URL=https://api.openai.com/v1\r\n\r\n# Azure OpenAI\r\nLLM_MODEL=azure/gpt-35-turbo\r\nLLM_API_KEY=your-azure-key\r\nLLM_BASE_URL=https://your-endpoint.openai.azure.com\r\n\r\n# Anthropic\r\nLLM_MODEL=claude-2\r\nLLM_API_KEY=your-anthropic-key\r\nLLM_BASE_URL=https://api.anthropic.com\r\n\r\n# Google\r\nLLM_MODEL=gemini/gemini-pro\r\nLLM_API_KEY=your-google-key\r\nLLM_BASE_URL=https://generativelanguage.googleapis.com\r\n\r\n# Ollama\r\nLLM_MODEL=ollama/llama2\r\nLLM_BASE_URL=http://localhost:11434\r\n```\r\n\r\n## API Reference\r\n\r\n### RAGPDF Class\r\n\r\n```python\r\nclass RAGPDF:\r\n def __init__(self, \r\n embedding_config: Union[Dict[str, Any], EmbeddingConfig],\r\n llm_config: Optional[Union[Dict[str, Any], LLMConfig]] = None,\r\n index_path: Optional[str] = None):\r\n \"\"\"Initialize RAGPDF with embedding and LLM configurations.\"\"\"\r\n\r\n async def add(self, pdf_path: str) -> None:\r\n \"\"\"Add a PDF document to the system.\"\"\"\r\n\r\n async def context(self, query: str, k: int = 5) -> RAGContext:\r\n \"\"\"Get relevant context for a query.\"\"\"\r\n\r\n async def chat(self, prompt: str, k: int = 5, stream: bool = False) -> Union[str, AsyncIterator[str]]:\r\n \"\"\"Generate a response using the LLM based on context.\"\"\"\r\n```\r\n\r\n### Configuration Models\r\n\r\n```python\r\nclass BaseConfig:\r\n \"\"\"Base configuration for API models.\"\"\"\r\n model: str # Model name (litellm compatible)\r\n api_key: str = \"\" # API key (optional)\r\n api_base: str = None # API base URL (optional)\r\n\r\nclass EmbeddingConfig(BaseConfig):\r\n \"\"\"Configuration for embedding model.\"\"\"\r\n pass\r\n\r\nclass LLMConfig(BaseConfig):\r\n \"\"\"Configuration for language model.\"\"\"\r\n temperature: float = 0.7 # Response temperature (optional)\r\n max_tokens: int = None # Maximum response length (optional)\r\n```\r\n\r\n## Examples\r\n\r\n### Using Different LLM Providers\r\n\r\n```python\r\n# OpenAI\r\nrag = RAGPDF(\r\n embedding_config=EmbeddingConfig(\r\n model=\"text-embedding-ada-002\",\r\n api_key=\"your-openai-key\"\r\n ),\r\n llm_config=LLMConfig(\r\n model=\"gpt-3.5-turbo\",\r\n api_key=\"your-openai-key\"\r\n )\r\n)\r\n\r\n# Ollama (local)\r\nrag = RAGPDF(\r\n embedding_config=EmbeddingConfig(\r\n model=\"ollama/nomic-embed-text\",\r\n api_base=\"http://localhost:11434\"\r\n ),\r\n llm_config=LLMConfig(\r\n model=\"ollama/llama2\",\r\n api_base=\"http://localhost:11434\"\r\n )\r\n)\r\n\r\n# Azure OpenAI\r\nrag = RAGPDF(\r\n embedding_config=EmbeddingConfig(\r\n model=\"azure/text-embedding-ada-002\",\r\n api_key=\"your-azure-key\",\r\n api_base=\"https://your-endpoint.openai.azure.com\"\r\n ),\r\n llm_config=LLMConfig(\r\n model=\"azure/gpt-35-turbo\",\r\n api_key=\"your-azure-key\",\r\n api_base=\"https://your-endpoint.openai.azure.com\"\r\n )\r\n)\r\n```\r\n\r\n### Persistent Storage\r\n\r\n```python\r\n# Initialize with index storage\r\nrag = RAGPDF(\r\n embedding_config=embedding_config,\r\n llm_config=llm_config,\r\n index_path=\"data/faiss_index.bin\"\r\n)\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Retrive PDF files context for your LLMs",
"version": "0.1.2",
"project_urls": {
"Bug Reports": "https://github.com/alfredwallace7/ragpdf/issues",
"Homepage": "https://github.com/alfredwallace7/ragpdf",
"Source": "https://github.com/alfredwallace7/ragpdf"
},
"split_keywords": [
"rag",
"pdf",
"llm",
"embeddings",
"vector-search",
"faiss",
"context",
"retrieval",
"augmented",
"generation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1bee61bc6bf710e0ef18c7e105ddcbf0ac84775ff75ff600de1c50c3a0c48e12",
"md5": "d590da8c73bee49cad578fa2ff098574",
"sha256": "b97c06f5b0c09f12dc3c0f67de1f2feb5a98fe48343c9b7a639ec7956cd52c32"
},
"downloads": -1,
"filename": "ragpdf-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d590da8c73bee49cad578fa2ff098574",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 11731,
"upload_time": "2025-01-18T21:41:25",
"upload_time_iso_8601": "2025-01-18T21:41:25.412859Z",
"url": "https://files.pythonhosted.org/packages/1b/ee/61bc6bf710e0ef18c7e105ddcbf0ac84775ff75ff600de1c50c3a0c48e12/ragpdf-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e3f34a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7",
"md5": "e6a1484e36bad23f18a7d604341d1d0a",
"sha256": "9444475ccbf9589dc422199943267f035df7d2c3a3f97f2d5a98ef1c1fe7d7a0"
},
"downloads": -1,
"filename": "ragpdf-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "e6a1484e36bad23f18a7d604341d1d0a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13456,
"upload_time": "2025-01-18T21:41:27",
"upload_time_iso_8601": "2025-01-18T21:41:27.368526Z",
"url": "https://files.pythonhosted.org/packages/e3/f3/4a6d79b3aa9fbb052ce56ac7bf4db5c39ee0825a52d26002a5c8d9ac06e7/ragpdf-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-18 21:41:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alfredwallace7",
"github_project": "ragpdf",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "litellm",
"specs": [
[
">=",
"1.30.3"
]
]
},
{
"name": "faiss-cpu",
"specs": [
[
">=",
"1.7.4"
]
]
},
{
"name": "PyPDF2",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.24.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.5.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "ragpdf"
}