pyvisionai


Namepyvisionai JSON
Version 0.2.4 PyPI version JSON
download
home_pagehttps://github.com/MDGrey33/content-extractor-with-vision
SummaryA Python library for extracting and describing content from documents using Vision LLMs
upload_time2025-01-15 20:52:54
maintainerNone
docs_urlNone
authorMDGrey33
requires_python<4.0,>=3.11
licenseApache-2.0
keywords pdf docx pptx html vision llm extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyVisionAI
# Content Extractor and Image Description with Vision LLM

Extract and describe content from documents using Vision Language Models.

## Requirements

- Python 3.8 or higher
- Operating system: Windows, macOS, or Linux
- Disk space: At least 1GB free space (more if using local Llama model)

## Features

- Extract text and images from PDF, DOCX, PPTX, and HTML files
- Capture interactive HTML pages as images with full rendering
- Describe images using local (Ollama) or cloud-based (OpenAI) Vision Language Models
- Save extracted text and image descriptions in markdown format
- Support for both CLI and library usage
- Multiple extraction methods for different use cases
- Detailed logging with timestamps for all operations
- Customizable image description prompts

## Installation

1. **Install System Dependencies**
   ```bash
   # macOS (using Homebrew)
   brew install --cask libreoffice  # Required for DOCX/PPTX processing
   brew install poppler             # Required for PDF processing
   pip install playwright          # Required for HTML processing
   playwright install              # Install browser dependencies

   # Ubuntu/Debian
   sudo apt-get update
   sudo apt-get install -y libreoffice  # Required for DOCX/PPTX processing
   sudo apt-get install -y poppler-utils # Required for PDF processing
   pip install playwright               # Required for HTML processing
   playwright install                   # Install browser dependencies

   # Windows
   # Download and install:
   # - LibreOffice: https://www.libreoffice.org/download/download/
   # - Poppler: http://blog.alivate.com.au/poppler-windows/
   # Add poppler's bin directory to your system PATH
   pip install playwright
   playwright install
   ```

2. **Install PyVisionAI**
   ```bash
   # Using pip
   pip install pyvisionai

   # Using poetry (will automatically install playwright as a dependency)
   poetry add pyvisionai
   poetry run playwright install  # Install browser dependencies
   ```

## Directory Structure

By default, PyVisionAI uses the following directory structure:
```
content/
├── source/      # Default input directory for files to process
├── extracted/   # Default output directory for processed files
└── log/         # Directory for log files and benchmarks
```

These directories are created automatically when needed, but you can:
1. Create them manually:
   ```bash
   mkdir -p content/source content/extracted content/log
   ```
2. Override them with custom paths:
   ```bash
   # Specify custom input and output directories
   file-extract -t pdf -s /path/to/inputs -o /path/to/outputs

   # Process a single file with custom output
   file-extract -t pdf -s ~/documents/file.pdf -o ~/results
   ```

Note: While the default directories provide a organized structure, you're free to use any directory layout that suits your needs by specifying custom paths with the `-s` (source) and `-o` (output) options.

## Setup for Image Description

For cloud image description (default, recommended):
```bash
# Set OpenAI API key
export OPENAI_API_KEY='your-api-key'
```

For local image description (optional):
```bash
# Start Ollama server
ollama serve

# Pull the required model
ollama pull llama3.2-vision
```

## Features

- Extract text and images from PDF, DOCX, PPTX, and HTML files
- Capture interactive HTML pages as images with full rendering
- Describe images using local (Ollama) or cloud-based (OpenAI) Vision Language Models
- Save extracted text and image descriptions in markdown format
- Support for both CLI and library usage
- Multiple extraction methods for different use cases
- Detailed logging with timestamps for all operations

## Usage

### Command Line Interface

1. **Extract Content from Files**
   ```bash
   # Process a single file (using default page-as-image method)
   file-extract -t pdf -s path/to/file.pdf -o output_dir
   file-extract -t docx -s path/to/file.docx -o output_dir
   file-extract -t pptx -s path/to/file.pptx -o output_dir
   file-extract -t html -s path/to/file.html -o output_dir

   # Process with specific extractor
   file-extract -t pdf -s input.pdf -o output_dir -e text_and_images

   # Process all files in a directory
   file-extract -t pdf -s input_dir -o output_dir

   # Example with custom prompt
   file-extract -t pdf -s document.pdf -o output_dir -p "Extract the exact text as present in the image and write one sentence about each visual in the image"
   ```

   **Note:** The custom prompt for file extraction will affect the content of the output document. In case of page_as_image It should contain instructions to extract text and describe visuals. Variations are acceptable as long as they encompass these tasks. Avoid prompts like "What's the color of this picture?" as they may not yield the desired results.

2. **Describe Images**
   ```bash
   # Using GPT-4 Vision (default, recommended)
   describe-image -i path/to/image.jpg

   # Using local Llama model
   describe-image -i path/to/image.jpg -u llama

   # Using custom prompt
   describe-image -i image.jpg -p "List the main colors in this image"

   # Additional options
   describe-image -i image.jpg -v  # Verbose output
   ```

### Library Usage

```python
from pyvisionai import create_extractor, describe_image_openai, describe_image_ollama

# 1. Extract content from files
extractor = create_extractor("pdf")  # or "docx", "pptx", or "html"
output_path = extractor.extract("input.pdf", "output_dir")

# With specific extraction method
extractor = create_extractor("pdf", extractor_type="text_and_images")
output_path = extractor.extract("input.pdf", "output_dir")

# Extract from HTML (always uses page_as_image method)
extractor = create_extractor("html")
output_path = extractor.extract("page.html", "output_dir")

# 2. Describe images
# Using GPT-4 Vision (default, recommended)
description = describe_image_openai(
    "image.jpg",
    model="gpt-4o-mini",  # default
    api_key="your-api-key",  # optional if set in environment
    max_tokens=300,  # default
    prompt="Describe this image focusing on colors and textures"  # optional custom prompt
)

# Using local Llama model
description = describe_image_ollama(
    "image.jpg",
    model="llama3.2-vision",  # default
    prompt="List the main objects in this image"  # optional custom prompt
)
```

## Logging

The application maintains detailed logs of all operations:
- By default, logs are stored in `content/log/` with timestamp-based filenames
- Each run creates a new log file: `pyvisionai_YYYYMMDD_HHMMSS.log`
- Logs include:
  - Timestamp for each operation
  - Processing steps and their status
  - Error messages and warnings
  - Extraction method used
  - Input and output file paths

## Environment Variables

```bash
# Required for OpenAI Vision (if using cloud description)
export OPENAI_API_KEY='your-api-key'

# Optional: Ollama host (if using local description)
export OLLAMA_HOST='http://localhost:11434'
```

## License

This project is licensed under the [Apache License 2.0](LICENSE).

## Command Parameters

### `file-extract` Command
```bash
file-extract [-h] -t TYPE -s SOURCE -o OUTPUT [-e EXTRACTOR] [-m MODEL] [-k API_KEY] [-v]

Required Arguments:
  -t, --type TYPE         File type to process (pdf, docx, pptx, html)
  -s, --source SOURCE     Source file or directory path
  -o, --output OUTPUT     Output directory path

Optional Arguments:
  -h, --help             Show help message and exit
  -e, --extractor TYPE   Extraction method:
                         - page_as_image: Convert pages to images (default)
                         - text_and_images: Extract text and images separately
                         Note: HTML only supports page_as_image
  -m, --model MODEL      Vision model for image description:
                         - gpt4: GPT-4 Vision (default, recommended)
                         - llama: Local Llama model
  -k, --api-key KEY      OpenAI API key (can also be set via OPENAI_API_KEY env var)
  -v, --verbose          Enable verbose logging
  -p, --prompt TEXT      Custom prompt for image description
```

### `describe-image` Command
```bash
describe-image [-h] -i IMAGE [-m MODEL] [-k API_KEY] [-t MAX_TOKENS] [-v] [-p PROMPT]

Required Arguments:
  -i, --image IMAGE      Path to image file

Optional Arguments:
  -h, --help            Show help message and exit
  -m, --model MODEL     Vision model to use:
                        - gpt4: GPT-4 Vision (default, recommended)
                        - llama: Local Llama model
  -k, --api-key KEY     OpenAI API key (can also be set via OPENAI_API_KEY env var)
  -t, --max-tokens NUM  Maximum tokens for response (default: 300)
  -p, --prompt TEXT     Custom prompt for image description
  -v, --verbose         Enable verbose logging
```

## Examples

### File Extraction Examples
```bash
# Basic usage with defaults (page_as_image method, GPT-4 Vision)
file-extract -t pdf -s document.pdf -o output_dir
file-extract -t html -s webpage.html -o output_dir  # HTML always uses page_as_image

# Specify extraction method (not applicable for HTML)
file-extract -t docx -s document.docx -o output_dir -e text_and_images

# Use local Llama model for image description
file-extract -t pptx -s slides.pptx -o output_dir -m llama

# Process all PDFs in a directory with verbose logging
file-extract -t pdf -s input_dir -o output_dir -v

# Use custom OpenAI API key
file-extract -t pdf -s document.pdf -o output_dir -k "your-api-key"

# Use custom prompt for image descriptions
file-extract -t pdf -s document.pdf -o output_dir -p "Focus on text content and layout"
```

### Image Description Examples
```bash
# Basic usage with defaults (GPT-4 Vision)
describe-image -i photo.jpg

# Use local Llama model
describe-image -i photo.jpg -m llama

# Use custom prompt
describe-image -i photo.jpg -p "List the main colors and their proportions"

# Customize token limit
describe-image -i photo.jpg -t 500

# Enable verbose logging
describe-image -i photo.jpg -v

# Use custom OpenAI API key
describe-image -i photo.jpg -k "your-api-key"

# Combine options
describe-image -i photo.jpg -m llama -p "Describe the lighting and shadows" -v
```

## Custom Prompts

PyVisionAI supports custom prompts for both file extraction and image description. Custom prompts allow you to control how content is extracted and described.

### Using Custom Prompts

1. **CLI Usage**
   ```bash
   # File extraction with custom prompt
   file-extract -t pdf -s document.pdf -o output_dir -p "Extract all text verbatim and describe any diagrams or images in detail"

   # Image description with custom prompt
   describe-image -i image.jpg -p "List the main colors and describe the layout of elements"
   ```

2. **Library Usage**
   ```python
   # File extraction with custom prompt
   extractor = create_extractor(
       "pdf",
       extractor_type="page_as_image",
       prompt="Extract all text exactly as it appears and provide detailed descriptions of any charts or diagrams"
   )
   output_path = extractor.extract("input.pdf", "output_dir")

   # Image description with custom prompt
   description = describe_image_openai(
       "image.jpg",
       prompt="Focus on spatial relationships between objects and any text content"
   )
   ```

3. **Environment Variable**
   ```bash
   # Set default prompt via environment variable
   export FILE_EXTRACTOR_PROMPT="Extract text and describe visual elements with emphasis on layout"
   ```

### Writing Effective Prompts

1. **For Page-as-Image Method**
   - Include instructions for both text extraction and visual description since the entire page is processed as an image
   - Example: "Extract the exact text as it appears on the page and describe any images, diagrams, or visual elements in detail"

2. **For Text-and-Images Method**
   - Focus only on image description since text is extracted separately
   - The model only sees the images, not the text content
   - Example: "Describe the visual content, focusing on what the image represents and any visual elements it contains"

3. **For Image Description**
   - Be specific about what aspects to focus on
   - Example: "Describe the main elements, their arrangement, and any text visible in the image"

Note: For page-as-image method, prompts must include both text extraction and visual description instructions as the entire page is processed as an image. For text-and-images method, prompts should focus solely on image description as text is handled separately.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MDGrey33/content-extractor-with-vision",
    "name": "pyvisionai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "pdf, docx, pptx, html, vision, llm, extraction",
    "author": "MDGrey33",
    "author_email": "roland@abouyounes.com",
    "download_url": "https://files.pythonhosted.org/packages/8e/40/ec9cb5ee7bbd48bc03ea0bbd5b7d8d4f5a82198405584bcde6ef2f927b41/pyvisionai-0.2.4.tar.gz",
    "platform": null,
    "description": "# PyVisionAI\n# Content Extractor and Image Description with Vision LLM\n\nExtract and describe content from documents using Vision Language Models.\n\n## Requirements\n\n- Python 3.8 or higher\n- Operating system: Windows, macOS, or Linux\n- Disk space: At least 1GB free space (more if using local Llama model)\n\n## Features\n\n- Extract text and images from PDF, DOCX, PPTX, and HTML files\n- Capture interactive HTML pages as images with full rendering\n- Describe images using local (Ollama) or cloud-based (OpenAI) Vision Language Models\n- Save extracted text and image descriptions in markdown format\n- Support for both CLI and library usage\n- Multiple extraction methods for different use cases\n- Detailed logging with timestamps for all operations\n- Customizable image description prompts\n\n## Installation\n\n1. **Install System Dependencies**\n   ```bash\n   # macOS (using Homebrew)\n   brew install --cask libreoffice  # Required for DOCX/PPTX processing\n   brew install poppler             # Required for PDF processing\n   pip install playwright          # Required for HTML processing\n   playwright install              # Install browser dependencies\n\n   # Ubuntu/Debian\n   sudo apt-get update\n   sudo apt-get install -y libreoffice  # Required for DOCX/PPTX processing\n   sudo apt-get install -y poppler-utils # Required for PDF processing\n   pip install playwright               # Required for HTML processing\n   playwright install                   # Install browser dependencies\n\n   # Windows\n   # Download and install:\n   # - LibreOffice: https://www.libreoffice.org/download/download/\n   # - Poppler: http://blog.alivate.com.au/poppler-windows/\n   # Add poppler's bin directory to your system PATH\n   pip install playwright\n   playwright install\n   ```\n\n2. **Install PyVisionAI**\n   ```bash\n   # Using pip\n   pip install pyvisionai\n\n   # Using poetry (will automatically install playwright as a dependency)\n   poetry add pyvisionai\n   poetry run playwright install  # Install browser dependencies\n   ```\n\n## Directory Structure\n\nBy default, PyVisionAI uses the following directory structure:\n```\ncontent/\n\u251c\u2500\u2500 source/      # Default input directory for files to process\n\u251c\u2500\u2500 extracted/   # Default output directory for processed files\n\u2514\u2500\u2500 log/         # Directory for log files and benchmarks\n```\n\nThese directories are created automatically when needed, but you can:\n1. Create them manually:\n   ```bash\n   mkdir -p content/source content/extracted content/log\n   ```\n2. Override them with custom paths:\n   ```bash\n   # Specify custom input and output directories\n   file-extract -t pdf -s /path/to/inputs -o /path/to/outputs\n\n   # Process a single file with custom output\n   file-extract -t pdf -s ~/documents/file.pdf -o ~/results\n   ```\n\nNote: While the default directories provide a organized structure, you're free to use any directory layout that suits your needs by specifying custom paths with the `-s` (source) and `-o` (output) options.\n\n## Setup for Image Description\n\nFor cloud image description (default, recommended):\n```bash\n# Set OpenAI API key\nexport OPENAI_API_KEY='your-api-key'\n```\n\nFor local image description (optional):\n```bash\n# Start Ollama server\nollama serve\n\n# Pull the required model\nollama pull llama3.2-vision\n```\n\n## Features\n\n- Extract text and images from PDF, DOCX, PPTX, and HTML files\n- Capture interactive HTML pages as images with full rendering\n- Describe images using local (Ollama) or cloud-based (OpenAI) Vision Language Models\n- Save extracted text and image descriptions in markdown format\n- Support for both CLI and library usage\n- Multiple extraction methods for different use cases\n- Detailed logging with timestamps for all operations\n\n## Usage\n\n### Command Line Interface\n\n1. **Extract Content from Files**\n   ```bash\n   # Process a single file (using default page-as-image method)\n   file-extract -t pdf -s path/to/file.pdf -o output_dir\n   file-extract -t docx -s path/to/file.docx -o output_dir\n   file-extract -t pptx -s path/to/file.pptx -o output_dir\n   file-extract -t html -s path/to/file.html -o output_dir\n\n   # Process with specific extractor\n   file-extract -t pdf -s input.pdf -o output_dir -e text_and_images\n\n   # Process all files in a directory\n   file-extract -t pdf -s input_dir -o output_dir\n\n   # Example with custom prompt\n   file-extract -t pdf -s document.pdf -o output_dir -p \"Extract the exact text as present in the image and write one sentence about each visual in the image\"\n   ```\n\n   **Note:** The custom prompt for file extraction will affect the content of the output document. In case of page_as_image It should contain instructions to extract text and describe visuals. Variations are acceptable as long as they encompass these tasks. Avoid prompts like \"What's the color of this picture?\" as they may not yield the desired results.\n\n2. **Describe Images**\n   ```bash\n   # Using GPT-4 Vision (default, recommended)\n   describe-image -i path/to/image.jpg\n\n   # Using local Llama model\n   describe-image -i path/to/image.jpg -u llama\n\n   # Using custom prompt\n   describe-image -i image.jpg -p \"List the main colors in this image\"\n\n   # Additional options\n   describe-image -i image.jpg -v  # Verbose output\n   ```\n\n### Library Usage\n\n```python\nfrom pyvisionai import create_extractor, describe_image_openai, describe_image_ollama\n\n# 1. Extract content from files\nextractor = create_extractor(\"pdf\")  # or \"docx\", \"pptx\", or \"html\"\noutput_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n# With specific extraction method\nextractor = create_extractor(\"pdf\", extractor_type=\"text_and_images\")\noutput_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n# Extract from HTML (always uses page_as_image method)\nextractor = create_extractor(\"html\")\noutput_path = extractor.extract(\"page.html\", \"output_dir\")\n\n# 2. Describe images\n# Using GPT-4 Vision (default, recommended)\ndescription = describe_image_openai(\n    \"image.jpg\",\n    model=\"gpt-4o-mini\",  # default\n    api_key=\"your-api-key\",  # optional if set in environment\n    max_tokens=300,  # default\n    prompt=\"Describe this image focusing on colors and textures\"  # optional custom prompt\n)\n\n# Using local Llama model\ndescription = describe_image_ollama(\n    \"image.jpg\",\n    model=\"llama3.2-vision\",  # default\n    prompt=\"List the main objects in this image\"  # optional custom prompt\n)\n```\n\n## Logging\n\nThe application maintains detailed logs of all operations:\n- By default, logs are stored in `content/log/` with timestamp-based filenames\n- Each run creates a new log file: `pyvisionai_YYYYMMDD_HHMMSS.log`\n- Logs include:\n  - Timestamp for each operation\n  - Processing steps and their status\n  - Error messages and warnings\n  - Extraction method used\n  - Input and output file paths\n\n## Environment Variables\n\n```bash\n# Required for OpenAI Vision (if using cloud description)\nexport OPENAI_API_KEY='your-api-key'\n\n# Optional: Ollama host (if using local description)\nexport OLLAMA_HOST='http://localhost:11434'\n```\n\n## License\n\nThis project is licensed under the [Apache License 2.0](LICENSE).\n\n## Command Parameters\n\n### `file-extract` Command\n```bash\nfile-extract [-h] -t TYPE -s SOURCE -o OUTPUT [-e EXTRACTOR] [-m MODEL] [-k API_KEY] [-v]\n\nRequired Arguments:\n  -t, --type TYPE         File type to process (pdf, docx, pptx, html)\n  -s, --source SOURCE     Source file or directory path\n  -o, --output OUTPUT     Output directory path\n\nOptional Arguments:\n  -h, --help             Show help message and exit\n  -e, --extractor TYPE   Extraction method:\n                         - page_as_image: Convert pages to images (default)\n                         - text_and_images: Extract text and images separately\n                         Note: HTML only supports page_as_image\n  -m, --model MODEL      Vision model for image description:\n                         - gpt4: GPT-4 Vision (default, recommended)\n                         - llama: Local Llama model\n  -k, --api-key KEY      OpenAI API key (can also be set via OPENAI_API_KEY env var)\n  -v, --verbose          Enable verbose logging\n  -p, --prompt TEXT      Custom prompt for image description\n```\n\n### `describe-image` Command\n```bash\ndescribe-image [-h] -i IMAGE [-m MODEL] [-k API_KEY] [-t MAX_TOKENS] [-v] [-p PROMPT]\n\nRequired Arguments:\n  -i, --image IMAGE      Path to image file\n\nOptional Arguments:\n  -h, --help            Show help message and exit\n  -m, --model MODEL     Vision model to use:\n                        - gpt4: GPT-4 Vision (default, recommended)\n                        - llama: Local Llama model\n  -k, --api-key KEY     OpenAI API key (can also be set via OPENAI_API_KEY env var)\n  -t, --max-tokens NUM  Maximum tokens for response (default: 300)\n  -p, --prompt TEXT     Custom prompt for image description\n  -v, --verbose         Enable verbose logging\n```\n\n## Examples\n\n### File Extraction Examples\n```bash\n# Basic usage with defaults (page_as_image method, GPT-4 Vision)\nfile-extract -t pdf -s document.pdf -o output_dir\nfile-extract -t html -s webpage.html -o output_dir  # HTML always uses page_as_image\n\n# Specify extraction method (not applicable for HTML)\nfile-extract -t docx -s document.docx -o output_dir -e text_and_images\n\n# Use local Llama model for image description\nfile-extract -t pptx -s slides.pptx -o output_dir -m llama\n\n# Process all PDFs in a directory with verbose logging\nfile-extract -t pdf -s input_dir -o output_dir -v\n\n# Use custom OpenAI API key\nfile-extract -t pdf -s document.pdf -o output_dir -k \"your-api-key\"\n\n# Use custom prompt for image descriptions\nfile-extract -t pdf -s document.pdf -o output_dir -p \"Focus on text content and layout\"\n```\n\n### Image Description Examples\n```bash\n# Basic usage with defaults (GPT-4 Vision)\ndescribe-image -i photo.jpg\n\n# Use local Llama model\ndescribe-image -i photo.jpg -m llama\n\n# Use custom prompt\ndescribe-image -i photo.jpg -p \"List the main colors and their proportions\"\n\n# Customize token limit\ndescribe-image -i photo.jpg -t 500\n\n# Enable verbose logging\ndescribe-image -i photo.jpg -v\n\n# Use custom OpenAI API key\ndescribe-image -i photo.jpg -k \"your-api-key\"\n\n# Combine options\ndescribe-image -i photo.jpg -m llama -p \"Describe the lighting and shadows\" -v\n```\n\n## Custom Prompts\n\nPyVisionAI supports custom prompts for both file extraction and image description. Custom prompts allow you to control how content is extracted and described.\n\n### Using Custom Prompts\n\n1. **CLI Usage**\n   ```bash\n   # File extraction with custom prompt\n   file-extract -t pdf -s document.pdf -o output_dir -p \"Extract all text verbatim and describe any diagrams or images in detail\"\n\n   # Image description with custom prompt\n   describe-image -i image.jpg -p \"List the main colors and describe the layout of elements\"\n   ```\n\n2. **Library Usage**\n   ```python\n   # File extraction with custom prompt\n   extractor = create_extractor(\n       \"pdf\",\n       extractor_type=\"page_as_image\",\n       prompt=\"Extract all text exactly as it appears and provide detailed descriptions of any charts or diagrams\"\n   )\n   output_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n   # Image description with custom prompt\n   description = describe_image_openai(\n       \"image.jpg\",\n       prompt=\"Focus on spatial relationships between objects and any text content\"\n   )\n   ```\n\n3. **Environment Variable**\n   ```bash\n   # Set default prompt via environment variable\n   export FILE_EXTRACTOR_PROMPT=\"Extract text and describe visual elements with emphasis on layout\"\n   ```\n\n### Writing Effective Prompts\n\n1. **For Page-as-Image Method**\n   - Include instructions for both text extraction and visual description since the entire page is processed as an image\n   - Example: \"Extract the exact text as it appears on the page and describe any images, diagrams, or visual elements in detail\"\n\n2. **For Text-and-Images Method**\n   - Focus only on image description since text is extracted separately\n   - The model only sees the images, not the text content\n   - Example: \"Describe the visual content, focusing on what the image represents and any visual elements it contains\"\n\n3. **For Image Description**\n   - Be specific about what aspects to focus on\n   - Example: \"Describe the main elements, their arrangement, and any text visible in the image\"\n\nNote: For page-as-image method, prompts must include both text extraction and visual description instructions as the entire page is processed as an image. For text-and-images method, prompts should focus solely on image description as text is handled separately.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A Python library for extracting and describing content from documents using Vision LLMs",
    "version": "0.2.4",
    "project_urls": {
        "Homepage": "https://github.com/MDGrey33/content-extractor-with-vision",
        "Repository": "https://github.com/MDGrey33/content-extractor-with-vision"
    },
    "split_keywords": [
        "pdf",
        " docx",
        " pptx",
        " html",
        " vision",
        " llm",
        " extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d0c94de5b2ac2bdd753e7f44fa3524ab13f0d1caba87a048dc7aea40b27cb55",
                "md5": "0200d95e614016b06bef0a2bf0077de0",
                "sha256": "6c577be11a5c10ebeed9b1fa590605f007e8b72fb38c53a49049a28a679053c5"
            },
            "downloads": -1,
            "filename": "pyvisionai-0.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0200d95e614016b06bef0a2bf0077de0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 36430,
            "upload_time": "2025-01-15T20:52:49",
            "upload_time_iso_8601": "2025-01-15T20:52:49.874424Z",
            "url": "https://files.pythonhosted.org/packages/7d/0c/94de5b2ac2bdd753e7f44fa3524ab13f0d1caba87a048dc7aea40b27cb55/pyvisionai-0.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e40ec9cb5ee7bbd48bc03ea0bbd5b7d8d4f5a82198405584bcde6ef2f927b41",
                "md5": "b64fd2397f29580f41a038c41106e490",
                "sha256": "988cc92781e3ac6bc23ac42e1f674ef0a861d5cfe2df9dd265cbfed3470e1738"
            },
            "downloads": -1,
            "filename": "pyvisionai-0.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b64fd2397f29580f41a038c41106e490",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 28025,
            "upload_time": "2025-01-15T20:52:54",
            "upload_time_iso_8601": "2025-01-15T20:52:54.082301Z",
            "url": "https://files.pythonhosted.org/packages/8e/40/ec9cb5ee7bbd48bc03ea0bbd5b7d8d4f5a82198405584bcde6ef2f927b41/pyvisionai-0.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-15 20:52:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MDGrey33",
    "github_project": "content-extractor-with-vision",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyvisionai"
}
        
Elapsed time: 0.71031s