pdf-searchable-ocr

Name	pdf-searchable-ocr JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	A simple Python package for OCR with searchable PDF generation using PaddleOCR
upload_time	2025-10-09 12:39:52
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	computer-vision ocr paddleocr pdf searchable-pdf text-recognition
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PaddleOCR Python Project

# pdf-searchable-ocr

A simple and powerful Python package for Optical Character Recognition (OCR) with searchable PDF generation using PaddleOCR.

## Features

- 🔍 **High-accuracy OCR** using PaddleOCR
- 📄 **Searchable PDF generation** with invisible text layers
- 🎨 **Bounding box visualization** for OCR results
- 🌍 **Multi-language support** (80+ languages)
- ⚡ **GPU acceleration** support
- 🔧 **Simple class-based API**
- 📦 **Easy installation and usage**

## Installation

### Using pip (recommended)

```bash
pip install pdf-searchable-ocr
```

### Using uv (for development)

```bash
git clone <repository-url>
cd pdf-searchable-ocr
uv sync
```

## Quick Start

### Basic Usage

```python
from py_ocr import OCRProcessor

# Initialize the OCR processor
ocr = OCRProcessor(lang='en', verbose=True)

# Process an image
ocr_result = ocr.process_image('path/to/your/image.jpg')

# Create a searchable PDF
pdf_path = ocr.create_searchable_pdf('path/to/your/image.jpg', ocr_result)

# Draw bounding boxes for visualization
boxed_image = ocr.draw_bounding_boxes('path/to/your/image.jpg', ocr_result)

print(f"Searchable PDF created: {pdf_path}")
print(f"Image with bounding boxes: {boxed_image}")
```

### CLI Usage

The package also provides a command-line tool:

```bash
# Basic usage - creates searchable PDF only
pdf-searchable-ocr input.jpg

# Specify custom output PDF name
pdf-searchable-ocr input.jpg --output-pdf my_document.pdf

# Enable bounding box visualization
pdf-searchable-ocr input.jpg --bounding-boxes

# Full options with custom names
pdf-searchable-ocr invoice.jpg \
    --output-pdf invoice_searchable.pdf \
    --output-prefix invoice_processed \
    --bounding-boxes \
    --lang en
```

### Complete Workflow

```python
from py_ocr import OCRProcessor

# Initialize processor
ocr = OCRProcessor(lang='en', use_gpu=False, verbose=True)

# Process image with custom PDF name and bounding boxes enabled
results = ocr.process_and_generate_all(
    'invoice.jpg', 
    output_pdf='invoice_searchable.pdf',
    output_prefix='invoice_processed',
    bounding_boxes=True
)

if results['searchable_pdf']:
    print(f"✅ Searchable PDF: {results['searchable_pdf']}")
if results['boxed_image']:
    print(f"✅ Visualization: {results['boxed_image']}")

# Or use defaults (no bounding boxes)
results = ocr.process_and_generate_all('document.jpg')
```

### Using Sample Images

```python
from py_ocr import OCRProcessor

# Initialize processor
ocr = OCRProcessor()

# Download a sample image for testing
image_path = ocr.download_sample_image()

# Process with custom settings
results = ocr.process_and_generate_all(
    image_path,
    output_pdf='sample_searchable.pdf',
    output_prefix='sample',
    bounding_boxes=True  # Enable visualization
)
```

## API Reference

### OCRProcessor Class

#### `__init__(lang='en', use_gpu=False, verbose=True, **kwargs)`

Initialize the OCR processor.

**Parameters:**
- `lang` (str): Language for OCR recognition (default: 'en')
- `use_gpu` (bool): Whether to use GPU acceleration (default: False)
- `verbose` (bool): Whether to print verbose output (default: True)
- `**kwargs`: Additional arguments passed to PaddleOCR

#### `process_image(image_path: str) -> Dict[str, Any]`

Perform OCR on an image.

**Parameters:**
- `image_path` (str): Path to the image file

**Returns:**
- `dict`: OCR results containing texts, scores, and bounding boxes
- `None`: If OCR failed

#### `create_searchable_pdf(image_path: str, ocr_result: dict, output_pdf: str) -> str`

Create a searchable PDF with invisible text layers.

**Parameters:**
- `image_path` (str): Path to the source image
- `ocr_result` (dict): OCR results from `process_image()`
- `output_pdf` (str): Output PDF filename (default: "searchable_output.pdf")

**Returns:**
- `str`: Path to the created PDF
- `None`: If creation failed

#### `draw_bounding_boxes(image_path: str, ocr_result: dict, output_image: str) -> str`

Draw bounding boxes on the image to visualize OCR detection.

**Parameters:**
- `image_path` (str): Path to the source image
- `ocr_result` (dict): OCR results from `process_image()`
- `output_image` (str): Output image filename (default: "image_with_boxes.jpg")

**Returns:**
- `str`: Path to the image with bounding boxes
- `None`: If creation failed

#### `process_and_generate_all(image_path: str, output_pdf: str, output_prefix: str, bounding_boxes: bool) -> dict`

Complete workflow: OCR + Searchable PDF + Optional Bounding Box Image.

**Parameters:**
- `image_path` (str): Path to the input image
- `output_pdf` (str): Output PDF filename (default: "searchable_output.pdf")
- `output_prefix` (str): Prefix for output files (default: "output")
- `bounding_boxes` (bool): Whether to generate bounding box visualization (default: False)

**Returns:**
- `dict`: Dictionary containing paths to all generated files

## Supported Languages

pdf-searchable-ocr supports 80+ languages through PaddleOCR. Some popular ones include:

- `en` - English
- `ch` - Chinese (Simplified)
- `french` - French
- `german` - German
- `korean` - Korean
- `japan` - Japanese
- `it` - Italian
- `xi` - Spanish
- `ru` - Russian
- `ar` - Arabic

For the complete list, see [PaddleOCR documentation](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/multi_languages_en.md).

## Advanced Configuration

### Output Control

```python
# Control output files and features
ocr = OCRProcessor(lang='en')

# Minimal processing - only searchable PDF
results = ocr.process_and_generate_all(
    'document.jpg',
    output_pdf='my_document.pdf',
    bounding_boxes=False  # Skip visualization
)

# Full processing with custom names
results = ocr.process_and_generate_all(
    'invoice.jpg',
    output_pdf='invoice_searchable.pdf',
    output_prefix='invoice_analysis',
    bounding_boxes=True  # Include visualization
)

# Generated files:
# - invoice_searchable.pdf (searchable PDF)
# - invoice_analysis_with_boxes.jpg (visualization)
```

### GPU Acceleration

```python
# Enable GPU acceleration (requires CUDA)
ocr = OCRProcessor(lang='en', use_gpu=True)
```

### Custom PaddleOCR Settings

```python
# Pass additional PaddleOCR parameters
ocr = OCRProcessor(
    lang='en',
    use_angle_cls=True,              # Enable angle classification
    use_textline_orientation=True,   # Enable text line orientation
    det_model_dir='custom/det/path', # Custom detection model
    rec_model_dir='custom/rec/path'  # Custom recognition model
)
```

### Batch Processing

```python
from py_ocr import OCRProcessor
import os

ocr = OCRProcessor(lang='en')

# Process multiple images
image_folder = 'path/to/images'
for filename in os.listdir(image_folder):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff')):
        image_path = os.path.join(image_folder, filename)
        base_name = os.path.splitext(filename)[0]
        
        # Process with custom output names
        results = ocr.process_and_generate_all(
            image_path, 
            output_pdf=f"searchable_{base_name}.pdf",
            output_prefix=f"processed_{base_name}",
            bounding_boxes=True  # Generate visualizations
        )
        print(f"Processed: {filename}")
```

## Output Examples

### Console Output
```
🔧 Initializing OCR engine with language: en
✅ OCR engine initialized successfully
📁 Using existing sample image: sample_image.jpg
🔍 Processing image: sample_image.jpg
📊 Text blocks detected: 48 | Average confidence: 0.984
✅ Searchable PDF saved as: my_document.pdf
🎨 Drawing 48 bounding boxes on image...
✅ Image with bounding boxes saved as: output_with_boxes.jpg
```

### Generated Files
When using `bounding_boxes=True`:
- `my_document.pdf` - Searchable PDF with invisible text layers
- `output_with_boxes.jpg` - Original image with colored bounding boxes

When using `bounding_boxes=False` (default):
- `my_document.pdf` - Searchable PDF only (faster processing)

## Requirements

- Python >= 3.8
- PaddleOCR >= 2.7.0
- OpenCV >= 4.0
- ReportLab >= 4.0
- Pillow >= 8.0

## Installation from Source

```bash
# Clone the repository
git clone <repository-url>
cd pdf-searchable-ocr

# Install with uv (recommended for development)
uv sync

# Or install with pip
pip install -e .
```

## Development

### Running Tests
```bash
uv run python -m pytest tests/
```

### Code Formatting
```bash
uv run black py_ocr/
uv run isort py_ocr/
```

### Type Checking
```bash
uv run mypy py_ocr/
```

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) for the excellent OCR engine
- [ReportLab](https://www.reportlab.com/) for PDF generation capabilities
- [OpenCV](https://opencv.org/) for image processing

## Changelog

### v0.1.0
- Initial release
- Basic OCR functionality
- Searchable PDF generation
- Bounding box visualization
- Multi-language support

## Support

If you encounter any issues or have questions:

1. Check the [Issues](../../issues) page
2. Create a new issue with detailed information
3. Contact the maintainers

## Roadmap

- [ ] Web interface for easy usage
- [ ] Batch processing CLI tool
- [ ] Docker container
- [ ] Additional output formats (Excel, Word)
- [ ] OCR result caching
- [ ] Performance optimizations

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdf-searchable-ocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Jasmin Mistry <mistry.jasmin@gmail.com>",
    "keywords": "computer-vision, ocr, paddleocr, pdf, searchable-pdf, text-recognition",
    "author": null,
    "author_email": "Jasmin Mistry <mistry.jasmin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/00/ca/3a1fded56e0e65e4f961e4d9b62715eabe3da7a93fb150495b039d8ecb04/pdf_searchable_ocr-0.1.1.tar.gz",
    "platform": null,
    "description": "# PaddleOCR Python Project\n\n# pdf-searchable-ocr\n\nA simple and powerful Python package for Optical Character Recognition (OCR) with searchable PDF generation using PaddleOCR.\n\n## Features\n\n- \ud83d\udd0d **High-accuracy OCR** using PaddleOCR\n- \ud83d\udcc4 **Searchable PDF generation** with invisible text layers\n- \ud83c\udfa8 **Bounding box visualization** for OCR results\n- \ud83c\udf0d **Multi-language support** (80+ languages)\n- \u26a1 **GPU acceleration** support\n- \ud83d\udd27 **Simple class-based API**\n- \ud83d\udce6 **Easy installation and usage**\n\n## Installation\n\n### Using pip (recommended)\n\n```bash\npip install pdf-searchable-ocr\n```\n\n### Using uv (for development)\n\n```bash\ngit clone <repository-url>\ncd pdf-searchable-ocr\nuv sync\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nfrom py_ocr import OCRProcessor\n\n# Initialize the OCR processor\nocr = OCRProcessor(lang='en', verbose=True)\n\n# Process an image\nocr_result = ocr.process_image('path/to/your/image.jpg')\n\n# Create a searchable PDF\npdf_path = ocr.create_searchable_pdf('path/to/your/image.jpg', ocr_result)\n\n# Draw bounding boxes for visualization\nboxed_image = ocr.draw_bounding_boxes('path/to/your/image.jpg', ocr_result)\n\nprint(f\"Searchable PDF created: {pdf_path}\")\nprint(f\"Image with bounding boxes: {boxed_image}\")\n```\n\n### CLI Usage\n\nThe package also provides a command-line tool:\n\n```bash\n# Basic usage - creates searchable PDF only\npdf-searchable-ocr input.jpg\n\n# Specify custom output PDF name\npdf-searchable-ocr input.jpg --output-pdf my_document.pdf\n\n# Enable bounding box visualization\npdf-searchable-ocr input.jpg --bounding-boxes\n\n# Full options with custom names\npdf-searchable-ocr invoice.jpg \\\n    --output-pdf invoice_searchable.pdf \\\n    --output-prefix invoice_processed \\\n    --bounding-boxes \\\n    --lang en\n```\n\n### Complete Workflow\n\n```python\nfrom py_ocr import OCRProcessor\n\n# Initialize processor\nocr = OCRProcessor(lang='en', use_gpu=False, verbose=True)\n\n# Process image with custom PDF name and bounding boxes enabled\nresults = ocr.process_and_generate_all(\n    'invoice.jpg', \n    output_pdf='invoice_searchable.pdf',\n    output_prefix='invoice_processed',\n    bounding_boxes=True\n)\n\nif results['searchable_pdf']:\n    print(f\"\u2705 Searchable PDF: {results['searchable_pdf']}\")\nif results['boxed_image']:\n    print(f\"\u2705 Visualization: {results['boxed_image']}\")\n\n# Or use defaults (no bounding boxes)\nresults = ocr.process_and_generate_all('document.jpg')\n```\n\n### Using Sample Images\n\n```python\nfrom py_ocr import OCRProcessor\n\n# Initialize processor\nocr = OCRProcessor()\n\n# Download a sample image for testing\nimage_path = ocr.download_sample_image()\n\n# Process with custom settings\nresults = ocr.process_and_generate_all(\n    image_path,\n    output_pdf='sample_searchable.pdf',\n    output_prefix='sample',\n    bounding_boxes=True  # Enable visualization\n)\n```\n\n## API Reference\n\n### OCRProcessor Class\n\n#### `__init__(lang='en', use_gpu=False, verbose=True, **kwargs)`\n\nInitialize the OCR processor.\n\n**Parameters:**\n- `lang` (str): Language for OCR recognition (default: 'en')\n- `use_gpu` (bool): Whether to use GPU acceleration (default: False)\n- `verbose` (bool): Whether to print verbose output (default: True)\n- `**kwargs`: Additional arguments passed to PaddleOCR\n\n#### `process_image(image_path: str) -> Dict[str, Any]`\n\nPerform OCR on an image.\n\n**Parameters:**\n- `image_path` (str): Path to the image file\n\n**Returns:**\n- `dict`: OCR results containing texts, scores, and bounding boxes\n- `None`: If OCR failed\n\n#### `create_searchable_pdf(image_path: str, ocr_result: dict, output_pdf: str) -> str`\n\nCreate a searchable PDF with invisible text layers.\n\n**Parameters:**\n- `image_path` (str): Path to the source image\n- `ocr_result` (dict): OCR results from `process_image()`\n- `output_pdf` (str): Output PDF filename (default: \"searchable_output.pdf\")\n\n**Returns:**\n- `str`: Path to the created PDF\n- `None`: If creation failed\n\n#### `draw_bounding_boxes(image_path: str, ocr_result: dict, output_image: str) -> str`\n\nDraw bounding boxes on the image to visualize OCR detection.\n\n**Parameters:**\n- `image_path` (str): Path to the source image\n- `ocr_result` (dict): OCR results from `process_image()`\n- `output_image` (str): Output image filename (default: \"image_with_boxes.jpg\")\n\n**Returns:**\n- `str`: Path to the image with bounding boxes\n- `None`: If creation failed\n\n#### `process_and_generate_all(image_path: str, output_pdf: str, output_prefix: str, bounding_boxes: bool) -> dict`\n\nComplete workflow: OCR + Searchable PDF + Optional Bounding Box Image.\n\n**Parameters:**\n- `image_path` (str): Path to the input image\n- `output_pdf` (str): Output PDF filename (default: \"searchable_output.pdf\")\n- `output_prefix` (str): Prefix for output files (default: \"output\")\n- `bounding_boxes` (bool): Whether to generate bounding box visualization (default: False)\n\n**Returns:**\n- `dict`: Dictionary containing paths to all generated files\n\n## Supported Languages\n\npdf-searchable-ocr supports 80+ languages through PaddleOCR. Some popular ones include:\n\n- `en` - English\n- `ch` - Chinese (Simplified)\n- `french` - French\n- `german` - German\n- `korean` - Korean\n- `japan` - Japanese\n- `it` - Italian\n- `xi` - Spanish\n- `ru` - Russian\n- `ar` - Arabic\n\nFor the complete list, see [PaddleOCR documentation](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/multi_languages_en.md).\n\n## Advanced Configuration\n\n### Output Control\n\n```python\n# Control output files and features\nocr = OCRProcessor(lang='en')\n\n# Minimal processing - only searchable PDF\nresults = ocr.process_and_generate_all(\n    'document.jpg',\n    output_pdf='my_document.pdf',\n    bounding_boxes=False  # Skip visualization\n)\n\n# Full processing with custom names\nresults = ocr.process_and_generate_all(\n    'invoice.jpg',\n    output_pdf='invoice_searchable.pdf',\n    output_prefix='invoice_analysis',\n    bounding_boxes=True  # Include visualization\n)\n\n# Generated files:\n# - invoice_searchable.pdf (searchable PDF)\n# - invoice_analysis_with_boxes.jpg (visualization)\n```\n\n### GPU Acceleration\n\n```python\n# Enable GPU acceleration (requires CUDA)\nocr = OCRProcessor(lang='en', use_gpu=True)\n```\n\n### Custom PaddleOCR Settings\n\n```python\n# Pass additional PaddleOCR parameters\nocr = OCRProcessor(\n    lang='en',\n    use_angle_cls=True,              # Enable angle classification\n    use_textline_orientation=True,   # Enable text line orientation\n    det_model_dir='custom/det/path', # Custom detection model\n    rec_model_dir='custom/rec/path'  # Custom recognition model\n)\n```\n\n### Batch Processing\n\n```python\nfrom py_ocr import OCRProcessor\nimport os\n\nocr = OCRProcessor(lang='en')\n\n# Process multiple images\nimage_folder = 'path/to/images'\nfor filename in os.listdir(image_folder):\n    if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff')):\n        image_path = os.path.join(image_folder, filename)\n        base_name = os.path.splitext(filename)[0]\n        \n        # Process with custom output names\n        results = ocr.process_and_generate_all(\n            image_path, \n            output_pdf=f\"searchable_{base_name}.pdf\",\n            output_prefix=f\"processed_{base_name}\",\n            bounding_boxes=True  # Generate visualizations\n        )\n        print(f\"Processed: {filename}\")\n```\n\n## Output Examples\n\n### Console Output\n```\n\ud83d\udd27 Initializing OCR engine with language: en\n\u2705 OCR engine initialized successfully\n\ud83d\udcc1 Using existing sample image: sample_image.jpg\n\ud83d\udd0d Processing image: sample_image.jpg\n\ud83d\udcca Text blocks detected: 48 | Average confidence: 0.984\n\u2705 Searchable PDF saved as: my_document.pdf\n\ud83c\udfa8 Drawing 48 bounding boxes on image...\n\u2705 Image with bounding boxes saved as: output_with_boxes.jpg\n```\n\n### Generated Files\nWhen using `bounding_boxes=True`:\n- `my_document.pdf` - Searchable PDF with invisible text layers\n- `output_with_boxes.jpg` - Original image with colored bounding boxes\n\nWhen using `bounding_boxes=False` (default):\n- `my_document.pdf` - Searchable PDF only (faster processing)\n\n## Requirements\n\n- Python >= 3.8\n- PaddleOCR >= 2.7.0\n- OpenCV >= 4.0\n- ReportLab >= 4.0\n- Pillow >= 8.0\n\n## Installation from Source\n\n```bash\n# Clone the repository\ngit clone <repository-url>\ncd pdf-searchable-ocr\n\n# Install with uv (recommended for development)\nuv sync\n\n# Or install with pip\npip install -e .\n```\n\n## Development\n\n### Running Tests\n```bash\nuv run python -m pytest tests/\n```\n\n### Code Formatting\n```bash\nuv run black py_ocr/\nuv run isort py_ocr/\n```\n\n### Type Checking\n```bash\nuv run mypy py_ocr/\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) for the excellent OCR engine\n- [ReportLab](https://www.reportlab.com/) for PDF generation capabilities\n- [OpenCV](https://opencv.org/) for image processing\n\n## Changelog\n\n### v0.1.0\n- Initial release\n- Basic OCR functionality\n- Searchable PDF generation\n- Bounding box visualization\n- Multi-language support\n\n## Support\n\nIf you encounter any issues or have questions:\n\n1. Check the [Issues](../../issues) page\n2. Create a new issue with detailed information\n3. Contact the maintainers\n\n## Roadmap\n\n- [ ] Web interface for easy usage\n- [ ] Batch processing CLI tool\n- [ ] Docker container\n- [ ] Additional output formats (Excel, Word)\n- [ ] OCR result caching\n- [ ] Performance optimizations",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple Python package for OCR with searchable PDF generation using PaddleOCR",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/jasminmistry/pdf-searchable-ocr/issues",
        "Documentation": "https://github.com/jasminmistry/pdf-searchable-ocr#readme",
        "Homepage": "https://github.com/jasminmistry/pdf-searchable-ocr",
        "Repository": "https://github.com/jasminmistry/pdf-searchable-ocr.git"
    },
    "split_keywords": [
        "computer-vision",
        " ocr",
        " paddleocr",
        " pdf",
        " searchable-pdf",
        " text-recognition"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0bf7294005d2fa843219f2f57fb19db6d520d6e413917caa2912f874441bb222",
                "md5": "030b392f2befd0fa71336ae8326a9016",
                "sha256": "5cb5b709155b89348590e444f8232fa429767ab94104528d97122da7a01753f1"
            },
            "downloads": -1,
            "filename": "pdf_searchable_ocr-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "030b392f2befd0fa71336ae8326a9016",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10987,
            "upload_time": "2025-10-09T12:39:50",
            "upload_time_iso_8601": "2025-10-09T12:39:50.620948Z",
            "url": "https://files.pythonhosted.org/packages/0b/f7/294005d2fa843219f2f57fb19db6d520d6e413917caa2912f874441bb222/pdf_searchable_ocr-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "00ca3a1fded56e0e65e4f961e4d9b62715eabe3da7a93fb150495b039d8ecb04",
                "md5": "d3c62010d081655de4c1112536eee6eb",
                "sha256": "103611c1e4f231d38883d888692baf99ef2654bc90e96bf04ee73f002ec11941"
            },
            "downloads": -1,
            "filename": "pdf_searchable_ocr-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d3c62010d081655de4c1112536eee6eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1129833,
            "upload_time": "2025-10-09T12:39:52",
            "upload_time_iso_8601": "2025-10-09T12:39:52.238544Z",
            "url": "https://files.pythonhosted.org/packages/00/ca/3a1fded56e0e65e4f961e4d9b62715eabe3da7a93fb150495b039d8ecb04/pdf_searchable_ocr-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-09 12:39:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jasminmistry",
    "github_project": "pdf-searchable-ocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pdf-searchable-ocr"
}

None