Name | miniocr JSON |
Version |
0.0.4
JSON |
| download |
home_page | None |
Summary | A small OCR package |
upload_time | 2025-07-10 15:22:51 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MiniOCR
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
A powerful and easy-to-use Python package for performing Optical Character Recognition (OCR) on images, PDF documents, and PowerPoint presentations using OpenAI's Vision API.
## Features
- 🖼️ **Multi-format support**: Process images (PNG, JPG, JPEG, GIF, BMP, TIFF, WebP), PDF files, and PPTX presentations
- ⚡ **Parallel processing**: Concurrent processing of multiple pages/slides for improved performance
- 🌐 **Cross-platform**: Works on Windows, macOS, and Linux
- 📄 **Visual OCR for PPTX**: Converts PowerPoint slides to images for accurate visual content extraction
- 🔄 **Async support**: Built with asyncio for efficient processing
- 📝 **Markdown output**: Converts documents to clean, structured markdown format
## Installation
### System Requirements
- **Python**: 3.8 or higher
- **Operating System**: Windows, macOS, or Linux
### Dependencies
#### Python Dependencies (Auto-installed)
MiniOCR automatically installs these Python packages:
- **openai** - OpenAI API client for Vision processing
- **aiohttp** - Async HTTP client for file downloads
- **aiofiles** - Async file operations
- **pdf2image** - PDF to image conversion
- **python-pptx** - PowerPoint file parsing
- **Pillow** - Image processing and manipulation
#### System Dependencies (Manual Installation Required)
##### For PDF Processing - Poppler
**macOS:**
```bash
brew install poppler
```
**Ubuntu/Debian:**
```bash
sudo apt-get install poppler-utils
```
**Windows:**
Download and install from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)
##### For PPTX Visual Processing - LibreOffice
**macOS:**
```bash
brew install --cask libreoffice
```
**Ubuntu/Debian:**
```bash
sudo apt-get install libreoffice
```
**Windows:**
Download and install from [LibreOffice official website](https://www.libreoffice.org/download/download/)
> **Note**: LibreOffice is required for high-quality PPTX processing with visual content extraction. Without it, MiniOCR will fall back to text-only extraction. On Windows, MiniOCR automatically detects LibreOffice in common installation paths.
### Install MiniOCR
#### From PyPI (Recommended)
```bash
pip install miniocr
```
#### From Source
```bash
git clone https://github.com/w95/miniocr.git
cd miniocr
pip install -e .
```
### Verify Installation
```python
from miniocr import MiniOCR, __version__
print(f"MiniOCR v{__version__} installed successfully!")
```
## Quick Start
### Setup
First, you'll need an OpenAI API key. Set it as an environment variable:
```bash
export OPENAI_API_KEY="your-api-key-here"
```
Or pass it directly when initializing the class.
### Basic Usage
```python
import asyncio
from miniocr import MiniOCR
async def main():
# Initialize with API key (or use environment variable)
ocr = MiniOCR(api_key="your-api-key-here")
# Process an image
result = await ocr.ocr("path/to/image.jpg")
print(result["content"])
# Process a PDF
result = await ocr.ocr("path/to/document.pdf")
print(f"Processed {result['pages']} pages")
print(result["content"])
# Process a PowerPoint presentation
result = await ocr.ocr("path/to/presentation.pptx")
print(result["content"])
if __name__ == "__main__":
asyncio.run(main())
```
### Advanced Usage
```python
import asyncio
from miniocr import MiniOCR
async def advanced_example():
ocr = MiniOCR()
# Process with custom settings
result = await ocr.ocr(
file_path="document.pdf",
model="gpt-4o", # Use different OpenAI model
concurrency=10, # Process up to 10 pages simultaneously
output_dir="./output", # Save markdown to file
cleanup=True # Clean up temporary files
)
print(f"File: {result['file_name']}")
print(f"Pages processed: {result['pages']}")
print(f"Content length: {len(result['content'])} characters")
asyncio.run(advanced_example())
```
### Processing URLs
```python
import asyncio
from miniocr import MiniOCR
async def process_url():
ocr = MiniOCR()
# Process a file from URL
result = await ocr.ocr("https://example.com/document.pdf")
print(result["content"])
asyncio.run(process_url())
```
## API Reference
### MiniOCR Class
#### `__init__(api_key: str = None)`
Initialize the MiniOCR instance.
**Parameters:**
- `api_key` (str, optional): OpenAI API key. If not provided, will use `OPENAI_API_KEY` environment variable.
#### `async ocr(file_path, model="gpt-4o-mini", concurrency=5, output_dir=None, cleanup=True)`
Process a file and extract text using OCR.
**Parameters:**
- `file_path` (str): Path or URL to the file to process
- `model` (str): OpenAI model to use (default: "gpt-4o-mini")
- `concurrency` (int): Number of concurrent API requests (default: 5)
- `output_dir` (str, optional): Directory to save markdown output
- `cleanup` (bool): Whether to clean up temporary files (default: True)
**Returns:**
- `dict`: Dictionary containing:
- `content` (str): Extracted text in markdown format
- `pages` (int): Number of pages/slides processed
- `file_name` (str): Name of the processed file
**Supported file types:**
- Images: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.webp`
- Documents: `.pdf`
- Presentations: `.pptx`
## Configuration
### Required Configuration
#### OpenAI API Key
You **must** provide an OpenAI API key to use MiniOCR. You can set it in two ways:
**Option 1: Environment Variable (Recommended)**
```bash
export OPENAI_API_KEY="your-api-key-here"
```
**Option 2: Pass directly to class**
```python
from miniocr import MiniOCR
ocr = MiniOCR(api_key="your-api-key-here")
```
### Optional Dependencies Behavior
#### Without Poppler (PDF Processing)
- **Effect**: PDF processing will fail
- **Error**: `pdf2image` will raise an exception
- **Solution**: Install Poppler following the instructions above
#### Without LibreOffice (PPTX Processing)
- **Effect**: Falls back to text-only extraction from PPTX files
- **Warning**: Visual content (charts, images, formatting) will be lost
- **Solution**: Install LibreOffice for full visual processing capabilities
### Model Options
MiniOCR supports various OpenAI models:
- `gpt-4o-mini` (default, cost-effective)
- `gpt-4o` (higher accuracy)
- `gpt-4-turbo`
## Output Format
MiniOCR converts documents to clean markdown with the following features:
- **Tables**: Converted to HTML format for better structure
- **Checkboxes**: Represented as ☐ (unchecked) and ☑ (checked)
- **Special elements**: Logos, watermarks, and page numbers are wrapped in brackets
- **Charts and infographics**: Interpreted and converted to markdown tables when applicable
## Error Handling
```python
import asyncio
from miniocr import MiniOCR
async def handle_errors():
ocr = MiniOCR()
try:
result = await ocr.ocr("nonexistent.pdf")
except ValueError as e:
print(f"Unsupported file type: {e}")
except Exception as e:
print(f"Processing error: {e}")
asyncio.run(handle_errors())
```
## Troubleshooting
### Common Issues
#### "No module named 'pdf2image'" or PDF processing fails
**Solution**: Install Poppler system dependency
```bash
# macOS
brew install poppler
# Ubuntu/Debian
sudo apt-get install poppler-utils
```
#### "LibreOffice conversion failed" for PPTX files
**Solution**: Install LibreOffice
```bash
# macOS
brew install --cask libreoffice
# Ubuntu/Debian
sudo apt-get install libreoffice
```
#### "Invalid API key" or OpenAI authentication errors
**Solution**: Verify your OpenAI API key
```bash
# Check if environment variable is set
echo $OPENAI_API_KEY
# Or test with a simple script
python -c "from openai import OpenAI; client = OpenAI(); print('API key is valid')"
```
#### PPTX files show only text content, missing charts/images
**Cause**: LibreOffice not installed, falling back to text-only extraction
**Solution**: Install LibreOffice for full visual processing capabilities
#### "soffice command not found" errors
**Cause**: LibreOffice not in system PATH
**Solutions**:
- **macOS**: Ensure LibreOffice is installed via Homebrew: `brew install --cask libreoffice`
- **Linux**: Install via package manager: `sudo apt-get install libreoffice`
- **Windows**: Add LibreOffice to PATH or reinstall
#### Rate limiting or quota exceeded errors
**Cause**: Too many requests to OpenAI API
**Solutions**:
- Reduce `concurrency` parameter (try 1-3 for free tier)
- Add delays between processing batches
- Upgrade your OpenAI plan for higher rate limits
### Dependencies Verification
#### Check Python Dependencies
```python
# Verify MiniOCR installation
from miniocr import MiniOCR, __version__
print(f"MiniOCR v{__version__} installed")
# Check key dependencies
import openai, aiohttp, pdf2image, pptx
print("All Python dependencies available")
```
#### Check System Dependencies
```bash
# Test Poppler installation
pdftoppm -h
# Test LibreOffice installation
soffice --version
```
## Performance Tips
1. **Adjust concurrency**: Increase `concurrency` parameter for faster processing of multi-page documents
2. **Use appropriate models**: `gpt-4o-mini` for cost-effectiveness, `gpt-4o` for higher accuracy
3. **Process in batches**: For large numbers of files, process them in batches to avoid rate limits
4. **Local processing**: Keep files local when possible to avoid download overhead
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Testing
Run the test suite:
```bash
pytest tests/
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
### v0.0.4
- **Enhanced Windows Support**: Improved LibreOffice detection on Windows systems
- **Cross-platform Compatibility**: Automatically finds LibreOffice in common installation paths
- **Robust Error Handling**: Better timeout and fallback mechanisms for PPTX processing
- **Improved Reliability**: More resilient LibreOffice executable detection across platforms
### v0.0.3
- **Enhanced PPTX Processing**: Now uses LibreOffice for PDF conversion and OpenAI Vision API
- **Visual Content Extraction**: Captures charts, images, and formatting from PowerPoint slides
- **Improved Accuracy**: Better OCR results for complex PPTX layouts
- **Fallback Support**: Graceful degradation to text extraction if LibreOffice unavailable
- **Updated Documentation**: Comprehensive dependency and troubleshooting information
### v0.0.2
- **PyPI Publication**: Package published to Python Package Index
- **Improved Package Structure**: Better organization and imports
- **Enhanced README**: Complete documentation with examples
- **Testing Infrastructure**: Comprehensive test suite
### v0.0.1
- **Initial Release**: Basic OCR functionality
- **Multi-format Support**: Images, PDF, and PPTX files
- **Async Processing**: Concurrent processing with configurable limits
- **Cross-platform Compatibility**: Windows, macOS, and Linux support
## Support
If you encounter any issues or have questions, please [open an issue](https://github.com/w95/miniocr/issues) on GitHub.
## Acknowledgments
- Built with [OpenAI's Vision API](https://platform.openai.com/docs/guides/vision)
- Uses [pdf2image](https://github.com/Belval/pdf2image) for PDF processing
- Uses [python-pptx](https://github.com/scanny/python-pptx) for PowerPoint processing
Raw data
{
"_id": null,
"home_page": null,
"name": "miniocr",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Enrike Nur <enrike.nur@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8d/f3/02b685c7e466bc65457bc89dc10b18c8825a76ea2d740ea49303d938b485/miniocr-0.0.4.tar.gz",
"platform": null,
"description": "# MiniOCR\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nA powerful and easy-to-use Python package for performing Optical Character Recognition (OCR) on images, PDF documents, and PowerPoint presentations using OpenAI's Vision API.\n\n## Features\n\n- \ud83d\uddbc\ufe0f **Multi-format support**: Process images (PNG, JPG, JPEG, GIF, BMP, TIFF, WebP), PDF files, and PPTX presentations\n- \u26a1 **Parallel processing**: Concurrent processing of multiple pages/slides for improved performance\n- \ud83c\udf10 **Cross-platform**: Works on Windows, macOS, and Linux\n- \ud83d\udcc4 **Visual OCR for PPTX**: Converts PowerPoint slides to images for accurate visual content extraction\n- \ud83d\udd04 **Async support**: Built with asyncio for efficient processing\n- \ud83d\udcdd **Markdown output**: Converts documents to clean, structured markdown format\n\n## Installation\n\n### System Requirements\n\n- **Python**: 3.8 or higher\n- **Operating System**: Windows, macOS, or Linux\n\n### Dependencies\n\n#### Python Dependencies (Auto-installed)\n\nMiniOCR automatically installs these Python packages:\n\n- **openai** - OpenAI API client for Vision processing\n- **aiohttp** - Async HTTP client for file downloads\n- **aiofiles** - Async file operations\n- **pdf2image** - PDF to image conversion\n- **python-pptx** - PowerPoint file parsing\n- **Pillow** - Image processing and manipulation\n\n#### System Dependencies (Manual Installation Required)\n\n##### For PDF Processing - Poppler\n\n**macOS:**\n```bash\nbrew install poppler\n```\n\n**Ubuntu/Debian:**\n```bash\nsudo apt-get install poppler-utils\n```\n\n**Windows:**\nDownload and install from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n\n##### For PPTX Visual Processing - LibreOffice\n\n**macOS:**\n```bash\nbrew install --cask libreoffice\n```\n\n**Ubuntu/Debian:**\n```bash\nsudo apt-get install libreoffice\n```\n\n**Windows:**\nDownload and install from [LibreOffice official website](https://www.libreoffice.org/download/download/)\n\n> **Note**: LibreOffice is required for high-quality PPTX processing with visual content extraction. Without it, MiniOCR will fall back to text-only extraction. On Windows, MiniOCR automatically detects LibreOffice in common installation paths.\n\n### Install MiniOCR\n\n#### From PyPI (Recommended)\n\n```bash\npip install miniocr\n```\n\n#### From Source\n\n```bash\ngit clone https://github.com/w95/miniocr.git\ncd miniocr\npip install -e .\n```\n\n### Verify Installation\n\n```python\nfrom miniocr import MiniOCR, __version__\nprint(f\"MiniOCR v{__version__} installed successfully!\")\n```\n\n## Quick Start\n\n### Setup\n\nFirst, you'll need an OpenAI API key. Set it as an environment variable:\n\n```bash\nexport OPENAI_API_KEY=\"your-api-key-here\"\n```\n\nOr pass it directly when initializing the class.\n\n### Basic Usage\n\n```python\nimport asyncio\nfrom miniocr import MiniOCR\n\nasync def main():\n # Initialize with API key (or use environment variable)\n ocr = MiniOCR(api_key=\"your-api-key-here\")\n \n # Process an image\n result = await ocr.ocr(\"path/to/image.jpg\")\n print(result[\"content\"])\n \n # Process a PDF\n result = await ocr.ocr(\"path/to/document.pdf\")\n print(f\"Processed {result['pages']} pages\")\n print(result[\"content\"])\n \n # Process a PowerPoint presentation\n result = await ocr.ocr(\"path/to/presentation.pptx\")\n print(result[\"content\"])\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n### Advanced Usage\n\n```python\nimport asyncio\nfrom miniocr import MiniOCR\n\nasync def advanced_example():\n ocr = MiniOCR()\n \n # Process with custom settings\n result = await ocr.ocr(\n file_path=\"document.pdf\",\n model=\"gpt-4o\", # Use different OpenAI model\n concurrency=10, # Process up to 10 pages simultaneously\n output_dir=\"./output\", # Save markdown to file\n cleanup=True # Clean up temporary files\n )\n \n print(f\"File: {result['file_name']}\")\n print(f\"Pages processed: {result['pages']}\")\n print(f\"Content length: {len(result['content'])} characters\")\n\nasyncio.run(advanced_example())\n```\n\n### Processing URLs\n\n```python\nimport asyncio\nfrom miniocr import MiniOCR\n\nasync def process_url():\n ocr = MiniOCR()\n \n # Process a file from URL\n result = await ocr.ocr(\"https://example.com/document.pdf\")\n print(result[\"content\"])\n\nasyncio.run(process_url())\n```\n\n## API Reference\n\n### MiniOCR Class\n\n#### `__init__(api_key: str = None)`\n\nInitialize the MiniOCR instance.\n\n**Parameters:**\n- `api_key` (str, optional): OpenAI API key. If not provided, will use `OPENAI_API_KEY` environment variable.\n\n#### `async ocr(file_path, model=\"gpt-4o-mini\", concurrency=5, output_dir=None, cleanup=True)`\n\nProcess a file and extract text using OCR.\n\n**Parameters:**\n- `file_path` (str): Path or URL to the file to process\n- `model` (str): OpenAI model to use (default: \"gpt-4o-mini\")\n- `concurrency` (int): Number of concurrent API requests (default: 5)\n- `output_dir` (str, optional): Directory to save markdown output\n- `cleanup` (bool): Whether to clean up temporary files (default: True)\n\n**Returns:**\n- `dict`: Dictionary containing:\n - `content` (str): Extracted text in markdown format\n - `pages` (int): Number of pages/slides processed\n - `file_name` (str): Name of the processed file\n\n**Supported file types:**\n- Images: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.webp`\n- Documents: `.pdf`\n- Presentations: `.pptx`\n\n## Configuration\n\n### Required Configuration\n\n#### OpenAI API Key\n\nYou **must** provide an OpenAI API key to use MiniOCR. You can set it in two ways:\n\n**Option 1: Environment Variable (Recommended)**\n```bash\nexport OPENAI_API_KEY=\"your-api-key-here\"\n```\n\n**Option 2: Pass directly to class**\n```python\nfrom miniocr import MiniOCR\nocr = MiniOCR(api_key=\"your-api-key-here\")\n```\n\n### Optional Dependencies Behavior\n\n#### Without Poppler (PDF Processing)\n- **Effect**: PDF processing will fail\n- **Error**: `pdf2image` will raise an exception\n- **Solution**: Install Poppler following the instructions above\n\n#### Without LibreOffice (PPTX Processing)\n- **Effect**: Falls back to text-only extraction from PPTX files\n- **Warning**: Visual content (charts, images, formatting) will be lost\n- **Solution**: Install LibreOffice for full visual processing capabilities\n\n### Model Options\n\nMiniOCR supports various OpenAI models:\n- `gpt-4o-mini` (default, cost-effective)\n- `gpt-4o` (higher accuracy)\n- `gpt-4-turbo`\n\n## Output Format\n\nMiniOCR converts documents to clean markdown with the following features:\n\n- **Tables**: Converted to HTML format for better structure\n- **Checkboxes**: Represented as \u2610 (unchecked) and \u2611 (checked)\n- **Special elements**: Logos, watermarks, and page numbers are wrapped in brackets\n- **Charts and infographics**: Interpreted and converted to markdown tables when applicable\n\n## Error Handling\n\n```python\nimport asyncio\nfrom miniocr import MiniOCR\n\nasync def handle_errors():\n ocr = MiniOCR()\n \n try:\n result = await ocr.ocr(\"nonexistent.pdf\")\n except ValueError as e:\n print(f\"Unsupported file type: {e}\")\n except Exception as e:\n print(f\"Processing error: {e}\")\n\nasyncio.run(handle_errors())\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### \"No module named 'pdf2image'\" or PDF processing fails\n**Solution**: Install Poppler system dependency\n```bash\n# macOS\nbrew install poppler\n\n# Ubuntu/Debian \nsudo apt-get install poppler-utils\n```\n\n#### \"LibreOffice conversion failed\" for PPTX files\n**Solution**: Install LibreOffice\n```bash\n# macOS\nbrew install --cask libreoffice\n\n# Ubuntu/Debian\nsudo apt-get install libreoffice\n```\n\n#### \"Invalid API key\" or OpenAI authentication errors\n**Solution**: Verify your OpenAI API key\n```bash\n# Check if environment variable is set\necho $OPENAI_API_KEY\n\n# Or test with a simple script\npython -c \"from openai import OpenAI; client = OpenAI(); print('API key is valid')\"\n```\n\n#### PPTX files show only text content, missing charts/images\n**Cause**: LibreOffice not installed, falling back to text-only extraction \n**Solution**: Install LibreOffice for full visual processing capabilities\n\n#### \"soffice command not found\" errors\n**Cause**: LibreOffice not in system PATH \n**Solutions**:\n- **macOS**: Ensure LibreOffice is installed via Homebrew: `brew install --cask libreoffice`\n- **Linux**: Install via package manager: `sudo apt-get install libreoffice`\n- **Windows**: Add LibreOffice to PATH or reinstall\n\n#### Rate limiting or quota exceeded errors\n**Cause**: Too many requests to OpenAI API \n**Solutions**:\n- Reduce `concurrency` parameter (try 1-3 for free tier)\n- Add delays between processing batches\n- Upgrade your OpenAI plan for higher rate limits\n\n### Dependencies Verification\n\n#### Check Python Dependencies\n```python\n# Verify MiniOCR installation\nfrom miniocr import MiniOCR, __version__\nprint(f\"MiniOCR v{__version__} installed\")\n\n# Check key dependencies\nimport openai, aiohttp, pdf2image, pptx\nprint(\"All Python dependencies available\")\n```\n\n#### Check System Dependencies\n```bash\n# Test Poppler installation\npdftoppm -h\n\n# Test LibreOffice installation \nsoffice --version\n```\n\n## Performance Tips\n\n1. **Adjust concurrency**: Increase `concurrency` parameter for faster processing of multi-page documents\n2. **Use appropriate models**: `gpt-4o-mini` for cost-effectiveness, `gpt-4o` for higher accuracy\n3. **Process in batches**: For large numbers of files, process them in batches to avoid rate limits\n4. **Local processing**: Keep files local when possible to avoid download overhead\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## Testing\n\nRun the test suite:\n\n```bash\npytest tests/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Changelog\n\n### v0.0.4\n- **Enhanced Windows Support**: Improved LibreOffice detection on Windows systems\n- **Cross-platform Compatibility**: Automatically finds LibreOffice in common installation paths\n- **Robust Error Handling**: Better timeout and fallback mechanisms for PPTX processing\n- **Improved Reliability**: More resilient LibreOffice executable detection across platforms\n\n### v0.0.3\n- **Enhanced PPTX Processing**: Now uses LibreOffice for PDF conversion and OpenAI Vision API\n- **Visual Content Extraction**: Captures charts, images, and formatting from PowerPoint slides\n- **Improved Accuracy**: Better OCR results for complex PPTX layouts\n- **Fallback Support**: Graceful degradation to text extraction if LibreOffice unavailable\n- **Updated Documentation**: Comprehensive dependency and troubleshooting information\n\n### v0.0.2\n- **PyPI Publication**: Package published to Python Package Index\n- **Improved Package Structure**: Better organization and imports\n- **Enhanced README**: Complete documentation with examples\n- **Testing Infrastructure**: Comprehensive test suite\n\n### v0.0.1\n- **Initial Release**: Basic OCR functionality\n- **Multi-format Support**: Images, PDF, and PPTX files\n- **Async Processing**: Concurrent processing with configurable limits\n- **Cross-platform Compatibility**: Windows, macOS, and Linux support\n\n## Support\n\nIf you encounter any issues or have questions, please [open an issue](https://github.com/w95/miniocr/issues) on GitHub.\n\n## Acknowledgments\n\n- Built with [OpenAI's Vision API](https://platform.openai.com/docs/guides/vision)\n- Uses [pdf2image](https://github.com/Belval/pdf2image) for PDF processing\n- Uses [python-pptx](https://github.com/scanny/python-pptx) for PowerPoint processing\n",
"bugtrack_url": null,
"license": null,
"summary": "A small OCR package",
"version": "0.0.4",
"project_urls": {
"Bug Tracker": "https://github.com/w95/miniocr/issues",
"Homepage": "https://github.com/w95/miniocr"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5ede2b6b7cffba8c6d9f91cb017497b9ebb516c5fdd71510c3068d951a2f668b",
"md5": "16d26273cb4196b76ec3711b811bb8cf",
"sha256": "5ed1eded2d1e7fec9b8054de685fe0ecafe2ae8b14c2c4e84469879ab451923d"
},
"downloads": -1,
"filename": "miniocr-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "16d26273cb4196b76ec3711b811bb8cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 9654,
"upload_time": "2025-07-10T15:22:50",
"upload_time_iso_8601": "2025-07-10T15:22:50.087117Z",
"url": "https://files.pythonhosted.org/packages/5e/de/2b6b7cffba8c6d9f91cb017497b9ebb516c5fdd71510c3068d951a2f668b/miniocr-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8df302b685c7e466bc65457bc89dc10b18c8825a76ea2d740ea49303d938b485",
"md5": "c5a33bf3194e36771e0c03249aab308a",
"sha256": "ec5ad39ae9bf930e982703a6f86cd53acde1979d8f18bc0dc104720b54a6d06f"
},
"downloads": -1,
"filename": "miniocr-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "c5a33bf3194e36771e0c03249aab308a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13347,
"upload_time": "2025-07-10T15:22:51",
"upload_time_iso_8601": "2025-07-10T15:22:51.225804Z",
"url": "https://files.pythonhosted.org/packages/8d/f3/02b685c7e466bc65457bc89dc10b18c8825a76ea2d740ea49303d938b485/miniocr-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 15:22:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "w95",
"github_project": "miniocr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "miniocr"
}