Name | llama-ocr-py JSON |
Version |
0.1.0
JSON |
| download |
home_page | https://github.com/princexoleo/llama_ocr_py |
Summary | A package for document extraction using Ollama Vision model |
upload_time | 2024-12-02 05:14:24 |
maintainer | None |
docs_url | None |
author | Mazharul Islam Leon |
requires_python | >=3.7 |
license | MIT License Copyright (c) 2024 Prince Leo Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
ocr
computer vision
document extraction
ollama
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# llama_ocr
A Python package for document text extraction using Ollama Vision models, with a focus on OCR (Optical Character Recognition) capabilities.
## Features
- 🔍 Text extraction from images using Ollama Vision models
- 🖼️ Advanced image preprocessing for better OCR results
- 🛠️ Configurable settings and parameters
- 🔧 Extensible architecture with dependency injection
- 📝 Comprehensive logging and error handling
## Installation
```bash
pip install llama_ocr
```
Or install from source:
```bash
git clone https://github.com/princexoleo/llama_ocr.git
cd llama_ocr
pip install -e .
```
## Quick Start
```python
from llama_ocr import DocumentExtractor
# Initialize extractor
extractor = DocumentExtractor()
# Extract text from an image
result = extractor.extract_from_image(
"path/to/image.jpg",
prompt="Extract all text from this document"
)
# Print extracted text
print(result["text"])
```
## Advanced Usage
### Custom Configuration
```python
from llama_ocr import DocumentExtractor, OCRConfig
config = OCRConfig(
model_name="llama3.2-vision",
preprocess_images=True,
optimize_for_ocr=True,
log_level="INFO"
)
extractor = DocumentExtractor(config=config)
```
### Image Processing Options
```python
# Extract with OCR optimization
result = extractor.extract_from_image(
"path/to/image.jpg",
save_processed=True # Save processed image for inspection
)
```
## Architecture
The package follows SOLID principles and uses dependency injection for flexibility:
- `DocumentExtractor`: Main class orchestrating the extraction process
- `VisionClient`: Abstract base class for vision model interactions
- `ImagePreprocessor`: Abstract base class for image processing
- `OCRConfig`: Configuration management using dataclasses
### Components
1. **Core Module**
- `extractor.py`: Main document extraction logic
- `vision_client.py`: Ollama Vision API integration
- `image_processor.py`: Image preprocessing utilities
- `base.py`: Abstract base classes and interfaces
2. **Configuration**
- `config.py`: Configuration management using dataclasses
## Dependencies
- ollama>=0.1.27
- Pillow>=10.1.0
- python-dotenv>=1.0.0
- opencv-python>=4.8.1.78
- numpy>=1.21.0
## Development
### Setting up Development Environment
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
# Install development dependencies
pip install -e ".[dev]"
```
### Running Tests
```bash
python -m unittest discover -s tests
```
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Thanks to the Ollama team for providing the vision models
- Built with ❤️ by Mazharul Islam Leon
Raw data
{
"_id": null,
"home_page": "https://github.com/princexoleo/llama_ocr_py",
"name": "llama-ocr-py",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "ocr, computer vision, document extraction, ollama",
"author": "Mazharul Islam Leon",
"author_email": "Mazharul Islam Leon <xtleon.xl@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8b/d8/6127f444ca4c9759bb8ed2bcb44f869cf017f65b7e00614342ff9bae0a09/llama_ocr_py-0.1.0.tar.gz",
"platform": null,
"description": "# llama_ocr\r\n\r\nA Python package for document text extraction using Ollama Vision models, with a focus on OCR (Optical Character Recognition) capabilities.\r\n\r\n## Features\r\n\r\n- \ud83d\udd0d Text extraction from images using Ollama Vision models\r\n- \ud83d\uddbc\ufe0f Advanced image preprocessing for better OCR results\r\n- \ud83d\udee0\ufe0f Configurable settings and parameters\r\n- \ud83d\udd27 Extensible architecture with dependency injection\r\n- \ud83d\udcdd Comprehensive logging and error handling\r\n\r\n## Installation\r\n\r\n```bash\r\npip install llama_ocr\r\n```\r\n\r\nOr install from source:\r\n\r\n```bash\r\ngit clone https://github.com/princexoleo/llama_ocr.git\r\ncd llama_ocr\r\npip install -e .\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom llama_ocr import DocumentExtractor\r\n\r\n# Initialize extractor\r\nextractor = DocumentExtractor()\r\n\r\n# Extract text from an image\r\nresult = extractor.extract_from_image(\r\n \"path/to/image.jpg\",\r\n prompt=\"Extract all text from this document\"\r\n)\r\n\r\n# Print extracted text\r\nprint(result[\"text\"])\r\n```\r\n\r\n## Advanced Usage\r\n\r\n### Custom Configuration\r\n\r\n```python\r\nfrom llama_ocr import DocumentExtractor, OCRConfig\r\n\r\nconfig = OCRConfig(\r\n model_name=\"llama3.2-vision\",\r\n preprocess_images=True,\r\n optimize_for_ocr=True,\r\n log_level=\"INFO\"\r\n)\r\n\r\nextractor = DocumentExtractor(config=config)\r\n```\r\n\r\n### Image Processing Options\r\n\r\n```python\r\n# Extract with OCR optimization\r\nresult = extractor.extract_from_image(\r\n \"path/to/image.jpg\",\r\n save_processed=True # Save processed image for inspection\r\n)\r\n```\r\n\r\n## Architecture\r\n\r\nThe package follows SOLID principles and uses dependency injection for flexibility:\r\n\r\n- `DocumentExtractor`: Main class orchestrating the extraction process\r\n- `VisionClient`: Abstract base class for vision model interactions\r\n- `ImagePreprocessor`: Abstract base class for image processing\r\n- `OCRConfig`: Configuration management using dataclasses\r\n\r\n### Components\r\n\r\n1. **Core Module**\r\n - `extractor.py`: Main document extraction logic\r\n - `vision_client.py`: Ollama Vision API integration\r\n - `image_processor.py`: Image preprocessing utilities\r\n - `base.py`: Abstract base classes and interfaces\r\n\r\n2. **Configuration**\r\n - `config.py`: Configuration management using dataclasses\r\n\r\n## Dependencies\r\n\r\n- ollama>=0.1.27\r\n- Pillow>=10.1.0\r\n- python-dotenv>=1.0.0\r\n- opencv-python>=4.8.1.78\r\n- numpy>=1.21.0\r\n\r\n## Development\r\n\r\n### Setting up Development Environment\r\n\r\n```bash\r\n# Create virtual environment\r\npython -m venv venv\r\nsource venv/bin/activate # Linux/Mac\r\n# or\r\n.\\venv\\Scripts\\activate # Windows\r\n\r\n# Install development dependencies\r\npip install -e \".[dev]\"\r\n```\r\n\r\n### Running Tests\r\n\r\n```bash\r\npython -m unittest discover -s tests\r\n```\r\n\r\n## Contributing\r\n\r\n1. Fork the repository\r\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\r\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\r\n4. Push to the branch (`git push origin feature/amazing-feature`)\r\n5. Open a Pull Request\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Acknowledgments\r\n\r\n- Thanks to the Ollama team for providing the vision models\r\n- Built with \u2764\ufe0f by Mazharul Islam Leon\r\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2024 Prince Leo Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
"summary": "A package for document extraction using Ollama Vision model",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/princexoleo/llama_ocr"
},
"split_keywords": [
"ocr",
" computer vision",
" document extraction",
" ollama"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "36173b133acb394df9865654bfb026fe4bd85c1d6b9d71c41f006c7bd6c4078a",
"md5": "6f328abe96af8e6d4013b0438962458e",
"sha256": "7fec80f86967950c700330f7483a07463b4b69df2030cee245ce9655c53264e9"
},
"downloads": -1,
"filename": "llama_ocr_py-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f328abe96af8e6d4013b0438962458e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 13048,
"upload_time": "2024-12-02T05:14:23",
"upload_time_iso_8601": "2024-12-02T05:14:23.107521Z",
"url": "https://files.pythonhosted.org/packages/36/17/3b133acb394df9865654bfb026fe4bd85c1d6b9d71c41f006c7bd6c4078a/llama_ocr_py-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8bd86127f444ca4c9759bb8ed2bcb44f869cf017f65b7e00614342ff9bae0a09",
"md5": "19513278ed41ae63896919adcd53abf5",
"sha256": "3f75195157565e7f557f2e0bb293e0b64f21fd310d21c7f4a40e2065a4c1a161"
},
"downloads": -1,
"filename": "llama_ocr_py-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "19513278ed41ae63896919adcd53abf5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 11711,
"upload_time": "2024-12-02T05:14:24",
"upload_time_iso_8601": "2024-12-02T05:14:24.402889Z",
"url": "https://files.pythonhosted.org/packages/8b/d8/6127f444ca4c9759bb8ed2bcb44f869cf017f65b7e00614342ff9bae0a09/llama_ocr_py-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-02 05:14:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "princexoleo",
"github_project": "llama_ocr_py",
"github_not_found": true,
"lcname": "llama-ocr-py"
}