# Vlense
A Python package to extract text from images and PDFs using Vision Language Models (VLM).
## Features
- Extract text from images and PDFs
- Supports JSON, HTML, and Markdown formats
- Easy integration with Vision Language Models
- Asynchronous processing with batch support
- Custom JSON schema for structured output
## Installation
```bash
pip install vlense
```
## Usage
```python
import os
import asyncio
from vlense import Vlense
from pydantic import BaseModel
path = ["./images/image1.jpg", "test.pdf"]
output_dir = "./output"
model = "gemini/gemini-1.5-flash"
temp_dir = "./temp_images"
os.environ["GEMINI_API_KEY"] = "YOUR_API_KEY"
async def main():
vlense = Vlense()
responses = await vlense.ocr(
file_path=path,
model=model,
output_dir=output_dir,
temp_dir=temp_dir,
batch_size=3,
clean_temp_files=False,
)
if __name__ == "__main__":
asyncio.run(main())
```
## API
### Vlense.ocr()
Performs OCR on the provided files.
**Parameters:**
- file_path : (Union[str, List[str]]): Path or list of paths to PDF/image files.
- model : (str, optional): Model name for generating completions. Defaults to `"gemini-1.5-flash"`.
- output_dir : (Optional[str], optional): Directory to save output. Defaults to `None`.
- temp_dir : (Optional[str], optional): Directory for temporary files. Defaults to system temp.
- batch_size : (int, optional): Number of concurrent processes. Defaults to `3`.
- format : (str, optional): Output format (`'markdown'`, `'html'`, `'json'`). Defaults to `'markdown'`.
- json_schema : (Optional[Type[BaseModel]], optional): Pydantic model for JSON output. Required if format is `'json'`.
- clean_temp_files : (Optional[bool], optional): Cleanup temporary files after processing. Defaults to `True`.
**Returns:**
- Dict[str, VlenseResponse] : Generated content.
## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## License
This project is licensed under the MIT License. See the LICENSE file for details.
## Contact
Author: Aditya Miskin
Email: [adityamiskin98@gmail.com](mailto:adityamiskin98@gmail.com)
Repository: [https://github.com/adityamiskin/vlense](https://github.com/adityamiskin/vlense)
Raw data
{
"_id": null,
"home_page": null,
"name": "vlense",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "vision-language-model, ocr, text-extraction, pdf-processing, image-processing",
"author": null,
"author_email": "Aditya Miskin <adityamiskin98@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/18/05/eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a/vlense-0.1.4.tar.gz",
"platform": null,
"description": "# Vlense\n\nA Python package to extract text from images and PDFs using Vision Language Models (VLM).\n\n## Features\n\n- Extract text from images and PDFs\n- Supports JSON, HTML, and Markdown formats\n- Easy integration with Vision Language Models\n- Asynchronous processing with batch support\n- Custom JSON schema for structured output\n\n## Installation\n\n```bash\npip install vlense\n```\n\n## Usage\n\n```python\nimport os\nimport asyncio\nfrom vlense import Vlense\nfrom pydantic import BaseModel\n\npath = [\"./images/image1.jpg\", \"test.pdf\"]\noutput_dir = \"./output\"\nmodel = \"gemini/gemini-1.5-flash\"\ntemp_dir = \"./temp_images\"\nos.environ[\"GEMINI_API_KEY\"] = \"YOUR_API_KEY\"\n\n\nasync def main():\n vlense = Vlense()\n responses = await vlense.ocr(\n file_path=path,\n model=model,\n output_dir=output_dir,\n temp_dir=temp_dir,\n batch_size=3,\n clean_temp_files=False,\n )\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## API\n\n### Vlense.ocr()\n\nPerforms OCR on the provided files.\n\n**Parameters:**\n\n- file_path : (Union[str, List[str]]): Path or list of paths to PDF/image files.\n\n- model : (str, optional): Model name for generating completions. Defaults to `\"gemini-1.5-flash\"`.\n\n- output_dir : (Optional[str], optional): Directory to save output. Defaults to `None`.\n\n- temp_dir : (Optional[str], optional): Directory for temporary files. Defaults to system temp.\n\n- batch_size : (int, optional): Number of concurrent processes. Defaults to `3`.\n\n- format : (str, optional): Output format (`'markdown'`, `'html'`, `'json'`). Defaults to `'markdown'`.\n\n- json_schema : (Optional[Type[BaseModel]], optional): Pydantic model for JSON output. Required if format is `'json'`.\n\n- clean_temp_files : (Optional[bool], optional): Cleanup temporary files after processing. Defaults to `True`.\n\n**Returns:**\n\n- Dict[str, VlenseResponse] : Generated content.\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## Contact\n\nAuthor: Aditya Miskin \nEmail: [adityamiskin98@gmail.com](mailto:adityamiskin98@gmail.com) \nRepository: [https://github.com/adityamiskin/vlense](https://github.com/adityamiskin/vlense)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package to extract text from images and PDFs using Vision Language Model (VLM).",
"version": "0.1.4",
"project_urls": {
"Bug Tracker": "https://github.com/adityamiskin/vlense/issues",
"Homepage": "https://github.com/adityamiskin/vlense",
"Repository": "https://github.com/adityamiskin/vlense.git"
},
"split_keywords": [
"vision-language-model",
" ocr",
" text-extraction",
" pdf-processing",
" image-processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5c3da9bcba69ba9809921fcff2085b85fa6a44a32670d89c91e737444e39e179",
"md5": "b9fac5b165101fd4555ec60ad9979b50",
"sha256": "2327dd1b94966c4936eb04255a7c161adec1f293773c40b330cc7a8a1a942124"
},
"downloads": -1,
"filename": "vlense-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b9fac5b165101fd4555ec60ad9979b50",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 15088,
"upload_time": "2024-11-06T10:51:14",
"upload_time_iso_8601": "2024-11-06T10:51:14.689778Z",
"url": "https://files.pythonhosted.org/packages/5c/3d/a9bcba69ba9809921fcff2085b85fa6a44a32670d89c91e737444e39e179/vlense-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1805eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a",
"md5": "b15860dcdda8f99549deb662e4c52fd2",
"sha256": "a82eef08bd1769aa1c330310aaff7cbcdcb056511a2b7ed56fc0dd8585252707"
},
"downloads": -1,
"filename": "vlense-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "b15860dcdda8f99549deb662e4c52fd2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11984,
"upload_time": "2024-11-06T10:51:15",
"upload_time_iso_8601": "2024-11-06T10:51:15.660502Z",
"url": "https://files.pythonhosted.org/packages/18/05/eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a/vlense-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-06 10:51:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adityamiskin",
"github_project": "vlense",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vlense"
}