vlense


Namevlense JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryA Python package to extract text from images and PDFs using Vision Language Model (VLM).
upload_time2024-11-06 10:51:15
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords vision-language-model ocr text-extraction pdf-processing image-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Vlense

A Python package to extract text from images and PDFs using Vision Language Models (VLM).

## Features

- Extract text from images and PDFs
- Supports JSON, HTML, and Markdown formats
- Easy integration with Vision Language Models
- Asynchronous processing with batch support
- Custom JSON schema for structured output

## Installation

```bash
pip install vlense
```

## Usage

```python
import os
import asyncio
from vlense import Vlense
from pydantic import BaseModel

path = ["./images/image1.jpg", "test.pdf"]
output_dir = "./output"
model = "gemini/gemini-1.5-flash"
temp_dir = "./temp_images"
os.environ["GEMINI_API_KEY"] = "YOUR_API_KEY"


async def main():
    vlense = Vlense()
    responses = await vlense.ocr(
        file_path=path,
        model=model,
        output_dir=output_dir,
        temp_dir=temp_dir,
        batch_size=3,
        clean_temp_files=False,
    )

if __name__ == "__main__":
    asyncio.run(main())
```

## API

### Vlense.ocr()

Performs OCR on the provided files.

**Parameters:**

- file_path : (Union[str, List[str]]): Path or list of paths to PDF/image files.

- model : (str, optional): Model name for generating completions. Defaults to `"gemini-1.5-flash"`.

- output_dir : (Optional[str], optional): Directory to save output. Defaults to `None`.

- temp_dir : (Optional[str], optional): Directory for temporary files. Defaults to system temp.

- batch_size : (int, optional): Number of concurrent processes. Defaults to `3`.

- format : (str, optional): Output format (`'markdown'`, `'html'`, `'json'`). Defaults to `'markdown'`.

- json_schema : (Optional[Type[BaseModel]], optional): Pydantic model for JSON output. Required if format is `'json'`.

- clean_temp_files : (Optional[bool], optional): Cleanup temporary files after processing. Defaults to `True`.

**Returns:**

- Dict[str, VlenseResponse] : Generated content.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request.

## License

This project is licensed under the MIT License. See the LICENSE file for details.

## Contact

Author: Aditya Miskin  
Email: [adityamiskin98@gmail.com](mailto:adityamiskin98@gmail.com)  
Repository: [https://github.com/adityamiskin/vlense](https://github.com/adityamiskin/vlense)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vlense",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "vision-language-model, ocr, text-extraction, pdf-processing, image-processing",
    "author": null,
    "author_email": "Aditya Miskin <adityamiskin98@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/18/05/eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a/vlense-0.1.4.tar.gz",
    "platform": null,
    "description": "# Vlense\n\nA Python package to extract text from images and PDFs using Vision Language Models (VLM).\n\n## Features\n\n- Extract text from images and PDFs\n- Supports JSON, HTML, and Markdown formats\n- Easy integration with Vision Language Models\n- Asynchronous processing with batch support\n- Custom JSON schema for structured output\n\n## Installation\n\n```bash\npip install vlense\n```\n\n## Usage\n\n```python\nimport os\nimport asyncio\nfrom vlense import Vlense\nfrom pydantic import BaseModel\n\npath = [\"./images/image1.jpg\", \"test.pdf\"]\noutput_dir = \"./output\"\nmodel = \"gemini/gemini-1.5-flash\"\ntemp_dir = \"./temp_images\"\nos.environ[\"GEMINI_API_KEY\"] = \"YOUR_API_KEY\"\n\n\nasync def main():\n    vlense = Vlense()\n    responses = await vlense.ocr(\n        file_path=path,\n        model=model,\n        output_dir=output_dir,\n        temp_dir=temp_dir,\n        batch_size=3,\n        clean_temp_files=False,\n    )\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n## API\n\n### Vlense.ocr()\n\nPerforms OCR on the provided files.\n\n**Parameters:**\n\n- file_path : (Union[str, List[str]]): Path or list of paths to PDF/image files.\n\n- model : (str, optional): Model name for generating completions. Defaults to `\"gemini-1.5-flash\"`.\n\n- output_dir : (Optional[str], optional): Directory to save output. Defaults to `None`.\n\n- temp_dir : (Optional[str], optional): Directory for temporary files. Defaults to system temp.\n\n- batch_size : (int, optional): Number of concurrent processes. Defaults to `3`.\n\n- format : (str, optional): Output format (`'markdown'`, `'html'`, `'json'`). Defaults to `'markdown'`.\n\n- json_schema : (Optional[Type[BaseModel]], optional): Pydantic model for JSON output. Required if format is `'json'`.\n\n- clean_temp_files : (Optional[bool], optional): Cleanup temporary files after processing. Defaults to `True`.\n\n**Returns:**\n\n- Dict[str, VlenseResponse] : Generated content.\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## Contact\n\nAuthor: Aditya Miskin  \nEmail: [adityamiskin98@gmail.com](mailto:adityamiskin98@gmail.com)  \nRepository: [https://github.com/adityamiskin/vlense](https://github.com/adityamiskin/vlense)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package to extract text from images and PDFs using Vision Language Model (VLM).",
    "version": "0.1.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/adityamiskin/vlense/issues",
        "Homepage": "https://github.com/adityamiskin/vlense",
        "Repository": "https://github.com/adityamiskin/vlense.git"
    },
    "split_keywords": [
        "vision-language-model",
        " ocr",
        " text-extraction",
        " pdf-processing",
        " image-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c3da9bcba69ba9809921fcff2085b85fa6a44a32670d89c91e737444e39e179",
                "md5": "b9fac5b165101fd4555ec60ad9979b50",
                "sha256": "2327dd1b94966c4936eb04255a7c161adec1f293773c40b330cc7a8a1a942124"
            },
            "downloads": -1,
            "filename": "vlense-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b9fac5b165101fd4555ec60ad9979b50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 15088,
            "upload_time": "2024-11-06T10:51:14",
            "upload_time_iso_8601": "2024-11-06T10:51:14.689778Z",
            "url": "https://files.pythonhosted.org/packages/5c/3d/a9bcba69ba9809921fcff2085b85fa6a44a32670d89c91e737444e39e179/vlense-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1805eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a",
                "md5": "b15860dcdda8f99549deb662e4c52fd2",
                "sha256": "a82eef08bd1769aa1c330310aaff7cbcdcb056511a2b7ed56fc0dd8585252707"
            },
            "downloads": -1,
            "filename": "vlense-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b15860dcdda8f99549deb662e4c52fd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 11984,
            "upload_time": "2024-11-06T10:51:15",
            "upload_time_iso_8601": "2024-11-06T10:51:15.660502Z",
            "url": "https://files.pythonhosted.org/packages/18/05/eda5e4d3eb1619b2257602ef2a6b7532c8c4ef2327307295d61df27e522a/vlense-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-06 10:51:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adityamiskin",
    "github_project": "vlense",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vlense"
}
        
Elapsed time: 0.42661s