Name | pdf-llm-tools JSON |
Version |
0.0.4
JSON |
| download |
home_page | None |
Summary | A family of LLM-enhanced PDF utilities |
upload_time | 2024-11-05 02:58:24 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
llm
pdf
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pdf-llm-tools [![PyPI](https://img.shields.io/pypi/v/pdf-llm-tools)](https://pypi.org/project/pdf-llm-tools/)
`pdf-llm-tools` is a family of AI PDF utilities:
- `pdfllm-titler` renames a PDF with metadata parsed from the filename and
contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.
- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected
contents table of the PDF.
We currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
`gpt-4o-mini` is hardcoded as the LLM backend. The program requires an OpenAI
API key via option, envvar, or manual input.
## Installation
```
pip install pdf-llm-tools
```
## Usage
These utilities require all PDFs to have a correct OCR layer. Run something like
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.
### titler
```
pdfllm titler a.pdf b.pdf c.pdf
pdfllm titler --last-page 8 d.pdf
```
See `--help` for full details.
## Development
This project is made with [Hatch](https://hatch.pypa.io/dev/).
- Build: `hatch build`
- Test: `hatch run test:test_all [--openai-api-key KEY]`
- The test system has the same API key handling as the main progam. The key
must be given either as an option in the `hatch run` invocation (which takes
precedence) or as the envvar `OPENAI_API_KEY`.
Raw data
{
"_id": null,
"home_page": null,
"name": "pdf-llm-tools",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, pdf",
"author": null,
"author_email": "Jacob Fong <jacobcfong@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b5/0b/3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab/pdf_llm_tools-0.0.4.tar.gz",
"platform": null,
"description": "# pdf-llm-tools [![PyPI](https://img.shields.io/pypi/v/pdf-llm-tools)](https://pypi.org/project/pdf-llm-tools/)\n\n`pdf-llm-tools` is a family of AI PDF utilities:\n\n- `pdfllm-titler` renames a PDF with metadata parsed from the filename and\n contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.\n- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected\n contents table of the PDF.\n\nWe currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for\nlayout-preserving text extraction and PyMuPDF to update outlines. OpenAI's\n`gpt-4o-mini` is hardcoded as the LLM backend. The program requires an OpenAI\nAPI key via option, envvar, or manual input.\n\n## Installation\n\n```\npip install pdf-llm-tools\n```\n\n## Usage\n\nThese utilities require all PDFs to have a correct OCR layer. Run something like\n[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.\n\n### titler\n\n```\npdfllm titler a.pdf b.pdf c.pdf\npdfllm titler --last-page 8 d.pdf\n```\n\nSee `--help` for full details.\n\n## Development\n\nThis project is made with [Hatch](https://hatch.pypa.io/dev/).\n\n- Build: `hatch build`\n- Test: `hatch run test:test_all [--openai-api-key KEY]`\n - The test system has the same API key handling as the main progam. The key\n must be given either as an option in the `hatch run` invocation (which takes\n precedence) or as the envvar `OPENAI_API_KEY`.\n",
"bugtrack_url": null,
"license": null,
"summary": "A family of LLM-enhanced PDF utilities",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/jcfk/pdf-llm-tools",
"Repository": "https://github.com/jcfk/pdf-llm-tools"
},
"split_keywords": [
"llm",
" pdf"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b13d3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730",
"md5": "ee59127cd7ede3ea1d32168070c783eb",
"sha256": "2cc6e7d2c3c1fb9efbd2b881dc32cdf91096053c888eb87c2248f818247ee386"
},
"downloads": -1,
"filename": "pdf_llm_tools-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ee59127cd7ede3ea1d32168070c783eb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 7859,
"upload_time": "2024-11-05T02:58:22",
"upload_time_iso_8601": "2024-11-05T02:58:22.209455Z",
"url": "https://files.pythonhosted.org/packages/b1/3d/3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730/pdf_llm_tools-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b50b3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab",
"md5": "6fef27511a86b5b4be5605af0424d609",
"sha256": "ea04394f65ef33976f601b44903b12229b1df90324aeef9f4623afd644f51e87"
},
"downloads": -1,
"filename": "pdf_llm_tools-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "6fef27511a86b5b4be5605af0424d609",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 294555,
"upload_time": "2024-11-05T02:58:24",
"upload_time_iso_8601": "2024-11-05T02:58:24.796317Z",
"url": "https://files.pythonhosted.org/packages/b5/0b/3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab/pdf_llm_tools-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 02:58:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jcfk",
"github_project": "pdf-llm-tools",
"github_fetch_exception": true,
"lcname": "pdf-llm-tools"
}