Name | pdf-llm-tools JSON |
Version |
0.0.3
JSON |
| download |
home_page | None |
Summary | A family of LLM-enhanced PDF utilities |
upload_time | 2024-07-25 01:10:25 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
llm
pdf
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pdf-llm-tools
`pdf-llm-tools` is a family of AI pdf utilities:
- `pdfllm-titler` renames a pdf with metadata parsed from the filename and
contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.
- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected
contents table of the pdf.
We currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
`gpt-3.5-turbo-1106` is hardcoded as the LLM backend. The program requires an
OpenAI API key via option, envvar, or manual input.
## Installation
```
pip install pdf-llm-tools
```
## Usage
These utilities require all PDFs to have a correct OCR layer. Run something like
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.
### pdfllm-titler
```
pdfllm-titler a.pdf b.pdf c.pdf
pdfllm-titler --last-page 8 d.pdf
```
See `--help` for full details.
## Development
This project is made with [Hatch](https://hatch.pypa.io/dev/).
- Build: `hatch build`
- Test: `hatch run test:test`
Raw data
{
"_id": null,
"home_page": null,
"name": "pdf-llm-tools",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, pdf",
"author": null,
"author_email": "Jacob Fong <jacobcfong@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e0/18/d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e/pdf_llm_tools-0.0.3.tar.gz",
"platform": null,
"description": "# pdf-llm-tools\n\n`pdf-llm-tools` is a family of AI pdf utilities:\n\n- `pdfllm-titler` renames a pdf with metadata parsed from the filename and\n contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.\n- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected\n contents table of the pdf.\n\nWe currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for\nlayout-preserving text extraction and PyMuPDF to update outlines. OpenAI's\n`gpt-3.5-turbo-1106` is hardcoded as the LLM backend. The program requires an\nOpenAI API key via option, envvar, or manual input.\n\n## Installation\n\n```\npip install pdf-llm-tools\n```\n\n## Usage\n\nThese utilities require all PDFs to have a correct OCR layer. Run something like\n[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.\n\n### pdfllm-titler\n\n```\npdfllm-titler a.pdf b.pdf c.pdf\npdfllm-titler --last-page 8 d.pdf\n```\n\nSee `--help` for full details.\n\n## Development\n\nThis project is made with [Hatch](https://hatch.pypa.io/dev/).\n\n- Build: `hatch build`\n- Test: `hatch run test:test`\n",
"bugtrack_url": null,
"license": null,
"summary": "A family of LLM-enhanced PDF utilities",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/jcfk/pdf-llm-tools",
"Repository": "https://github.com/jcfk/pdf-llm-tools"
},
"split_keywords": [
"llm",
" pdf"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9a23466c958d468167268f705cdeefb9738566e0e85fb61ed4c98937568c0c5e",
"md5": "1450e3555bf84c4147ffb96dd0ebed8a",
"sha256": "180e39bdbb4286667b37734c12b6da09db21a235dae9c7a83a9c2305eeb68b53"
},
"downloads": -1,
"filename": "pdf_llm_tools-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1450e3555bf84c4147ffb96dd0ebed8a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5829,
"upload_time": "2024-07-25T01:10:23",
"upload_time_iso_8601": "2024-07-25T01:10:23.634576Z",
"url": "https://files.pythonhosted.org/packages/9a/23/466c958d468167268f705cdeefb9738566e0e85fb61ed4c98937568c0c5e/pdf_llm_tools-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e018d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e",
"md5": "2482c0fcdc945e4208a630a8f837ca52",
"sha256": "d2fe6b79a015c2403428b53a1667dd2e715e70688dbe2df5c3440ec0cdf85a47"
},
"downloads": -1,
"filename": "pdf_llm_tools-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "2482c0fcdc945e4208a630a8f837ca52",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 292409,
"upload_time": "2024-07-25T01:10:25",
"upload_time_iso_8601": "2024-07-25T01:10:25.566604Z",
"url": "https://files.pythonhosted.org/packages/e0/18/d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e/pdf_llm_tools-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-25 01:10:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jcfk",
"github_project": "pdf-llm-tools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pdf-llm-tools"
}