pdf-llm-tools


Namepdf-llm-tools JSON
Version 0.0.4 PyPI version JSON
download
home_pageNone
SummaryA family of LLM-enhanced PDF utilities
upload_time2024-11-05 02:58:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords llm pdf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pdf-llm-tools [![PyPI](https://img.shields.io/pypi/v/pdf-llm-tools)](https://pypi.org/project/pdf-llm-tools/)

`pdf-llm-tools` is a family of AI PDF utilities:

- `pdfllm-titler` renames a PDF with metadata parsed from the filename and
  contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.
- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected
  contents table of the PDF.

We currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
`gpt-4o-mini` is hardcoded as the LLM backend. The program requires an OpenAI
API key via option, envvar, or manual input.

## Installation

```
pip install pdf-llm-tools
```

## Usage

These utilities require all PDFs to have a correct OCR layer. Run something like
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.

### titler

```
pdfllm titler a.pdf b.pdf c.pdf
pdfllm titler --last-page 8 d.pdf
```

See `--help` for full details.

## Development

This project is made with [Hatch](https://hatch.pypa.io/dev/).

- Build: `hatch build`
- Test: `hatch run test:test_all [--openai-api-key KEY]`
  - The test system has the same API key handling as the main progam. The key
    must be given either as an option in the `hatch run` invocation (which takes
    precedence) or as the envvar `OPENAI_API_KEY`.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdf-llm-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, pdf",
    "author": null,
    "author_email": "Jacob Fong <jacobcfong@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b5/0b/3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab/pdf_llm_tools-0.0.4.tar.gz",
    "platform": null,
    "description": "# pdf-llm-tools [![PyPI](https://img.shields.io/pypi/v/pdf-llm-tools)](https://pypi.org/project/pdf-llm-tools/)\n\n`pdf-llm-tools` is a family of AI PDF utilities:\n\n- `pdfllm-titler` renames a PDF with metadata parsed from the filename and\n  contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.\n- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected\n  contents table of the PDF.\n\nWe currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for\nlayout-preserving text extraction and PyMuPDF to update outlines. OpenAI's\n`gpt-4o-mini` is hardcoded as the LLM backend. The program requires an OpenAI\nAPI key via option, envvar, or manual input.\n\n## Installation\n\n```\npip install pdf-llm-tools\n```\n\n## Usage\n\nThese utilities require all PDFs to have a correct OCR layer. Run something like\n[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.\n\n### titler\n\n```\npdfllm titler a.pdf b.pdf c.pdf\npdfllm titler --last-page 8 d.pdf\n```\n\nSee `--help` for full details.\n\n## Development\n\nThis project is made with [Hatch](https://hatch.pypa.io/dev/).\n\n- Build: `hatch build`\n- Test: `hatch run test:test_all [--openai-api-key KEY]`\n  - The test system has the same API key handling as the main progam. The key\n    must be given either as an option in the `hatch run` invocation (which takes\n    precedence) or as the envvar `OPENAI_API_KEY`.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A family of LLM-enhanced PDF utilities",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/jcfk/pdf-llm-tools",
        "Repository": "https://github.com/jcfk/pdf-llm-tools"
    },
    "split_keywords": [
        "llm",
        " pdf"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b13d3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730",
                "md5": "ee59127cd7ede3ea1d32168070c783eb",
                "sha256": "2cc6e7d2c3c1fb9efbd2b881dc32cdf91096053c888eb87c2248f818247ee386"
            },
            "downloads": -1,
            "filename": "pdf_llm_tools-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee59127cd7ede3ea1d32168070c783eb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7859,
            "upload_time": "2024-11-05T02:58:22",
            "upload_time_iso_8601": "2024-11-05T02:58:22.209455Z",
            "url": "https://files.pythonhosted.org/packages/b1/3d/3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730/pdf_llm_tools-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b50b3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab",
                "md5": "6fef27511a86b5b4be5605af0424d609",
                "sha256": "ea04394f65ef33976f601b44903b12229b1df90324aeef9f4623afd644f51e87"
            },
            "downloads": -1,
            "filename": "pdf_llm_tools-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "6fef27511a86b5b4be5605af0424d609",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 294555,
            "upload_time": "2024-11-05T02:58:24",
            "upload_time_iso_8601": "2024-11-05T02:58:24.796317Z",
            "url": "https://files.pythonhosted.org/packages/b5/0b/3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab/pdf_llm_tools-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-05 02:58:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jcfk",
    "github_project": "pdf-llm-tools",
    "github_fetch_exception": true,
    "lcname": "pdf-llm-tools"
}
        
Elapsed time: 0.33861s