pdf-llm-tools


Namepdf-llm-tools JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryA family of LLM-enhanced PDF utilities
upload_time2024-07-25 01:10:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords llm pdf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pdf-llm-tools

`pdf-llm-tools` is a family of AI pdf utilities:

- `pdfllm-titler` renames a pdf with metadata parsed from the filename and
  contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.
- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected
  contents table of the pdf.

We currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
`gpt-3.5-turbo-1106` is hardcoded as the LLM backend. The program requires an
OpenAI API key via option, envvar, or manual input.

## Installation

```
pip install pdf-llm-tools
```

## Usage

These utilities require all PDFs to have a correct OCR layer. Run something like
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.

### pdfllm-titler

```
pdfllm-titler a.pdf b.pdf c.pdf
pdfllm-titler --last-page 8 d.pdf
```

See `--help` for full details.

## Development

This project is made with [Hatch](https://hatch.pypa.io/dev/).

- Build: `hatch build`
- Test: `hatch run test:test`

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdf-llm-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, pdf",
    "author": null,
    "author_email": "Jacob Fong <jacobcfong@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e0/18/d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e/pdf_llm_tools-0.0.3.tar.gz",
    "platform": null,
    "description": "# pdf-llm-tools\n\n`pdf-llm-tools` is a family of AI pdf utilities:\n\n- `pdfllm-titler` renames a pdf with metadata parsed from the filename and\n  contents. In particular it renames it as `YEAR-AUTHOR-TITLE.pdf`.\n- (todo) `pdfllm-toccer` adds a bookmark structure parsed from the detected\n  contents table of the pdf.\n\nWe currently use poppler/[pdftotext](https://github.com/jalan/pdftotext) for\nlayout-preserving text extraction and PyMuPDF to update outlines. OpenAI's\n`gpt-3.5-turbo-1106` is hardcoded as the LLM backend. The program requires an\nOpenAI API key via option, envvar, or manual input.\n\n## Installation\n\n```\npip install pdf-llm-tools\n```\n\n## Usage\n\nThese utilities require all PDFs to have a correct OCR layer. Run something like\n[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) if needed.\n\n### pdfllm-titler\n\n```\npdfllm-titler a.pdf b.pdf c.pdf\npdfllm-titler --last-page 8 d.pdf\n```\n\nSee `--help` for full details.\n\n## Development\n\nThis project is made with [Hatch](https://hatch.pypa.io/dev/).\n\n- Build: `hatch build`\n- Test: `hatch run test:test`\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A family of LLM-enhanced PDF utilities",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/jcfk/pdf-llm-tools",
        "Repository": "https://github.com/jcfk/pdf-llm-tools"
    },
    "split_keywords": [
        "llm",
        " pdf"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9a23466c958d468167268f705cdeefb9738566e0e85fb61ed4c98937568c0c5e",
                "md5": "1450e3555bf84c4147ffb96dd0ebed8a",
                "sha256": "180e39bdbb4286667b37734c12b6da09db21a235dae9c7a83a9c2305eeb68b53"
            },
            "downloads": -1,
            "filename": "pdf_llm_tools-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1450e3555bf84c4147ffb96dd0ebed8a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 5829,
            "upload_time": "2024-07-25T01:10:23",
            "upload_time_iso_8601": "2024-07-25T01:10:23.634576Z",
            "url": "https://files.pythonhosted.org/packages/9a/23/466c958d468167268f705cdeefb9738566e0e85fb61ed4c98937568c0c5e/pdf_llm_tools-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e018d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e",
                "md5": "2482c0fcdc945e4208a630a8f837ca52",
                "sha256": "d2fe6b79a015c2403428b53a1667dd2e715e70688dbe2df5c3440ec0cdf85a47"
            },
            "downloads": -1,
            "filename": "pdf_llm_tools-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "2482c0fcdc945e4208a630a8f837ca52",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 292409,
            "upload_time": "2024-07-25T01:10:25",
            "upload_time_iso_8601": "2024-07-25T01:10:25.566604Z",
            "url": "https://files.pythonhosted.org/packages/e0/18/d8a94585cd5a70b2120d832f18283fe578d1184af3f15a3c992b4eca328e/pdf_llm_tools-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-25 01:10:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jcfk",
    "github_project": "pdf-llm-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pdf-llm-tools"
}
        
Elapsed time: 0.59675s