pdf-scout

Name	pdf-scout JSON
Version	0.0.6 JSON
	download
home_page	https://github.com/hueyy/pdf_scout
Summary	automatically create bookmarks in a PDF file
upload_time	2023-01-03 07:00:41
maintainer
docs_url	None
author	Huey
requires_python	>=3.10,<4.0
license	EUPL-1.2
keywords	pdf bookmark outline
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pdf_scout


![PyPI](https://img.shields.io/pypi/v/pdf_scout)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pdf_scout)
![PyPI - License](https://img.shields.io/pypi/l/pdf_scout)

This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.

You can install it globally via pip:

```
pip install --user pdf_scout
pdf_scout ./my_document.pdf

pip uninstall pdf_scout
```

![screenshot](./assets/screenshot.png)

This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:

* Single column of text (not multiple columns)
* Font size of header text > font size of body text
* Header text is justified or left-aligned
* Paragraph spacing for headers > body text paragraph spacing
* Consistent left margins on every page

## Supported document types

`pdf_scout` has been tested on and expressly supports the following classes of documents:

- Singapore State Court and Supreme Court Judgments (unreported)
- Singapore Law Reports
- [OpenDoc](https://www.opendoc.gov.sg/)-generated PDFs, such as the [State Court Practice Directions 2021](https://epd-statecourts-2021.opendoc.gov.sg/) and the [Supreme Court Practice Directions 2021](https://epd-supcourt-2021.opendoc.gov.sg/)

It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.

## Development

This project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:

```bash
poetry install
```

To open a virtualenv in the project folder with the dependencies, run:

```bash
poetry shell
```

To run a script directly, run:

```bash
poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>
```

### Tests

There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the `/pdf` folder manually using the relevant sources (you may want to consider using [Clerkent](https://clerkent.huey.xyz) to download the unreported versions of judgments):

```bash
poetry run pytest
poetry run pytest --snapshot-update
```

### Static type-checking

```bash
poetry run mypy pdf_scout/app.py
```

### Tips

- Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hueyy/pdf_scout",
    "name": "pdf-scout",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10,<4.0",
    "maintainer_email": "",
    "keywords": "pdf,bookmark,outline",
    "author": "Huey",
    "author_email": "hello@huey.xyz",
    "download_url": "https://files.pythonhosted.org/packages/7a/08/352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0/pdf_scout-0.0.6.tar.gz",
    "platform": null,
    "description": "# pdf_scout\n\n\n![PyPI](https://img.shields.io/pypi/v/pdf_scout)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pdf_scout)\n![PyPI - License](https://img.shields.io/pypi/l/pdf_scout)\n\nThis CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.\n\nYou can install it globally via pip:\n\n```\npip install --user pdf_scout\npdf_scout ./my_document.pdf\n\npip uninstall pdf_scout\n```\n\n![screenshot](./assets/screenshot.png)\n\nThis project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:\n\n* Single column of text (not multiple columns)\n* Font size of header text > font size of body text\n* Header text is justified or left-aligned\n* Paragraph spacing for headers > body text paragraph spacing\n* Consistent left margins on every page\n\n## Supported document types\n\n`pdf_scout` has been tested on and expressly supports the following classes of documents:\n\n- Singapore State Court and Supreme Court Judgments (unreported)\n- Singapore Law Reports\n- [OpenDoc](https://www.opendoc.gov.sg/)-generated PDFs, such as the [State Court Practice Directions 2021](https://epd-statecourts-2021.opendoc.gov.sg/) and the [Supreme Court Practice Directions 2021](https://epd-supcourt-2021.opendoc.gov.sg/)\n\nIt may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.\n\n## Development\n\nThis project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:\n\n```bash\npoetry install\n```\n\nTo open a virtualenv in the project folder with the dependencies, run:\n\n```bash\npoetry shell\n```\n\nTo run a script directly, run:\n\n```bash\npoetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>\n```\n\n### Tests\n\nThere are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the `/pdf` folder manually using the relevant sources (you may want to consider using [Clerkent](https://clerkent.huey.xyz) to download the unreported versions of judgments):\n\n```bash\npoetry run pytest\npoetry run pytest --snapshot-update\n```\n\n### Static type-checking\n\n```bash\npoetry run mypy pdf_scout/app.py\n```\n\n### Tips\n\n- Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file\n\n",
    "bugtrack_url": null,
    "license": "EUPL-1.2",
    "summary": "automatically create bookmarks in a PDF file",
    "version": "0.0.6",
    "split_keywords": [
        "pdf",
        "bookmark",
        "outline"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8aee8e0b7ec8cce767959b283f60ab98047bd9932a6331aafd74b36e64407fe0",
                "md5": "c3c66cb5c05237ac12b5927075eabab5",
                "sha256": "5a21a3ddca70215016f0c5cf1f670e3913ba5fbc92552ed913f14b20853f706d"
            },
            "downloads": -1,
            "filename": "pdf_scout-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c3c66cb5c05237ac12b5927075eabab5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<4.0",
            "size": 11665,
            "upload_time": "2023-01-03T07:00:40",
            "upload_time_iso_8601": "2023-01-03T07:00:40.273865Z",
            "url": "https://files.pythonhosted.org/packages/8a/ee/8e0b7ec8cce767959b283f60ab98047bd9932a6331aafd74b36e64407fe0/pdf_scout-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a08352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0",
                "md5": "e543ddfa6f4b39997d88e011aeba88ce",
                "sha256": "87908911f26ca52c3e030d4c76c3c4273d0ed51c01d1b4271af4d302a917f331"
            },
            "downloads": -1,
            "filename": "pdf_scout-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "e543ddfa6f4b39997d88e011aeba88ce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<4.0",
            "size": 10473,
            "upload_time": "2023-01-03T07:00:41",
            "upload_time_iso_8601": "2023-01-03T07:00:41.538647Z",
            "url": "https://files.pythonhosted.org/packages/7a/08/352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0/pdf_scout-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-03 07:00:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hueyy",
    "github_project": "pdf_scout",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pdf-scout"
}

Huey