# pdf_scout
![PyPI](https://img.shields.io/pypi/v/pdf_scout)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pdf_scout)
![PyPI - License](https://img.shields.io/pypi/l/pdf_scout)
This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.
You can install it globally via pip:
```
pip install --user pdf_scout
pdf_scout ./my_document.pdf
pip uninstall pdf_scout
```
![screenshot](./assets/screenshot.png)
This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:
* Single column of text (not multiple columns)
* Font size of header text > font size of body text
* Header text is justified or left-aligned
* Paragraph spacing for headers > body text paragraph spacing
* Consistent left margins on every page
## Supported document types
`pdf_scout` has been tested on and expressly supports the following classes of documents:
- Singapore State Court and Supreme Court Judgments (unreported)
- Singapore Law Reports
- [OpenDoc](https://www.opendoc.gov.sg/)-generated PDFs, such as the [State Court Practice Directions 2021](https://epd-statecourts-2021.opendoc.gov.sg/) and the [Supreme Court Practice Directions 2021](https://epd-supcourt-2021.opendoc.gov.sg/)
It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.
## Development
This project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:
```bash
poetry install
```
To open a virtualenv in the project folder with the dependencies, run:
```bash
poetry shell
```
To run a script directly, run:
```bash
poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>
```
### Tests
There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the `/pdf` folder manually using the relevant sources (you may want to consider using [Clerkent](https://clerkent.huey.xyz) to download the unreported versions of judgments):
```bash
poetry run pytest
poetry run pytest --snapshot-update
```
### Static type-checking
```bash
poetry run mypy pdf_scout/app.py
```
### Tips
- Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file
Raw data
{
"_id": null,
"home_page": "https://github.com/hueyy/pdf_scout",
"name": "pdf-scout",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "pdf,bookmark,outline",
"author": "Huey",
"author_email": "hello@huey.xyz",
"download_url": "https://files.pythonhosted.org/packages/7a/08/352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0/pdf_scout-0.0.6.tar.gz",
"platform": null,
"description": "# pdf_scout\n\n\n![PyPI](https://img.shields.io/pypi/v/pdf_scout)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pdf_scout)\n![PyPI - License](https://img.shields.io/pypi/l/pdf_scout)\n\nThis CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.\n\nYou can install it globally via pip:\n\n```\npip install --user pdf_scout\npdf_scout ./my_document.pdf\n\npip uninstall pdf_scout\n```\n\n![screenshot](./assets/screenshot.png)\n\nThis project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:\n\n* Single column of text (not multiple columns)\n* Font size of header text > font size of body text\n* Header text is justified or left-aligned\n* Paragraph spacing for headers > body text paragraph spacing\n* Consistent left margins on every page\n\n## Supported document types\n\n`pdf_scout` has been tested on and expressly supports the following classes of documents:\n\n- Singapore State Court and Supreme Court Judgments (unreported)\n- Singapore Law Reports\n- [OpenDoc](https://www.opendoc.gov.sg/)-generated PDFs, such as the [State Court Practice Directions 2021](https://epd-statecourts-2021.opendoc.gov.sg/) and the [Supreme Court Practice Directions 2021](https://epd-supcourt-2021.opendoc.gov.sg/)\n\nIt may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.\n\n## Development\n\nThis project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:\n\n```bash\npoetry install\n```\n\nTo open a virtualenv in the project folder with the dependencies, run:\n\n```bash\npoetry shell\n```\n\nTo run a script directly, run:\n\n```bash\npoetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>\n```\n\n### Tests\n\nThere are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the `/pdf` folder manually using the relevant sources (you may want to consider using [Clerkent](https://clerkent.huey.xyz) to download the unreported versions of judgments):\n\n```bash\npoetry run pytest\npoetry run pytest --snapshot-update\n```\n\n### Static type-checking\n\n```bash\npoetry run mypy pdf_scout/app.py\n```\n\n### Tips\n\n- Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file\n\n",
"bugtrack_url": null,
"license": "EUPL-1.2",
"summary": "automatically create bookmarks in a PDF file",
"version": "0.0.6",
"split_keywords": [
"pdf",
"bookmark",
"outline"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8aee8e0b7ec8cce767959b283f60ab98047bd9932a6331aafd74b36e64407fe0",
"md5": "c3c66cb5c05237ac12b5927075eabab5",
"sha256": "5a21a3ddca70215016f0c5cf1f670e3913ba5fbc92552ed913f14b20853f706d"
},
"downloads": -1,
"filename": "pdf_scout-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c3c66cb5c05237ac12b5927075eabab5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 11665,
"upload_time": "2023-01-03T07:00:40",
"upload_time_iso_8601": "2023-01-03T07:00:40.273865Z",
"url": "https://files.pythonhosted.org/packages/8a/ee/8e0b7ec8cce767959b283f60ab98047bd9932a6331aafd74b36e64407fe0/pdf_scout-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7a08352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0",
"md5": "e543ddfa6f4b39997d88e011aeba88ce",
"sha256": "87908911f26ca52c3e030d4c76c3c4273d0ed51c01d1b4271af4d302a917f331"
},
"downloads": -1,
"filename": "pdf_scout-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "e543ddfa6f4b39997d88e011aeba88ce",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 10473,
"upload_time": "2023-01-03T07:00:41",
"upload_time_iso_8601": "2023-01-03T07:00:41.538647Z",
"url": "https://files.pythonhosted.org/packages/7a/08/352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0/pdf_scout-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-03 07:00:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "hueyy",
"github_project": "pdf_scout",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pdf-scout"
}