pdf.tocgen


Namepdf.tocgen JSON
Version 1.3.4 PyPI version JSON
download
home_pagehttps://krasjet.com/voice/pdf.tocgen/
SummaryAutomatically generate table of contents for pdf files
upload_time2023-11-26 04:00:16
maintainer
docs_urlNone
authorkrasjet
requires_python>=3.7,<4.0
licenseGPL-3.0-or-later
keywords pdf cli
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [pdf.tocgen][tocgen]
==========

[![PyPI](https://img.shields.io/pypi/v/pdf.tocgen)](https://pypi.org/project/pdf.tocgen/)
[![build](https://github.com/Krasjet/pdf.tocgen/workflows/build/badge.svg?branch=master)](https://github.com/Krasjet/pdf.tocgen/actions?query=workflow%3Abuild)

```
                          in.pdf
                            |
                            |
     +----------------------+--------------------+
     |                      |                    |
     V                      V                    V
+----------+          +-----------+         +----------+
|          |  recipe  |           |   ToC   |          |
| pdfxmeta +--------->| pdftocgen +-------->| pdftocio +---> out.pdf
|          |          |           |         |          |
+----------+          +-----------+         +----------+
```

[pdf.tocgen][tocgen] is a set of command-line tools for automatically
extracting and generating the table of contents (ToC) of a PDF file. It uses
the embedded font attributes and position of headings to deduce the basic
outline of a PDF file.

It works best for PDF files produces from a TeX document using `pdftex` (and
its friends `pdflatex`, `pdfxetex`, etc.), but it's designed to work with any
**software-generated** PDF files (i.e. you shouldn't expect it to work with
scanned PDFs). Some examples include `troff`/`groff`, Adobe InDesign, Microsoft
Word, and probably more.

Please see the [**homepage**][tocgen] for a detailed introduction.

Installation
------------

pdf.tocgen is written in Python 3. It is known to work with Python 3.7 to 3.11
on Linux, Windows, and macOS (On BSDs, you probably need to build PyMuPDF
yourself). Use

```sh
$ pip install -U pdf.tocgen
```
to install the latest version systemwide. Alternatively, use [pipx][pipx] or

```sh
$ pip install -U --user pdf.tocgen
```
to install it for the current user. I would recommend the latter approach to
avoid messing up the package manager on your system.

If you are using an Arch-based Linux distro, the package is also available on
[AUR][aur]. It can be installed using any AUR helper, for example [`yay`][yay]:

```{.console .codein}
$ yay -S pdf.tocgen
```

[pipx]: https://pipxproject.github.io/pipx/
[aur]: https://aur.archlinux.org/packages/pdf.tocgen/
[yay]: https://github.com/Jguer/yay

Workflow
--------

The design of pdf.tocgen is influenced by the [Unix philosophy][unix]. I
intentionally separated pdf.tocgen to 3 separate programs. They work together,
but each of them is useful on their own.

1. `pdfxmeta`: extract the metadata (font attributes, positions) of headings to
    build a **recipe** file.
2. `pdftocgen`: generate a table of contents from the recipe.
3. `pdftocio`: import the table of contents to the PDF document.

You should read [the example][ex] on the homepage for a proper introduction,
but the basic workflow follows like this.

First, use `pdfxmeta` to search for the metadata of headings, and generate
**heading filters** using the automatic setting

```sh
$ pdfxmeta -p page -a 1 in.pdf "Section" >> recipe.toml
$ pdfxmeta -p page -a 2 in.pdf "Subsection" >> recipe.toml
```
Note that `page` needs to be replaced by the page number of the search keyword.

The output `recipe.toml` file would contain several heading filters, each of
which specifies the attribute of a heading at a particular level should have.

An example recipe file would look like this:

```toml
[[heading]]
level = 1
greedy = true
font.name = "Times-Bold"
font.size = 19.92530059814453

[[heading]]
level = 2
greedy = true
font.name = "Times-Bold"
font.size = 11.9552001953125
```

Then pass the recipe to `pdftocgen` to generate a table of contents,

```console
$ pdftocgen in.pdf < recipe.toml
"Preface" 5
    "Bottom-up Design" 5
    "Plan of the Book" 7
    "Examples" 9
    "Acknowledgements" 9
"Contents" 11
"The Extensible Language" 14
    "1.1 Design by Evolution" 14
    "1.2 Programming Bottom-Up" 16
    "1.3 Extensible Software" 18
    "1.4 Extending Lisp" 19
    "1.5 Why Lisp (or When)" 21
"Functions" 22
    "2.1 Functions as Data" 22
    "2.2 Defining Functions" 23
    "2.3 Functional Arguments" 26
    "2.4 Functions as Properties" 28
    "2.5 Scope" 29
    "2.6 Closures" 30
    "2.7 Local Functions" 34
    "2.8 Tail-Recursion" 35
    "2.9 Compilation" 37
    "2.10 Functions from Lists" 40
"Functional Programming" 41
    "3.1 Functional Design" 41
    "3.2 Imperative Outside-In" 46
    "3.3 Functional Interfaces" 48
    "3.4 Interactive Programming" 50
[--snip--]
```
which can be directly imported to the PDF file using `pdftocio`,

```sh
$ pdftocgen in.pdf < recipe.toml | pdftocio -o out.pdf in.pdf
```

Or if you want to edit the table of contents before importing it,

```sh
$ pdftocgen in.pdf < recipe.toml > toc
$ vim toc # edit
$ pdftocio in.pdf < toc
```

Each of the three programs has some extra functionalities. Use the `-h` option
to see all the options you could pass in.

Command examples
----------------

Because of the modularity of design, each program is useful on its own, despite
being part of the pipeline. This section will provide some more examples on how
you could use them. Feel free to come up with more.

### `pdftocio`

`pdftocio` should best demonstrate this point, this program can do a lot on its
own.

To display existing table of contents in a PDF to `stdout`:

```console
$ pdftocio doc.pdf
"Level 1 heading 1" 1
    "Level 2 heading 1" 1
        "Level 3 heading 1" 2
        "Level 3 heading 2" 3
    "Level 2 heading 2" 4
"Level 1 heading 2" 5
```

To write existing table of contents in a PDF to a file named `toc`:

```console
$ pdftocio doc.pdf > toc
```

To write a `toc` file back to `doc.pdf`:

```console
$ pdftocio doc.pdf < toc
```

To specify the name of output PDF:

```console
$ pdftocio -o out.pdf doc.pdf < toc
```

To copy the table of contents from `doc1.pdf` to `doc2.pdf`:

```console
$ pdftocio -v doc1.pdf | pdftocio doc2.pdf
```

Note that the `-v` flag helps preserve the vertical
positions of headings during the copy.

To print the table of contents for reading:

```console
$ pdftocio -H doc.pdf
Level 1 heading 1 ··· 1
    Level 2 heading 1 ··· 1
        Level 3 heading 1 ··· 2
        Level 3 heading 2 ··· 3
    Level 2 heading 2 ··· 4
Level 1 heading 2 ··· 5
```

### `pdftocgen`

If you have obtained an existing recipe `rcp.toml` for `doc.pdf`, you could
apply it and print the outline to `stdout` by

```console
$ pdftocgen doc.pdf < rcp.toml
"Level 1 heading 1" 1
    "Level 2 heading 1" 1
        "Level 3 heading 1" 2
        "Level 3 heading 2" 3
    "Level 2 heading 2" 4
"Level 1 heading 2" 5
```

To output the table of contents to a file called `toc`:

```console
$ pdftocgen doc.pdf < rcp.toml > toc
```

To import the generated table of contents to the PDF file, and output
to `doc_out.pdf`:

```console
$ pdftocgen doc.pdf < rcp.toml | pdftocio -o doc_out.pdf doc.pdf
```

To print the generated table of contents for reading:

```console
$ pdftocgen -H doc.pdf < rcp.toml
Level 1 heading 1 ··· 1
    Level 2 heading 1 ··· 1
        Level 3 heading 1 ··· 2
        Level 3 heading 2 ··· 3
    Level 2 heading 2 ··· 4
Level 1 heading 2 ··· 5
```

If you want to include the vertical position in a page for each heading, use the
`-v` flag

```console
$ pdftocgen -v doc.pdf < rcp.toml
"Level 1 heading 1" 1 306.947998046875
    "Level 2 heading 1" 1 586.3488159179688
        "Level 3 heading 1" 2 586.5888061523438
        "Level 3 heading 2" 3 155.66879272460938
    "Level 2 heading 2" 4 435.8687744140625
"Level 1 heading 2" 5 380.78875732421875
```

`pdftocio` can understand the vertical position in the output to generate table
of contents entries that link to the exact position of the heading, instead of
the top of the page.

```console
$ pdftocgen -v doc.pdf < rcp.toml | pdftocio doc.pdf
```

Note that the default output of `pdftocio` here is `doc_out.pdf`.

### `pdfxmeta`

To search for `Anaphoric` in the entire PDF:

```console
$ pdfxmeta onlisp.pdf "Anaphoric"
14. Anaphoric Macros:
    font.name = "Times-Bold"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = true
    bbox.left = 308.6400146484375
    bbox.top = 307.1490478515625
    bbox.right = 404.33282470703125
    bbox.bottom = 320.9472351074219
[--snip--]
```

To output the result as a heading filter with the automatic settings,

```console
$ pdfxmeta -a 1 onlisp.pdf "Anaphoric"
[[heading]]
# 14. Anaphoric Macros
level = 1
greedy = true
font.name = "Times-Bold"
font.size = 9.962599754333496
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = true
# font.monospace = false
# font.bold = true
# bbox.left = 308.6400146484375
# bbox.top = 307.1490478515625
# bbox.right = 404.33282470703125
# bbox.bottom = 320.9472351074219
# bbox.tolerance = 1e-5
[--snip--]
```
which can be directly write to a recipe file:

```console
$ pdfxmeta -a 1 onlisp.pdf "Anaphoric" >> recipe.toml
```

To case-insensitive search for `Anaphoric` in the entire PDF:

```console
$ pdfxmeta -i onlisp.pdf "Anaphoric"
to compile-time. Chapter 14 introduces anaphoric macros, which allow you to:
    font.name = "Times-Roman"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = false
    bbox.left = 138.60000610351562
    bbox.top = 295.6583557128906
    bbox.right = 459.0260009765625
    bbox.bottom = 308.948486328125
[--snip--]
```

Use regular expression to case-insensitive search search for `Anaphoric` in the
entire PDF:

```console
$ pdfxmeta onlisp.pdf "[Aa]naphoric"
to compile-time. Chapter 14 introduces anaphoric macros, which allow you to:
    font.name = "Times-Roman"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = false
    bbox.left = 138.60000610351562
    bbox.top = 295.6583557128906
    bbox.right = 459.0260009765625
    bbox.bottom = 308.948486328125
[--snip--]
```

To search only on page 203:

```console
$ pdfxmeta -p 203 onlisp.pdf "anaphoric"
anaphoric if, called:
    font.name = "Times-Roman"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = false
    bbox.left = 138.60000610351562
    bbox.top = 283.17822265625
    bbox.right = 214.81094360351562
    bbox.bottom = 296.4683532714844
[--snip--]
```

To dump the entire page of 203:

```console
$ pdfxmeta -p 203 onlisp.pdf
190:
    font.name = "Times-Roman"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = false
    bbox.left = 138.60000610351562
    bbox.top = 126.09941101074219
    bbox.right = 153.54388427734375
    bbox.bottom = 139.38951110839844
[--snip--]
```

To dump the entire PDF document:

```console
$ pdfxmeta onlisp.pdf
i:
    font.name = "Times-Roman"
    font.size = 9.962599754333496
    font.color = 0x000000
    font.superscript = false
    font.italic = false
    font.serif = true
    font.monospace = false
    font.bold = false
    bbox.left = 458.0400085449219
    bbox.top = 126.09941101074219
    bbox.right = 460.8096008300781
    bbox.bottom = 139.38951110839844
[--snip--]
```

Development
-----------

If you want to modify the source code or contribute anything, first install
[`poetry`][poetry], which is a dependency and package manager for Python used
by pdf.tocgen. Then run

```sh
$ poetry install
```
in the root directory of this repository to set up development dependencies.

If you want to test the development version of pdf.tocgen, use the `poetry run` command:

```sh
$ poetry run pdfxmeta in.pdf "pattern"
```
Alternatively, you could also use the

```sh
$ poetry shell
```
command to open up a virtual environment and run the development version
directly:

```sh
(pdf.tocgen) $ pdfxmeta in.pdf "pattern"
```

Before you send a patch or pull request, make sure the unit test passes by
running:

```sh
$ make test
```

GUI front end
-------------

If you are a Emacs user, you could install Daniel Nicolai's [toc-mode][tocmode]
package as a GUI front end for pdf.tocgen, though it offers many more
functionalities, such as extracting (printed) table of contents from a PDF
file. Note that it uses pdf.tocgen under the hood, so you still need to install
pdf.tocgen before using toc-mode as a front end for pdf.tocgen.

License
-------

pdf.tocgen itself a is free software. The source code of pdf.tocgen is licensed
under the GNU GPLv3 license. However, the recipes in the `recipes` directory is
separately licensed under the [CC BY-NC-SA 4.0 License][cc] to prevent any
commercial usage, and thus not included in the distribution.

pdf.tocgen is based on [PyMuPDF][pymupdf], licensed under the GNU GPLv3
license, which is again based on [MuPDF][mupdf], licensed under the GNU AGPLv3
license. A copy of the AGPLv3 license is included in the repository.

If you want to make any derivatives based on this project, please follow the
terms of the GNU GPLv3 license.

[tocgen]: https://krasjet.com/voice/pdf.tocgen/
[unix]: https://en.wikipedia.org/wiki/Unix_philosophy
[ex]: https://krasjet.com/voice/pdf.tocgen/#a-worked-example
[poetry]: https://python-poetry.org/
[pymupdf]: https://github.com/pymupdf/PyMuPDF
[mupdf]: https://mupdf.com/docs/index.html
[cc]: https://creativecommons.org/licenses/by-nc-sa/4.0/
[tocmode]: https://github.com/dalanicolai/toc-mode

            

Raw data

            {
    "_id": null,
    "home_page": "https://krasjet.com/voice/pdf.tocgen/",
    "name": "pdf.tocgen",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "pdf,cli",
    "author": "krasjet",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/77/44/e6dafea2c491e84425ed725b69b689e58703609b1d70e7b7f49f28cf5df7/pdf_tocgen-1.3.4.tar.gz",
    "platform": null,
    "description": "[pdf.tocgen][tocgen]\n==========\n\n[![PyPI](https://img.shields.io/pypi/v/pdf.tocgen)](https://pypi.org/project/pdf.tocgen/)\n[![build](https://github.com/Krasjet/pdf.tocgen/workflows/build/badge.svg?branch=master)](https://github.com/Krasjet/pdf.tocgen/actions?query=workflow%3Abuild)\n\n```\n                          in.pdf\n                            |\n                            |\n     +----------------------+--------------------+\n     |                      |                    |\n     V                      V                    V\n+----------+          +-----------+         +----------+\n|          |  recipe  |           |   ToC   |          |\n| pdfxmeta +--------->| pdftocgen +-------->| pdftocio +---> out.pdf\n|          |          |           |         |          |\n+----------+          +-----------+         +----------+\n```\n\n[pdf.tocgen][tocgen] is a set of command-line tools for automatically\nextracting and generating the table of contents (ToC) of a PDF file. It uses\nthe embedded font attributes and position of headings to deduce the basic\noutline of a PDF file.\n\nIt works best for PDF files produces from a TeX document using `pdftex` (and\nits friends `pdflatex`, `pdfxetex`, etc.), but it's designed to work with any\n**software-generated** PDF files (i.e. you shouldn't expect it to work with\nscanned PDFs). Some examples include `troff`/`groff`, Adobe InDesign, Microsoft\nWord, and probably more.\n\nPlease see the [**homepage**][tocgen] for a detailed introduction.\n\nInstallation\n------------\n\npdf.tocgen is written in Python 3. It is known to work with Python 3.7 to 3.11\non Linux, Windows, and macOS (On BSDs, you probably need to build PyMuPDF\nyourself). Use\n\n```sh\n$ pip install -U pdf.tocgen\n```\nto install the latest version systemwide. Alternatively, use [pipx][pipx] or\n\n```sh\n$ pip install -U --user pdf.tocgen\n```\nto install it for the current user. I would recommend the latter approach to\navoid messing up the package manager on your system.\n\nIf you are using an Arch-based Linux distro, the package is also available on\n[AUR][aur]. It can be installed using any AUR helper, for example [`yay`][yay]:\n\n```{.console .codein}\n$ yay -S pdf.tocgen\n```\n\n[pipx]: https://pipxproject.github.io/pipx/\n[aur]: https://aur.archlinux.org/packages/pdf.tocgen/\n[yay]: https://github.com/Jguer/yay\n\nWorkflow\n--------\n\nThe design of pdf.tocgen is influenced by the [Unix philosophy][unix]. I\nintentionally separated pdf.tocgen to 3 separate programs. They work together,\nbut each of them is useful on their own.\n\n1. `pdfxmeta`: extract the metadata (font attributes, positions) of headings to\n    build a **recipe** file.\n2. `pdftocgen`: generate a table of contents from the recipe.\n3. `pdftocio`: import the table of contents to the PDF document.\n\nYou should read [the example][ex] on the homepage for a proper introduction,\nbut the basic workflow follows like this.\n\nFirst, use `pdfxmeta` to search for the metadata of headings, and generate\n**heading filters** using the automatic setting\n\n```sh\n$ pdfxmeta -p page -a 1 in.pdf \"Section\" >> recipe.toml\n$ pdfxmeta -p page -a 2 in.pdf \"Subsection\" >> recipe.toml\n```\nNote that `page` needs to be replaced by the page number of the search keyword.\n\nThe output `recipe.toml` file would contain several heading filters, each of\nwhich specifies the attribute of a heading at a particular level should have.\n\nAn example recipe file would look like this:\n\n```toml\n[[heading]]\nlevel = 1\ngreedy = true\nfont.name = \"Times-Bold\"\nfont.size = 19.92530059814453\n\n[[heading]]\nlevel = 2\ngreedy = true\nfont.name = \"Times-Bold\"\nfont.size = 11.9552001953125\n```\n\nThen pass the recipe to `pdftocgen` to generate a table of contents,\n\n```console\n$ pdftocgen in.pdf < recipe.toml\n\"Preface\" 5\n    \"Bottom-up Design\" 5\n    \"Plan of the Book\" 7\n    \"Examples\" 9\n    \"Acknowledgements\" 9\n\"Contents\" 11\n\"The Extensible Language\" 14\n    \"1.1 Design by Evolution\" 14\n    \"1.2 Programming Bottom-Up\" 16\n    \"1.3 Extensible Software\" 18\n    \"1.4 Extending Lisp\" 19\n    \"1.5 Why Lisp (or When)\" 21\n\"Functions\" 22\n    \"2.1 Functions as Data\" 22\n    \"2.2 Defining Functions\" 23\n    \"2.3 Functional Arguments\" 26\n    \"2.4 Functions as Properties\" 28\n    \"2.5 Scope\" 29\n    \"2.6 Closures\" 30\n    \"2.7 Local Functions\" 34\n    \"2.8 Tail-Recursion\" 35\n    \"2.9 Compilation\" 37\n    \"2.10 Functions from Lists\" 40\n\"Functional Programming\" 41\n    \"3.1 Functional Design\" 41\n    \"3.2 Imperative Outside-In\" 46\n    \"3.3 Functional Interfaces\" 48\n    \"3.4 Interactive Programming\" 50\n[--snip--]\n```\nwhich can be directly imported to the PDF file using `pdftocio`,\n\n```sh\n$ pdftocgen in.pdf < recipe.toml | pdftocio -o out.pdf in.pdf\n```\n\nOr if you want to edit the table of contents before importing it,\n\n```sh\n$ pdftocgen in.pdf < recipe.toml > toc\n$ vim toc # edit\n$ pdftocio in.pdf < toc\n```\n\nEach of the three programs has some extra functionalities. Use the `-h` option\nto see all the options you could pass in.\n\nCommand examples\n----------------\n\nBecause of the modularity of design, each program is useful on its own, despite\nbeing part of the pipeline. This section will provide some more examples on how\nyou could use them. Feel free to come up with more.\n\n### `pdftocio`\n\n`pdftocio` should best demonstrate this point, this program can do a lot on its\nown.\n\nTo display existing table of contents in a PDF to `stdout`:\n\n```console\n$ pdftocio doc.pdf\n\"Level 1 heading 1\" 1\n    \"Level 2 heading 1\" 1\n        \"Level 3 heading 1\" 2\n        \"Level 3 heading 2\" 3\n    \"Level 2 heading 2\" 4\n\"Level 1 heading 2\" 5\n```\n\nTo write existing table of contents in a PDF to a file named `toc`:\n\n```console\n$ pdftocio doc.pdf > toc\n```\n\nTo write a `toc` file back to `doc.pdf`:\n\n```console\n$ pdftocio doc.pdf < toc\n```\n\nTo specify the name of output PDF:\n\n```console\n$ pdftocio -o out.pdf doc.pdf < toc\n```\n\nTo copy the table of contents from `doc1.pdf` to `doc2.pdf`:\n\n```console\n$ pdftocio -v doc1.pdf | pdftocio doc2.pdf\n```\n\nNote that the `-v` flag helps preserve the vertical\npositions of headings during the copy.\n\nTo print the table of contents for reading:\n\n```console\n$ pdftocio -H doc.pdf\nLevel 1 heading 1 \u00b7\u00b7\u00b7 1\n    Level 2 heading 1 \u00b7\u00b7\u00b7 1\n        Level 3 heading 1 \u00b7\u00b7\u00b7 2\n        Level 3 heading 2 \u00b7\u00b7\u00b7 3\n    Level 2 heading 2 \u00b7\u00b7\u00b7 4\nLevel 1 heading 2 \u00b7\u00b7\u00b7 5\n```\n\n### `pdftocgen`\n\nIf you have obtained an existing recipe `rcp.toml` for `doc.pdf`, you could\napply it and print the outline to `stdout` by\n\n```console\n$ pdftocgen doc.pdf < rcp.toml\n\"Level 1 heading 1\" 1\n    \"Level 2 heading 1\" 1\n        \"Level 3 heading 1\" 2\n        \"Level 3 heading 2\" 3\n    \"Level 2 heading 2\" 4\n\"Level 1 heading 2\" 5\n```\n\nTo output the table of contents to a file called `toc`:\n\n```console\n$ pdftocgen doc.pdf < rcp.toml > toc\n```\n\nTo import the generated table of contents to the PDF file, and output\nto `doc_out.pdf`:\n\n```console\n$ pdftocgen doc.pdf < rcp.toml | pdftocio -o doc_out.pdf doc.pdf\n```\n\nTo print the generated table of contents for reading:\n\n```console\n$ pdftocgen -H doc.pdf < rcp.toml\nLevel 1 heading 1 \u00b7\u00b7\u00b7 1\n    Level 2 heading 1 \u00b7\u00b7\u00b7 1\n        Level 3 heading 1 \u00b7\u00b7\u00b7 2\n        Level 3 heading 2 \u00b7\u00b7\u00b7 3\n    Level 2 heading 2 \u00b7\u00b7\u00b7 4\nLevel 1 heading 2 \u00b7\u00b7\u00b7 5\n```\n\nIf you want to include the vertical position in a page for each heading, use the\n`-v` flag\n\n```console\n$ pdftocgen -v doc.pdf < rcp.toml\n\"Level 1 heading 1\" 1 306.947998046875\n    \"Level 2 heading 1\" 1 586.3488159179688\n        \"Level 3 heading 1\" 2 586.5888061523438\n        \"Level 3 heading 2\" 3 155.66879272460938\n    \"Level 2 heading 2\" 4 435.8687744140625\n\"Level 1 heading 2\" 5 380.78875732421875\n```\n\n`pdftocio` can understand the vertical position in the output to generate table\nof contents entries that link to the exact position of the heading, instead of\nthe top of the page.\n\n```console\n$ pdftocgen -v doc.pdf < rcp.toml | pdftocio doc.pdf\n```\n\nNote that the default output of `pdftocio` here is `doc_out.pdf`.\n\n### `pdfxmeta`\n\nTo search for `Anaphoric` in the entire PDF:\n\n```console\n$ pdfxmeta onlisp.pdf \"Anaphoric\"\n14. Anaphoric Macros:\n    font.name = \"Times-Bold\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = true\n    bbox.left = 308.6400146484375\n    bbox.top = 307.1490478515625\n    bbox.right = 404.33282470703125\n    bbox.bottom = 320.9472351074219\n[--snip--]\n```\n\nTo output the result as a heading filter with the automatic settings,\n\n```console\n$ pdfxmeta -a 1 onlisp.pdf \"Anaphoric\"\n[[heading]]\n# 14. Anaphoric Macros\nlevel = 1\ngreedy = true\nfont.name = \"Times-Bold\"\nfont.size = 9.962599754333496\n# font.size_tolerance = 1e-5\n# font.color = 0x000000\n# font.superscript = false\n# font.italic = false\n# font.serif = true\n# font.monospace = false\n# font.bold = true\n# bbox.left = 308.6400146484375\n# bbox.top = 307.1490478515625\n# bbox.right = 404.33282470703125\n# bbox.bottom = 320.9472351074219\n# bbox.tolerance = 1e-5\n[--snip--]\n```\nwhich can be directly write to a recipe file:\n\n```console\n$ pdfxmeta -a 1 onlisp.pdf \"Anaphoric\" >> recipe.toml\n```\n\nTo case-insensitive search for `Anaphoric` in the entire PDF:\n\n```console\n$ pdfxmeta -i onlisp.pdf \"Anaphoric\"\nto compile-time. Chapter 14 introduces anaphoric macros, which allow you to:\n    font.name = \"Times-Roman\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = false\n    bbox.left = 138.60000610351562\n    bbox.top = 295.6583557128906\n    bbox.right = 459.0260009765625\n    bbox.bottom = 308.948486328125\n[--snip--]\n```\n\nUse regular expression to case-insensitive search search for `Anaphoric` in the\nentire PDF:\n\n```console\n$ pdfxmeta onlisp.pdf \"[Aa]naphoric\"\nto compile-time. Chapter 14 introduces anaphoric macros, which allow you to:\n    font.name = \"Times-Roman\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = false\n    bbox.left = 138.60000610351562\n    bbox.top = 295.6583557128906\n    bbox.right = 459.0260009765625\n    bbox.bottom = 308.948486328125\n[--snip--]\n```\n\nTo search only on page 203:\n\n```console\n$ pdfxmeta -p 203 onlisp.pdf \"anaphoric\"\nanaphoric if, called:\n    font.name = \"Times-Roman\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = false\n    bbox.left = 138.60000610351562\n    bbox.top = 283.17822265625\n    bbox.right = 214.81094360351562\n    bbox.bottom = 296.4683532714844\n[--snip--]\n```\n\nTo dump the entire page of 203:\n\n```console\n$ pdfxmeta -p 203 onlisp.pdf\n190:\n    font.name = \"Times-Roman\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = false\n    bbox.left = 138.60000610351562\n    bbox.top = 126.09941101074219\n    bbox.right = 153.54388427734375\n    bbox.bottom = 139.38951110839844\n[--snip--]\n```\n\nTo dump the entire PDF document:\n\n```console\n$ pdfxmeta onlisp.pdf\ni:\n    font.name = \"Times-Roman\"\n    font.size = 9.962599754333496\n    font.color = 0x000000\n    font.superscript = false\n    font.italic = false\n    font.serif = true\n    font.monospace = false\n    font.bold = false\n    bbox.left = 458.0400085449219\n    bbox.top = 126.09941101074219\n    bbox.right = 460.8096008300781\n    bbox.bottom = 139.38951110839844\n[--snip--]\n```\n\nDevelopment\n-----------\n\nIf you want to modify the source code or contribute anything, first install\n[`poetry`][poetry], which is a dependency and package manager for Python used\nby pdf.tocgen. Then run\n\n```sh\n$ poetry install\n```\nin the root directory of this repository to set up development dependencies.\n\nIf you want to test the development version of pdf.tocgen, use the `poetry run` command:\n\n```sh\n$ poetry run pdfxmeta in.pdf \"pattern\"\n```\nAlternatively, you could also use the\n\n```sh\n$ poetry shell\n```\ncommand to open up a virtual environment and run the development version\ndirectly:\n\n```sh\n(pdf.tocgen) $ pdfxmeta in.pdf \"pattern\"\n```\n\nBefore you send a patch or pull request, make sure the unit test passes by\nrunning:\n\n```sh\n$ make test\n```\n\nGUI front end\n-------------\n\nIf you are a Emacs user, you could install Daniel Nicolai's [toc-mode][tocmode]\npackage as a GUI front end for pdf.tocgen, though it offers many more\nfunctionalities, such as extracting (printed) table of contents from a PDF\nfile. Note that it uses pdf.tocgen under the hood, so you still need to install\npdf.tocgen before using toc-mode as a front end for pdf.tocgen.\n\nLicense\n-------\n\npdf.tocgen itself a is free software. The source code of pdf.tocgen is licensed\nunder the GNU GPLv3 license. However, the recipes in the `recipes` directory is\nseparately licensed under the [CC BY-NC-SA 4.0 License][cc] to prevent any\ncommercial usage, and thus not included in the distribution.\n\npdf.tocgen is based on [PyMuPDF][pymupdf], licensed under the GNU GPLv3\nlicense, which is again based on [MuPDF][mupdf], licensed under the GNU AGPLv3\nlicense. A copy of the AGPLv3 license is included in the repository.\n\nIf you want to make any derivatives based on this project, please follow the\nterms of the GNU GPLv3 license.\n\n[tocgen]: https://krasjet.com/voice/pdf.tocgen/\n[unix]: https://en.wikipedia.org/wiki/Unix_philosophy\n[ex]: https://krasjet.com/voice/pdf.tocgen/#a-worked-example\n[poetry]: https://python-poetry.org/\n[pymupdf]: https://github.com/pymupdf/PyMuPDF\n[mupdf]: https://mupdf.com/docs/index.html\n[cc]: https://creativecommons.org/licenses/by-nc-sa/4.0/\n[tocmode]: https://github.com/dalanicolai/toc-mode\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Automatically generate table of contents for pdf files",
    "version": "1.3.4",
    "project_urls": {
        "Homepage": "https://krasjet.com/voice/pdf.tocgen/",
        "Repository": "https://github.com/Krasjet/pdf.tocgen"
    },
    "split_keywords": [
        "pdf",
        "cli"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "97d1fa7a6c44b2cb367e3c7995cb32e41faf7c11debdea18e4d2cc4824117f69",
                "md5": "97752bb67d6d51705db4c0effe0042dc",
                "sha256": "f50f7d9ed6049cc237132bf954183b76bab5412d72ff72e17664f6bb363bc018"
            },
            "downloads": -1,
            "filename": "pdf_tocgen-1.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "97752bb67d6d51705db4c0effe0042dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 33561,
            "upload_time": "2023-11-26T04:00:14",
            "upload_time_iso_8601": "2023-11-26T04:00:14.353814Z",
            "url": "https://files.pythonhosted.org/packages/97/d1/fa7a6c44b2cb367e3c7995cb32e41faf7c11debdea18e4d2cc4824117f69/pdf_tocgen-1.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7744e6dafea2c491e84425ed725b69b689e58703609b1d70e7b7f49f28cf5df7",
                "md5": "01933141eb219f0c8a87d2d1a30c1181",
                "sha256": "090758832614727eaf1fd0ba0075d5a10eb8f268d1d534fabd7131170a8ac79e"
            },
            "downloads": -1,
            "filename": "pdf_tocgen-1.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "01933141eb219f0c8a87d2d1a30c1181",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 42693,
            "upload_time": "2023-11-26T04:00:16",
            "upload_time_iso_8601": "2023-11-26T04:00:16.496090Z",
            "url": "https://files.pythonhosted.org/packages/77/44/e6dafea2c491e84425ed725b69b689e58703609b1d70e7b7f49f28cf5df7/pdf_tocgen-1.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-26 04:00:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Krasjet",
    "github_project": "pdf.tocgen",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pdf.tocgen"
}
        
Elapsed time: 2.47405s