pdfsh


Namepdfsh JSON
Version 2024.4 PyPI version JSON
download
home_pageNone
Summaryminimal shell to investigate PDF files
upload_time2024-10-15 11:05:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords pdf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pdfsh

[![CircleCI](https://dl.circleci.com/status-badge/img/gh/metebalci/pdfsh/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/metebalci/pdfsh/tree/main)

`pdfsh` is a utility to investigate the PDF file structure in a shell-like interface. It allows one to "mount" a PDF file and use a simple shell-like interface to navigate inside the PDF file structurally.

Technically, `pdfsh` is a PDF processor, a PDF reader, but not a viewer that renders the page contents.

In `pdfsh`, similar to a file system, the PDF file is represented as a tree. All the nodes of the tree are PDF objects.

`pdfsh` has its own ISO 32000-2:2020 PDF-2.0 parser.

`pdfsh` uses ccitt and lzw filter implementations and png predictor implementation in [pdfminer.six](https://github.com/pdfminer/pdfminer.six). To minimize the dependency, I decided to add the implementations of these directly to the pdfsh code, so there is no dependency to pdfminer.six.

`pdfsh` assumes it is run under a ANSI capable terminal as it uses ANSI terminal features and colors. If strange behavior is observed, make sure the terminal emulation it is run is ANSI compatible.

## Usage

```
pip install pdfsh
```

which installs a `pdfsh` executable into the path.

When `pdfsh` is run as `pdfsh <pdf_file>`, the shell interface is loaded with the document at the root of structural tree. The root node has no name, and represented by a single `/`.

`pdfsh` shell interface have commands like `ls`, `cd` and `cat`. For paths, an autocomplete mechanism is implemented.

`pdfsh` has a simple prompt: `<filename>:<current_node> $`. The current node is given as a path separated by `/` like a UNIX filesystem path.

## Tutorial

For an introduction to PDF and a tutorial using `pdfsh`, please see my blog post [A Minimum Complete Tutorial of Portable Document Format (PDF) with pdfsh](https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh/).

## Notes

`pdfsh` supports both cross-reference tables and cross-reference streams as well as hybrid-reference files. However, because `pdfsh` eagerly constructs the cross-reference table, either the cross-reference table or cross-reference stream is read in a particular update section. Thus, an object that is not visible in cross-reference stream but visible in cross-reference table cannot be found. More information about this topic can be found in ISO 32000-2:2020 7.5.8.4. Compatibility with applications that do not support compressed reference streams.

## Changes

Version numbers are in `<year>.<positive_integer>` format. The `<positive_integer` monotonically increases in the same year but resets to `1` in the new year.

### 2024.4
- cross-reference streams support
- object streams support
- `--version` option added
- migrated from setup.py to pyproject.toml 

### 2024.3 is skipped

### 2024.2
- first public release

### 2024.1
- initial test release, not for public use

## External Licenses

### pdfminer.six

[pdfminer.six](https://github.com/pdfminer/pdfminer.six): [Copyright (c) 2004-2016  Yusuke Shinyama \<yusuke at shinyama dot jp\>](LICENSE.pdfminer.six)

- [ccitt.py](pdfminer/ccitt.py) and [lzw.py](pdfminer/lzw.py) are part of pdfminer.six
- [utils.py](pdfminer/utils.py) contains one function (`apply_png_predictor`) from the same source file (utils.py) from pdfminer.six.

# License

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdfsh",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pdf",
    "author": null,
    "author_email": "Mete Balci <metebalci@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/08/2d/428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502/pdfsh-2024.4.tar.gz",
    "platform": null,
    "description": "# pdfsh\n\n[![CircleCI](https://dl.circleci.com/status-badge/img/gh/metebalci/pdfsh/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/metebalci/pdfsh/tree/main)\n\n`pdfsh` is a utility to investigate the PDF file structure in a shell-like interface. It allows one to \"mount\" a PDF file and use a simple shell-like interface to navigate inside the PDF file structurally.\n\nTechnically, `pdfsh` is a PDF processor, a PDF reader, but not a viewer that renders the page contents.\n\nIn `pdfsh`, similar to a file system, the PDF file is represented as a tree. All the nodes of the tree are PDF objects.\n\n`pdfsh` has its own ISO 32000-2:2020 PDF-2.0 parser.\n\n`pdfsh` uses ccitt and lzw filter implementations and png predictor implementation in [pdfminer.six](https://github.com/pdfminer/pdfminer.six). To minimize the dependency, I decided to add the implementations of these directly to the pdfsh code, so there is no dependency to pdfminer.six.\n\n`pdfsh` assumes it is run under a ANSI capable terminal as it uses ANSI terminal features and colors. If strange behavior is observed, make sure the terminal emulation it is run is ANSI compatible.\n\n## Usage\n\n```\npip install pdfsh\n```\n\nwhich installs a `pdfsh` executable into the path.\n\nWhen `pdfsh` is run as `pdfsh <pdf_file>`, the shell interface is loaded with the document at the root of structural tree. The root node has no name, and represented by a single `/`.\n\n`pdfsh` shell interface have commands like `ls`, `cd` and `cat`. For paths, an autocomplete mechanism is implemented.\n\n`pdfsh` has a simple prompt: `<filename>:<current_node> $`. The current node is given as a path separated by `/` like a UNIX filesystem path.\n\n## Tutorial\n\nFor an introduction to PDF and a tutorial using `pdfsh`, please see my blog post [A Minimum Complete Tutorial of Portable Document Format (PDF) with pdfsh](https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh/).\n\n## Notes\n\n`pdfsh` supports both cross-reference tables and cross-reference streams as well as hybrid-reference files. However, because `pdfsh` eagerly constructs the cross-reference table, either the cross-reference table or cross-reference stream is read in a particular update section. Thus, an object that is not visible in cross-reference stream but visible in cross-reference table cannot be found. More information about this topic can be found in ISO 32000-2:2020 7.5.8.4. Compatibility with applications that do not support compressed reference streams.\n\n## Changes\n\nVersion numbers are in `<year>.<positive_integer>` format. The `<positive_integer` monotonically increases in the same year but resets to `1` in the new year.\n\n### 2024.4\n- cross-reference streams support\n- object streams support\n- `--version` option added\n- migrated from setup.py to pyproject.toml \n\n### 2024.3 is skipped\n\n### 2024.2\n- first public release\n\n### 2024.1\n- initial test release, not for public use\n\n## External Licenses\n\n### pdfminer.six\n\n[pdfminer.six](https://github.com/pdfminer/pdfminer.six): [Copyright (c) 2004-2016  Yusuke Shinyama \\<yusuke at shinyama dot jp\\>](LICENSE.pdfminer.six)\n\n- [ccitt.py](pdfminer/ccitt.py) and [lzw.py](pdfminer/lzw.py) are part of pdfminer.six\n- [utils.py](pdfminer/utils.py) contains one function (`apply_png_predictor`) from the same source file (utils.py) from pdfminer.six.\n\n# License\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see <https://www.gnu.org/licenses/>.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "minimal shell to investigate PDF files",
    "version": "2024.4",
    "project_urls": {
        "Changelog": "https://github.com/metebalci/pdfsh/blob/master/README.md",
        "Documentation": "https://github.com/metebalci/pdfsh",
        "Homepage": "https://github.com/metebalci/pdfsh",
        "Issues": "https://github.com/metebalci/pdfsh/issues",
        "Repository": "https://github.com/metebalci/pdfsh.git"
    },
    "split_keywords": [
        "pdf"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2bd9969ba2c56f0c72bab030a5758362f16592165789b61a41af76c38b3c2778",
                "md5": "8518424b75f2e78924bf3916044fc11c",
                "sha256": "b0db747a4d275cdd204638350ea6aa767711214e972a76325bfd09d65426f101"
            },
            "downloads": -1,
            "filename": "pdfsh-2024.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8518424b75f2e78924bf3916044fc11c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 50325,
            "upload_time": "2024-10-15T11:05:23",
            "upload_time_iso_8601": "2024-10-15T11:05:23.598078Z",
            "url": "https://files.pythonhosted.org/packages/2b/d9/969ba2c56f0c72bab030a5758362f16592165789b61a41af76c38b3c2778/pdfsh-2024.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "082d428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502",
                "md5": "c9ab96d7bcd8c795ba98bf560f6727f6",
                "sha256": "9ce9ef507e3e05d377df0e1a75c4c1be2fc23a70fc1a02bb9f50d6169c9291fa"
            },
            "downloads": -1,
            "filename": "pdfsh-2024.4.tar.gz",
            "has_sig": false,
            "md5_digest": "c9ab96d7bcd8c795ba98bf560f6727f6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 48069,
            "upload_time": "2024-10-15T11:05:25",
            "upload_time_iso_8601": "2024-10-15T11:05:25.753070Z",
            "url": "https://files.pythonhosted.org/packages/08/2d/428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502/pdfsh-2024.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 11:05:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "metebalci",
    "github_project": "pdfsh",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "lcname": "pdfsh"
}
        
Elapsed time: 0.36438s