Name | pdfsh JSON |
Version |
2024.4
JSON |
| download |
home_page | None |
Summary | minimal shell to investigate PDF files |
upload_time | 2024-10-15 11:05:25 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
pdf
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pdfsh
[![CircleCI](https://dl.circleci.com/status-badge/img/gh/metebalci/pdfsh/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/metebalci/pdfsh/tree/main)
`pdfsh` is a utility to investigate the PDF file structure in a shell-like interface. It allows one to "mount" a PDF file and use a simple shell-like interface to navigate inside the PDF file structurally.
Technically, `pdfsh` is a PDF processor, a PDF reader, but not a viewer that renders the page contents.
In `pdfsh`, similar to a file system, the PDF file is represented as a tree. All the nodes of the tree are PDF objects.
`pdfsh` has its own ISO 32000-2:2020 PDF-2.0 parser.
`pdfsh` uses ccitt and lzw filter implementations and png predictor implementation in [pdfminer.six](https://github.com/pdfminer/pdfminer.six). To minimize the dependency, I decided to add the implementations of these directly to the pdfsh code, so there is no dependency to pdfminer.six.
`pdfsh` assumes it is run under a ANSI capable terminal as it uses ANSI terminal features and colors. If strange behavior is observed, make sure the terminal emulation it is run is ANSI compatible.
## Usage
```
pip install pdfsh
```
which installs a `pdfsh` executable into the path.
When `pdfsh` is run as `pdfsh <pdf_file>`, the shell interface is loaded with the document at the root of structural tree. The root node has no name, and represented by a single `/`.
`pdfsh` shell interface have commands like `ls`, `cd` and `cat`. For paths, an autocomplete mechanism is implemented.
`pdfsh` has a simple prompt: `<filename>:<current_node> $`. The current node is given as a path separated by `/` like a UNIX filesystem path.
## Tutorial
For an introduction to PDF and a tutorial using `pdfsh`, please see my blog post [A Minimum Complete Tutorial of Portable Document Format (PDF) with pdfsh](https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh/).
## Notes
`pdfsh` supports both cross-reference tables and cross-reference streams as well as hybrid-reference files. However, because `pdfsh` eagerly constructs the cross-reference table, either the cross-reference table or cross-reference stream is read in a particular update section. Thus, an object that is not visible in cross-reference stream but visible in cross-reference table cannot be found. More information about this topic can be found in ISO 32000-2:2020 7.5.8.4. Compatibility with applications that do not support compressed reference streams.
## Changes
Version numbers are in `<year>.<positive_integer>` format. The `<positive_integer` monotonically increases in the same year but resets to `1` in the new year.
### 2024.4
- cross-reference streams support
- object streams support
- `--version` option added
- migrated from setup.py to pyproject.toml
### 2024.3 is skipped
### 2024.2
- first public release
### 2024.1
- initial test release, not for public use
## External Licenses
### pdfminer.six
[pdfminer.six](https://github.com/pdfminer/pdfminer.six): [Copyright (c) 2004-2016 Yusuke Shinyama \<yusuke at shinyama dot jp\>](LICENSE.pdfminer.six)
- [ccitt.py](pdfminer/ccitt.py) and [lzw.py](pdfminer/lzw.py) are part of pdfminer.six
- [utils.py](pdfminer/utils.py) contains one function (`apply_png_predictor`) from the same source file (utils.py) from pdfminer.six.
# License
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Raw data
{
"_id": null,
"home_page": null,
"name": "pdfsh",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pdf",
"author": null,
"author_email": "Mete Balci <metebalci@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/08/2d/428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502/pdfsh-2024.4.tar.gz",
"platform": null,
"description": "# pdfsh\n\n[![CircleCI](https://dl.circleci.com/status-badge/img/gh/metebalci/pdfsh/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/metebalci/pdfsh/tree/main)\n\n`pdfsh` is a utility to investigate the PDF file structure in a shell-like interface. It allows one to \"mount\" a PDF file and use a simple shell-like interface to navigate inside the PDF file structurally.\n\nTechnically, `pdfsh` is a PDF processor, a PDF reader, but not a viewer that renders the page contents.\n\nIn `pdfsh`, similar to a file system, the PDF file is represented as a tree. All the nodes of the tree are PDF objects.\n\n`pdfsh` has its own ISO 32000-2:2020 PDF-2.0 parser.\n\n`pdfsh` uses ccitt and lzw filter implementations and png predictor implementation in [pdfminer.six](https://github.com/pdfminer/pdfminer.six). To minimize the dependency, I decided to add the implementations of these directly to the pdfsh code, so there is no dependency to pdfminer.six.\n\n`pdfsh` assumes it is run under a ANSI capable terminal as it uses ANSI terminal features and colors. If strange behavior is observed, make sure the terminal emulation it is run is ANSI compatible.\n\n## Usage\n\n```\npip install pdfsh\n```\n\nwhich installs a `pdfsh` executable into the path.\n\nWhen `pdfsh` is run as `pdfsh <pdf_file>`, the shell interface is loaded with the document at the root of structural tree. The root node has no name, and represented by a single `/`.\n\n`pdfsh` shell interface have commands like `ls`, `cd` and `cat`. For paths, an autocomplete mechanism is implemented.\n\n`pdfsh` has a simple prompt: `<filename>:<current_node> $`. The current node is given as a path separated by `/` like a UNIX filesystem path.\n\n## Tutorial\n\nFor an introduction to PDF and a tutorial using `pdfsh`, please see my blog post [A Minimum Complete Tutorial of Portable Document Format (PDF) with pdfsh](https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh/).\n\n## Notes\n\n`pdfsh` supports both cross-reference tables and cross-reference streams as well as hybrid-reference files. However, because `pdfsh` eagerly constructs the cross-reference table, either the cross-reference table or cross-reference stream is read in a particular update section. Thus, an object that is not visible in cross-reference stream but visible in cross-reference table cannot be found. More information about this topic can be found in ISO 32000-2:2020 7.5.8.4. Compatibility with applications that do not support compressed reference streams.\n\n## Changes\n\nVersion numbers are in `<year>.<positive_integer>` format. The `<positive_integer` monotonically increases in the same year but resets to `1` in the new year.\n\n### 2024.4\n- cross-reference streams support\n- object streams support\n- `--version` option added\n- migrated from setup.py to pyproject.toml \n\n### 2024.3 is skipped\n\n### 2024.2\n- first public release\n\n### 2024.1\n- initial test release, not for public use\n\n## External Licenses\n\n### pdfminer.six\n\n[pdfminer.six](https://github.com/pdfminer/pdfminer.six): [Copyright (c) 2004-2016 Yusuke Shinyama \\<yusuke at shinyama dot jp\\>](LICENSE.pdfminer.six)\n\n- [ccitt.py](pdfminer/ccitt.py) and [lzw.py](pdfminer/lzw.py) are part of pdfminer.six\n- [utils.py](pdfminer/utils.py) contains one function (`apply_png_predictor`) from the same source file (utils.py) from pdfminer.six.\n\n# License\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program. If not, see <https://www.gnu.org/licenses/>.\n",
"bugtrack_url": null,
"license": null,
"summary": "minimal shell to investigate PDF files",
"version": "2024.4",
"project_urls": {
"Changelog": "https://github.com/metebalci/pdfsh/blob/master/README.md",
"Documentation": "https://github.com/metebalci/pdfsh",
"Homepage": "https://github.com/metebalci/pdfsh",
"Issues": "https://github.com/metebalci/pdfsh/issues",
"Repository": "https://github.com/metebalci/pdfsh.git"
},
"split_keywords": [
"pdf"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2bd9969ba2c56f0c72bab030a5758362f16592165789b61a41af76c38b3c2778",
"md5": "8518424b75f2e78924bf3916044fc11c",
"sha256": "b0db747a4d275cdd204638350ea6aa767711214e972a76325bfd09d65426f101"
},
"downloads": -1,
"filename": "pdfsh-2024.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8518424b75f2e78924bf3916044fc11c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 50325,
"upload_time": "2024-10-15T11:05:23",
"upload_time_iso_8601": "2024-10-15T11:05:23.598078Z",
"url": "https://files.pythonhosted.org/packages/2b/d9/969ba2c56f0c72bab030a5758362f16592165789b61a41af76c38b3c2778/pdfsh-2024.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "082d428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502",
"md5": "c9ab96d7bcd8c795ba98bf560f6727f6",
"sha256": "9ce9ef507e3e05d377df0e1a75c4c1be2fc23a70fc1a02bb9f50d6169c9291fa"
},
"downloads": -1,
"filename": "pdfsh-2024.4.tar.gz",
"has_sig": false,
"md5_digest": "c9ab96d7bcd8c795ba98bf560f6727f6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 48069,
"upload_time": "2024-10-15T11:05:25",
"upload_time_iso_8601": "2024-10-15T11:05:25.753070Z",
"url": "https://files.pythonhosted.org/packages/08/2d/428d0e0f2b75caabee63bfaceb37307551fdc0f0d22936464fff448a5502/pdfsh-2024.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-15 11:05:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "metebalci",
"github_project": "pdfsh",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"lcname": "pdfsh"
}