Name | pdfimageextractor JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | Extract high-quality images from PDF files while preserving metadata |
upload_time | 2025-02-19 02:55:15 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
extraction
images
pdf
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# PDF Image Extractor
A tool to extract high-quality images from PDF files while preserving metadata and positioning information.
## Features
- Extracts images in their original quality without recompression
- Preserves image metadata including DPI and positioning
- Detects and skips duplicate images
- Generates detailed JSON metadata file
- Sorts images by their position on the page
## Installation
```bash
pip install pdfimageextractor
```
## Usage
```bash
pdfextractimages <PDF_FILE> [OUTPUT_FOLDER]
```
Arguments:
- `PDF_FILE`: Path to the PDF file to process
- `OUTPUT_FOLDER`: Optional directory to save extracted images (defaults to PDF_FILE_images)
## Output
The tool creates:
- Original quality images extracted from the PDF
- A `image_metadata.json` file containing detailed information about each image
Raw data
{
"_id": null,
"home_page": null,
"name": "pdfimageextractor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "extraction, images, pdf",
"author": null,
"author_email": "Neal Caren <neal.caren@unc.edu>",
"download_url": "https://files.pythonhosted.org/packages/b7/e9/086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24/pdfimageextractor-0.1.1.tar.gz",
"platform": null,
"description": "# PDF Image Extractor\n\nA tool to extract high-quality images from PDF files while preserving metadata and positioning information.\n\n## Features\n\n- Extracts images in their original quality without recompression\n- Preserves image metadata including DPI and positioning\n- Detects and skips duplicate images\n- Generates detailed JSON metadata file\n- Sorts images by their position on the page\n\n## Installation\n\n```bash\npip install pdfimageextractor\n```\n\n## Usage\n\n```bash\npdfextractimages <PDF_FILE> [OUTPUT_FOLDER]\n```\n\nArguments:\n- `PDF_FILE`: Path to the PDF file to process\n- `OUTPUT_FOLDER`: Optional directory to save extracted images (defaults to PDF_FILE_images)\n\n## Output\n\nThe tool creates:\n- Original quality images extracted from the PDF\n- A `image_metadata.json` file containing detailed information about each image\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Extract high-quality images from PDF files while preserving metadata",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/nealcaren/pdfimageextractor",
"Repository": "https://github.com/nealcaren/pdfimageextractor.git"
},
"split_keywords": [
"extraction",
" images",
" pdf"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3382fae1a95de75d0788edeb6656dfcd3eb2099b08df4b80e0260fae1814876e",
"md5": "9cff66e2ac5842f6925b4febee36b416",
"sha256": "2e1489e7950a62f5532c3457edadb618c4cc2962237197063dabfd63de58e50b"
},
"downloads": -1,
"filename": "pdfimageextractor-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9cff66e2ac5842f6925b4febee36b416",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 4910,
"upload_time": "2025-02-19T02:55:13",
"upload_time_iso_8601": "2025-02-19T02:55:13.587913Z",
"url": "https://files.pythonhosted.org/packages/33/82/fae1a95de75d0788edeb6656dfcd3eb2099b08df4b80e0260fae1814876e/pdfimageextractor-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b7e9086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24",
"md5": "6a23d5216e91225654f46c4718fb3550",
"sha256": "6a8b6c5aeff9e0c1cb8f9d198928b0716f0a5e6a1021ee63c6106a66a51f15f1"
},
"downloads": -1,
"filename": "pdfimageextractor-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "6a23d5216e91225654f46c4718fb3550",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6217,
"upload_time": "2025-02-19T02:55:15",
"upload_time_iso_8601": "2025-02-19T02:55:15.262192Z",
"url": "https://files.pythonhosted.org/packages/b7/e9/086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24/pdfimageextractor-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-19 02:55:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nealcaren",
"github_project": "pdfimageextractor",
"github_not_found": true,
"lcname": "pdfimageextractor"
}