pdfimageextractor


Namepdfimageextractor JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryExtract high-quality images from PDF files while preserving metadata
upload_time2025-02-19 02:55:15
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords extraction images pdf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PDF Image Extractor

A tool to extract high-quality images from PDF files while preserving metadata and positioning information.

## Features

- Extracts images in their original quality without recompression
- Preserves image metadata including DPI and positioning
- Detects and skips duplicate images
- Generates detailed JSON metadata file
- Sorts images by their position on the page

## Installation

```bash
pip install pdfimageextractor
```

## Usage

```bash
pdfextractimages <PDF_FILE> [OUTPUT_FOLDER]
```

Arguments:
- `PDF_FILE`: Path to the PDF file to process
- `OUTPUT_FOLDER`: Optional directory to save extracted images (defaults to PDF_FILE_images)

## Output

The tool creates:
- Original quality images extracted from the PDF
- A `image_metadata.json` file containing detailed information about each image

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdfimageextractor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "extraction, images, pdf",
    "author": null,
    "author_email": "Neal Caren <neal.caren@unc.edu>",
    "download_url": "https://files.pythonhosted.org/packages/b7/e9/086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24/pdfimageextractor-0.1.1.tar.gz",
    "platform": null,
    "description": "# PDF Image Extractor\n\nA tool to extract high-quality images from PDF files while preserving metadata and positioning information.\n\n## Features\n\n- Extracts images in their original quality without recompression\n- Preserves image metadata including DPI and positioning\n- Detects and skips duplicate images\n- Generates detailed JSON metadata file\n- Sorts images by their position on the page\n\n## Installation\n\n```bash\npip install pdfimageextractor\n```\n\n## Usage\n\n```bash\npdfextractimages <PDF_FILE> [OUTPUT_FOLDER]\n```\n\nArguments:\n- `PDF_FILE`: Path to the PDF file to process\n- `OUTPUT_FOLDER`: Optional directory to save extracted images (defaults to PDF_FILE_images)\n\n## Output\n\nThe tool creates:\n- Original quality images extracted from the PDF\n- A `image_metadata.json` file containing detailed information about each image\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Extract high-quality images from PDF files while preserving metadata",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/nealcaren/pdfimageextractor",
        "Repository": "https://github.com/nealcaren/pdfimageextractor.git"
    },
    "split_keywords": [
        "extraction",
        " images",
        " pdf"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3382fae1a95de75d0788edeb6656dfcd3eb2099b08df4b80e0260fae1814876e",
                "md5": "9cff66e2ac5842f6925b4febee36b416",
                "sha256": "2e1489e7950a62f5532c3457edadb618c4cc2962237197063dabfd63de58e50b"
            },
            "downloads": -1,
            "filename": "pdfimageextractor-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9cff66e2ac5842f6925b4febee36b416",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 4910,
            "upload_time": "2025-02-19T02:55:13",
            "upload_time_iso_8601": "2025-02-19T02:55:13.587913Z",
            "url": "https://files.pythonhosted.org/packages/33/82/fae1a95de75d0788edeb6656dfcd3eb2099b08df4b80e0260fae1814876e/pdfimageextractor-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b7e9086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24",
                "md5": "6a23d5216e91225654f46c4718fb3550",
                "sha256": "6a8b6c5aeff9e0c1cb8f9d198928b0716f0a5e6a1021ee63c6106a66a51f15f1"
            },
            "downloads": -1,
            "filename": "pdfimageextractor-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6a23d5216e91225654f46c4718fb3550",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6217,
            "upload_time": "2025-02-19T02:55:15",
            "upload_time_iso_8601": "2025-02-19T02:55:15.262192Z",
            "url": "https://files.pythonhosted.org/packages/b7/e9/086a4ace56c338b975ffa1ebab0269a5ea4819c85c3549050c95bb547a24/pdfimageextractor-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-19 02:55:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nealcaren",
    "github_project": "pdfimageextractor",
    "github_not_found": true,
    "lcname": "pdfimageextractor"
}
        
Elapsed time: 0.85914s