image-to-pdf

Name	image-to-pdf JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Convert image files to PDF.
upload_time	2024-06-10 14:53:55
maintainer	None
docs_url	None
author	Jeffrey A. Clark, stefan6419846 (standalone conversion package)
requires_python	<4,>=3.8
license	HPND
keywords	image pdf conversion
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Image to PDF

Convert images to PDF files, utilizing *Pillow*, while attempting to handle alpha channels in a suitable manner.

## About

I have always been looking for a good and reliable way to convert images to PDF files, while supporting images with alpha channels and avoiding undesired data loss.

My usual approach for converting images to PDF has been the implementation from *Pillow* as this would at least warn about data loss when handling alpha channels. Version 9.5.0 introduced the ability to convert RGBA images to PDF files as well by using JPEG2000/JPX images: [#6925](https://github.com/python-pillow/Pillow/pull/6925). But why do we need another library when *Pillow* apparently solved this problem some time ago?

During some tests, it turned out that neither *pdf.js* nor *Poppler*/*Cairo* would correctly render these PDF files: [#8074](https://github.com/python-pillow/Pillow/issues/8074). Due to *pdf.js* being my default PDF viewer for the web browser and *pdftocairo* my usual way to convert PDF files to thumbnails etc., this did not look right. Although *pdf.js* has fixed the bug in the meantime, it seems like JPX files with alpha channels are not handled sufficiently by all viewers, thus restricting the possible user base.

In the process of my report about the issues to *Pillow*, a pull request came up which would replace the alpha handling with SMasks: [#8097](https://github.com/python-pillow/Pillow/pull/8097). Unfortunately, it has been closed in the meantime since *pdf.js* fixed the bug. As this approach proved to be the most stable in my opinion, I decided to create an own package for it, while incorporating some basic feedback from user *sl2c* to not use `/DCTDecode` for RGBA images, but `/FlateDecode` instead. During testing, it turned out that *pypdf* would be able to recover the original image representation, even when the PDF had a white image on white background - this is just to make sure that ideally no data gets lost, although most of my usual RGBA images are indeed regular screenshots where removing the transparency altogether would not hurt, but this generally requires some more complex approaches to detect it.

Further alternatives I considered:

* Using `image.paste` through *Pillow* to copy the possibly transparent image onto a white background. This has the side effect of basically making all white areas of the original image unreadable if the background has been transparent.
* Using `pdfrwx`: I have not been able to recover the original image with its transparency in all cases. Additionally, this repository is neither packaged nor does it have any unit or integration tests. It depends on both NumPy and SciPy as well as *potrace* (GPL-2.0-or-later), being quite heavy and possibly introducing viral effects. *pdfrw*, another dependency, has been unmaintained for quite some time.

## Installation

You can install this package from PyPI:

```bash
python -m pip install image-to-pdf
```

## Usage

The main entry point is `image_to_pdf.save()`, which will convert one image to a PDF file by default. For all other functionality and some examples, I recommend you to have a look at the tests.

Additionally, the general API is compatible to the original `PIL.PdfImagePlugin`. Thus, you can configure this package to overwrite the original `PdfImagePlugin` functionality. See `IntegrationTestCase` for an example.

## License

This package is subject to the terms of the HPND license.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "image-to-pdf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": null,
    "keywords": "image, pdf, conversion",
    "author": "Jeffrey A. Clark, stefan6419846 (standalone conversion package)",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f4/c6/f9d329126895c558c3b62087c11cea0eb836f8efd3979bc09a5726c3e4b7/image_to_pdf-0.1.0.tar.gz",
    "platform": null,
    "description": "# Image to PDF\n\nConvert images to PDF files, utilizing *Pillow*, while attempting to handle alpha channels in a suitable manner.\n\n## About\n\nI have always been looking for a good and reliable way to convert images to PDF files, while supporting images with alpha channels and avoiding undesired data loss.\n\nMy usual approach for converting images to PDF has been the implementation from *Pillow* as this would at least warn about data loss when handling alpha channels. Version 9.5.0 introduced the ability to convert RGBA images to PDF files as well by using JPEG2000/JPX images: [#6925](https://github.com/python-pillow/Pillow/pull/6925). But why do we need another library when *Pillow* apparently solved this problem some time ago?\n\nDuring some tests, it turned out that neither *pdf.js* nor *Poppler*/*Cairo* would correctly render these PDF files: [#8074](https://github.com/python-pillow/Pillow/issues/8074). Due to *pdf.js* being my default PDF viewer for the web browser and *pdftocairo* my usual way to convert PDF files to thumbnails etc., this did not look right. Although *pdf.js* has fixed the bug in the meantime, it seems like JPX files with alpha channels are not handled sufficiently by all viewers, thus restricting the possible user base.\n\nIn the process of my report about the issues to *Pillow*, a pull request came up which would replace the alpha handling with SMasks: [#8097](https://github.com/python-pillow/Pillow/pull/8097). Unfortunately, it has been closed in the meantime since *pdf.js* fixed the bug. As this approach proved to be the most stable in my opinion, I decided to create an own package for it, while incorporating some basic feedback from user *sl2c* to not use `/DCTDecode` for RGBA images, but `/FlateDecode` instead. During testing, it turned out that *pypdf* would be able to recover the original image representation, even when the PDF had a white image on white background - this is just to make sure that ideally no data gets lost, although most of my usual RGBA images are indeed regular screenshots where removing the transparency altogether would not hurt, but this generally requires some more complex approaches to detect it.\n\nFurther alternatives I considered:\n\n* Using `image.paste` through *Pillow* to copy the possibly transparent image onto a white background. This has the side effect of basically making all white areas of the original image unreadable if the background has been transparent.\n* Using `pdfrwx`: I have not been able to recover the original image with its transparency in all cases. Additionally, this repository is neither packaged nor does it have any unit or integration tests. It depends on both NumPy and SciPy as well as *potrace* (GPL-2.0-or-later), being quite heavy and possibly introducing viral effects. *pdfrw*, another dependency, has been unmaintained for quite some time.\n\n## Installation\n\nYou can install this package from PyPI:\n\n```bash\npython -m pip install image-to-pdf\n```\n\n## Usage\n\nThe main entry point is `image_to_pdf.save()`, which will convert one image to a PDF file by default. For all other functionality and some examples, I recommend you to have a look at the tests.\n\nAdditionally, the general API is compatible to the original `PIL.PdfImagePlugin`. Thus, you can configure this package to overwrite the original `PdfImagePlugin` functionality. See `IntegrationTestCase` for an example.\n\n## License\n\nThis package is subject to the terms of the HPND license.\n",
    "bugtrack_url": null,
    "license": "HPND",
    "summary": "Convert image files to PDF.",
    "version": "0.1.0",
    "project_urls": {
        "Changelog": "https://github.com/stefan6419846/image-to-pdf/blob/main/CHANGELOG.md",
        "Homepage": "https://github.com/stefan6419846/image-to-pdf",
        "Issues": "https://github.com/stefan6419846/image-to-pdf/issues",
        "Repository": "https://github.com/stefan6419846/image-to-pdf"
    },
    "split_keywords": [
        "image",
        " pdf",
        " conversion"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c9ecd1a092867fab5fc92e19af2a211eeb8fa65c2a1e77e15633dd79464ec52c",
                "md5": "6caaf4139f34c44b6938625f8f0df2c3",
                "sha256": "c94935b610c4914e434f0820ed448d6de0860c4b061c0b041d71b3b1f1129db4"
            },
            "downloads": -1,
            "filename": "image_to_pdf-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6caaf4139f34c44b6938625f8f0df2c3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 8239,
            "upload_time": "2024-06-10T14:53:53",
            "upload_time_iso_8601": "2024-06-10T14:53:53.826603Z",
            "url": "https://files.pythonhosted.org/packages/c9/ec/d1a092867fab5fc92e19af2a211eeb8fa65c2a1e77e15633dd79464ec52c/image_to_pdf-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f4c6f9d329126895c558c3b62087c11cea0eb836f8efd3979bc09a5726c3e4b7",
                "md5": "ce3fea6f0c40ff212a030d4d37c79acc",
                "sha256": "0eb11691e6cec3b2e4d4557d6fd0e0d7ee88e4d60915eaeace466c684c148f7e"
            },
            "downloads": -1,
            "filename": "image_to_pdf-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ce3fea6f0c40ff212a030d4d37c79acc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 117497,
            "upload_time": "2024-06-10T14:53:55",
            "upload_time_iso_8601": "2024-06-10T14:53:55.136189Z",
            "url": "https://files.pythonhosted.org/packages/f4/c6/f9d329126895c558c3b62087c11cea0eb836f8efd3979bc09a5726c3e4b7/image_to_pdf-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-10 14:53:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stefan6419846",
    "github_project": "image-to-pdf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "image-to-pdf"
}

Jeffrey A. Clark, stefan6419846 (standalone conversion package)