archive-pdf-tools


Namearchive-pdf-tools JSON
Version 1.5.4 PyPI version JSON
download
home_pagehttps://github.com/internetarchive/archive-pdf-tools
SummaryInternet Archive PDF compression tools
upload_time2023-06-27 13:17:39
maintainer
docs_urlNone
authorMerlijn Boris Wolf Wajer
requires_python>=3.6
licenseAGPL-3.0
keywords pdf mrc hocr internet archive
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Internet Archive PDF tools
##########################

:authors: - Merlijn Wajer <merlijn@archive.org>
:date: 2021-11-14 18:00

This repository contains a library to perform MRC (Mixed Raster Content)
compression on images [*]_, which offers lossy high compression of images, in
particular images with text.

Additionally, the library can generate MRC-compressed PDF files with hOCR [*]_
text layers mixed into to the PDF, which makes searching and copy-pasting of the
PDF possible. PDFs generated by `bin/recode_pdf` should be `PDF/A 3b` and
`PDF/UA` compatible.

Some of the tooling also supports specific Internet Archive file formats (such
as the "scandata.xml" files, but the tooling should work fine without those
files, too.

While the code is already being used internally to create PDFs at the Internet
Archive, the code still needs more documentation and cleaning up, so don't
expect this to be super well documented just yet.


Features
========

* Reliable: has produced over 6 million PDFs in 2021 alone (each with many
  hundreds of pages)
* Fast and robust compression: Competes directly with the proprietary software
  offerings when it comes to speed and compressibility (often outperforming in
  both)
* MRC compression of images, leading to anywhere from 3-15x compression ratios,
  depending on the quality setting provided.
* Creates PDF from a directory of images
* Improved compression based on OCR results (hOCR files)
* Hidden text layer insertion based on hOCR files, which makes a PDF searchable
  and the text copy-pasteable.
* PDF/A 3b compatible.
* Basic PDF/UA support (accessibility features)
* Creation of 1 bit (black and white) PDFs


Dependencies
============

* Python 3.x
* Python packages (also see `requirements.txt`):
    - PyMuPDF
    - lxml
    - scikit-image
    - Pillow
    - roman
    - `archive-hocr-tools <https://github.com/internetarchive/archive-hocr-tools>`_


One-of:

* `Kakadu JPEG2000 binaries <https://kakadusoftware.com/>`_
* Open source OpenJPEG2000 tools (opj_compress and opj_decompress)
* `Grok <https://github.com/GrokImageCompression/grok/>`_ (grk_compress and grk_decompress)
* `jpegoptim <https://github.com/tjko/jpegoptim>`_ (when using JPEG instead of JPEG2000)

For JBIG2 compression:

* `jbig2enc <https://github.com/agl/jbig2enc>`_ for JBIG2 compression (and PyMuPDF 1.19.0 or higher)


Installation
============

First install dependencies. For example, in Ubuntu::


    sudo apt install libleptonica-dev libopenjp2-tools libxml2-dev libxslt-dev python3-dev python3-pip

    sudo apt install automake libtool
    git clone https://github.com/agl/jbig2enc
    cd jbig2enc
    ./autogen.sh
    ./configure && make
    sudo make install


Because `archive-pdf-tools` is on the `Python Package Index <https://pypi.org/project/archive-pdf-tools/>`_ (PyPI), you can use `pip` (the Python 3 version is often called `pip3`) to install the latest version::


    # Latest version
    pip3 install archive-pdf-tools

    # Specific version
    pip3 install archive-pdf-tools==1.4.14


Alternatively, if you want a specific commit or unreleased version, check out the master branch or a `tagged release <https://github.com/internetarchive/archive-pdf-tools/tags>`_ and use `pip` to install::


    git clone https://github.com/internetarchive/archive-pdf-tools.git
    cd archive-pdf-tools
    pip3 install .


Finally, if you've downloaded a wheel to test a specific commit, you can also install it using `pip`::


    pip3 install --force-reinstall -U --no-deps ./archive_pdf_tools-${version}.whl


To see if `archive-pdf-tools` is installed correctly for your user, run::


    recode_pdf --version



Not well tested features
========================

* "Recoding" an existing PDF, extracting the images and creating a new PDF with
  the images from the existing PDF is not well tested. This works OK if every
  PDF page just has a single image.


Known issues
============

* Using ``--image-mode 0`` and ``--image-mode 1`` is currently broken, so only
  MRC or no images is supported.
* It is not possible to recode/compress a PDF without hOCR files. This will be
  addressed in the future, since it should not be a problem to generate a PDF
  lacking hOCR data.


Planned features
================

* Addition of a second set of fonts in the PDFs, so that hidden selected text
  also renders the original glyphs.
* Better background generation (text shade removal from the background)
* Better compression parameter selection, I have not toyed around that much with
  kakadu and grok/openjpeg2000 parameters.


MRC
===

The goal of Mixed Raster Content compression is to decompose the image into a
background, foreground and mask. The background should contain components that
are not of particular interest, whereas the foreground would contain all
glyphs/text on a page, as well as the lines and edges of various drawings or
images. The mask is a 1-bit image which has the value '1' when a pixel is part
of the foreground.

This decomposition can then be used to compress the different components
individually, applying much higher compression to specific components, usually
the background, which can be downscaled as well. The foreground can be quite
compressed as well, since it mostly just needs to contain the approximate
colours of the text and other lines - any artifacts introduced during the
foreground compression (e.g. ugly artifact around text borders) are removed by
overlaying the mask component of the image, which is losslessly compressed
(typically using either JBIG2 or CCITT).

In a PDF, this usually means the background image is inserted into a page,
followed by the foreground image, which uses the mask as its alpha layer.

Usage
-----

Creating a PDF from a set of images is pretty straightforward::


    recode_pdf --from-imagestack 'sim_english-illustrated-magazine_1884-12_2_15_jp2/*' \
        --hocr-file sim_english-illustrated-magazine_1884-12_2_15_hocr.html \
        --dpi 400 --bg-downsample 3 \
        -m 2 -t 10 --mask-compression jbig2 \
        -o /tmp/example.pdf
    [...]
    Processed 9 pages at 1.16 seconds/page
    Compression ratio: 7.144962



Or, to scan a document, OCR it with Tesseract and save the result as a compressed PDF
(JPEG2000 compression with OpenJPEG, background downsampled three times), with
text layer::

    scanimage --resolution 300 --mode Color --format tiff | tee /tmp/scan.tiff | tesseract - - hocr > /tmp/scan.hocr ; recode_pdf -v -J openjpeg --bg-downsample 3 --from-imagestack /tmp/scan.tiff --hocr-file /tmp/scan.hocr -o /tmp/scan.pdf
    [...]
    Processed 1 pages at 11.40 seconds/page
    Compression ratio: 249.876613


Examining the results
---------------------

``mrcview`` (tools/mrcview) is shipped with the package and can be used to turn a
MRC-compressed PDF into a PDF with each layer on a separate page, this is the
easiest way to inspect the resulting compression. Run it like so:

    mrcview /tmp/compressed.pdf /tmp/mrc.pdf

There is also ``maskview``, which just renders the masks of a PDF to another PDF.

Alternatively, one could use ``pdfimages`` to extract the image layers of a
specific page and then view them with your favourite image viewer::

    pageno=0; pdfimages -f $pageno -l $pageno -png path_to_pdf extracted_image_base
    feh extracted_image_base*.png

`tools/pdfimagesmrc` can be used to check how the size of the PDF
is broken down into the foreground, background, masks and text layer.

License
=======

License for all code (minus ``internetarchive/pdfrenderer.py``) is AGPL 3.0.

``internetarchive/pdfrenderer.py`` is Apache 2.0, which matches the Tesseract
license for that file.


.. [*] https://en.wikipedia.org/wiki/Mixed_raster_content
.. [*] http://kba.cloud/hocr-spec/1.2/


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/internetarchive/archive-pdf-tools",
    "name": "archive-pdf-tools",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "PDF,MRC,hOCR,Internet Archive",
    "author": "Merlijn Boris Wolf Wajer",
    "author_email": "merlijn@archive.org",
    "download_url": "https://files.pythonhosted.org/packages/e4/1b/e08645a31fa2c15a37c8aa46efbab1f41dea185a5a96589c3d871af11e07/archive-pdf-tools-1.5.4.tar.gz",
    "platform": null,
    "description": "Internet Archive PDF tools\n##########################\n\n:authors: - Merlijn Wajer <merlijn@archive.org>\n:date: 2021-11-14 18:00\n\nThis repository contains a library to perform MRC (Mixed Raster Content)\ncompression on images [*]_, which offers lossy high compression of images, in\nparticular images with text.\n\nAdditionally, the library can generate MRC-compressed PDF files with hOCR [*]_\ntext layers mixed into to the PDF, which makes searching and copy-pasting of the\nPDF possible. PDFs generated by `bin/recode_pdf` should be `PDF/A 3b` and\n`PDF/UA` compatible.\n\nSome of the tooling also supports specific Internet Archive file formats (such\nas the \"scandata.xml\" files, but the tooling should work fine without those\nfiles, too.\n\nWhile the code is already being used internally to create PDFs at the Internet\nArchive, the code still needs more documentation and cleaning up, so don't\nexpect this to be super well documented just yet.\n\n\nFeatures\n========\n\n* Reliable: has produced over 6 million PDFs in 2021 alone (each with many\n  hundreds of pages)\n* Fast and robust compression: Competes directly with the proprietary software\n  offerings when it comes to speed and compressibility (often outperforming in\n  both)\n* MRC compression of images, leading to anywhere from 3-15x compression ratios,\n  depending on the quality setting provided.\n* Creates PDF from a directory of images\n* Improved compression based on OCR results (hOCR files)\n* Hidden text layer insertion based on hOCR files, which makes a PDF searchable\n  and the text copy-pasteable.\n* PDF/A 3b compatible.\n* Basic PDF/UA support (accessibility features)\n* Creation of 1 bit (black and white) PDFs\n\n\nDependencies\n============\n\n* Python 3.x\n* Python packages (also see `requirements.txt`):\n    - PyMuPDF\n    - lxml\n    - scikit-image\n    - Pillow\n    - roman\n    - `archive-hocr-tools <https://github.com/internetarchive/archive-hocr-tools>`_\n\n\nOne-of:\n\n* `Kakadu JPEG2000 binaries <https://kakadusoftware.com/>`_\n* Open source OpenJPEG2000 tools (opj_compress and opj_decompress)\n* `Grok <https://github.com/GrokImageCompression/grok/>`_ (grk_compress and grk_decompress)\n* `jpegoptim <https://github.com/tjko/jpegoptim>`_ (when using JPEG instead of JPEG2000)\n\nFor JBIG2 compression:\n\n* `jbig2enc <https://github.com/agl/jbig2enc>`_ for JBIG2 compression (and PyMuPDF 1.19.0 or higher)\n\n\nInstallation\n============\n\nFirst install dependencies. For example, in Ubuntu::\n\n\n    sudo apt install libleptonica-dev libopenjp2-tools libxml2-dev libxslt-dev python3-dev python3-pip\n\n    sudo apt install automake libtool\n    git clone https://github.com/agl/jbig2enc\n    cd jbig2enc\n    ./autogen.sh\n    ./configure && make\n    sudo make install\n\n\nBecause `archive-pdf-tools` is on the `Python Package Index <https://pypi.org/project/archive-pdf-tools/>`_ (PyPI), you can use `pip` (the Python 3 version is often called `pip3`) to install the latest version::\n\n\n    # Latest version\n    pip3 install archive-pdf-tools\n\n    # Specific version\n    pip3 install archive-pdf-tools==1.4.14\n\n\nAlternatively, if you want a specific commit or unreleased version, check out the master branch or a `tagged release <https://github.com/internetarchive/archive-pdf-tools/tags>`_ and use `pip` to install::\n\n\n    git clone https://github.com/internetarchive/archive-pdf-tools.git\n    cd archive-pdf-tools\n    pip3 install .\n\n\nFinally, if you've downloaded a wheel to test a specific commit, you can also install it using `pip`::\n\n\n    pip3 install --force-reinstall -U --no-deps ./archive_pdf_tools-${version}.whl\n\n\nTo see if `archive-pdf-tools` is installed correctly for your user, run::\n\n\n    recode_pdf --version\n\n\n\nNot well tested features\n========================\n\n* \"Recoding\" an existing PDF, extracting the images and creating a new PDF with\n  the images from the existing PDF is not well tested. This works OK if every\n  PDF page just has a single image.\n\n\nKnown issues\n============\n\n* Using ``--image-mode 0`` and ``--image-mode 1`` is currently broken, so only\n  MRC or no images is supported.\n* It is not possible to recode/compress a PDF without hOCR files. This will be\n  addressed in the future, since it should not be a problem to generate a PDF\n  lacking hOCR data.\n\n\nPlanned features\n================\n\n* Addition of a second set of fonts in the PDFs, so that hidden selected text\n  also renders the original glyphs.\n* Better background generation (text shade removal from the background)\n* Better compression parameter selection, I have not toyed around that much with\n  kakadu and grok/openjpeg2000 parameters.\n\n\nMRC\n===\n\nThe goal of Mixed Raster Content compression is to decompose the image into a\nbackground, foreground and mask. The background should contain components that\nare not of particular interest, whereas the foreground would contain all\nglyphs/text on a page, as well as the lines and edges of various drawings or\nimages. The mask is a 1-bit image which has the value '1' when a pixel is part\nof the foreground.\n\nThis decomposition can then be used to compress the different components\nindividually, applying much higher compression to specific components, usually\nthe background, which can be downscaled as well. The foreground can be quite\ncompressed as well, since it mostly just needs to contain the approximate\ncolours of the text and other lines - any artifacts introduced during the\nforeground compression (e.g. ugly artifact around text borders) are removed by\noverlaying the mask component of the image, which is losslessly compressed\n(typically using either JBIG2 or CCITT).\n\nIn a PDF, this usually means the background image is inserted into a page,\nfollowed by the foreground image, which uses the mask as its alpha layer.\n\nUsage\n-----\n\nCreating a PDF from a set of images is pretty straightforward::\n\n\n    recode_pdf --from-imagestack 'sim_english-illustrated-magazine_1884-12_2_15_jp2/*' \\\n        --hocr-file sim_english-illustrated-magazine_1884-12_2_15_hocr.html \\\n        --dpi 400 --bg-downsample 3 \\\n        -m 2 -t 10 --mask-compression jbig2 \\\n        -o /tmp/example.pdf\n    [...]\n    Processed 9 pages at 1.16 seconds/page\n    Compression ratio: 7.144962\n\n\n\nOr, to scan a document, OCR it with Tesseract and save the result as a compressed PDF\n(JPEG2000 compression with OpenJPEG, background downsampled three times), with\ntext layer::\n\n    scanimage --resolution 300 --mode Color --format tiff | tee /tmp/scan.tiff | tesseract - - hocr > /tmp/scan.hocr ; recode_pdf -v -J openjpeg --bg-downsample 3 --from-imagestack /tmp/scan.tiff --hocr-file /tmp/scan.hocr -o /tmp/scan.pdf\n    [...]\n    Processed 1 pages at 11.40 seconds/page\n    Compression ratio: 249.876613\n\n\nExamining the results\n---------------------\n\n``mrcview`` (tools/mrcview) is shipped with the package and can be used to turn a\nMRC-compressed PDF into a PDF with each layer on a separate page, this is the\neasiest way to inspect the resulting compression. Run it like so:\n\n    mrcview /tmp/compressed.pdf /tmp/mrc.pdf\n\nThere is also ``maskview``, which just renders the masks of a PDF to another PDF.\n\nAlternatively, one could use ``pdfimages`` to extract the image layers of a\nspecific page and then view them with your favourite image viewer::\n\n    pageno=0; pdfimages -f $pageno -l $pageno -png path_to_pdf extracted_image_base\n    feh extracted_image_base*.png\n\n`tools/pdfimagesmrc` can be used to check how the size of the PDF\nis broken down into the foreground, background, masks and text layer.\n\nLicense\n=======\n\nLicense for all code (minus ``internetarchive/pdfrenderer.py``) is AGPL 3.0.\n\n``internetarchive/pdfrenderer.py`` is Apache 2.0, which matches the Tesseract\nlicense for that file.\n\n\n.. [*] https://en.wikipedia.org/wiki/Mixed_raster_content\n.. [*] http://kba.cloud/hocr-spec/1.2/\n\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "Internet Archive PDF compression tools",
    "version": "1.5.4",
    "project_urls": {
        "Download": "https://github.com/internetarchive/archive-pdf-tools/archive/1.5.4.tar.gz",
        "Homepage": "https://github.com/internetarchive/archive-pdf-tools"
    },
    "split_keywords": [
        "pdf",
        "mrc",
        "hocr",
        "internet archive"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3fb1d1671336f016792266d972899d4c3d8caa65b1a1dde7e2552b7f251ddfca",
                "md5": "06032e8f34c60206bccbffc137c397aa",
                "sha256": "311a83ddb46fc622b2b717e79d53329b31e9b7d21c80484ba28db0b26ee23489"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp310-cp310-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "06032e8f34c60206bccbffc137c397aa",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6",
            "size": 146274,
            "upload_time": "2023-06-27T13:16:50",
            "upload_time_iso_8601": "2023-06-27T13:16:50.206934Z",
            "url": "https://files.pythonhosted.org/packages/3f/b1/d1671336f016792266d972899d4c3d8caa65b1a1dde7e2552b7f251ddfca/archive_pdf_tools-1.5.4-cp310-cp310-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e06dd0a49c1e5975396f387b23ec3549f36e8e5deda5f78a37b859c815e68e6",
                "md5": "b9851e9cba89446fc54ec404bb88eb30",
                "sha256": "4643cec2db3e00e5abf4f947664414d17c4bc9e14fe2d12d700a6429e3f00f42"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b9851e9cba89446fc54ec404bb88eb30",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6",
            "size": 399539,
            "upload_time": "2023-06-27T13:16:54",
            "upload_time_iso_8601": "2023-06-27T13:16:54.201639Z",
            "url": "https://files.pythonhosted.org/packages/2e/06/dd0a49c1e5975396f387b23ec3549f36e8e5deda5f78a37b859c815e68e6/archive_pdf_tools-1.5.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b854effc49bc1b723beba95d3cf59d5f2728c03db7aabd4d2bb8c415027a79db",
                "md5": "78edb756a29a265e841dc7a3321aab26",
                "sha256": "946d00f149ab7ea77c94c6edfd82f60e4089f5b86ed86df0af6c2717a7080217"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "78edb756a29a265e841dc7a3321aab26",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6",
            "size": 139508,
            "upload_time": "2023-06-27T13:16:57",
            "upload_time_iso_8601": "2023-06-27T13:16:57.174142Z",
            "url": "https://files.pythonhosted.org/packages/b8/54/effc49bc1b723beba95d3cf59d5f2728c03db7aabd4d2bb8c415027a79db/archive_pdf_tools-1.5.4-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "66f00fd0be779b579dd73570b66664d5f3a329dfd91c9669f42deb6541eabeb8",
                "md5": "901daaeeb0d784bc20a5d2cceb734154",
                "sha256": "4750a007c860ae4a3d4ee871d08491e33d524ec53c7dabd78c8891bb36502ed0"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp311-cp311-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "901daaeeb0d784bc20a5d2cceb734154",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6",
            "size": 146413,
            "upload_time": "2023-06-27T13:16:59",
            "upload_time_iso_8601": "2023-06-27T13:16:59.656274Z",
            "url": "https://files.pythonhosted.org/packages/66/f0/0fd0be779b579dd73570b66664d5f3a329dfd91c9669f42deb6541eabeb8/archive_pdf_tools-1.5.4-cp311-cp311-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "297261cbbecd069f4fd9d521330ce8acf63dc9214f6e3932ab25fbfcc4e94405",
                "md5": "13dd2eff7962d69e4420ae0c48bb8f36",
                "sha256": "0ae4f9c08eaa71179b16bdbfe49bdec09a9d50536f9ee9a8789f72539f4356ab"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "13dd2eff7962d69e4420ae0c48bb8f36",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6",
            "size": 418708,
            "upload_time": "2023-06-27T13:17:03",
            "upload_time_iso_8601": "2023-06-27T13:17:03.355534Z",
            "url": "https://files.pythonhosted.org/packages/29/72/61cbbecd069f4fd9d521330ce8acf63dc9214f6e3932ab25fbfcc4e94405/archive_pdf_tools-1.5.4-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d15dc1aaacfb1796e6dac8c1995043946b87d7f1fd1d0aae9f199526afae17b4",
                "md5": "da0ff3c449588f99e4f70b208cce8356",
                "sha256": "0558dd0e3f7269571d8b9c82e74013d7af353377a03ebd4f51cda3732feb001e"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "da0ff3c449588f99e4f70b208cce8356",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6",
            "size": 139997,
            "upload_time": "2023-06-27T13:17:05",
            "upload_time_iso_8601": "2023-06-27T13:17:05.601114Z",
            "url": "https://files.pythonhosted.org/packages/d1/5d/c1aaacfb1796e6dac8c1995043946b87d7f1fd1d0aae9f199526afae17b4/archive_pdf_tools-1.5.4-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9665fc4fd8dc98f5d7b3928ccb99de0a74990ce7233ab9071e52cc04fe264f22",
                "md5": "8af16f5858f7800a8b0e6b33a20259ea",
                "sha256": "4dae3d5bf6a0a5b211e0ef06752d5173c901fcf448878cc597925f8d1af9ea26"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp37-cp37m-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "8af16f5858f7800a8b0e6b33a20259ea",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6",
            "size": 145688,
            "upload_time": "2023-06-27T13:17:08",
            "upload_time_iso_8601": "2023-06-27T13:17:08.427944Z",
            "url": "https://files.pythonhosted.org/packages/96/65/fc4fd8dc98f5d7b3928ccb99de0a74990ce7233ab9071e52cc04fe264f22/archive_pdf_tools-1.5.4-cp37-cp37m-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "791c645c2df2e46a6dc95b003d09342f1d0247f63bb30277173e0b7181f0f8dc",
                "md5": "1502685b848f0851859ca022376552c0",
                "sha256": "b044ae62747dd57e59a019a197890dd6886e3a8cfec978e2b98e1abb3ba5628c"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "1502685b848f0851859ca022376552c0",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6",
            "size": 376701,
            "upload_time": "2023-06-27T13:17:13",
            "upload_time_iso_8601": "2023-06-27T13:17:13.859856Z",
            "url": "https://files.pythonhosted.org/packages/79/1c/645c2df2e46a6dc95b003d09342f1d0247f63bb30277173e0b7181f0f8dc/archive_pdf_tools-1.5.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b53323fc4b0b355248c0b39f4d1a27bcde9c682adb81fa0778f758e6fda63e3",
                "md5": "717b2396a3693c772c09a3b63834a8c2",
                "sha256": "126928752df9047005a2f924e6794c6a6c481713aec095b0b8a76b4d85fbd77c"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp37-cp37m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "717b2396a3693c772c09a3b63834a8c2",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6",
            "size": 140013,
            "upload_time": "2023-06-27T13:17:16",
            "upload_time_iso_8601": "2023-06-27T13:17:16.553711Z",
            "url": "https://files.pythonhosted.org/packages/9b/53/323fc4b0b355248c0b39f4d1a27bcde9c682adb81fa0778f758e6fda63e3/archive_pdf_tools-1.5.4-cp37-cp37m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "78fdf6b477502d140f7c4d9c0aa1f9f555b0fbda62defc37bdc24517c8ed1dba",
                "md5": "9d5a7825fdba05071b3f117ca1509947",
                "sha256": "a4948d664ce8bc65abe74caed835046cd1a43976238e74b82605d7df7c03a2a9"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp38-cp38-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9d5a7825fdba05071b3f117ca1509947",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 145627,
            "upload_time": "2023-06-27T13:17:19",
            "upload_time_iso_8601": "2023-06-27T13:17:19.367780Z",
            "url": "https://files.pythonhosted.org/packages/78/fd/f6b477502d140f7c4d9c0aa1f9f555b0fbda62defc37bdc24517c8ed1dba/archive_pdf_tools-1.5.4-cp38-cp38-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d030ef2cb06a7cadf25bf5a63155c1298e01b8ed1995168c00612549ce6d6e0",
                "md5": "770ceb5177a2d0cb374edab2c940e930",
                "sha256": "3ba746833c1ca8454436e5d61006c43acbd735c71f35a0b606a7fe6a8bc9d5e7"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "770ceb5177a2d0cb374edab2c940e930",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 399064,
            "upload_time": "2023-06-27T13:17:24",
            "upload_time_iso_8601": "2023-06-27T13:17:24.061491Z",
            "url": "https://files.pythonhosted.org/packages/3d/03/0ef2cb06a7cadf25bf5a63155c1298e01b8ed1995168c00612549ce6d6e0/archive_pdf_tools-1.5.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17901b068587517aa3a72e41f60672c85bbc5df970a375f94035b0ed79cccd0c",
                "md5": "10d41d92d48b88f2235a544b97eb8e2b",
                "sha256": "6618511665616e4f7458d2a51ec5f8781d1fa4e0b4c0487e4c33c6ed226a7682"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp38-cp38-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "10d41d92d48b88f2235a544b97eb8e2b",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6",
            "size": 140308,
            "upload_time": "2023-06-27T13:17:27",
            "upload_time_iso_8601": "2023-06-27T13:17:27.238351Z",
            "url": "https://files.pythonhosted.org/packages/17/90/1b068587517aa3a72e41f60672c85bbc5df970a375f94035b0ed79cccd0c/archive_pdf_tools-1.5.4-cp38-cp38-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6b3f97bf81eb626be4f2cf860a3c096f17e8e4aa6fd18c183d1be9bd3a61964",
                "md5": "b537ea2473f0a875330d5488317d26af",
                "sha256": "651910ee6a124e504288f00e43ec810af5f2a6f57d53086b95eb8ddfd6d2693b"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b537ea2473f0a875330d5488317d26af",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6",
            "size": 146239,
            "upload_time": "2023-06-27T13:17:30",
            "upload_time_iso_8601": "2023-06-27T13:17:30.222551Z",
            "url": "https://files.pythonhosted.org/packages/c6/b3/f97bf81eb626be4f2cf860a3c096f17e8e4aa6fd18c183d1be9bd3a61964/archive_pdf_tools-1.5.4-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2068d80399df26ef25a69c10840ddcb83b4a1faad6a9efa31b563e892ac8366a",
                "md5": "801e13306942f82d8beb315e9a6484fa",
                "sha256": "aeaa76a1bcc266c39796498cd77780c98dfcf34ae9594f7e29e5d18690824e7a"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "801e13306942f82d8beb315e9a6484fa",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6",
            "size": 398916,
            "upload_time": "2023-06-27T13:17:33",
            "upload_time_iso_8601": "2023-06-27T13:17:33.854866Z",
            "url": "https://files.pythonhosted.org/packages/20/68/d80399df26ef25a69c10840ddcb83b4a1faad6a9efa31b563e892ac8366a/archive_pdf_tools-1.5.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47c7775f7a2a31dab02931208abc7142eb7e9270d3a717811f67332f586374a0",
                "md5": "a975f9b1d74c40a871ac312e55437d59",
                "sha256": "2d96bca90fe011992b4d520be6fbd26ede0ffd4b052dce8e983cbee42f75bc3e"
            },
            "downloads": -1,
            "filename": "archive_pdf_tools-1.5.4-cp39-cp39-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "a975f9b1d74c40a871ac312e55437d59",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6",
            "size": 140244,
            "upload_time": "2023-06-27T13:17:36",
            "upload_time_iso_8601": "2023-06-27T13:17:36.470730Z",
            "url": "https://files.pythonhosted.org/packages/47/c7/775f7a2a31dab02931208abc7142eb7e9270d3a717811f67332f586374a0/archive_pdf_tools-1.5.4-cp39-cp39-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e41be08645a31fa2c15a37c8aa46efbab1f41dea185a5a96589c3d871af11e07",
                "md5": "94681ccdeddc8cb1c324af182e06d478",
                "sha256": "db0c3aadad12047e9195f663d28ba828c7b001a65b836123c5ccb9e75e919b04"
            },
            "downloads": -1,
            "filename": "archive-pdf-tools-1.5.4.tar.gz",
            "has_sig": false,
            "md5_digest": "94681ccdeddc8cb1c324af182e06d478",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 194605,
            "upload_time": "2023-06-27T13:17:39",
            "upload_time_iso_8601": "2023-06-27T13:17:39.709115Z",
            "url": "https://files.pythonhosted.org/packages/e4/1b/e08645a31fa2c15a37c8aa46efbab1f41dea185a5a96589c3d871af11e07/archive-pdf-tools-1.5.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-27 13:17:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "internetarchive",
    "github_project": "archive-pdf-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "archive-pdf-tools"
}
        
Elapsed time: 0.22681s