pdfreader


Namepdfreader JSON
Version 0.1.15 PyPI version JSON
download
home_pagehttp://github.com/maxpmaxp/pdfreader
SummaryPythonic API for parsing PDF files
upload_time2024-05-03 19:19:38
maintainerMaksym Polshcha
docs_urlNone
authorMaksym Polshcha
requires_python>=3.4
licenseMIT Licence
keywords pdf pdfreader pdfparser adobe parser cmap
VCS
bugtrack_url
requirements bitarray pillow pycryptodome python-dateutil
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =========
pdfreader
=========
:Info: See `the tutorials & documentation <https://pdfreader.readthedocs.io>`_ for more information.
:Author & Maintainer: Maksym Polshcha <maxp@sterch.net>

See `GitHub <https://github.com/maxpmaxp/pdfreader>`_ for the latest source.

About
=====

*pdfreader* is a Pythonic API for:
    * extracting texts, images and other data from PDF documents (plain or protected)
    * accessing different objects within PDF documents


*pdfreader* is **NOT** a tool (maybe one day it become!):
    * to create or update PDF files
    * to split PDF files into pages or other pieces
    * convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See `Tutorials & Documentation <https://pdfreader.readthedocs.io>`_.

Features
========

* Extracts texts (plain text and formatted text objects)
* Extract PDF forms data (pure strings and formatted text objects)
* Supports all PDF encodings, CMap, predefined cmaps.
* Extracts images and image masks as `Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>`_
* Supports encrypted and password-protected PDF documents
* Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
* Follows `PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>`_
* Lazy objects access allows to process huge PDF documents quite fast

Installation
============

*pdfreader* can be installed with `pip <http://pypi.python.org/pypi/pip>`_::

  $ python -m pip install pdfreader

Or ``easy_install`` from
`setuptools <http://pypi.python.org/pypi/setuptools>`_::

  $ python -m easy_install pdfreader

You can also download the project source and do::

  $ python setup.py install


Tutorial and Documentation
===========================

`Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>`_


Support, Bugs & Feature Requests
============================================

*pdfreader* uses `GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>`_ to keep track of bugs,
feature requests, etc.


Related Projects
================

* `pdfminer <https://github.com/euske/pdfminer>`_ 
* `pyPdf2 <https://github.com/py-pdf/PyPDF2>`_
* `xpdf <http://www.foolabs.com/xpdf/>`_
* `pdfbox <http://pdfbox.apache.org/>`_
* `mupdf <http://mupdf.com/>`_


References
==========

* `Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_
* `Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>`_
* `PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>`_
* `Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>`_
* `Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>`_


Donation
========
If this project is helpful, you can treat me to coffee :-)

.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif
   :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support&currency_code=USD&source=url

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/maxpmaxp/pdfreader",
    "name": "pdfreader",
    "maintainer": "Maksym Polshcha",
    "docs_url": null,
    "requires_python": ">=3.4",
    "maintainer_email": "maxp@sterch.net",
    "keywords": "pdf, pdfreader, pdfparser, adobe, parser, cmap",
    "author": "Maksym Polshcha",
    "author_email": "maxp@sterch.net",
    "download_url": "https://files.pythonhosted.org/packages/d1/a7/30d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf/pdfreader-0.1.15.tar.gz",
    "platform": null,
    "description": "=========\npdfreader\n=========\n:Info: See `the tutorials & documentation <https://pdfreader.readthedocs.io>`_ for more information.\n:Author & Maintainer: Maksym Polshcha <maxp@sterch.net>\n\nSee `GitHub <https://github.com/maxpmaxp/pdfreader>`_ for the latest source.\n\nAbout\n=====\n\n*pdfreader* is a Pythonic API for:\n    * extracting texts, images and other data from PDF documents (plain or protected)\n    * accessing different objects within PDF documents\n\n\n*pdfreader* is **NOT** a tool (maybe one day it become!):\n    * to create or update PDF files\n    * to split PDF files into pages or other pieces\n    * convert PDFs to any other format\n\nNevertheless it can be used as a part of such tools.\n\nSee `Tutorials & Documentation <https://pdfreader.readthedocs.io>`_.\n\nFeatures\n========\n\n* Extracts texts (plain text and formatted text objects)\n* Extract PDF forms data (pure strings and formatted text objects)\n* Supports all PDF encodings, CMap, predefined cmaps.\n* Extracts images and image masks as `Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>`_\n* Supports encrypted and password-protected PDF documents\n* Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)\n* Follows `PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>`_\n* Lazy objects access allows to process huge PDF documents quite fast\n\nInstallation\n============\n\n*pdfreader* can be installed with `pip <http://pypi.python.org/pypi/pip>`_::\n\n  $ python -m pip install pdfreader\n\nOr ``easy_install`` from\n`setuptools <http://pypi.python.org/pypi/setuptools>`_::\n\n  $ python -m easy_install pdfreader\n\nYou can also download the project source and do::\n\n  $ python setup.py install\n\n\nTutorial and Documentation\n===========================\n\n`Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>`_\n\n\nSupport, Bugs & Feature Requests\n============================================\n\n*pdfreader* uses `GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>`_ to keep track of bugs,\nfeature requests, etc.\n\n\nRelated Projects\n================\n\n* `pdfminer <https://github.com/euske/pdfminer>`_ \n* `pyPdf2 <https://github.com/py-pdf/PyPDF2>`_\n* `xpdf <http://www.foolabs.com/xpdf/>`_\n* `pdfbox <http://pdfbox.apache.org/>`_\n* `mupdf <http://mupdf.com/>`_\n\n\nReferences\n==========\n\n* `Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_\n* `Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>`_\n* `PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>`_\n* `Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>`_\n* `Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>`_\n\n\nDonation\n========\nIf this project is helpful, you can treat me to coffee :-)\n\n.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif\n   :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support&currency_code=USD&source=url\n",
    "bugtrack_url": null,
    "license": "MIT Licence",
    "summary": "Pythonic API for parsing PDF files",
    "version": "0.1.15",
    "project_urls": {
        "Homepage": "http://github.com/maxpmaxp/pdfreader"
    },
    "split_keywords": [
        "pdf",
        " pdfreader",
        " pdfparser",
        " adobe",
        " parser",
        " cmap"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4966b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342",
                "md5": "b3f22d940a73cf4bf1b5c34e5680c544",
                "sha256": "bfd0d29c0d70a81b767b42b6959dd588a8290086c8c72a828739c1e2bda07eba"
            },
            "downloads": -1,
            "filename": "pdfreader-0.1.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b3f22d940a73cf4bf1b5c34e5680c544",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.4",
            "size": 135646,
            "upload_time": "2024-05-03T19:19:35",
            "upload_time_iso_8601": "2024-05-03T19:19:35.921770Z",
            "url": "https://files.pythonhosted.org/packages/49/66/b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342/pdfreader-0.1.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1a730d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf",
                "md5": "16934adfd2b9d1bdc86b965c40b6eb44",
                "sha256": "2ee1252cc5f21a2f8cadb458decd85c1313271abb5bac1e4363a3e0e17e2dd87"
            },
            "downloads": -1,
            "filename": "pdfreader-0.1.15.tar.gz",
            "has_sig": false,
            "md5_digest": "16934adfd2b9d1bdc86b965c40b6eb44",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4",
            "size": 2655995,
            "upload_time": "2024-05-03T19:19:38",
            "upload_time_iso_8601": "2024-05-03T19:19:38.279340Z",
            "url": "https://files.pythonhosted.org/packages/d1/a7/30d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf/pdfreader-0.1.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-03 19:19:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maxpmaxp",
    "github_project": "pdfreader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "bitarray",
            "specs": [
                [
                    ">=",
                    "2.9.2"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    ">=",
                    "7.1.0"
                ]
            ]
        },
        {
            "name": "pycryptodome",
            "specs": [
                [
                    "==",
                    "3.19.1"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.1"
                ]
            ]
        }
    ],
    "lcname": "pdfreader"
}
        
Elapsed time: 0.40894s