=========
pdfreader
=========
:Info: See `the tutorials & documentation <https://pdfreader.readthedocs.io>`_ for more information.
:Author & Maintainer: Maksym Polshcha <maxp@sterch.net>
See `GitHub <https://github.com/maxpmaxp/pdfreader>`_ for the latest source.
About
=====
*pdfreader* is a Pythonic API for:
* extracting texts, images and other data from PDF documents (plain or protected)
* accessing different objects within PDF documents
*pdfreader* is **NOT** a tool (maybe one day it become!):
* to create or update PDF files
* to split PDF files into pages or other pieces
* convert PDFs to any other format
Nevertheless it can be used as a part of such tools.
See `Tutorials & Documentation <https://pdfreader.readthedocs.io>`_.
Features
========
* Extracts texts (plain text and formatted text objects)
* Extract PDF forms data (pure strings and formatted text objects)
* Supports all PDF encodings, CMap, predefined cmaps.
* Extracts images and image masks as `Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>`_
* Supports encrypted and password-protected PDF documents
* Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
* Follows `PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>`_
* Lazy objects access allows to process huge PDF documents quite fast
Installation
============
*pdfreader* can be installed with `pip <http://pypi.python.org/pypi/pip>`_::
$ python -m pip install pdfreader
Or ``easy_install`` from
`setuptools <http://pypi.python.org/pypi/setuptools>`_::
$ python -m easy_install pdfreader
You can also download the project source and do::
$ python setup.py install
Tutorial and Documentation
===========================
`Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>`_
Support, Bugs & Feature Requests
============================================
*pdfreader* uses `GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>`_ to keep track of bugs,
feature requests, etc.
Related Projects
================
* `pdfminer <https://github.com/euske/pdfminer>`_
* `pyPdf2 <https://github.com/py-pdf/PyPDF2>`_
* `xpdf <http://www.foolabs.com/xpdf/>`_
* `pdfbox <http://pdfbox.apache.org/>`_
* `mupdf <http://mupdf.com/>`_
References
==========
* `Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_
* `Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>`_
* `PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>`_
* `Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>`_
* `Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>`_
Donation
========
If this project is helpful, you can treat me to coffee :-)
.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif
:target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support¤cy_code=USD&source=url
Raw data
{
"_id": null,
"home_page": "http://github.com/maxpmaxp/pdfreader",
"name": "pdfreader",
"maintainer": "Maksym Polshcha",
"docs_url": null,
"requires_python": ">=3.4",
"maintainer_email": "maxp@sterch.net",
"keywords": "pdf, pdfreader, pdfparser, adobe, parser, cmap",
"author": "Maksym Polshcha",
"author_email": "maxp@sterch.net",
"download_url": "https://files.pythonhosted.org/packages/d1/a7/30d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf/pdfreader-0.1.15.tar.gz",
"platform": null,
"description": "=========\npdfreader\n=========\n:Info: See `the tutorials & documentation <https://pdfreader.readthedocs.io>`_ for more information.\n:Author & Maintainer: Maksym Polshcha <maxp@sterch.net>\n\nSee `GitHub <https://github.com/maxpmaxp/pdfreader>`_ for the latest source.\n\nAbout\n=====\n\n*pdfreader* is a Pythonic API for:\n * extracting texts, images and other data from PDF documents (plain or protected)\n * accessing different objects within PDF documents\n\n\n*pdfreader* is **NOT** a tool (maybe one day it become!):\n * to create or update PDF files\n * to split PDF files into pages or other pieces\n * convert PDFs to any other format\n\nNevertheless it can be used as a part of such tools.\n\nSee `Tutorials & Documentation <https://pdfreader.readthedocs.io>`_.\n\nFeatures\n========\n\n* Extracts texts (plain text and formatted text objects)\n* Extract PDF forms data (pure strings and formatted text objects)\n* Supports all PDF encodings, CMap, predefined cmaps.\n* Extracts images and image masks as `Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>`_\n* Supports encrypted and password-protected PDF documents\n* Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)\n* Follows `PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>`_\n* Lazy objects access allows to process huge PDF documents quite fast\n\nInstallation\n============\n\n*pdfreader* can be installed with `pip <http://pypi.python.org/pypi/pip>`_::\n\n $ python -m pip install pdfreader\n\nOr ``easy_install`` from\n`setuptools <http://pypi.python.org/pypi/setuptools>`_::\n\n $ python -m easy_install pdfreader\n\nYou can also download the project source and do::\n\n $ python setup.py install\n\n\nTutorial and Documentation\n===========================\n\n`Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>`_\n\n\nSupport, Bugs & Feature Requests\n============================================\n\n*pdfreader* uses `GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>`_ to keep track of bugs,\nfeature requests, etc.\n\n\nRelated Projects\n================\n\n* `pdfminer <https://github.com/euske/pdfminer>`_ \n* `pyPdf2 <https://github.com/py-pdf/PyPDF2>`_\n* `xpdf <http://www.foolabs.com/xpdf/>`_\n* `pdfbox <http://pdfbox.apache.org/>`_\n* `mupdf <http://mupdf.com/>`_\n\n\nReferences\n==========\n\n* `Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_\n* `Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>`_\n* `PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>`_\n* `Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>`_\n* `Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>`_\n\n\nDonation\n========\nIf this project is helpful, you can treat me to coffee :-)\n\n.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif\n :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support¤cy_code=USD&source=url\n",
"bugtrack_url": null,
"license": "MIT Licence",
"summary": "Pythonic API for parsing PDF files",
"version": "0.1.15",
"project_urls": {
"Homepage": "http://github.com/maxpmaxp/pdfreader"
},
"split_keywords": [
"pdf",
" pdfreader",
" pdfparser",
" adobe",
" parser",
" cmap"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4966b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342",
"md5": "b3f22d940a73cf4bf1b5c34e5680c544",
"sha256": "bfd0d29c0d70a81b767b42b6959dd588a8290086c8c72a828739c1e2bda07eba"
},
"downloads": -1,
"filename": "pdfreader-0.1.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b3f22d940a73cf4bf1b5c34e5680c544",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.4",
"size": 135646,
"upload_time": "2024-05-03T19:19:35",
"upload_time_iso_8601": "2024-05-03T19:19:35.921770Z",
"url": "https://files.pythonhosted.org/packages/49/66/b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342/pdfreader-0.1.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d1a730d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf",
"md5": "16934adfd2b9d1bdc86b965c40b6eb44",
"sha256": "2ee1252cc5f21a2f8cadb458decd85c1313271abb5bac1e4363a3e0e17e2dd87"
},
"downloads": -1,
"filename": "pdfreader-0.1.15.tar.gz",
"has_sig": false,
"md5_digest": "16934adfd2b9d1bdc86b965c40b6eb44",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4",
"size": 2655995,
"upload_time": "2024-05-03T19:19:38",
"upload_time_iso_8601": "2024-05-03T19:19:38.279340Z",
"url": "https://files.pythonhosted.org/packages/d1/a7/30d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf/pdfreader-0.1.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-03 19:19:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "maxpmaxp",
"github_project": "pdfreader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "bitarray",
"specs": [
[
">=",
"2.9.2"
]
]
},
{
"name": "pillow",
"specs": [
[
">=",
"7.1.0"
]
]
},
{
"name": "pycryptodome",
"specs": [
[
"==",
"3.19.1"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.8.1"
]
]
}
],
"lcname": "pdfreader"
}