pdf-essentials


Namepdf-essentials JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryAn easy-to-use Python library for annotating, manipulating and processing PDF files
upload_time2024-08-17 11:20:35
maintainerNone
docs_urlNone
authorOren Halvani
requires_python>=3.9
licenseNone
keywords pdf-annotation pdf-converter pdf-crop-margins pdf-highlighting pdf-manipulation pdf-manipulation-utitilies pdf-merger pdf-redaction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <p>
    <a href="#"><img src="https://github.com/Halvani/pdf-essentials/blob/main/images/redaction.jpg" alt="PDF-Essentials logo"/></a>
  </p>
</div>

# PDF-Essentials
An easy-to-use Python library for annotating, manipulating and processing PDF files.


## Description
PDF-Essentials was born out of the idea to bundle both common and advanced PDF manipulation and processing functions into a single library. It builds on [pikepdf](https://github.com/pikepdf/pikepdf), [fitz](https://github.com/pymupdf/PyMuPDF), [PyPDF2/pypdf](https://github.com/py-pdf/pypdf) and [reportlab](https://pypi.org/project/reportlab/) and allows you to perform a variety of PDF editing tasks, including redacting, annotating, extracting, cropping, deleting, splitting and others, without having to deal with the details required when using low-level PDF libraries.


## Installation
The easiest way to install PDF-Essentials is to use pip, where you can choose between PyPI and this repository: 

- ```pip install pdf-essentials```
- ```pip install git+https://github.com/Halvani/pdf-essentials.git```

The latter will pull and install the latest commit from this repository as well as the required Python dependencies. Note that the repo is updated regulary, while PyPi-packages are less frequently released (primarily after mayor bugfixing, refactoring, etc.).


## Features
PDF-Essentials offers both standard and advanced functions for processing PDF files. The following functions are currently covered:

- **Numbering**: Insert page numbers at three possible positions at the bottom: left, center or right

- **Extraction**: Extract specified pages from a PDF and save the result to a new PDF file

- **Range splitting**: Split a PDF file into several parts using a list of page ranges and save the parts in the specified output folder

- **Merging**: merge a list of PDF files into a single PDF file in the specified order

- **Deletion**: delete specified pages from a PDF file

- **Rotation**: rotate pages in a PDF file based on global or page-wise rotation setting(s)

- **Reordering**: arrange the pages in a PDF file in a specific order

- **Metadata removal**: strip all metadata from a PDF file and save the modified content in a new PDF file

- **Cropping**: crop pages of a PDF based on the given (individual/global) margins and saves the result to a new PDF file

- **Redaction**: anonymize and rasterize words/substrings in a PDF file by redacting them with an overlay color ([RGB](https://en.wikipedia.org/wiki/RGB_color_model), [HTML color codes](https://htmlcolorcodes.com/), [Web colors](https://en.wikipedia.org/wiki/Web_colors))

- **Copy enabling**: remove restrictions in a PDF file that prevent text from being copied

- **Form field protection**: take a PDF with filled form fields and make it read-only

- **PDF/A Conversion/Validation**: convert PDF to PDF/A format (requires [ghostscript](https://ghostscript.com/releases/gsdnld.html)) or validate if a given PDF complies with PDF/A

- **Highlighting**: highlights occurrences of substrings, whole words or regex patterns in a PDF file and adds comments to the highlighted text along with a background color


## Limitations / Design Considerations:
PDF-Essentials comes with several limitations:

- *Form field protection* does not currently work for all tested PDFs and is still being investigated. Therefore, you do not need to open an issue in this regard.

- Currently, all input/output operations are based on the file system. In the future, it is planned to integrate an additional in-memory mechanism so that the functions can operate directly on byte streams, which in turn will enable reasonable pipelining. However, this mechanism is a little tricky and requires a well thought-out design, which again demands time. So please be patient 🙏


## Disclaimer
Although this project has been carried out with great care, no liability is accepted for the completeness and accuracy of all the underlying data. The use of PDF-Essentials for integration into production systems is at your own risk!

Furthermore, please note that this project is still in its initial phase. The code structure may therefore change over time.


## License
The PDF-Essentials package is released under the Apache-2.0 license. See <a href="https://github.com/Halvani/pdf-essentials/blob/main/LICENSE">LICENSE</a> for further details.


## Last Remarks
As is usual with open source projects, we developers do not earn any money with what we do, but are primarily interested in giving something back to the community with fun, passion and joy. Nevertheless, we would be very happy if you rewarded all the time that has gone into the project with just a small star 🤗
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdf-essentials",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pdf-annotation, pdf-converter, pdf-crop-margins, pdf-highlighting, pdf-manipulation, pdf-manipulation-utitilies, pdf-merger, pdf-redaction",
    "author": "Oren Halvani",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8f/25/ba1aa9098524586b9c6551fe4917e2475e000bd8e8d91b22f8d992ceed69/pdf_essentials-0.0.3.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <p>\n    <a href=\"#\"><img src=\"https://github.com/Halvani/pdf-essentials/blob/main/images/redaction.jpg\" alt=\"PDF-Essentials logo\"/></a>\n  </p>\n</div>\n\n# PDF-Essentials\nAn easy-to-use Python library for annotating, manipulating and processing PDF files.\n\n\n## Description\nPDF-Essentials was born out of the idea to bundle both common and advanced PDF manipulation and processing functions into a single library. It builds on [pikepdf](https://github.com/pikepdf/pikepdf), [fitz](https://github.com/pymupdf/PyMuPDF), [PyPDF2/pypdf](https://github.com/py-pdf/pypdf) and [reportlab](https://pypi.org/project/reportlab/) and allows you to perform a variety of PDF editing tasks, including redacting, annotating, extracting, cropping, deleting, splitting and others, without having to deal with the details required when using low-level PDF libraries.\n\n\n## Installation\nThe easiest way to install PDF-Essentials is to use pip, where you can choose between PyPI and this repository: \n\n- ```pip install pdf-essentials```\n- ```pip install git+https://github.com/Halvani/pdf-essentials.git```\n\nThe latter will pull and install the latest commit from this repository as well as the required Python dependencies. Note that the repo is updated regulary, while PyPi-packages are less frequently released (primarily after mayor bugfixing, refactoring, etc.).\n\n\n## Features\nPDF-Essentials offers both standard and advanced functions for processing PDF files. The following functions are currently covered:\n\n- **Numbering**: Insert page numbers at three possible positions at the bottom: left, center or right\n\n- **Extraction**: Extract specified pages from a PDF and save the result to a new PDF file\n\n- **Range splitting**: Split a PDF file into several parts using a list of page ranges and save the parts in the specified output folder\n\n- **Merging**: merge a list of PDF files into a single PDF file in the specified order\n\n- **Deletion**: delete specified pages from a PDF file\n\n- **Rotation**: rotate pages in a PDF file based on global or page-wise rotation setting(s)\n\n- **Reordering**: arrange the pages in a PDF file in a specific order\n\n- **Metadata removal**: strip all metadata from a PDF file and save the modified content in a new PDF file\n\n- **Cropping**: crop pages of a PDF based on the given (individual/global) margins and saves the result to a new PDF file\n\n- **Redaction**: anonymize and rasterize words/substrings in a PDF file by redacting them with an overlay color ([RGB](https://en.wikipedia.org/wiki/RGB_color_model), [HTML color codes](https://htmlcolorcodes.com/), [Web colors](https://en.wikipedia.org/wiki/Web_colors))\n\n- **Copy enabling**: remove restrictions in a PDF file that prevent text from being copied\n\n- **Form field protection**: take a PDF with filled form fields and make it read-only\n\n- **PDF/A Conversion/Validation**: convert PDF to PDF/A format (requires [ghostscript](https://ghostscript.com/releases/gsdnld.html)) or validate if a given PDF complies with PDF/A\n\n- **Highlighting**: highlights occurrences of substrings, whole words or regex patterns in a PDF file and adds comments to the highlighted text along with a background color\n\n\n## Limitations / Design Considerations:\nPDF-Essentials comes with several limitations:\n\n- *Form field protection* does not currently work for all tested PDFs and is still being investigated. Therefore, you do not need to open an issue in this regard.\n\n- Currently, all input/output operations are based on the file system. In the future, it is planned to integrate an additional in-memory mechanism so that the functions can operate directly on byte streams, which in turn will enable reasonable pipelining. However, this mechanism is a little tricky and requires a well thought-out design, which again demands time. So please be patient \ud83d\ude4f\n\n\n## Disclaimer\nAlthough this project has been carried out with great care, no liability is accepted for the completeness and accuracy of all the underlying data. The use of PDF-Essentials for integration into production systems is at your own risk!\n\nFurthermore, please note that this project is still in its initial phase. The code structure may therefore change over time.\n\n\n## License\nThe PDF-Essentials package is released under the Apache-2.0 license. See <a href=\"https://github.com/Halvani/pdf-essentials/blob/main/LICENSE\">LICENSE</a> for further details.\n\n\n## Last Remarks\nAs is usual with open source projects, we developers do not earn any money with what we do, but are primarily interested in giving something back to the community with fun, passion and joy. Nevertheless, we would be very happy if you rewarded all the time that has gone into the project with just a small star \ud83e\udd17",
    "bugtrack_url": null,
    "license": null,
    "summary": "An easy-to-use Python library for annotating, manipulating and processing PDF files",
    "version": "0.0.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/Halvani/pdf-essentials/issues",
        "Homepage": "https://github.com/Halvani/pdf-essentials"
    },
    "split_keywords": [
        "pdf-annotation",
        " pdf-converter",
        " pdf-crop-margins",
        " pdf-highlighting",
        " pdf-manipulation",
        " pdf-manipulation-utitilies",
        " pdf-merger",
        " pdf-redaction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3da94828ed6c57ec5e8295e551fcf3e2e713608e41113c69e7b6cf1629acc3d1",
                "md5": "46dd974acc085ec32d5efbcbd673f3b6",
                "sha256": "ae4de36fe0fe3602b8da7e4e86f83485ba4ade37cc720ff2ccdc2f2440b6ae96"
            },
            "downloads": -1,
            "filename": "pdf_essentials-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "46dd974acc085ec32d5efbcbd673f3b6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18115,
            "upload_time": "2024-08-17T11:20:33",
            "upload_time_iso_8601": "2024-08-17T11:20:33.937042Z",
            "url": "https://files.pythonhosted.org/packages/3d/a9/4828ed6c57ec5e8295e551fcf3e2e713608e41113c69e7b6cf1629acc3d1/pdf_essentials-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f25ba1aa9098524586b9c6551fe4917e2475e000bd8e8d91b22f8d992ceed69",
                "md5": "4938c1f3def1780cbb63d4ef5c0a66e8",
                "sha256": "a931795d17be3c75f27ad72432f8fb6ff63524bbfd1cdf06d24b6bd539ee6208"
            },
            "downloads": -1,
            "filename": "pdf_essentials-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "4938c1f3def1780cbb63d4ef5c0a66e8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 1104120,
            "upload_time": "2024-08-17T11:20:35",
            "upload_time_iso_8601": "2024-08-17T11:20:35.534557Z",
            "url": "https://files.pythonhosted.org/packages/8f/25/ba1aa9098524586b9c6551fe4917e2475e000bd8e8d91b22f8d992ceed69/pdf_essentials-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-17 11:20:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Halvani",
    "github_project": "pdf-essentials",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pdf-essentials"
}
        
Elapsed time: 0.36550s