file-crusher


Namefile-crusher JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/pilip-d/FileCrusher.git
SummaryCompresses PDFs with PNG compression.
upload_time2024-05-20 10:47:21
maintainerNone
docs_urlNone
authorPhilip Dell
requires_python<4,>=3.6
licenseNone
keywords compression pdf png goodnotes file crusher crunch compress size limit
VCS
bugtrack_url
requirements img2pdf pillow PyMuPDF pytesseract
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # File Crusher
Compresses PDFs with PNG compression.

Tired of bumping against upload size limits? This tool is perfect to compress PDFs and PNGs by combining some of the best compression tools in one.
While it can be slow, it really crushes your filesize and helps you to conquer the relentless 5MB upload limit.

It works by splitting up a PDF into PNGs and compress these with advpng, pngcrush and pngquant. Then it combines them back into a PDF and applies a round of lossless pdf compression. Optionally it can apply OCR - Optical Character Recognition to make a scanned PDF searchable.
Additionally, it exposes internal processors enabling you to use it as png compressor and file converter.

## Installation

### 1. Install the python library

```bash
pip install file-crusher
```

### 2. Install the Compression Tools

#### windows
already pre-installed in compressor_lib directory

#### Linux(ubuntu)

```bash
sudo apt install pngquant -y && sudo apt install advancecomp -y && sudo apt install pngcrush -y
```
and install wine for cpdfsqueeze
```bash
apt install wine -y
```

### 3. optionally Install pytesseract for OCR

#### For Windows via GUI
Download and Install [Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)
Select Additional Languages that you want. (f.e German under Additional Language Data)

#### Linux
```bash
apt install tesseract-ocr
```

add additional language packs
```bash
apt install tesseract-ocr-<language-shortform> -y
```

example for german
```bash
apt install tesseract-ocr-deu -y
```

## Usage

### CLI Usage
```bash
# for pdfs
python3 -m file_crusher input.pdf output.pdf --pdfcompressor
# or for pngs
python3 -m file_crusher input.png output.png --pngcompressor
# for other processors see
python3 -m file_crusher --help
```

### Python Usage
```python3
from file_crusher import PNGCompressor, PDFCompressor

compressor = PNGCompressor()
compressor.process_file("input.png", "output.png")

# extreme mode
compressor = PNGCompressor(0)
compressor.process_file("input.png", "output.png")

# fast mode
compressor = PNGCompressor(5)
compressor.process_file("input.png", "output.png")

# also check the other options
compressor = PDFCompressor(default_pdf_dpi=200)
compressor.process_file("input.pdf", "output.pdf")
```

## Disclaimer

It's important to note that lossy compression results in loss of quality or data.
Therefore, it's always a good idea to test the output file to make sure it meets your requirements.

If you encounter any challenges while using the library or have suggestions for its improvement, I invite you to please create an issue. [https://github.com/pIlIp-d/FileCrusher/issues](https://github.com/pIlIp-d/FileCrusher/issues)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pilip-d/FileCrusher.git",
    "name": "file-crusher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.6",
    "maintainer_email": null,
    "keywords": "compression pdf png GoodNotes file crusher crunch compress size limit",
    "author": "Philip Dell",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e3/11/ee2bbc78a74adddd44c9d6ed82ed892361a9ec17e1aff3388ab1001b7cc0/file_crusher-0.2.1.tar.gz",
    "platform": null,
    "description": "# File Crusher\nCompresses PDFs with PNG compression.\n\nTired of bumping against upload size limits? This tool is perfect to compress PDFs and PNGs by combining some of the best compression tools in one.\nWhile it can be slow, it really crushes your filesize and helps you to conquer the relentless 5MB upload limit.\n\nIt works by splitting up a PDF into PNGs and compress these with advpng, pngcrush and pngquant. Then it combines them back into a PDF and applies a round of lossless pdf compression. Optionally it can apply OCR - Optical Character Recognition to make a scanned PDF searchable.\nAdditionally, it exposes internal processors enabling you to use it as png compressor and file converter.\n\n## Installation\n\n### 1. Install the python library\n\n```bash\npip install file-crusher\n```\n\n### 2. Install the Compression Tools\n\n#### windows\nalready pre-installed in compressor_lib directory\n\n#### Linux(ubuntu)\n\n```bash\nsudo apt install pngquant -y && sudo apt install advancecomp -y && sudo apt install pngcrush -y\n```\nand install wine for cpdfsqueeze\n```bash\napt install wine -y\n```\n\n### 3. optionally Install pytesseract for OCR\n\n#### For Windows via GUI\nDownload and Install [Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)\nSelect Additional Languages that you want. (f.e German under Additional Language Data)\n\n#### Linux\n```bash\napt install tesseract-ocr\n```\n\nadd additional language packs\n```bash\napt install tesseract-ocr-<language-shortform> -y\n```\n\nexample for german\n```bash\napt install tesseract-ocr-deu -y\n```\n\n## Usage\n\n### CLI Usage\n```bash\n# for pdfs\npython3 -m file_crusher input.pdf output.pdf --pdfcompressor\n# or for pngs\npython3 -m file_crusher input.png output.png --pngcompressor\n# for other processors see\npython3 -m file_crusher --help\n```\n\n### Python Usage\n```python3\nfrom file_crusher import PNGCompressor, PDFCompressor\n\ncompressor = PNGCompressor()\ncompressor.process_file(\"input.png\", \"output.png\")\n\n# extreme mode\ncompressor = PNGCompressor(0)\ncompressor.process_file(\"input.png\", \"output.png\")\n\n# fast mode\ncompressor = PNGCompressor(5)\ncompressor.process_file(\"input.png\", \"output.png\")\n\n# also check the other options\ncompressor = PDFCompressor(default_pdf_dpi=200)\ncompressor.process_file(\"input.pdf\", \"output.pdf\")\n```\n\n## Disclaimer\n\nIt's important to note that lossy compression results in loss of quality or data.\nTherefore, it's always a good idea to test the output file to make sure it meets your requirements.\n\nIf you encounter any challenges while using the library or have suggestions for its improvement, I invite you to please create an issue. [https://github.com/pIlIp-d/FileCrusher/issues](https://github.com/pIlIp-d/FileCrusher/issues)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Compresses PDFs with PNG compression.",
    "version": "0.2.1",
    "project_urls": {
        "Bug Reports": "https://github.com/pIlIp-d/FileCrusher/issues",
        "Homepage": "https://github.com/pilip-d/FileCrusher.git",
        "Source": "https://github.com/pilip-d/FileCrusher.git"
    },
    "split_keywords": [
        "compression",
        "pdf",
        "png",
        "goodnotes",
        "file",
        "crusher",
        "crunch",
        "compress",
        "size",
        "limit"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "144b0dc5a55d6ac93a5297dd61553e57b367bde2eb747e5d19ec74ad24146c06",
                "md5": "4b76dc06a6a673a31c75d5530b80bb43",
                "sha256": "292a821df7a8a7c40699a7af2caabd91aa67b53e61daf4d058c4d31e3ee7ead6"
            },
            "downloads": -1,
            "filename": "file_crusher-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4b76dc06a6a673a31c75d5530b80bb43",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.6",
            "size": 2057976,
            "upload_time": "2024-05-20T10:47:17",
            "upload_time_iso_8601": "2024-05-20T10:47:17.001188Z",
            "url": "https://files.pythonhosted.org/packages/14/4b/0dc5a55d6ac93a5297dd61553e57b367bde2eb747e5d19ec74ad24146c06/file_crusher-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e311ee2bbc78a74adddd44c9d6ed82ed892361a9ec17e1aff3388ab1001b7cc0",
                "md5": "9829aa24aa6ead80978ed2d508113e1f",
                "sha256": "ca451c2ccdcc5605ff10bcc38610477e1e7722f965dddc3eb68dc1b0a003f7a8"
            },
            "downloads": -1,
            "filename": "file_crusher-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9829aa24aa6ead80978ed2d508113e1f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.6",
            "size": 2046175,
            "upload_time": "2024-05-20T10:47:21",
            "upload_time_iso_8601": "2024-05-20T10:47:21.302333Z",
            "url": "https://files.pythonhosted.org/packages/e3/11/ee2bbc78a74adddd44c9d6ed82ed892361a9ec17e1aff3388ab1001b7cc0/file_crusher-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-20 10:47:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pilip-d",
    "github_project": "FileCrusher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "img2pdf",
            "specs": [
                [
                    ">=",
                    "0.1.4"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    ">=",
                    "10.2.0"
                ]
            ]
        },
        {
            "name": "PyMuPDF",
            "specs": [
                [
                    ">=",
                    "1.22.0"
                ]
            ]
        },
        {
            "name": "pytesseract",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ]
            ]
        }
    ],
    "lcname": "file-crusher"
}
        
Elapsed time: 0.24401s