lemonpdf


Namelemonpdf JSON
Version 2.0.0 PyPI version JSON
download
home_pageNone
SummaryPython3 library to get urls from PDF files.
upload_time2024-08-07 15:05:00
maintainerNone
docs_urlNone
authorzudefoque
requires_python>=3.7
licenseMIT license
keywords pdf extractor cli tools
VCS
bugtrack_url
requirements packaging pdf2image pillow PyMuPDF PyMuPDFb pytesseract
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # lemonpdf

![PyPI - Downloads](https://img.shields.io/pypi/dm/lemonpdf)
![PyPI - License](https://img.shields.io/pypi/l/lemonpdf)
![GitHub Tag](https://img.shields.io/github/v/tag/JuanBindez/lemonpdf?include_prereleases)
<a href="https://pypi.org/project/lemonpdf/"><img src="https://img.shields.io/pypi/v/lemonpdf" /></a>

### Python3 library to get urls from PDF files.


### Install
    sudo apt install tesseract-ocr poppler-utils
    pip install lemonpdf

### Quickstart


### Command line interface use (CLI)

#### get urls

    lemonpdf -u file.pdf

#### save urls list in file txt

    lemonpdf -u file.pdf -o urls.txt -s

#### get domains

    lemonpdf -d file.pdf

#### save domains in file txt

    lemonpdf -d file.pdf -o domains.txt -s

### scripts

#### get urls and save file txt

```python

from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'out_file.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_urls(save=True)

print(urls)


```

#### get domains and save file txt

```python
from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'domains.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_domains(save=True)

print(urls)


```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lemonpdf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "PDF, Extractor, cli, tools",
    "author": "zudefoque",
    "author_email": "Juan Bindez <juanbindez780@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c6/ab/511613ea8e3f8b638c2bfa0d7b8110cf78949d2ce10677b55670c60845de/lemonpdf-2.0.0.tar.gz",
    "platform": null,
    "description": "# lemonpdf\n\n![PyPI - Downloads](https://img.shields.io/pypi/dm/lemonpdf)\n![PyPI - License](https://img.shields.io/pypi/l/lemonpdf)\n![GitHub Tag](https://img.shields.io/github/v/tag/JuanBindez/lemonpdf?include_prereleases)\n<a href=\"https://pypi.org/project/lemonpdf/\"><img src=\"https://img.shields.io/pypi/v/lemonpdf\" /></a>\n\n### Python3 library to get urls from PDF files.\n\n\n### Install\n    sudo apt install tesseract-ocr poppler-utils\n    pip install lemonpdf\n\n### Quickstart\n\n\n### Command line interface use (CLI)\n\n#### get urls\n\n    lemonpdf -u file.pdf\n\n#### save urls list in file txt\n\n    lemonpdf -u file.pdf -o urls.txt -s\n\n#### get domains\n\n    lemonpdf -d file.pdf\n\n#### save domains in file txt\n\n    lemonpdf -d file.pdf -o domains.txt -s\n\n### scripts\n\n#### get urls and save file txt\n\n```python\n\nfrom lemonpdf import Extractor\n\npdf_path = 'file.pdf'\noutput_txt_path = 'out_file.txt'\n\nextractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)\n\nurls = extractor.extract_urls(save=True)\n\nprint(urls)\n\n\n```\n\n#### get domains and save file txt\n\n```python\nfrom lemonpdf import Extractor\n\npdf_path = 'file.pdf'\noutput_txt_path = 'domains.txt'\n\nextractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)\n\nurls = extractor.extract_domains(save=True)\n\nprint(urls)\n\n\n```\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Python3 library to get urls from PDF files.",
    "version": "2.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/juanbindez/lemonpdf/issues",
        "Homepage": "https://github.com/juanbindez/lemonpdf",
        "Read the Docs": "http://lemonpdf.readthedocs.io/"
    },
    "split_keywords": [
        "pdf",
        " extractor",
        " cli",
        " tools"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5422f99684bf557dfe3bf5cc6e5d855acacd3cbd696635d72bbafbf2492b161b",
                "md5": "eeb4d3aafdf1a726607295871cdb6d56",
                "sha256": "c96d225be0257320209efb0beab58232e8021b570e0363498b6ca18099e853ea"
            },
            "downloads": -1,
            "filename": "lemonpdf-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eeb4d3aafdf1a726607295871cdb6d56",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 4500,
            "upload_time": "2024-08-07T15:04:59",
            "upload_time_iso_8601": "2024-08-07T15:04:59.262955Z",
            "url": "https://files.pythonhosted.org/packages/54/22/f99684bf557dfe3bf5cc6e5d855acacd3cbd696635d72bbafbf2492b161b/lemonpdf-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6ab511613ea8e3f8b638c2bfa0d7b8110cf78949d2ce10677b55670c60845de",
                "md5": "8dafc160abc42c0527880ff19ac9db36",
                "sha256": "15852f44f492e9b5a2772349c7a7afa37e0c5bb25a10bad3bf343ab2b0b54a6b"
            },
            "downloads": -1,
            "filename": "lemonpdf-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8dafc160abc42c0527880ff19ac9db36",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3877,
            "upload_time": "2024-08-07T15:05:00",
            "upload_time_iso_8601": "2024-08-07T15:05:00.985372Z",
            "url": "https://files.pythonhosted.org/packages/c6/ab/511613ea8e3f8b638c2bfa0d7b8110cf78949d2ce10677b55670c60845de/lemonpdf-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-07 15:05:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "juanbindez",
    "github_project": "lemonpdf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "packaging",
            "specs": []
        },
        {
            "name": "pdf2image",
            "specs": []
        },
        {
            "name": "pillow",
            "specs": []
        },
        {
            "name": "PyMuPDF",
            "specs": []
        },
        {
            "name": "PyMuPDFb",
            "specs": []
        },
        {
            "name": "pytesseract",
            "specs": []
        }
    ],
    "lcname": "lemonpdf"
}
        
Elapsed time: 1.91735s