pdfdol


Namepdfdol JSON
Version 0.1.21 PyPI version JSON
download
home_pageNone
SummaryData Object Layer for PDF data
upload_time2025-09-08 12:42:51
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licensemit
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pdfdol

Data Object Layer for PDF data

To install:	```pip install pdfdol```

[Documentation](https://i2mint.github.io/pdfdol/)


# Examples

## Pdf "Stores"

Get a dict-like object to list and read the pdfs of a folder, as text:

    >>> from pdfdol import PdfFilesReader
    >>> from pdfdol.tests import get_test_pdf_folder
    >>> folder_path = get_test_pdf_folder()
    >>> pdfs = PdfFilesReader(folder_path)
    >>> sorted(pdfs)
    ['sample_pdf_1', 'sample_pdf_2']
    >>> assert pdfs['sample_pdf_2'] == [
    ...     'Page 1\nThis is a sample text for testing Python PDF tools.'
    ... ]

See that the values of a `PdfFilesReader` are lists of pages. 
If you need strings (i.e. all the pages together) you can add a decoder like so:

```python
from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)
```

If you need this at the level of the class, just do this:

```python
from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...
```

If you need to concatinate a bunch of pdfs together, you can do so in many
ways. Here's one:

```python
from dol import Files
from pdfdol import concat_pdfs

s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)
```


## Get pdf from various sources

Example with a URL

```py
pdf_data = get_pdf("https://pypi.org", src_kind="url")
print("Got PDF data of length:", len(pdf_data))
```

Example with HTML content

```py
html_content = "<html><body><h1>Hello, PDF!</h1></body></html>"
pdf_data = get_pdf(html_content, src_kind="html")
print("Got PDF data of length:", len(pdf_data))
```

Example saving to file

```py
filepath = get_pdf("https://pypi.org", egress="output.pdf", src_kind="url")
print("PDF saved to:", filepath)
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdfdol",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/09/24/8574f94ae08ba302a44509885b6eea5775d40d896c55b19d192b447f6dc5/pdfdol-0.1.21.tar.gz",
    "platform": null,
    "description": "# pdfdol\n\nData Object Layer for PDF data\n\nTo install:\t```pip install pdfdol```\n\n[Documentation](https://i2mint.github.io/pdfdol/)\n\n\n# Examples\n\n## Pdf \"Stores\"\n\nGet a dict-like object to list and read the pdfs of a folder, as text:\n\n    >>> from pdfdol import PdfFilesReader\n    >>> from pdfdol.tests import get_test_pdf_folder\n    >>> folder_path = get_test_pdf_folder()\n    >>> pdfs = PdfFilesReader(folder_path)\n    >>> sorted(pdfs)\n    ['sample_pdf_1', 'sample_pdf_2']\n    >>> assert pdfs['sample_pdf_2'] == [\n    ...     'Page 1\\nThis is a sample text for testing Python PDF tools.'\n    ... ]\n\nSee that the values of a `PdfFilesReader` are lists of pages. \nIf you need strings (i.e. all the pages together) you can add a decoder like so:\n\n```python\nfrom dol import add_decoder\npage_separator = '---------------------'\npdfs = add_decoder(pdfs, decoder=page_separator.join)\n```\n\nIf you need this at the level of the class, just do this:\n\n```python\nfrom dol import add_decoder\npage_separator = '---------------------'\nFilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)\n# and then\npdfs = FilesReader(folder_path)\n# ...\n```\n\nIf you need to concatinate a bunch of pdfs together, you can do so in many\nways. Here's one:\n\n```python\nfrom dol import Files\nfrom pdfdol import concat_pdfs\n\ns = Files('~/Downloads/cosmograph_documentation_pdfs/')\nconcat_pdfs(s, key_order=sorted)\n```\n\n\n## Get pdf from various sources\n\nExample with a URL\n\n```py\npdf_data = get_pdf(\"https://pypi.org\", src_kind=\"url\")\nprint(\"Got PDF data of length:\", len(pdf_data))\n```\n\nExample with HTML content\n\n```py\nhtml_content = \"<html><body><h1>Hello, PDF!</h1></body></html>\"\npdf_data = get_pdf(html_content, src_kind=\"html\")\nprint(\"Got PDF data of length:\", len(pdf_data))\n```\n\nExample saving to file\n\n```py\nfilepath = get_pdf(\"https://pypi.org\", egress=\"output.pdf\", src_kind=\"url\")\nprint(\"PDF saved to:\", filepath)\n```\n\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Data Object Layer for PDF data",
    "version": "0.1.21",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c876a9ce5fc6d2c4cc9d472a2a25a524cf26cc25fe98c6ad78ea191c01edbed8",
                "md5": "5e6171f32a20095f500ca5b564ea02d1",
                "sha256": "0184715c0bc999d39671d4ef1753c75bd76358cc3709948fb01f477afedf796d"
            },
            "downloads": -1,
            "filename": "pdfdol-0.1.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5e6171f32a20095f500ca5b564ea02d1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18318,
            "upload_time": "2025-09-08T12:42:50",
            "upload_time_iso_8601": "2025-09-08T12:42:50.418630Z",
            "url": "https://files.pythonhosted.org/packages/c8/76/a9ce5fc6d2c4cc9d472a2a25a524cf26cc25fe98c6ad78ea191c01edbed8/pdfdol-0.1.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "09248574f94ae08ba302a44509885b6eea5775d40d896c55b19d192b447f6dc5",
                "md5": "8028ae6c44bc2731e5e58eb5e2b2a67b",
                "sha256": "2fe3e2d35b383937acbb8c3bf16b07740b3292c5a20437bcd7726802f06a608a"
            },
            "downloads": -1,
            "filename": "pdfdol-0.1.21.tar.gz",
            "has_sig": false,
            "md5_digest": "8028ae6c44bc2731e5e58eb5e2b2a67b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16214,
            "upload_time": "2025-09-08T12:42:51",
            "upload_time_iso_8601": "2025-09-08T12:42:51.571372Z",
            "url": "https://files.pythonhosted.org/packages/09/24/8574f94ae08ba302a44509885b6eea5775d40d896c55b19d192b447f6dc5/pdfdol-0.1.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 12:42:51",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pdfdol"
}
        
Elapsed time: 2.61662s