Name | pdfdol JSON |
Version |
0.1.11
JSON |
| download |
home_page | None |
Summary | Data Object Layer for PDF data |
upload_time | 2024-11-26 12:01:57 |
maintainer | None |
docs_url | None |
author | OtoSense |
requires_python | None |
license | mit |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pdfdol
Data Object Layer for PDF data
To install: ```pip install pdfdol```
[Documentation](https://i2mint.github.io/pdfdol/)
# Examples
Get a dict-like object to list and read the pdfs of a folder, as text:
>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
... 'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]
See that the values of a `PdfFilesReader` are lists of pages.
If you need strings (i.e. all the pages together) you can add a decoder like so:
```python
from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)
```
If you need this at the level of the class, just do this:
```python
from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...
```
If you need to concatinate a bunch of pdfs together, you can do so in many
ways. Here's one:
```python
from dol import Files
from pdfdol import concat_pdfs
s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "pdfdol",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "OtoSense",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/5f/11/c199841803301fdd295e6a8aabb5eb93aff600808210f7d009f32a81e67e/pdfdol-0.1.11.tar.gz",
"platform": null,
"description": "# pdfdol\n\nData Object Layer for PDF data\n\nTo install:\t```pip install pdfdol```\n\n[Documentation](https://i2mint.github.io/pdfdol/)\n\n\n# Examples\n\nGet a dict-like object to list and read the pdfs of a folder, as text:\n\n >>> from pdfdol import PdfFilesReader\n >>> from pdfdol.tests import get_test_pdf_folder\n >>> folder_path = get_test_pdf_folder()\n >>> pdfs = PdfFilesReader(folder_path)\n >>> sorted(pdfs)\n ['sample_pdf_1', 'sample_pdf_2']\n >>> assert pdfs['sample_pdf_2'] == [\n ... 'Page 1\\nThis is a sample text for testing Python PDF tools.'\n ... ]\n\nSee that the values of a `PdfFilesReader` are lists of pages. \nIf you need strings (i.e. all the pages together) you can add a decoder like so:\n\n```python\nfrom dol import add_decoder\npage_separator = '---------------------'\npdfs = add_decoder(pdfs, decoder=page_separator.join)\n```\n\nIf you need this at the level of the class, just do this:\n\n```python\nfrom dol import add_decoder\npage_separator = '---------------------'\nFilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)\n# and then\npdfs = FilesReader(folder_path)\n# ...\n```\n\nIf you need to concatinate a bunch of pdfs together, you can do so in many\nways. Here's one:\n\n```python\nfrom dol import Files\nfrom pdfdol import concat_pdfs\n\ns = Files('~/Downloads/cosmograph_documentation_pdfs/')\nconcat_pdfs(s, key_order=sorted)\n```\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Data Object Layer for PDF data",
"version": "0.1.11",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "24f3daa12c2238cf82b13f1d23e0e9b28618f86e9e3c5d74e07a6a9927d24600",
"md5": "bc2e9a2f20a7d727dd36dfff59fd317a",
"sha256": "cb90fc79a15c447770de72348f9f3d1bd1515f819d82620c8cd1fe8012af0b37"
},
"downloads": -1,
"filename": "pdfdol-0.1.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bc2e9a2f20a7d727dd36dfff59fd317a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7786,
"upload_time": "2024-11-26T12:01:56",
"upload_time_iso_8601": "2024-11-26T12:01:56.287398Z",
"url": "https://files.pythonhosted.org/packages/24/f3/daa12c2238cf82b13f1d23e0e9b28618f86e9e3c5d74e07a6a9927d24600/pdfdol-0.1.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5f11c199841803301fdd295e6a8aabb5eb93aff600808210f7d009f32a81e67e",
"md5": "02c0c5f9f4e030ebbeba7a60b03c7932",
"sha256": "6cee1702eab25f6cc1ba9909c5785b0694850def05a68e954d141864355aad7f"
},
"downloads": -1,
"filename": "pdfdol-0.1.11.tar.gz",
"has_sig": false,
"md5_digest": "02c0c5f9f4e030ebbeba7a60b03c7932",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6576,
"upload_time": "2024-11-26T12:01:57",
"upload_time_iso_8601": "2024-11-26T12:01:57.875220Z",
"url": "https://files.pythonhosted.org/packages/5f/11/c199841803301fdd295e6a8aabb5eb93aff600808210f7d009f32a81e67e/pdfdol-0.1.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-26 12:01:57",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pdfdol"
}