# Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords
## pip install pdferli
#### Tested against Windows 10 / Python 3.10 / Anaconda
```python
crack_password(file, chars, processes=4, minlen=None, maxlen=None, verbose=True)
Attempt to crack a PDF password using a brute-force approach.
Args:
file (str): Path to the encrypted PDF file.
chars (iterable): List of characters to generate passwords from.
processes (int, optional): Number of parallel processes for password cracking. Defaults to 4.
minlen (int, optional): Minimum length of generated passwords. Defaults to 1.
maxlen (int, optional): Maximum length of generated passwords. Defaults to length of chars + 1.
verbose (bool, optional): Whether to display progress information. Defaults to True.
Returns:
str: Cracked password if successful, None if not successful
get_pdfdf(path, normalize_content=False, **kwargs)
Extract structured data from a PDF document and return it as a pandas DataFrame.
Args:
path (str): Path to the PDF file.
normalize_content (bool, optional): Whether to normalize content extraction. Defaults to False.
**kwargs: Additional keyword arguments for pikepdf.open and extract_pages methods.
Returns:
pandas.DataFrame: DataFrame containing extracted structured data from the PDF.
put_password_encryption(inputfile, outputfile, password)
Encrypt a PDF file using a specified password.
Args:
inputfile (str): Path to the input PDF file.
outputfile (str): Path to the output encrypted PDF file.
password (str): Password for encryption.
remove_restrictions(inputfile, outputfile, **kwargs)
Remove encryption and restrictions from a PDF file.
Args:
inputfile (str): Path to the input encrypted PDF file.
outputfile (str): Path to the output decrypted PDF file.
**kwargs: Additional keyword arguments for pikepdf.save method.
Examples:
from time import perf_counter
from pdferli import (
crack_password,
put_password_encryption,
remove_restrictions,
get_pdfdf,
)
put_password_encryption(
r"C:\sample.pdf",
r"C:\sample4.pdf",
password="1234",
)
path = r"C:\Arquivo.pdf"
remove_restrictions(path, "c:\\norestrictions.pdf")
df = get_pdfdf(path, normalize_content=False)
if __name__ == "__main__": # necessary for crack_password since it uses multiprocessing
start = perf_counter()
x = crack_password(
file=r"C:\sample4.pdf",
chars=list("0123456789"),
processes=4,
minlen=0,
maxlen=None,
verbose=True,
)
print(perf_counter() - start)
print(x)
start = perf_counter()
# output df
aa_adv aa_bits aa_colorspace aa_element_index aa_element_type aa_evenodd aa_fill aa_fontname aa_height aa_imagemask aa_linewidth aa_name aa_size aa_srcsize aa_stream aa_stroke aa_text aa_text_element aa_text_line aa_upright aa_width aa_x0 aa_x1 aa_y0 aa_y1 bb_hierachy_element bb_hierachy_page
0 31.968 <NA> <NA> 0 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> A APENAS VISUALIZAÇÃO A True 11.336388 126.431281 137.767669 242.012331 298.558504 (0, 0, 0) (0, 0)
1 <NA> <NA> <NA> 1 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)
2 31.968 <NA> <NA> 2 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> P APENAS VISUALIZAÇÃO P True 11.336388 149.036174 160.372561 264.617224 321.163396 (0, 0, 0) (0, 0)
3 <NA> <NA> <NA> 3 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)
4 31.968 <NA> <NA> 4 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> E APENAS VISUALIZAÇÃO E True 11.336388 171.641066 182.977454 287.222116 343.768289 (0, 0, 0) (0, 0)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/pdferli",
"name": "pdferli",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "pdf,parsing,passwords",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/47/65/19a603bfd5a7e44abd380da3b3df8f2a002f15f5417fdd996241ed82311e/pdferli-0.11.tar.gz",
"platform": null,
"description": "\r\n# Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords\r\n\r\n## pip install pdferli \r\n\r\n#### Tested against Windows 10 / Python 3.10 / Anaconda \r\n\r\n```python\r\n\r\ncrack_password(file, chars, processes=4, minlen=None, maxlen=None, verbose=True)\r\n\tAttempt to crack a PDF password using a brute-force approach.\r\n\t\r\n\tArgs:\r\n\t\tfile (str): Path to the encrypted PDF file.\r\n\t\tchars (iterable): List of characters to generate passwords from.\r\n\t\tprocesses (int, optional): Number of parallel processes for password cracking. Defaults to 4.\r\n\t\tminlen (int, optional): Minimum length of generated passwords. Defaults to 1.\r\n\t\tmaxlen (int, optional): Maximum length of generated passwords. Defaults to length of chars + 1.\r\n\t\tverbose (bool, optional): Whether to display progress information. Defaults to True.\r\n\t\r\n\tReturns:\r\n\t\tstr: Cracked password if successful, None if not successful\r\n\r\n\r\nget_pdfdf(path, normalize_content=False, **kwargs)\r\n\tExtract structured data from a PDF document and return it as a pandas DataFrame.\r\n\t\r\n\tArgs:\r\n\t\tpath (str): Path to the PDF file.\r\n\t\tnormalize_content (bool, optional): Whether to normalize content extraction. Defaults to False.\r\n\t\t**kwargs: Additional keyword arguments for pikepdf.open and extract_pages methods.\r\n\t\r\n\tReturns:\r\n\t\tpandas.DataFrame: DataFrame containing extracted structured data from the PDF.\r\n\r\nput_password_encryption(inputfile, outputfile, password)\r\n\tEncrypt a PDF file using a specified password.\r\n\t\r\n\tArgs:\r\n\t\tinputfile (str): Path to the input PDF file.\r\n\t\toutputfile (str): Path to the output encrypted PDF file.\r\n\t\tpassword (str): Password for encryption.\r\n\r\n\r\nremove_restrictions(inputfile, outputfile, **kwargs)\r\n\tRemove encryption and restrictions from a PDF file.\r\n\t\r\n\tArgs:\r\n\t\tinputfile (str): Path to the input encrypted PDF file.\r\n\t\toutputfile (str): Path to the output decrypted PDF file.\r\n\t\t**kwargs: Additional keyword arguments for pikepdf.save method.\r\n\r\n\r\nExamples:\r\n\r\nfrom time import perf_counter\r\n\r\nfrom pdferli import (\r\n crack_password,\r\n put_password_encryption,\r\n remove_restrictions,\r\n get_pdfdf,\r\n)\r\n\r\n\r\nput_password_encryption(\r\n r\"C:\\sample.pdf\",\r\n r\"C:\\sample4.pdf\",\r\n password=\"1234\",\r\n)\r\npath = r\"C:\\Arquivo.pdf\"\r\nremove_restrictions(path, \"c:\\\\norestrictions.pdf\")\r\ndf = get_pdfdf(path, normalize_content=False)\r\n\r\n\r\n\r\n\r\nif __name__ == \"__main__\": # necessary for crack_password since it uses multiprocessing\r\n start = perf_counter()\r\n x = crack_password(\r\n file=r\"C:\\sample4.pdf\",\r\n chars=list(\"0123456789\"),\r\n processes=4,\r\n minlen=0,\r\n maxlen=None,\r\n verbose=True,\r\n )\r\n print(perf_counter() - start)\r\n print(x)\r\n start = perf_counter()\r\n\r\n\r\n\r\n# output df\r\n aa_adv aa_bits aa_colorspace aa_element_index aa_element_type aa_evenodd aa_fill aa_fontname aa_height aa_imagemask aa_linewidth aa_name aa_size aa_srcsize aa_stream aa_stroke aa_text aa_text_element aa_text_line aa_upright aa_width aa_x0 aa_x1 aa_y0 aa_y1 bb_hierachy_element bb_hierachy_page\r\n0 31.968 <NA> <NA> 0 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> A APENAS VISUALIZA\u00c7\u00c3O A True 11.336388 126.431281 137.767669 242.012331 298.558504 (0, 0, 0) (0, 0)\r\n1 <NA> <NA> <NA> 1 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \\n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)\r\n2 31.968 <NA> <NA> 2 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> P APENAS VISUALIZA\u00c7\u00c3O P True 11.336388 149.036174 160.372561 264.617224 321.163396 (0, 0, 0) (0, 0)\r\n3 <NA> <NA> <NA> 3 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \\n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)\r\n4 31.968 <NA> <NA> 4 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> E APENAS VISUALIZA\u00c7\u00c3O E True 11.336388 171.641066 182.977454 287.222116 343.768289 (0, 0, 0) (0, 0)\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords",
"version": "0.11",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/pdferli"
},
"split_keywords": [
"pdf",
"parsing",
"passwords"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2dc4aef642801ea4103f343a054a17ad9114e993b9bf5ef1585385690d94d02b",
"md5": "3b9bbfd9f4c54711271f86cd4fc34680",
"sha256": "ef12ce3e7b1d1288f7f5382e41abc20936c9f30d49f324aab6b3055e0b039bf2"
},
"downloads": -1,
"filename": "pdferli-0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3b9bbfd9f4c54711271f86cd4fc34680",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15041,
"upload_time": "2023-08-24T02:41:12",
"upload_time_iso_8601": "2023-08-24T02:41:12.729373Z",
"url": "https://files.pythonhosted.org/packages/2d/c4/aef642801ea4103f343a054a17ad9114e993b9bf5ef1585385690d94d02b/pdferli-0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "476519a603bfd5a7e44abd380da3b3df8f2a002f15f5417fdd996241ed82311e",
"md5": "9fd1d1fb264eaac6d962b042fbac48a4",
"sha256": "929dd3c8ed8d8c7083f448f4193b94274b49d8c252fc6578c90e8a3b55638f4e"
},
"downloads": -1,
"filename": "pdferli-0.11.tar.gz",
"has_sig": false,
"md5_digest": "9fd1d1fb264eaac6d962b042fbac48a4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14568,
"upload_time": "2023-08-24T02:41:14",
"upload_time_iso_8601": "2023-08-24T02:41:14.710681Z",
"url": "https://files.pythonhosted.org/packages/47/65/19a603bfd5a7e44abd380da3b3df8f2a002f15f5417fdd996241ed82311e/pdferli-0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-24 02:41:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "pdferli",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pdferli"
}