eml2pdf


Nameeml2pdf JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryConvert .eml (email) files to PDF using Python.
upload_time2025-02-09 17:45:56
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords eml pdf pdf-converter weasyprint html mime multipart
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # eml2pdf

Convert `.eml` (email) files to PDF using Python, making them easier
to archive, share, and view without requiring an email client.

Depends on [GNOME's Pango](https://gitlab.gnome.org/GNOME/pango) and
various Python libraries but NOT on a full rendering engine like
WebKit or Gecko. [python-pdfkit](https://github.com/JazzCore/python-pdf-kit)
and [wkhtmltopdf](https://github.com/wkhtmltopdf/wkhtmltopdf) are
[deprecated libraries](
https://github.com/JazzCore/python-pdfkit?tab=readme-ov-file#deprecation-warning)

Should run on Linux distributions with Pango and Python and macOS. The Pango
dependency is a challenge on Windows at the moment.

This software is in beta state. There are some unit tests, I use it in my own
workflow, but we need some actual users/downloads, basics for translations and
Debian packaging files to proceed to a 1.0 release.

## Features

- Converts email body plain from HTML or plain text message body.
- Tries to filter potential **security or privacy** issues.
- Preserves formatting, character encodings, embedded images.
- Generates a header section with email metadata From, To, Subject, Date and, if
  any, a list of attachments with size and md5sum.

## Dependencies

- Python 3.11+
- [weasyprint](https://weasyprint.org/) - a visual rendering engine for HTML
  and CSS that can export to PDF. Weasyprint depends on [GNOME's Pango](
  https://gitlab.gnome.org/GNOME/pango).
- [python-markdown](https://github.com/Python-Markdown/markdown) - for
  HTML'izing plain text.
- [hurry.filesize](https://pypi.org/project/hurry.filesize/) - return human
  readable filesizes.
- [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/) - HTML
  sanitization.

## Installation

On a desktop system, chances are high that you have Pango installed. In this
case you can install eml2pdf from PyPi using pip:

```bash
pip install eml2pdf
```

If weasyprint can't find Pango, best is to [install weasyprint using your
system's package manager](
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation).

Users of Arch linux or derived distro's like Manjora can use AUR package
[eml2pdf-git](https://aur.archlinux.org/packages/eml_to_pdf-git).

Check [INSTALL.md](INSTALL.md) for more detailed installation instructions if
you need more help.

## Usage

eml2pdf will convert all .eml files in an input directory and
save converted PDF files in a specified output directory.

The output filenames are formatted as:
`YYYY-MM-DD_subject[-counter].pdf`, where:

- The date prefix is taken from the email's sent date.
- The email subject is taken from the email headers.
- Should there be any duplicate filenames, then a counter will be added.
- The extension is changed to `.pdf`

For example, `some_file.eml` with subject "My Email" sent on March 15, 2024
will become `2024-03-15_My_Email.pdf`.

```text
usage: eml2pdf [-h] [-d] [-n number] [-p size] [-v] input_dir output_dir

Convert EML files to PDF

positional arguments:
  input_dir             Directory containing EML files
  output_dir            Directory for PDF output

options:
  -h, --help            Show this help message and exit.
  -d, --debug_html      Write intermediate html file next to pdf's.
  -n number, --number-of-procs number
                        Number of parallel processes. Defaults to the number
                        of available logical CPU's to eml2pdf.
  -p size, --page size  a3 a4 a5 b4 b5 letter legal or ledger with or without
                        'landscape', for example: 'a4 landscape' or 'a3'
                        including quotes. Defaults to 'a4', implying portrait.
  --unsafe              Don't sanitize HTML from potentially unsafe elements
                        such as remote images, scripts, etc. This may expose
                        sensitive user information.
  -v, --verbose         Show a lot of verbose debugging info. Forces number
                        of procs to 1.
```

Example below renders all .eml files in `./emails` to a4 landscape oriented pdf's
in `./pdf`:

```bash
eml2pdf -p 'a4 landscape' ./emails ./pdfs
```

### Debug HTML

eml2pdf will first parse email header info such as date, subject, etc. Next
the mail body will be parsed. If there is an HTML body, eml2pdf will clean
this HTML body (ref. below under Security) and prepend this resulting HTML with
a summary table.

In a next step this HTML is rendered by weasyprint to a PDF.

The '--debug_html' flag will save this intermediate HTML. You can use this to
check if there is an email parsing issue in eml2pdf or a PDF conversion issue
in weasyprint.

### Page size

Not all emails are properly formatted. Part of your mail might not be visible
in the pdf in case an email doesn't limit width of some elements such as
images, tables or others. You can play with page sizes and orientations to try
and accomodate wide emails.

### Security

#### HTML Sanitization

Emails can contain HTML which can contain stuff you don't expect or want.

In the best case your emails contain clean HTML.

In common cases they will contain intentional tracking of end users using
forged remote sources for images and other resources. This is a common
practice in marketing or mass mailing solutions.

eml2pdf tries to keep the formatting in your mails ánd clean up
potentially malicious content using custom filtering of tags, remote images,
remote stylesheets, etc.

We try to cleanup. We can't give you a 100% guarantee. If you're very worried,
please cleanup your mails yourself.

You can use the --unsafe flag if you don't want eml2pdf to try and
sanitize your mails. Check your mails' content before you use this flag!

#### MD5 sums of attachments

eml2pdf lists attachments with their md5sums. You can use these md5sums for
your convenience. They give a very strong indication that files are not
altered. **They will not be usable as proof in courts of law.**
They are not intended to be.

## Reporting issues

We've tested eml2pdf with a couple of cases with embedded images, tables,
unicode or specific encodings. Refer to [tests](tree/main/tests) for example
emails.

Please open an issue ticket if you have a mail where conversion results are
not usable. Describe what you think your message contains and the output you
expect. Attach verbose eml2pdf output of only this eml file and attach
the eml file itself. We're not promising a solution, but we can
have a look.

**Please cleanup any attachments you add. Remove things you don't want to share with
the world.**

## Credits

eml2pdf was originally forked from [klokie/eml-to-pdf](
https://github.com/klokie/eml-to-pdf) by [Daniel Grossfeld](
https://github.com/klokie/).

## License

eml2pdf code is published under the MIT license.

Licenses for dependencies:

- weasyprint: BSD-3
- python-markdown: BSD-3
- hurry.filesize: ZPL 2.1
- beautifulsoup4: MIT
- Pango: GPLv2

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "eml2pdf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "eml, pdf, pdf-converter, weasyprint, html, mime, multipart",
    "author": null,
    "author_email": "Pieter Lenaerts <pieter.lenaerts@outlook.be>, Daniel Grossfeld <github@klokie.com>",
    "download_url": "https://files.pythonhosted.org/packages/5b/77/f74f8a32e4cfe738260a8b21d10af962485d8d8daaf30b56fbad4c8735fd/eml2pdf-0.1.1.tar.gz",
    "platform": null,
    "description": "# eml2pdf\n\nConvert `.eml` (email) files to PDF using Python, making them easier\nto archive, share, and view without requiring an email client.\n\nDepends on [GNOME's Pango](https://gitlab.gnome.org/GNOME/pango) and\nvarious Python libraries but NOT on a full rendering engine like\nWebKit or Gecko. [python-pdfkit](https://github.com/JazzCore/python-pdf-kit)\nand [wkhtmltopdf](https://github.com/wkhtmltopdf/wkhtmltopdf) are\n[deprecated libraries](\nhttps://github.com/JazzCore/python-pdfkit?tab=readme-ov-file#deprecation-warning)\n\nShould run on Linux distributions with Pango and Python and macOS. The Pango\ndependency is a challenge on Windows at the moment.\n\nThis software is in beta state. There are some unit tests, I use it in my own\nworkflow, but we need some actual users/downloads, basics for translations and\nDebian packaging files to proceed to a 1.0 release.\n\n## Features\n\n- Converts email body plain from HTML or plain text message body.\n- Tries to filter potential **security or privacy** issues.\n- Preserves formatting, character encodings, embedded images.\n- Generates a header section with email metadata From, To, Subject, Date and, if\n  any, a list of attachments with size and md5sum.\n\n## Dependencies\n\n- Python 3.11+\n- [weasyprint](https://weasyprint.org/) - a visual rendering engine for HTML\n  and CSS that can export to PDF. Weasyprint depends on [GNOME's Pango](\n  https://gitlab.gnome.org/GNOME/pango).\n- [python-markdown](https://github.com/Python-Markdown/markdown) - for\n  HTML'izing plain text.\n- [hurry.filesize](https://pypi.org/project/hurry.filesize/) - return human\n  readable filesizes.\n- [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/) - HTML\n  sanitization.\n\n## Installation\n\nOn a desktop system, chances are high that you have Pango installed. In this\ncase you can install eml2pdf from PyPi using pip:\n\n```bash\npip install eml2pdf\n```\n\nIf weasyprint can't find Pango, best is to [install weasyprint using your\nsystem's package manager](\nhttps://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation).\n\nUsers of Arch linux or derived distro's like Manjora can use AUR package\n[eml2pdf-git](https://aur.archlinux.org/packages/eml_to_pdf-git).\n\nCheck [INSTALL.md](INSTALL.md) for more detailed installation instructions if\nyou need more help.\n\n## Usage\n\neml2pdf will convert all .eml files in an input directory and\nsave converted PDF files in a specified output directory.\n\nThe output filenames are formatted as:\n`YYYY-MM-DD_subject[-counter].pdf`, where:\n\n- The date prefix is taken from the email's sent date.\n- The email subject is taken from the email headers.\n- Should there be any duplicate filenames, then a counter will be added.\n- The extension is changed to `.pdf`\n\nFor example, `some_file.eml` with subject \"My Email\" sent on March 15, 2024\nwill become `2024-03-15_My_Email.pdf`.\n\n```text\nusage: eml2pdf [-h] [-d] [-n number] [-p size] [-v] input_dir output_dir\n\nConvert EML files to PDF\n\npositional arguments:\n  input_dir             Directory containing EML files\n  output_dir            Directory for PDF output\n\noptions:\n  -h, --help            Show this help message and exit.\n  -d, --debug_html      Write intermediate html file next to pdf's.\n  -n number, --number-of-procs number\n                        Number of parallel processes. Defaults to the number\n                        of available logical CPU's to eml2pdf.\n  -p size, --page size  a3 a4 a5 b4 b5 letter legal or ledger with or without\n                        'landscape', for example: 'a4 landscape' or 'a3'\n                        including quotes. Defaults to 'a4', implying portrait.\n  --unsafe              Don't sanitize HTML from potentially unsafe elements\n                        such as remote images, scripts, etc. This may expose\n                        sensitive user information.\n  -v, --verbose         Show a lot of verbose debugging info. Forces number\n                        of procs to 1.\n```\n\nExample below renders all .eml files in `./emails` to a4 landscape oriented pdf's\nin `./pdf`:\n\n```bash\neml2pdf -p 'a4 landscape' ./emails ./pdfs\n```\n\n### Debug HTML\n\neml2pdf will first parse email header info such as date, subject, etc. Next\nthe mail body will be parsed. If there is an HTML body, eml2pdf will clean\nthis HTML body (ref. below under Security) and prepend this resulting HTML with\na summary table.\n\nIn a next step this HTML is rendered by weasyprint to a PDF.\n\nThe '--debug_html' flag will save this intermediate HTML. You can use this to\ncheck if there is an email parsing issue in eml2pdf or a PDF conversion issue\nin weasyprint.\n\n### Page size\n\nNot all emails are properly formatted. Part of your mail might not be visible\nin the pdf in case an email doesn't limit width of some elements such as\nimages, tables or others. You can play with page sizes and orientations to try\nand accomodate wide emails.\n\n### Security\n\n#### HTML Sanitization\n\nEmails can contain HTML which can contain stuff you don't expect or want.\n\nIn the best case your emails contain clean HTML.\n\nIn common cases they will contain intentional tracking of end users using\nforged remote sources for images and other resources. This is a common\npractice in marketing or mass mailing solutions.\n\neml2pdf tries to keep the formatting in your mails \u00e1nd clean up\npotentially malicious content using custom filtering of tags, remote images,\nremote stylesheets, etc.\n\nWe try to cleanup. We can't give you a 100% guarantee. If you're very worried,\nplease cleanup your mails yourself.\n\nYou can use the --unsafe flag if you don't want eml2pdf to try and\nsanitize your mails. Check your mails' content before you use this flag!\n\n#### MD5 sums of attachments\n\neml2pdf lists attachments with their md5sums. You can use these md5sums for\nyour convenience. They give a very strong indication that files are not\naltered. **They will not be usable as proof in courts of law.**\nThey are not intended to be.\n\n## Reporting issues\n\nWe've tested eml2pdf with a couple of cases with embedded images, tables,\nunicode or specific encodings. Refer to [tests](tree/main/tests) for example\nemails.\n\nPlease open an issue ticket if you have a mail where conversion results are\nnot usable. Describe what you think your message contains and the output you\nexpect. Attach verbose eml2pdf output of only this eml file and attach\nthe eml file itself. We're not promising a solution, but we can\nhave a look.\n\n**Please cleanup any attachments you add. Remove things you don't want to share with\nthe world.**\n\n## Credits\n\neml2pdf was originally forked from [klokie/eml-to-pdf](\nhttps://github.com/klokie/eml-to-pdf) by [Daniel Grossfeld](\nhttps://github.com/klokie/).\n\n## License\n\neml2pdf code is published under the MIT license.\n\nLicenses for dependencies:\n\n- weasyprint: BSD-3\n- python-markdown: BSD-3\n- hurry.filesize: ZPL 2.1\n- beautifulsoup4: MIT\n- Pango: GPLv2\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Convert .eml (email) files to PDF using Python.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/plenaerts/eml2pdf",
        "Issues": "https://github.com/plenaerts/eml2pdf/issues",
        "Repository": "https://github.com/plenaerts/eml2pdf"
    },
    "split_keywords": [
        "eml",
        " pdf",
        " pdf-converter",
        " weasyprint",
        " html",
        " mime",
        " multipart"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6fa1f58c64040bdc6ad10610a1a59cfcde04bf1ed0bb0dc412eb7165f1dc0f6c",
                "md5": "c6d2e770385568e07e9e70ca76d62b24",
                "sha256": "fd775f8ac4c26c3a0a671e7a2d021f00a2a8511bbe2db591cebc1427ea7f508f"
            },
            "downloads": -1,
            "filename": "eml2pdf-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c6d2e770385568e07e9e70ca76d62b24",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 11459,
            "upload_time": "2025-02-09T17:45:54",
            "upload_time_iso_8601": "2025-02-09T17:45:54.064806Z",
            "url": "https://files.pythonhosted.org/packages/6f/a1/f58c64040bdc6ad10610a1a59cfcde04bf1ed0bb0dc412eb7165f1dc0f6c/eml2pdf-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5b77f74f8a32e4cfe738260a8b21d10af962485d8d8daaf30b56fbad4c8735fd",
                "md5": "0f48b9a652b7c3e628ea7bbc9bacdc4a",
                "sha256": "b0a88b557c187db36e8dd5a650a126bb17c216c2ac5a3149014a9e5d31e341aa"
            },
            "downloads": -1,
            "filename": "eml2pdf-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0f48b9a652b7c3e628ea7bbc9bacdc4a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 180019,
            "upload_time": "2025-02-09T17:45:56",
            "upload_time_iso_8601": "2025-02-09T17:45:56.082018Z",
            "url": "https://files.pythonhosted.org/packages/5b/77/f74f8a32e4cfe738260a8b21d10af962485d8d8daaf30b56fbad4c8735fd/eml2pdf-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-09 17:45:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plenaerts",
    "github_project": "eml2pdf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "eml2pdf"
}
        
Elapsed time: 1.89903s