[![PyPI - Version](https://img.shields.io/pypi/v/mail-parser)](https://pypi.org/project/mail-parser/)
[![Coverage Status](https://coveralls.io/repos/github/SpamScope/mail-parser/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/mail-parser?branch=develop)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/mail-parser?color=blue)](https://pypistats.org/packages/mail-parser)
![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)
# mail-parser
mail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.
It give you an easy way to pass from raw mail to Python object that you can use in your code.
It's the key module of [SpamScope](https://github.com/SpamScope/spamscope).
mail-parser can parse Outlook email format (.msg). To use this feature, you need to install `libemail-outlook-message-perl` package. For Debian based systems:
```
$ apt-get install libemail-outlook-message-perl
```
For more details:
```
$ apt-cache show libemail-outlook-message-perl
```
mail-parser supports Python 3.
# Apache 2 Open Source License
mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
## Support the project
If you find this project useful, you can support it by donating any amount you want. All donations are greatly appreciated and help maintain and develop the project.
[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif "Donate")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)
<a href="bitcoin:bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32">
<img src="https://github.com/SpamScope/mail-parser/blob/develop/docs/images/Bitcoin%20SpamScope.jpg?raw=true" alt="Bitcoin" width="200">
</a>
Bitcoin Address: `bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32`
# mail-parser on Web
- [Splunk app](https://splunkbase.splunk.com/app/4129/)
- [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)
- [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)
- [REMnux](https://docs.remnux.org/discover-the-tools/analyze+documents/email+messages#mail-parser)
# Description
mail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of
[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):
- bcc
- cc
- date
- delivered_to
- from\_ (not `from` because is a keyword of Python)
- message_id
- received
- reply_to
- subject
- to
There are other properties to get:
- body
- body html
- body plain
- headers
- attachments
- sender IP address
- to domains
- timezone
The `attachments` property is a list of objects. Every object has the following keys:
- binary: it's true if the attachment is a binary
- charset
- content_transfer_encoding
- content-disposition
- content-id
- filename
- mail_content_type
- payload: attachment payload in base64
To get custom headers you should replace "-" with "\_".
Example for header `X-MSMail-Priority`:
```
$ mail.X_MSMail_Priority
```
The `received` header is parsed and splitted in hop. The fields supported are:
- by
- date
- date_utc
- delay (between two hop)
- envelope_from
- envelope_sender
- for
- from
- hop
- with
mail-parser can detect defect in mail:
- [defects](https://docs.python.org/2/library/email.message.html#email.message.Message.defects): mail with some not compliance RFC part
All properties have a JSON and raw property that you can get with:
- name_json
- name_raw
Example:
```
$ mail.to (Python object)
$ mail.to_json (JSON)
$ mail.to_raw (raw header)
```
The command line tool use the JSON format.
## Defects
These defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).
This library can take these epilogues.
# Authors
## Main Author
**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)
# Installation
Clone repository
```
git clone https://github.com/SpamScope/mail-parser.git
```
and install mail-parser with `setup.py`:
```
$ cd mail-parser
$ python setup.py install
```
or use `pip`:
```
$ pip install mail-parser
```
# Usage in a project
Import `mailparser` module:
```
import mailparser
mail = mailparser.parse_from_bytes(byte_mail)
mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_file_msg(outlook_mail)
mail = mailparser.parse_from_file_obj(fp)
mail = mailparser.parse_from_string(raw_mail)
```
Then you can get all parts
```
mail.attachments: list of all attachments
mail.body
mail.date: datetime object in UTC
mail.defects: defect RFC not compliance
mail.defects_categories: only defects categories
mail.delivered_to
mail.from_
mail.get_server_ipaddress(trust="my_server_mail_trust")
mail.headers
mail.mail: tokenized mail in a object
mail.message: email.message.Message object
mail.message_as_string: message as string
mail.message_id
mail.received
mail.subject
mail.text_plain: only text plain mail parts in a list
mail.text_html: only text html mail parts in a list
mail.text_not_managed: all not managed text (check the warning logs to find content subtype)
mail.to
mail.to_domains
mail.timezone: returns the timezone, offset from UTC
mail.mail_partial: returns only the mains parts of emails
```
It's possible to write the attachments on disk with the method:
```
mail.write_attachments(base_path)
```
# Usage from command-line
If you installed mailparser with `pip` or `setup.py` you can use it with command-line.
These are all swithes:
```
usage: mailparser [-h] (-f FILE | -s STRING | -k)
[-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]
[-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]
[-i Trust mail server string] [-p] [-z] [-v]
Wrapper for email Python Standard Library
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE Raw email file (default: None)
-s STRING, --string STRING
Raw email string (default: None)
-k, --stdin Enable parsing from stdin (default: False)
-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Set log level (default: WARNING)
-j, --json Show the JSON of parsed mail (default: False)
-b, --body Print the body of mail (default: False)
-a, --attachments Print the attachments of mail (default: False)
-r, --headers Print the headers of mail (default: False)
-t, --to Print the to of mail (default: False)
-dt, --delivered-to Print the delivered-to of mail (default: False)
-m, --from Print the from of mail (default: False)
-u, --subject Print the subject of mail (default: False)
-c, --receiveds Print all receiveds of mail (default: False)
-d, --defects Print the defects of mail (default: False)
-o, --outlook Analyze Outlook msg (default: False)
-i Trust mail server string, --senderip Trust mail server string
Extract a reliable sender IP address heuristically
(default: None)
-p, --mail-hash Print mail fingerprints without headers (default:
False)
-z, --attachments-hash
Print attachments with fingerprints (default: False)
-sa, --store-attachments
Store attachments on disk (default: False)
-ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH
Path where store attachments (default: /tmp)
-v, --version show program's version number and exit
It takes as input a raw mail and generates a parsed object.
```
Example:
```shell
$ mailparser -f example_mail -j
```
This example will show you the tokenized mail in a JSON pretty format.
From [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to
[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).
# Exceptions
Exceptions hierarchy of mail-parser:
```
MailParserError: Base MailParser Exception
|
\── MailParserOutlookError: Raised with Outlook integration errors
|
\── MailParserEnvironmentError: Raised when the environment is not correct
|
\── MailParserOSError: Raised when there is an OS error
|
\── MailParserReceivedParsingError: Raised when a received header cannot be parsed
```
# Development
The first step is to install the development environment:
```
$ python3.10 -m virtualenv venv
$ source venv/bin/activate
$ pip install -e ".[dev, test]"
```
The second step is to run the tests:
```
$ make unittest
```
Then you can try to run the command line tool:
```
$ mail-parser -f tests/mails/mail_malformed_3 -j
```
If all is ok, you can start to develop.
Raw data
{
"_id": null,
"home_page": "https://github.com/SpamScope/mail-parser",
"name": "mail-parser",
"maintainer": "Fedele Mantuano",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "mantuano.fedele@gmail.com",
"keywords": "email, mail, parser, spam, phishing, malware, forensic, analysis",
"author": "Fedele Mantuano",
"author_email": "mantuano.fedele@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cb/d7/f1a78ace3f44d61d763c94b32d8a101f0d2b76a2e274c7e1f55bd91a1657/mail_parser-4.1.2.tar.gz",
"platform": "OS Independent",
"description": "[![PyPI - Version](https://img.shields.io/pypi/v/mail-parser)](https://pypi.org/project/mail-parser/)\n[![Coverage Status](https://coveralls.io/repos/github/SpamScope/mail-parser/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/mail-parser?branch=develop)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/mail-parser?color=blue)](https://pypistats.org/packages/mail-parser)\n\n\n![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)\n\n# mail-parser\n\nmail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.\nIt give you an easy way to pass from raw mail to Python object that you can use in your code.\nIt's the key module of [SpamScope](https://github.com/SpamScope/spamscope).\n\nmail-parser can parse Outlook email format (.msg). To use this feature, you need to install `libemail-outlook-message-perl` package. For Debian based systems:\n\n```\n$ apt-get install libemail-outlook-message-perl\n```\n\nFor more details:\n\n```\n$ apt-cache show libemail-outlook-message-perl\n```\n\nmail-parser supports Python 3.\n\n\n# Apache 2 Open Source License\nmail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.\n\n## Support the project\nIf you find this project useful, you can support it by donating any amount you want. All donations are greatly appreciated and help maintain and develop the project.\n\n[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif \"Donate\")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)\n\n<a href=\"bitcoin:bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32\">\n <img src=\"https://github.com/SpamScope/mail-parser/blob/develop/docs/images/Bitcoin%20SpamScope.jpg?raw=true\" alt=\"Bitcoin\" width=\"200\">\n</a>\n\nBitcoin Address: `bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32`\n\n# mail-parser on Web\n - [Splunk app](https://splunkbase.splunk.com/app/4129/)\n - [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)\n - [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)\n - [REMnux](https://docs.remnux.org/discover-the-tools/analyze+documents/email+messages#mail-parser)\n\n# Description\n\nmail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of\n[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):\n\n - bcc\n - cc\n - date\n - delivered_to\n - from\\_ (not `from` because is a keyword of Python)\n - message_id\n - received\n - reply_to\n - subject\n - to\n\nThere are other properties to get:\n - body\n - body html\n - body plain\n - headers\n - attachments\n - sender IP address\n - to domains\n - timezone\n\nThe `attachments` property is a list of objects. Every object has the following keys:\n - binary: it's true if the attachment is a binary\n - charset\n - content_transfer_encoding\n - content-disposition\n - content-id\n - filename\n - mail_content_type\n - payload: attachment payload in base64\n\nTo get custom headers you should replace \"-\" with \"\\_\".\nExample for header `X-MSMail-Priority`:\n\n```\n$ mail.X_MSMail_Priority\n```\n\nThe `received` header is parsed and splitted in hop. The fields supported are:\n - by\n - date\n - date_utc\n - delay (between two hop)\n - envelope_from\n - envelope_sender\n - for\n - from\n - hop\n - with\n\n\nmail-parser can detect defect in mail:\n - [defects](https://docs.python.org/2/library/email.message.html#email.message.Message.defects): mail with some not compliance RFC part\n\nAll properties have a JSON and raw property that you can get with:\n - name_json\n - name_raw\n\nExample:\n\n```\n$ mail.to (Python object)\n$ mail.to_json (JSON)\n$ mail.to_raw (raw header)\n```\n\nThe command line tool use the JSON format.\n\n## Defects\nThese defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).\nThis library can take these epilogues.\n\n\n# Authors\n\n## Main Author\n**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)\n\n\n# Installation\n\nClone repository\n\n```\ngit clone https://github.com/SpamScope/mail-parser.git\n```\n\nand install mail-parser with `setup.py`:\n\n```\n$ cd mail-parser\n\n$ python setup.py install\n```\n\nor use `pip`:\n\n```\n$ pip install mail-parser\n```\n\n# Usage in a project\n\nImport `mailparser` module:\n\n```\nimport mailparser\n\nmail = mailparser.parse_from_bytes(byte_mail)\nmail = mailparser.parse_from_file(f)\nmail = mailparser.parse_from_file_msg(outlook_mail)\nmail = mailparser.parse_from_file_obj(fp)\nmail = mailparser.parse_from_string(raw_mail)\n```\n\nThen you can get all parts\n\n```\nmail.attachments: list of all attachments\nmail.body\nmail.date: datetime object in UTC\nmail.defects: defect RFC not compliance\nmail.defects_categories: only defects categories\nmail.delivered_to\nmail.from_\nmail.get_server_ipaddress(trust=\"my_server_mail_trust\")\nmail.headers\nmail.mail: tokenized mail in a object\nmail.message: email.message.Message object\nmail.message_as_string: message as string\nmail.message_id\nmail.received\nmail.subject\nmail.text_plain: only text plain mail parts in a list\nmail.text_html: only text html mail parts in a list\nmail.text_not_managed: all not managed text (check the warning logs to find content subtype)\nmail.to\nmail.to_domains\nmail.timezone: returns the timezone, offset from UTC\nmail.mail_partial: returns only the mains parts of emails\n```\n\nIt's possible to write the attachments on disk with the method:\n\n```\nmail.write_attachments(base_path)\n```\n\n# Usage from command-line\n\nIf you installed mailparser with `pip` or `setup.py` you can use it with command-line.\n\nThese are all swithes:\n\n```\nusage: mailparser [-h] (-f FILE | -s STRING | -k)\n [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]\n [-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]\n [-i Trust mail server string] [-p] [-z] [-v]\n\nWrapper for email Python Standard Library\n\noptional arguments:\n -h, --help show this help message and exit\n -f FILE, --file FILE Raw email file (default: None)\n -s STRING, --string STRING\n Raw email string (default: None)\n -k, --stdin Enable parsing from stdin (default: False)\n -l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}\n Set log level (default: WARNING)\n -j, --json Show the JSON of parsed mail (default: False)\n -b, --body Print the body of mail (default: False)\n -a, --attachments Print the attachments of mail (default: False)\n -r, --headers Print the headers of mail (default: False)\n -t, --to Print the to of mail (default: False)\n -dt, --delivered-to Print the delivered-to of mail (default: False)\n -m, --from Print the from of mail (default: False)\n -u, --subject Print the subject of mail (default: False)\n -c, --receiveds Print all receiveds of mail (default: False)\n -d, --defects Print the defects of mail (default: False)\n -o, --outlook Analyze Outlook msg (default: False)\n -i Trust mail server string, --senderip Trust mail server string\n Extract a reliable sender IP address heuristically\n (default: None)\n -p, --mail-hash Print mail fingerprints without headers (default:\n False)\n -z, --attachments-hash\n Print attachments with fingerprints (default: False)\n -sa, --store-attachments\n Store attachments on disk (default: False)\n -ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH\n Path where store attachments (default: /tmp)\n -v, --version show program's version number and exit\n\nIt takes as input a raw mail and generates a parsed object.\n```\n\nExample:\n\n```shell\n$ mailparser -f example_mail -j\n```\n\nThis example will show you the tokenized mail in a JSON pretty format.\n\nFrom [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to\n[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).\n\n\n# Exceptions\n\nExceptions hierarchy of mail-parser:\n\n```\nMailParserError: Base MailParser Exception\n|\n\\\u2500\u2500 MailParserOutlookError: Raised with Outlook integration errors\n|\n\\\u2500\u2500 MailParserEnvironmentError: Raised when the environment is not correct\n|\n\\\u2500\u2500 MailParserOSError: Raised when there is an OS error\n|\n\\\u2500\u2500 MailParserReceivedParsingError: Raised when a received header cannot be parsed\n```\n\n# Development\nThe first step is to install the development environment:\n\n```\n$ python3.10 -m virtualenv venv\n$ source venv/bin/activate\n$ pip install -e \".[dev, test]\"\n```\n\nThe second step is to run the tests:\n\n```\n$ make unittest\n```\n\nThen you can try to run the command line tool:\n\n```\n$ mail-parser -f tests/mails/mail_malformed_3 -j\n```\n\nIf all is ok, you can start to develop.\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Improved wrapper for email standard library",
"version": "4.1.2",
"project_urls": {
"Homepage": "https://github.com/SpamScope/mail-parser"
},
"split_keywords": [
"email",
" mail",
" parser",
" spam",
" phishing",
" malware",
" forensic",
" analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e8b17dffb0bb284339f542f9890b45c3accedc634aec898a84915f8eea7c776d",
"md5": "d9b6e09b776419a309c220eebeb3e8d7",
"sha256": "a6267daa42b9a2dd18a667aacb4891d662a50503c78749089d5f09ebf2a31d2b"
},
"downloads": -1,
"filename": "mail_parser-4.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d9b6e09b776419a309c220eebeb3e8d7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 27002,
"upload_time": "2024-11-11T21:40:50",
"upload_time_iso_8601": "2024-11-11T21:40:50.363593Z",
"url": "https://files.pythonhosted.org/packages/e8/b1/7dffb0bb284339f542f9890b45c3accedc634aec898a84915f8eea7c776d/mail_parser-4.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cbd7f1a78ace3f44d61d763c94b32d8a101f0d2b76a2e274c7e1f55bd91a1657",
"md5": "041c32db62002d762c73672aabbe2df9",
"sha256": "35e3568b84361a3caba0f86b3a27a5756cebbd07e458fd91ab793c45f8a09160"
},
"downloads": -1,
"filename": "mail_parser-4.1.2.tar.gz",
"has_sig": false,
"md5_digest": "041c32db62002d762c73672aabbe2df9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 26948,
"upload_time": "2024-11-11T21:40:51",
"upload_time_iso_8601": "2024-11-11T21:40:51.459404Z",
"url": "https://files.pythonhosted.org/packages/cb/d7/f1a78ace3f44d61d763c94b32d8a101f0d2b76a2e274c7e1f55bd91a1657/mail_parser-4.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-11 21:40:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SpamScope",
"github_project": "mail-parser",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mail-parser"
}