[![PyPI version](https://badge.fury.io/py/mail-parser.svg)](https://badge.fury.io/py/mail-parser)
[![Build Status](https://travis-ci.org/SpamScope/mail-parser.svg?branch=develop)](https://travis-ci.org/SpamScope/mail-parser)
[![Coverage Status](https://coveralls.io/repos/github/SpamScope/mail-parser/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/mail-parser?branch=develop)
[![BCH compliance](https://bettercodehub.com/edge/badge/SpamScope/mail-parser?branch=develop)](https://bettercodehub.com/)
[![](https://images.microbadger.com/badges/image/fmantuano/spamscope-mail-parser.svg)](https://microbadger.com/images/fmantuano/spamscope-mail-parser "Get your own image badge on microbadger.com")
![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)
# mail-parser
mail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.
It give you an easy way to pass from raw mail to Python object that you can use in your code.
It's the key module of [SpamScope](https://github.com/SpamScope/spamscope).
mail-parser can parse Outlook email format (.msg). To use this feature, you need to install `libemail-outlook-message-perl` package. For Debian based systems:
```
$ apt-get install libemail-outlook-message-perl
```
For more details:
```
$ apt-cache show libemail-outlook-message-perl
```
mail-parser supports Python 3.
# Apache 2 Open Source License
mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
If you want support the project:
[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif "Donate")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)
![Bitcoin Donate](https://i.stack.imgur.com/MnQ6V.png)
![](https://github.com/SpamScope/mail-parser/raw/develop/docs/bitcoin-qrcode.png)
# mail-parser on Web
- [Splunk app](https://splunkbase.splunk.com/app/4129/)
- [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)
- [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)
# Description
mail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of
[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):
- bcc
- cc
- date
- delivered_to
- from\_ (not `from` because is a keyword of Python)
- message_id
- received
- reply_to
- subject
- to
There are other properties to get:
- body
- body html
- body plain
- headers
- attachments
- sender IP address
- to domains
- timezone
The `attachments` property is a list of objects. Every object has the following keys:
- binary: it's true if the attachment is a binary
- charset
- content_transfer_encoding
- content-disposition
- content-id
- filename
- mail_content_type
- payload: attachment payload in base64
To get custom headers you should replace "-" with "\_".
Example for header `X-MSMail-Priority`:
```
$ mail.X_MSMail_Priority
```
The `received` header is parsed and splitted in hop. The fields supported are:
- by
- date
- date_utc
- delay (between two hop)
- envelope_from
- envelope_sender
- for
- from
- hop
- with
mail-parser can detect defect in mail:
- [defects](https://docs.python.org/2/library/email.message.html#email.message.Message.defects): mail with some not compliance RFC part
All properties have a JSON and raw property that you can get with:
- name_json
- name_raw
Example:
```
$ mail.to (Python object)
$ mail.to_json (JSON)
$ mail.to_raw (raw header)
```
The command line tool use the JSON format.
## Defects
These defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).
This library can take these epilogues.
# Authors
## Main Author
**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)
# Installation
Clone repository
```
git clone https://github.com/SpamScope/mail-parser.git
```
and install mail-parser with `setup.py`:
```
$ cd mail-parser
$ python setup.py install
```
or use `pip`:
```
$ pip install mail-parser
```
# Usage in a project
Import `mailparser` module:
```
import mailparser
mail = mailparser.parse_from_bytes(byte_mail)
mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_file_msg(outlook_mail)
mail = mailparser.parse_from_file_obj(fp)
mail = mailparser.parse_from_string(raw_mail)
```
Then you can get all parts
```
mail.attachments: list of all attachments
mail.body
mail.date: datetime object in UTC
mail.defects: defect RFC not compliance
mail.defects_categories: only defects categories
mail.delivered_to
mail.from_
mail.get_server_ipaddress(trust="my_server_mail_trust")
mail.headers
mail.mail: tokenized mail in a object
mail.message: email.message.Message object
mail.message_as_string: message as string
mail.message_id
mail.received
mail.subject
mail.text_plain: only text plain mail parts in a list
mail.text_html: only text html mail parts in a list
mail.text_not_managed: all not managed text (check the warning logs to find content subtype)
mail.to
mail.to_domains
mail.timezone: returns the timezone, offset from UTC
mail.mail_partial: returns only the mains parts of emails
```
It's possible to write the attachments on disk with the method:
```
mail.write_attachments(base_path)
```
# Usage from command-line
If you installed mailparser with `pip` or `setup.py` you can use it with command-line.
These are all swithes:
```
usage: mailparser [-h] (-f FILE | -s STRING | -k)
[-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]
[-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]
[-i Trust mail server string] [-p] [-z] [-v]
Wrapper for email Python Standard Library
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE Raw email file (default: None)
-s STRING, --string STRING
Raw email string (default: None)
-k, --stdin Enable parsing from stdin (default: False)
-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Set log level (default: WARNING)
-j, --json Show the JSON of parsed mail (default: False)
-b, --body Print the body of mail (default: False)
-a, --attachments Print the attachments of mail (default: False)
-r, --headers Print the headers of mail (default: False)
-t, --to Print the to of mail (default: False)
-dt, --delivered-to Print the delivered-to of mail (default: False)
-m, --from Print the from of mail (default: False)
-u, --subject Print the subject of mail (default: False)
-c, --receiveds Print all receiveds of mail (default: False)
-d, --defects Print the defects of mail (default: False)
-o, --outlook Analyze Outlook msg (default: False)
-i Trust mail server string, --senderip Trust mail server string
Extract a reliable sender IP address heuristically
(default: None)
-p, --mail-hash Print mail fingerprints without headers (default:
False)
-z, --attachments-hash
Print attachments with fingerprints (default: False)
-sa, --store-attachments
Store attachments on disk (default: False)
-ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH
Path where store attachments (default: /tmp)
-v, --version show program's version number and exit
It takes as input a raw mail and generates a parsed object.
```
Example:
```shell
$ mailparser -f example_mail -j
```
This example will show you the tokenized mail in a JSON pretty format.
From [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to
[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).
# Exceptions
Exceptions hierarchy of mail-parser:
```
MailParserError: Base MailParser Exception
|
\── MailParserOutlookError: Raised with Outlook integration errors
|
\── MailParserEnvironmentError: Raised when the environment is not correct
|
\── MailParserOSError: Raised when there is an OS error
|
\── MailParserReceivedParsingError: Raised when a received header cannot be parsed
```
Raw data
{
"_id": null,
"home_page": "https://github.com/SpamScope/mail-parser",
"name": "mail-parser",
"maintainer": "Fedele Mantuano",
"docs_url": null,
"requires_python": "",
"maintainer_email": "mantuano.fedele@gmail.com",
"keywords": "mail,email,parser,wrapper",
"author": "Fedele Mantuano",
"author_email": "mantuano.fedele@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2b/3d/7f096230c4b61857c7682dc5497000a987b5f7e8376f54f0a5ed73e2cc3d/mail-parser-3.15.0.tar.gz",
"platform": "Linux",
"description": "[![PyPI version](https://badge.fury.io/py/mail-parser.svg)](https://badge.fury.io/py/mail-parser)\n[![Build Status](https://travis-ci.org/SpamScope/mail-parser.svg?branch=develop)](https://travis-ci.org/SpamScope/mail-parser)\n[![Coverage Status](https://coveralls.io/repos/github/SpamScope/mail-parser/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/mail-parser?branch=develop)\n[![BCH compliance](https://bettercodehub.com/edge/badge/SpamScope/mail-parser?branch=develop)](https://bettercodehub.com/)\n[![](https://images.microbadger.com/badges/image/fmantuano/spamscope-mail-parser.svg)](https://microbadger.com/images/fmantuano/spamscope-mail-parser \"Get your own image badge on microbadger.com\")\n\n![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)\n\n# mail-parser\n\nmail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.\nIt give you an easy way to pass from raw mail to Python object that you can use in your code.\nIt's the key module of [SpamScope](https://github.com/SpamScope/spamscope).\n\nmail-parser can parse Outlook email format (.msg). To use this feature, you need to install `libemail-outlook-message-perl` package. For Debian based systems:\n\n```\n$ apt-get install libemail-outlook-message-perl\n```\n\nFor more details:\n\n```\n$ apt-cache show libemail-outlook-message-perl\n```\n\nmail-parser supports Python 3.\n\n\n# Apache 2 Open Source License\nmail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.\n\nIf you want support the project:\n\n\n[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif \"Donate\")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)\n\n![Bitcoin Donate](https://i.stack.imgur.com/MnQ6V.png)\n\n![](https://github.com/SpamScope/mail-parser/raw/develop/docs/bitcoin-qrcode.png)\n\n\n# mail-parser on Web\n - [Splunk app](https://splunkbase.splunk.com/app/4129/)\n - [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)\n - [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)\n\n\n# Description\n\nmail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of\n[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):\n\n - bcc\n - cc\n - date\n - delivered_to\n - from\\_ (not `from` because is a keyword of Python)\n - message_id\n - received\n - reply_to\n - subject\n - to\n\nThere are other properties to get:\n - body\n - body html\n - body plain\n - headers\n - attachments\n - sender IP address\n - to domains\n - timezone\n\nThe `attachments` property is a list of objects. Every object has the following keys:\n - binary: it's true if the attachment is a binary\n - charset\n - content_transfer_encoding\n - content-disposition\n - content-id\n - filename\n - mail_content_type\n - payload: attachment payload in base64\n\nTo get custom headers you should replace \"-\" with \"\\_\".\nExample for header `X-MSMail-Priority`:\n\n```\n$ mail.X_MSMail_Priority\n```\n\nThe `received` header is parsed and splitted in hop. The fields supported are:\n - by\n - date\n - date_utc\n - delay (between two hop)\n - envelope_from\n - envelope_sender\n - for\n - from\n - hop\n - with\n\n\nmail-parser can detect defect in mail:\n - [defects](https://docs.python.org/2/library/email.message.html#email.message.Message.defects): mail with some not compliance RFC part\n\nAll properties have a JSON and raw property that you can get with:\n - name_json\n - name_raw\n\nExample:\n\n```\n$ mail.to (Python object)\n$ mail.to_json (JSON)\n$ mail.to_raw (raw header)\n```\n\nThe command line tool use the JSON format.\n\n## Defects\nThese defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).\nThis library can take these epilogues.\n\n\n# Authors\n\n## Main Author\n**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)\n\n\n# Installation\n\nClone repository\n\n```\ngit clone https://github.com/SpamScope/mail-parser.git\n```\n\nand install mail-parser with `setup.py`:\n\n```\n$ cd mail-parser\n\n$ python setup.py install\n```\n\nor use `pip`:\n\n```\n$ pip install mail-parser\n```\n\n# Usage in a project\n\nImport `mailparser` module:\n\n```\nimport mailparser\n\nmail = mailparser.parse_from_bytes(byte_mail)\nmail = mailparser.parse_from_file(f)\nmail = mailparser.parse_from_file_msg(outlook_mail)\nmail = mailparser.parse_from_file_obj(fp)\nmail = mailparser.parse_from_string(raw_mail)\n```\n\nThen you can get all parts\n\n```\nmail.attachments: list of all attachments\nmail.body\nmail.date: datetime object in UTC\nmail.defects: defect RFC not compliance\nmail.defects_categories: only defects categories\nmail.delivered_to\nmail.from_\nmail.get_server_ipaddress(trust=\"my_server_mail_trust\")\nmail.headers\nmail.mail: tokenized mail in a object\nmail.message: email.message.Message object\nmail.message_as_string: message as string\nmail.message_id\nmail.received\nmail.subject\nmail.text_plain: only text plain mail parts in a list\nmail.text_html: only text html mail parts in a list\nmail.text_not_managed: all not managed text (check the warning logs to find content subtype)\nmail.to\nmail.to_domains\nmail.timezone: returns the timezone, offset from UTC\nmail.mail_partial: returns only the mains parts of emails\n```\n\nIt's possible to write the attachments on disk with the method:\n\n```\nmail.write_attachments(base_path)\n```\n\n# Usage from command-line\n\nIf you installed mailparser with `pip` or `setup.py` you can use it with command-line.\n\nThese are all swithes:\n\n```\nusage: mailparser [-h] (-f FILE | -s STRING | -k)\n [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]\n [-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]\n [-i Trust mail server string] [-p] [-z] [-v]\n\nWrapper for email Python Standard Library\n\noptional arguments:\n -h, --help show this help message and exit\n -f FILE, --file FILE Raw email file (default: None)\n -s STRING, --string STRING\n Raw email string (default: None)\n -k, --stdin Enable parsing from stdin (default: False)\n -l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}\n Set log level (default: WARNING)\n -j, --json Show the JSON of parsed mail (default: False)\n -b, --body Print the body of mail (default: False)\n -a, --attachments Print the attachments of mail (default: False)\n -r, --headers Print the headers of mail (default: False)\n -t, --to Print the to of mail (default: False)\n -dt, --delivered-to Print the delivered-to of mail (default: False)\n -m, --from Print the from of mail (default: False)\n -u, --subject Print the subject of mail (default: False)\n -c, --receiveds Print all receiveds of mail (default: False)\n -d, --defects Print the defects of mail (default: False)\n -o, --outlook Analyze Outlook msg (default: False)\n -i Trust mail server string, --senderip Trust mail server string\n Extract a reliable sender IP address heuristically\n (default: None)\n -p, --mail-hash Print mail fingerprints without headers (default:\n False)\n -z, --attachments-hash\n Print attachments with fingerprints (default: False)\n -sa, --store-attachments\n Store attachments on disk (default: False)\n -ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH\n Path where store attachments (default: /tmp)\n -v, --version show program's version number and exit\n\nIt takes as input a raw mail and generates a parsed object.\n```\n\nExample:\n\n```shell\n$ mailparser -f example_mail -j\n```\n\nThis example will show you the tokenized mail in a JSON pretty format.\n\nFrom [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to\n[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).\n\n\n# Exceptions\n\nExceptions hierarchy of mail-parser:\n\n```\nMailParserError: Base MailParser Exception\n|\n\\\u2500\u2500 MailParserOutlookError: Raised with Outlook integration errors\n|\n\\\u2500\u2500 MailParserEnvironmentError: Raised when the environment is not correct\n|\n\\\u2500\u2500 MailParserOSError: Raised when there is an OS error\n|\n\\\u2500\u2500 MailParserReceivedParsingError: Raised when a received header cannot be parsed\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Wrapper for email standard library",
"version": "3.15.0",
"project_urls": {
"Homepage": "https://github.com/SpamScope/mail-parser"
},
"split_keywords": [
"mail",
"email",
"parser",
"wrapper"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "675a6ad8c80b10fe92a6d123660b63fa3e23a7a0fe55c20201ab794c9d714f0e",
"md5": "e1d413df8a61bc187252f4fc87e6c74c",
"sha256": "b5d3cd8752cd7c0352c4520388b00881bca24e5ce613d667d8ff8a3ce4994f51"
},
"downloads": -1,
"filename": "mail_parser-3.15.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e1d413df8a61bc187252f4fc87e6c74c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 25964,
"upload_time": "2021-02-26T17:37:03",
"upload_time_iso_8601": "2021-02-26T17:37:03.610950Z",
"url": "https://files.pythonhosted.org/packages/67/5a/6ad8c80b10fe92a6d123660b63fa3e23a7a0fe55c20201ab794c9d714f0e/mail_parser-3.15.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b3d7f096230c4b61857c7682dc5497000a987b5f7e8376f54f0a5ed73e2cc3d",
"md5": "835fc32d629824bca8c1fae5d240e11c",
"sha256": "d66638acf0633dfd8a718e1e3646a6d58f8e9d75080c94638c7b267b4b0d6c86"
},
"downloads": -1,
"filename": "mail-parser-3.15.0.tar.gz",
"has_sig": false,
"md5_digest": "835fc32d629824bca8c1fae5d240e11c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18001,
"upload_time": "2021-02-26T17:37:05",
"upload_time_iso_8601": "2021-02-26T17:37:05.312872Z",
"url": "https://files.pythonhosted.org/packages/2b/3d/7f096230c4b61857c7682dc5497000a987b5f7e8376f54f0a5ed73e2cc3d/mail-parser-3.15.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-02-26 17:37:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SpamScope",
"github_project": "mail-parser",
"travis_ci": true,
"coveralls": true,
"github_actions": false,
"requirements": [],
"tox": true,
"lcname": "mail-parser"
}