Name | RTFDE JSON |
Version |
0.1.2
JSON |
| download |
home_page | https://github.com/seamustuohy/RTFDE |
Summary | A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format. |
upload_time | 2024-06-22 15:11:56 |
maintainer | None |
docs_url | None |
author | seamus tuohy |
requires_python | ~=3.8 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# RTFDE: RTF De-Encapsulator
A python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files.
De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.
# Features
- De-encapsulate HTML from RTF encapsulated HTML.
- De-encapsulate plain text from RTF encapsulated text.
# Known Issues
- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped.
- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.
# Anti-Features (I don't intend to have this library do this.)
- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.
# Installation
**To install from the pip package.**
```
pip3 install RTFDE
```
# Usage
## De-encapsulating HTML or TEXT
```python
from RTFDE.deencapsulate import DeEncapsulator
with open('rtf_file', 'rb') as fp:
raw_rtf = fp.read()
rtf_obj = DeEncapsulator(raw_rtf)
rtf_obj.deencapsulate()
if rtf_obj.content_type == 'html':
print(rtf_obj.html)
else:
print(rtf_obj.text)
```
# Enabling Logging
Any logging (including how verbose the logging is) can be handled by configuring logging. You can enable RTFDE's logging at the highest level by getting and setting the "RTFDE" logger.
```
log = logging.getLogger("RTFDE")
log.setLevel(logging.INFO)
```
To see how to enable more in-depth logging for debugging check out the CONTRIBUTING.md file.
```
# Now, get the log that you want
# The main logger is simply called RTFDE. That will get you all the *normal* logs.
requests_log = logging.getLogger("RTFDE")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
```
# Contribute
Please check the [contributing guidelines](./CONTRIBUTING.md)
# License
Please see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.
Raw data
{
"_id": null,
"home_page": "https://github.com/seamustuohy/RTFDE",
"name": "RTFDE",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.8",
"maintainer_email": null,
"keywords": null,
"author": "seamus tuohy",
"author_email": "code@seamustuohy.com",
"download_url": null,
"platform": null,
"description": "# RTFDE: RTF De-Encapsulator\n\nA python3 library for extracting encapsulated `HTML` & `plain text` content from the `RTF` bodies of .msg files.\n\nDe-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.\n\n# Features\n\n- De-encapsulate HTML from RTF encapsulated HTML.\n- De-encapsulate plain text from RTF encapsulated text.\n\n# Known Issues\n\n- This library *fully* unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable) text will be returned un-escaped.\n- This library currently can't [combine attachments](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/b518f0bc-468c-4218-87a7-8f8859bf5773) from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.\n\n# Anti-Features (I don't intend to have this library do this.)\n\n- Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.\n\n# Installation\n\n**To install from the pip package.**\n\n```\npip3 install RTFDE\n\n```\n\n# Usage\n\n## De-encapsulating HTML or TEXT\n\n```python\nfrom RTFDE.deencapsulate import DeEncapsulator\n\nwith open('rtf_file', 'rb') as fp:\n raw_rtf = fp.read()\n rtf_obj = DeEncapsulator(raw_rtf)\n rtf_obj.deencapsulate()\n if rtf_obj.content_type == 'html':\n print(rtf_obj.html)\n else:\n print(rtf_obj.text)\n```\n\n\n\n# Enabling Logging\n\nAny logging (including how verbose the logging is) can be handled by configuring logging. You can enable RTFDE's logging at the highest level by getting and setting the \"RTFDE\" logger.\n\n```\nlog = logging.getLogger(\"RTFDE\")\nlog.setLevel(logging.INFO)\n```\n\n\n\n\n\n\nTo see how to enable more in-depth logging for debugging check out the CONTRIBUTING.md file.\n\n```\n# Now, get the log that you want\n# The main logger is simply called RTFDE. That will get you all the *normal* logs.\nrequests_log = logging.getLogger(\"RTFDE\")\nrequests_log.setLevel(logging.DEBUG)\nrequests_log.propagate = True\n```\n\n\n# Contribute\n\nPlease check the [contributing guidelines](./CONTRIBUTING.md)\n\n# License\n\nPlease see the [license file](./LICENSE) for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format.",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/seamustuohy/RTFDE"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c9321ad82739351117c0711767b828e8f2567a5ffb783741a87120d955564a19",
"md5": "cef70b2430f1ee66ec2bcd2cf19a620e",
"sha256": "f6d1450c99b04e930da130e8b419aa33b1f953623e1b94ad5c0f67f0362eb737"
},
"downloads": -1,
"filename": "RTFDE-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cef70b2430f1ee66ec2bcd2cf19a620e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.8",
"size": 36142,
"upload_time": "2024-06-22T15:11:56",
"upload_time_iso_8601": "2024-06-22T15:11:56.792260Z",
"url": "https://files.pythonhosted.org/packages/c9/32/1ad82739351117c0711767b828e8f2567a5ffb783741a87120d955564a19/RTFDE-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-22 15:11:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "seamustuohy",
"github_project": "RTFDE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rtfde"
}