rtfparse


Namertfparse JSON
Version 0.8.2 PyPI version JSON
download
home_pageNone
SummaryTool to parse Microsoft Rich Text Format (RTF)
upload_time2024-03-05 04:39:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseCopyright (c) 2023 Sven Siegmud Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords parse rtf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rtfparse

RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with `rtfparse` is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.

rtfparse can also decompressed RTF from MS Outlook `.msg` files and parse that.

# Installation

Install rtfparse from your local repository with pip:

    pip install rtfparse

Installation creates an executable file `rtfparse` in your python scripts folder which should be in your `$PATH`. 

# Usage From Command Line

Use the `rtfparse` executable from the command line. Read `rtfparse --help`.

rtfparse writes logs into `~/rtfparse/` into these files:

```
rtfparse.debug.log
rtfparse.info.log
rtfparse.errors.log
```

## Example: De-encapsulate HTML from an uncompressed RTF file

    rtfparse --rtf-file "path/to/rtf_file.rtf" --de-encapsulate-html --output-file "path/to/extracted.html"

## Example: De-encapsulate HTML from MS Outlook email file

Thanks to [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf), rtfparse internally uses them:

    rtfparse --msg-file "path/to/email.msg" --de-encapsulate-html --output-file "path/to/extracted.html"

## Example: Only decompress the RTF from MS Outlook email file

    rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"

## Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments

When extracting the RTF from the `.msg` file, you can save the attachments (which includes images embedded in the email text) in a directory:

    rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"

In `rtfparse` version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package [embedimg](https://github.com/fleetingbytes/embedimg).

    rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img

In the current version the option `--embed-img` does nothing.

# Programatic usage in python module

```
from pathlib import Path
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML

source_path = Path(r"path/to/your/rtf/document.rtf")
target_path = Path(r"path/to/your/html/de_encapsulated.html")
# Create parent directory of `target_path` if it does not already exist:
target_path.parent.mkdir(parents=True, exist_ok=True)


parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()

renderer = De_encapsulate_HTML()

with open(target_path, mode="w", encoding="utf-8") as html_file:
    renderer.render(parsed, html_file)
```

# RTF Specification Links

If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.

* [Swissmains Link to RTF Spec 1.9.1](https://manuals.swissmains.com/pages/viewpage.action?pageId=1376332&preview=%2F1376332%2F10620104%2FWord2007RTFSpec9.pdf)
* [Webarchive Link to RTF Spec 1.9.1](https://web.archive.org/web/20190708132914/http://www.kleinlercher.at/tools/Windows_Protocols/Word2007RTFSpec9.pdf)
* [RTF Extensions, MS-OXRTFEX](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/411d0d58-49f7-496c-b8c3-5859b045f6cf)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rtfparse",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "parse,rtf",
    "author": null,
    "author_email": "Sven Siegmund <sven.siegmund@iav.de>",
    "download_url": "https://files.pythonhosted.org/packages/ed/a9/2cfa90ca3c88f8cf320e21d4e4dbc13b40621d303a56c2068b1bc0cfc454/rtfparse-0.8.2.tar.gz",
    "platform": null,
    "description": "# rtfparse\n\nRTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with `rtfparse` is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.\n\nrtfparse can also decompressed RTF from MS Outlook `.msg` files and parse that.\n\n# Installation\n\nInstall rtfparse from your local repository with pip:\n\n    pip install rtfparse\n\nInstallation creates an executable file `rtfparse` in your python scripts folder which should be in your `$PATH`. \n\n# Usage From Command Line\n\nUse the `rtfparse` executable from the command line. Read `rtfparse --help`.\n\nrtfparse writes logs into `~/rtfparse/` into these files:\n\n```\nrtfparse.debug.log\nrtfparse.info.log\nrtfparse.errors.log\n```\n\n## Example: De-encapsulate HTML from an uncompressed RTF file\n\n    rtfparse --rtf-file \"path/to/rtf_file.rtf\" --de-encapsulate-html --output-file \"path/to/extracted.html\"\n\n## Example: De-encapsulate HTML from MS Outlook email file\n\nThanks to [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf), rtfparse internally uses them:\n\n    rtfparse --msg-file \"path/to/email.msg\" --de-encapsulate-html --output-file \"path/to/extracted.html\"\n\n## Example: Only decompress the RTF from MS Outlook email file\n\n    rtfparse --msg-file \"path/to/email.msg\" --output-file \"path/to/extracted.rtf\"\n\n## Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments\n\nWhen extracting the RTF from the `.msg` file, you can save the attachments (which includes images embedded in the email text) in a directory:\n\n    rtfparse --msg-file \"path/to/email.msg\" --output-file \"path/to/extracted.rtf\" --attachments-dir \"path/to/dir\"\n\nIn `rtfparse` version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package [embedimg](https://github.com/fleetingbytes/embedimg).\n\n    rtfparse --msg-file \"path/to/email.msg\" --output-file \"path/to/extracted.rtf\" --attachments-dir \"path/to/dir\" --embed-img\n\nIn the current version the option `--embed-img` does nothing.\n\n# Programatic usage in python module\n\n```\nfrom pathlib import Path\nfrom rtfparse.parser import Rtf_Parser\nfrom rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML\n\nsource_path = Path(r\"path/to/your/rtf/document.rtf\")\ntarget_path = Path(r\"path/to/your/html/de_encapsulated.html\")\n# Create parent directory of `target_path` if it does not already exist:\ntarget_path.parent.mkdir(parents=True, exist_ok=True)\n\n\nparser = Rtf_Parser(rtf_path=source_path)\nparsed = parser.parse_file()\n\nrenderer = De_encapsulate_HTML()\n\nwith open(target_path, mode=\"w\", encoding=\"utf-8\") as html_file:\n    renderer.render(parsed, html_file)\n```\n\n# RTF Specification Links\n\nIf you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.\n\n* [Swissmains Link to RTF Spec 1.9.1](https://manuals.swissmains.com/pages/viewpage.action?pageId=1376332&preview=%2F1376332%2F10620104%2FWord2007RTFSpec9.pdf)\n* [Webarchive Link to RTF Spec 1.9.1](https://web.archive.org/web/20190708132914/http://www.kleinlercher.at/tools/Windows_Protocols/Word2007RTFSpec9.pdf)\n* [RTF Extensions, MS-OXRTFEX](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/411d0d58-49f7-496c-b8c3-5859b045f6cf)\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2023 Sven Siegmud\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "Tool to parse Microsoft Rich Text Format (RTF)",
    "version": "0.8.2",
    "project_urls": {
        "Documentation": "https://github.com/fleetingbytes/rtfparse#readme",
        "Issues": "https://github.com/fleetingbytes/rtfparse/issues",
        "Source": "https://github.com/fleetingbytes/rtfparse"
    },
    "split_keywords": [
        "parse",
        "rtf"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2901b1d037592e1810cb18d17f2ab6ea8508eeb4883b2dc916de1ef9460cba82",
                "md5": "52e54ca0dc1ccad33f30d8fd33d7a0f2",
                "sha256": "c77e0c4eed9618d3035558618273cace340ddeaa22f0e500b34124407fb5fb21"
            },
            "downloads": -1,
            "filename": "rtfparse-0.8.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "52e54ca0dc1ccad33f30d8fd33d7a0f2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 15859,
            "upload_time": "2024-03-05T04:39:56",
            "upload_time_iso_8601": "2024-03-05T04:39:56.990181Z",
            "url": "https://files.pythonhosted.org/packages/29/01/b1d037592e1810cb18d17f2ab6ea8508eeb4883b2dc916de1ef9460cba82/rtfparse-0.8.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eda92cfa90ca3c88f8cf320e21d4e4dbc13b40621d303a56c2068b1bc0cfc454",
                "md5": "603a4c53b3468b238df47c197075e903",
                "sha256": "a037a0a8232e3e983c314da8926a1581bb681e54b02b8b53dcb011c0f8df6f00"
            },
            "downloads": -1,
            "filename": "rtfparse-0.8.2.tar.gz",
            "has_sig": false,
            "md5_digest": "603a4c53b3468b238df47c197075e903",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 13332,
            "upload_time": "2024-03-05T04:39:58",
            "upload_time_iso_8601": "2024-03-05T04:39:58.663459Z",
            "url": "https://files.pythonhosted.org/packages/ed/a9/2cfa90ca3c88f8cf320e21d4e4dbc13b40621d303a56c2068b1bc0cfc454/rtfparse-0.8.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-05 04:39:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fleetingbytes",
    "github_project": "rtfparse#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "rtfparse"
}
        
Elapsed time: 0.20116s