html2text


Namehtml2text JSON
Version 2024.2.26 PyPI version JSON
download
home_pagehttps://github.com/Alir3z4/html2text/
SummaryTurn HTML into equivalent Markdown-structured text.
upload_time2024-02-27 18:49:24
maintainerAlireza Savand
docs_urlNone
authorAaron Swartz
requires_python>=3.8
licenseGNU GPL 3
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # html2text

[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)



html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).


Usage: `html2text [filename [encoding]]`

| Option                                                 | Description
|--------------------------------------------------------|---------------------------------------------------
| `--version`                                            | Show program's version number and exit
| `-h`, `--help`                                         | Show this help message and exit
| `--ignore-links`                                       | Don't include any formatting for links
|`--escape-all`                                          | Escape all special characters.  Output is less readable, but avoids corner case formatting issues.
| `--reference-links`                                    | Use reference links instead of links to create markdown
| `--mark-code`                                          | Mark preformatted and code blocks with [code]...[/code]

For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)


Or you can use it from within `Python`:

```
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

```


Or with some configuration options:
```
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!

```

*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*


## How to install

`html2text` is available on pypi
https://pypi.org/project/html2text/

```
$ pip install html2text
```


## How to run unit tests

    tox

To see the coverage results:

    coverage html

then open the `./htmlcov/index.html` file in your browser.

## Documentation

Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Alir3z4/html2text/",
    "name": "html2text",
    "maintainer": "Alireza Savand",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "alireza.savand@gmail.com",
    "keywords": "",
    "author": "Aaron Swartz",
    "author_email": "me@aaronsw.com",
    "download_url": "https://files.pythonhosted.org/packages/1a/43/e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05/html2text-2024.2.26.tar.gz",
    "platform": "OS Independent",
    "description": "# html2text\r\n\r\n[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)\r\n[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)\r\n\r\n\r\n\r\nhtml2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).\r\n\r\n\r\nUsage: `html2text [filename [encoding]]`\r\n\r\n| Option                                                 | Description\r\n|--------------------------------------------------------|---------------------------------------------------\r\n| `--version`                                            | Show program's version number and exit\r\n| `-h`, `--help`                                         | Show this help message and exit\r\n| `--ignore-links`                                       | Don't include any formatting for links\r\n|`--escape-all`                                          | Escape all special characters.  Output is less readable, but avoids corner case formatting issues.\r\n| `--reference-links`                                    | Use reference links instead of links to create markdown\r\n| `--mark-code`                                          | Mark preformatted and code blocks with [code]...[/code]\r\n\r\nFor a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\r\n\r\n\r\nOr you can use it from within `Python`:\r\n\r\n```\r\n>>> import html2text\r\n>>>\r\n>>> print(html2text.html2text(\"<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>\"))\r\n**Zed's** dead baby, _Zed's_ dead.\r\n\r\n```\r\n\r\n\r\nOr with some configuration options:\r\n```\r\n>>> import html2text\r\n>>>\r\n>>> h = html2text.HTML2Text()\r\n>>> # Ignore converting links from HTML\r\n>>> h.ignore_links = True\r\n>>> print h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\")\r\nHello, world!\r\n\r\n>>> print(h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\"))\r\n\r\nHello, world!\r\n\r\n>>> # Don't Ignore links anymore, I like links\r\n>>> h.ignore_links = False\r\n>>> print(h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\"))\r\nHello, [world](https://www.google.com/earth/)!\r\n\r\n```\r\n\r\n*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*\r\n\r\n\r\n## How to install\r\n\r\n`html2text` is available on pypi\r\nhttps://pypi.org/project/html2text/\r\n\r\n```\r\n$ pip install html2text\r\n```\r\n\r\n\r\n## How to run unit tests\r\n\r\n    tox\r\n\r\nTo see the coverage results:\r\n\r\n    coverage html\r\n\r\nthen open the `./htmlcov/index.html` file in your browser.\r\n\r\n## Documentation\r\n\r\nDocumentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\r\n",
    "bugtrack_url": null,
    "license": "GNU GPL 3",
    "summary": "Turn HTML into equivalent Markdown-structured text.",
    "version": "2024.2.26",
    "project_urls": {
        "Homepage": "https://github.com/Alir3z4/html2text/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a43e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05",
                "md5": "b67974402e2e3ea0e7d611ce3096388c",
                "sha256": "05f8e367d15aaabc96415376776cdd11afd5127a77fce6e36afc60c563ca2c32"
            },
            "downloads": -1,
            "filename": "html2text-2024.2.26.tar.gz",
            "has_sig": false,
            "md5_digest": "b67974402e2e3ea0e7d611ce3096388c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 56527,
            "upload_time": "2024-02-27T18:49:24",
            "upload_time_iso_8601": "2024-02-27T18:49:24.855461Z",
            "url": "https://files.pythonhosted.org/packages/1a/43/e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05/html2text-2024.2.26.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-27 18:49:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Alir3z4",
    "github_project": "html2text",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "html2text"
}
        
Elapsed time: 2.97004s