# html2text
[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text [filename [encoding]]`
| Option | Description
|--------------------------------------------------------|---------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
| `--reference-links` | Use reference links instead of links to create markdown
| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]
For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
Or you can use it from within `Python`:
```
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
```
Or with some configuration options:
```
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!
```
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
## How to install
`html2text` is available on pypi
https://pypi.org/project/html2text/
```
$ pip install html2text
```
## How to run unit tests
tox
To see the coverage results:
coverage html
then open the `./htmlcov/index.html` file in your browser.
## Documentation
Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
Raw data
{
"_id": null,
"home_page": "https://github.com/Alir3z4/html2text/",
"name": "html2text",
"maintainer": "Alireza Savand",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "alireza.savand@gmail.com",
"keywords": "",
"author": "Aaron Swartz",
"author_email": "me@aaronsw.com",
"download_url": "https://files.pythonhosted.org/packages/1a/43/e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05/html2text-2024.2.26.tar.gz",
"platform": "OS Independent",
"description": "# html2text\r\n\r\n[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)\r\n[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)\r\n\r\n\r\n\r\nhtml2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).\r\n\r\n\r\nUsage: `html2text [filename [encoding]]`\r\n\r\n| Option | Description\r\n|--------------------------------------------------------|---------------------------------------------------\r\n| `--version` | Show program's version number and exit\r\n| `-h`, `--help` | Show this help message and exit\r\n| `--ignore-links` | Don't include any formatting for links\r\n|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.\r\n| `--reference-links` | Use reference links instead of links to create markdown\r\n| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]\r\n\r\nFor a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\r\n\r\n\r\nOr you can use it from within `Python`:\r\n\r\n```\r\n>>> import html2text\r\n>>>\r\n>>> print(html2text.html2text(\"<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>\"))\r\n**Zed's** dead baby, _Zed's_ dead.\r\n\r\n```\r\n\r\n\r\nOr with some configuration options:\r\n```\r\n>>> import html2text\r\n>>>\r\n>>> h = html2text.HTML2Text()\r\n>>> # Ignore converting links from HTML\r\n>>> h.ignore_links = True\r\n>>> print h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\")\r\nHello, world!\r\n\r\n>>> print(h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\"))\r\n\r\nHello, world!\r\n\r\n>>> # Don't Ignore links anymore, I like links\r\n>>> h.ignore_links = False\r\n>>> print(h.handle(\"<p>Hello, <a href='https://www.google.com/earth/'>world</a>!\"))\r\nHello, [world](https://www.google.com/earth/)!\r\n\r\n```\r\n\r\n*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*\r\n\r\n\r\n## How to install\r\n\r\n`html2text` is available on pypi\r\nhttps://pypi.org/project/html2text/\r\n\r\n```\r\n$ pip install html2text\r\n```\r\n\r\n\r\n## How to run unit tests\r\n\r\n tox\r\n\r\nTo see the coverage results:\r\n\r\n coverage html\r\n\r\nthen open the `./htmlcov/index.html` file in your browser.\r\n\r\n## Documentation\r\n\r\nDocumentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\r\n",
"bugtrack_url": null,
"license": "GNU GPL 3",
"summary": "Turn HTML into equivalent Markdown-structured text.",
"version": "2024.2.26",
"project_urls": {
"Homepage": "https://github.com/Alir3z4/html2text/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1a43e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05",
"md5": "b67974402e2e3ea0e7d611ce3096388c",
"sha256": "05f8e367d15aaabc96415376776cdd11afd5127a77fce6e36afc60c563ca2c32"
},
"downloads": -1,
"filename": "html2text-2024.2.26.tar.gz",
"has_sig": false,
"md5_digest": "b67974402e2e3ea0e7d611ce3096388c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 56527,
"upload_time": "2024-02-27T18:49:24",
"upload_time_iso_8601": "2024-02-27T18:49:24.855461Z",
"url": "https://files.pythonhosted.org/packages/1a/43/e1d53588561e533212117750ee79ad0ba02a41f52a08c1df3396bd466c05/html2text-2024.2.26.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-27 18:49:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Alir3z4",
"github_project": "html2text",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "html2text"
}