# html_to_markdown
This library is a refactored and modernized fork of [markdownify](https://pypi.org/project/markdownify/), supporting
Python 3.9 and above.
### Differences with the Markdownify
- The refactored codebase uses a strict functional approach - no classes are involved.
- There is full typing with strict MyPy strict adherence and a py.typed file included.
- The `convert_to_markdown` function allows passing a pre-configured instance of `BeautifulSoup` instead of html.
- This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which
point versioning is no longer aligned.
## Installation
```shell
pip install html_to_markdown
```
## Usage
Convert an string HTML to Markdown:
```python
from html_to_markdown import convert_to_markdown
convert_to_markdown('<b>Yay</b> <a href="http://github.com">GitHub</a>') # > '**Yay** [GitHub](http://github.com)'
```
Or pass a pre-configured instance of `BeautifulSoup`:
```python
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown
soup = BeautifulSoup('<b>Yay</b> <a href="http://github.com">GitHub</a>', 'lxml') # lxml requires an extra dependency.
convert_to_markdown(soup) # > '**Yay** [GitHub](http://github.com)'
```
### Options
The `convert_to_markdown` function accepts the following kwargs:
- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.
- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.
- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.
- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.
- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.
- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.
- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.
- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.
- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.
- heading_style (Literal["underlined", "atx", "atx_closed"]): The style to use for Markdown headings. Defaults to "
underlined".
- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.
- newline_style (Literal["spaces", "backslash"]): Style for handling newlines in text content. Defaults to "spaces".
- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.
- strong_em_symbol (Literal["*", "_"]): Symbol to use for strong/emphasized text. Defaults to "*".
- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.
- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.
- wrap (bool): Wrap text to the specified width. Defaults to False.
- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.
- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.
## CLI
For compatibility with the original markdownify, a CLI is provided. Use `html_to_markdown example.html > example.md` or
pipe input from stdin:
```shell
cat example.html | html_to_markdown > example.md
```
Use `html_to_markdown -h` to see all available options. They are the same as listed above and take the same arguments.
Raw data
{
"_id": null,
"home_page": null,
"name": "html-to-markdown",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "beautifulsoup, converter, html, markdown, text-processing",
"author": null,
"author_email": "Na'aman Hirschfeld <nhirschfeld@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/74/d3/52475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f/html_to_markdown-1.1.0.tar.gz",
"platform": null,
"description": "# html_to_markdown\n\nThis library is a refactored and modernized fork of [markdownify](https://pypi.org/project/markdownify/), supporting\nPython 3.9 and above.\n\n### Differences with the Markdownify\n\n- The refactored codebase uses a strict functional approach - no classes are involved.\n- There is full typing with strict MyPy strict adherence and a py.typed file included.\n- The `convert_to_markdown` function allows passing a pre-configured instance of `BeautifulSoup` instead of html.\n- This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which\n point versioning is no longer aligned.\n\n## Installation\n\n```shell\npip install html_to_markdown\n```\n\n## Usage\n\nConvert an string HTML to Markdown:\n\n```python\nfrom html_to_markdown import convert_to_markdown\n\nconvert_to_markdown('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>') # > '**Yay** [GitHub](http://github.com)'\n```\n\nOr pass a pre-configured instance of `BeautifulSoup`:\n\n```python\nfrom bs4 import BeautifulSoup\nfrom html_to_markdown import convert_to_markdown\n\nsoup = BeautifulSoup('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>', 'lxml') # lxml requires an extra dependency.\n\nconvert_to_markdown(soup) # > '**Yay** [GitHub](http://github.com)'\n```\n\n### Options\n\nThe `convert_to_markdown` function accepts the following kwargs:\n\n- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.\n- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.\n- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.\n- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.\n- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.\n- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.\n- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.\n- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.\n- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.\n- heading_style (Literal[\"underlined\", \"atx\", \"atx_closed\"]): The style to use for Markdown headings. Defaults to \"\n underlined\".\n- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.\n- newline_style (Literal[\"spaces\", \"backslash\"]): Style for handling newlines in text content. Defaults to \"spaces\".\n- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.\n- strong_em_symbol (Literal[\"*\", \"_\"]): Symbol to use for strong/emphasized text. Defaults to \"*\".\n- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.\n- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.\n- wrap (bool): Wrap text to the specified width. Defaults to False.\n- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.\n- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.\n\n## CLI\n\nFor compatibility with the original markdownify, a CLI is provided. Use `html_to_markdown example.html > example.md` or\npipe input from stdin:\n\n```shell\ncat example.html | html_to_markdown > example.md\n```\n\nUse `html_to_markdown -h` to see all available options. They are the same as listed above and take the same arguments.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Convert HTML to markdown",
"version": "1.1.0",
"project_urls": null,
"split_keywords": [
"beautifulsoup",
" converter",
" html",
" markdown",
" text-processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "14e01c78aff17b862d2e0f0edea0f1f24a089ef71cd8393435afede9850f1f29",
"md5": "4057325f43bafd09479241f5214cd266",
"sha256": "1aa42c056b6f3606f7d137c90b893a655d11bc818b93fc534bafdde4ea21553b"
},
"downloads": -1,
"filename": "html_to_markdown-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4057325f43bafd09479241f5214cd266",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 13394,
"upload_time": "2024-09-09T06:26:32",
"upload_time_iso_8601": "2024-09-09T06:26:32.658647Z",
"url": "https://files.pythonhosted.org/packages/14/e0/1c78aff17b862d2e0f0edea0f1f24a089ef71cd8393435afede9850f1f29/html_to_markdown-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "74d352475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f",
"md5": "6980fa6fb5cfc30d9062d646d3ffd2c3",
"sha256": "f6912217f555f526261096ea886e1a87073b1c5327228954315d94965871c1cd"
},
"downloads": -1,
"filename": "html_to_markdown-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "6980fa6fb5cfc30d9062d646d3ffd2c3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 10771,
"upload_time": "2024-09-09T06:26:33",
"upload_time_iso_8601": "2024-09-09T06:26:33.873126Z",
"url": "https://files.pythonhosted.org/packages/74/d3/52475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f/html_to_markdown-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-09 06:26:33",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "html-to-markdown"
}