html-to-markdown


Namehtml-to-markdown JSON
Version 1.1.0 PyPI version JSON
download
home_pageNone
SummaryConvert HTML to markdown
upload_time2024-09-09 06:26:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords beautifulsoup converter html markdown text-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # html_to_markdown

This library is a refactored and modernized fork of [markdownify](https://pypi.org/project/markdownify/), supporting
Python 3.9 and above.

### Differences with the Markdownify

- The refactored codebase uses a strict functional approach - no classes are involved.
- There is full typing with strict MyPy strict adherence and a py.typed file included.
- The `convert_to_markdown` function allows passing a pre-configured instance of `BeautifulSoup` instead of html.
- This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which
  point versioning is no longer aligned.

## Installation

```shell
pip install html_to_markdown
```

## Usage

Convert an string HTML to Markdown:

```python
from html_to_markdown import convert_to_markdown

convert_to_markdown('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'
```

Or pass a pre-configured instance of `BeautifulSoup`:

```python
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown

soup = BeautifulSoup('<b>Yay</b> <a href="http://github.com">GitHub</a>', 'lxml')  # lxml requires an extra dependency.

convert_to_markdown(soup)  # > '**Yay** [GitHub](http://github.com)'
```

### Options

The `convert_to_markdown` function accepts the following kwargs:

- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.
- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.
- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.
- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.
- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.
- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.
- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.
- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.
- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.
- heading_style (Literal["underlined", "atx", "atx_closed"]): The style to use for Markdown headings. Defaults to "
  underlined".
- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.
- newline_style (Literal["spaces", "backslash"]): Style for handling newlines in text content. Defaults to "spaces".
- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.
- strong_em_symbol (Literal["*", "_"]): Symbol to use for strong/emphasized text. Defaults to "*".
- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.
- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.
- wrap (bool): Wrap text to the specified width. Defaults to False.
- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.
- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.

## CLI

For compatibility with the original markdownify, a CLI is provided. Use `html_to_markdown example.html > example.md` or
pipe input from stdin:

```shell
cat example.html | html_to_markdown > example.md
```

Use `html_to_markdown -h` to see all available options. They are the same as listed above and take the same arguments.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "html-to-markdown",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "beautifulsoup, converter, html, markdown, text-processing",
    "author": null,
    "author_email": "Na'aman Hirschfeld <nhirschfeld@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/74/d3/52475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f/html_to_markdown-1.1.0.tar.gz",
    "platform": null,
    "description": "# html_to_markdown\n\nThis library is a refactored and modernized fork of [markdownify](https://pypi.org/project/markdownify/), supporting\nPython 3.9 and above.\n\n### Differences with the Markdownify\n\n- The refactored codebase uses a strict functional approach - no classes are involved.\n- There is full typing with strict MyPy strict adherence and a py.typed file included.\n- The `convert_to_markdown` function allows passing a pre-configured instance of `BeautifulSoup` instead of html.\n- This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which\n  point versioning is no longer aligned.\n\n## Installation\n\n```shell\npip install html_to_markdown\n```\n\n## Usage\n\nConvert an string HTML to Markdown:\n\n```python\nfrom html_to_markdown import convert_to_markdown\n\nconvert_to_markdown('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'\n```\n\nOr pass a pre-configured instance of `BeautifulSoup`:\n\n```python\nfrom bs4 import BeautifulSoup\nfrom html_to_markdown import convert_to_markdown\n\nsoup = BeautifulSoup('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>', 'lxml')  # lxml requires an extra dependency.\n\nconvert_to_markdown(soup)  # > '**Yay** [GitHub](http://github.com)'\n```\n\n### Options\n\nThe `convert_to_markdown` function accepts the following kwargs:\n\n- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.\n- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.\n- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.\n- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.\n- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.\n- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.\n- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.\n- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.\n- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.\n- heading_style (Literal[\"underlined\", \"atx\", \"atx_closed\"]): The style to use for Markdown headings. Defaults to \"\n  underlined\".\n- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.\n- newline_style (Literal[\"spaces\", \"backslash\"]): Style for handling newlines in text content. Defaults to \"spaces\".\n- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.\n- strong_em_symbol (Literal[\"*\", \"_\"]): Symbol to use for strong/emphasized text. Defaults to \"*\".\n- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.\n- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.\n- wrap (bool): Wrap text to the specified width. Defaults to False.\n- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.\n- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.\n\n## CLI\n\nFor compatibility with the original markdownify, a CLI is provided. Use `html_to_markdown example.html > example.md` or\npipe input from stdin:\n\n```shell\ncat example.html | html_to_markdown > example.md\n```\n\nUse `html_to_markdown -h` to see all available options. They are the same as listed above and take the same arguments.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Convert HTML to markdown",
    "version": "1.1.0",
    "project_urls": null,
    "split_keywords": [
        "beautifulsoup",
        " converter",
        " html",
        " markdown",
        " text-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14e01c78aff17b862d2e0f0edea0f1f24a089ef71cd8393435afede9850f1f29",
                "md5": "4057325f43bafd09479241f5214cd266",
                "sha256": "1aa42c056b6f3606f7d137c90b893a655d11bc818b93fc534bafdde4ea21553b"
            },
            "downloads": -1,
            "filename": "html_to_markdown-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4057325f43bafd09479241f5214cd266",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 13394,
            "upload_time": "2024-09-09T06:26:32",
            "upload_time_iso_8601": "2024-09-09T06:26:32.658647Z",
            "url": "https://files.pythonhosted.org/packages/14/e0/1c78aff17b862d2e0f0edea0f1f24a089ef71cd8393435afede9850f1f29/html_to_markdown-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74d352475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f",
                "md5": "6980fa6fb5cfc30d9062d646d3ffd2c3",
                "sha256": "f6912217f555f526261096ea886e1a87073b1c5327228954315d94965871c1cd"
            },
            "downloads": -1,
            "filename": "html_to_markdown-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6980fa6fb5cfc30d9062d646d3ffd2c3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 10771,
            "upload_time": "2024-09-09T06:26:33",
            "upload_time_iso_8601": "2024-09-09T06:26:33.873126Z",
            "url": "https://files.pythonhosted.org/packages/74/d3/52475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f/html_to_markdown-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-09 06:26:33",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "html-to-markdown"
}
        
Elapsed time: 0.32185s