markdownify


Namemarkdownify JSON
Version 0.14.1 PyPI version JSON
download
home_pageNone
SummaryConvert HTML to markdown.
upload_time2024-11-24 22:08:30
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |build| |version| |license| |downloads|

.. |build| image:: https://img.shields.io/github/actions/workflow/status/matthewwithanm/python-markdownify/python-app.yml?branch=develop
    :alt: GitHub Workflow Status
    :target: https://github.com/matthewwithanm/python-markdownify/actions/workflows/python-app.yml?query=workflow%3A%22Python+application%22

.. |version| image:: https://img.shields.io/pypi/v/markdownify
    :alt: Pypi version
    :target: https://pypi.org/project/markdownify/

.. |license| image:: https://img.shields.io/pypi/l/markdownify
    :alt: License
    :target: https://github.com/matthewwithanm/python-markdownify/blob/develop/LICENSE

.. |downloads| image:: https://pepy.tech/badge/markdownify
    :alt: Pypi Downloads
    :target: https://pepy.tech/project/markdownify

Installation
============

``pip install markdownify``


Usage
=====

Convert some HTML to Markdown:

.. code:: python

    from markdownify import markdownify as md
    md('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'

Specify tags to exclude:

.. code:: python

    from markdownify import markdownify as md
    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'

\...or specify the tags you want to include:

.. code:: python

    from markdownify import markdownify as md
    md('<b>Yay</b> <a href="http://github.com">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'


Options
=======

Markdownify supports the following options:

strip
  A list of tags to strip. This option can't be used with the
  ``convert`` option.

convert
  A list of tags to convert. This option can't be used with the
  ``strip`` option.

autolinks
  A boolean indicating whether the "automatic link" style should be used when
  a ``a`` tag's contents match its href. Defaults to ``True``.

default_title
  A boolean to enable setting the title of a link to its href, if no title is
  given. Defaults to ``False``.

heading_style
  Defines how headings should be converted. Accepted values are ``ATX``,
  ``ATX_CLOSED``, ``SETEXT``, and ``UNDERLINED`` (which is an alias for
  ``SETEXT``). Defaults to ``UNDERLINED``.

bullets
  An iterable (string, list, or tuple) of bullet styles to be used. If the
  iterable only contains one item, it will be used regardless of how deeply
  lists are nested. Otherwise, the bullet will alternate based on nesting
  level. Defaults to ``'*+-'``.

strong_em_symbol
  In markdown, both ``*`` and ``_`` are used to encode **strong** or
  *emphasized* texts. Either of these symbols can be chosen by the options
  ``ASTERISK`` (default) or ``UNDERSCORE`` respectively.

sub_symbol, sup_symbol
  Define the chars that surround ``<sub>`` and ``<sup>`` text. Defaults to an
  empty string, because this is non-standard behavior. Could be something like
  ``~`` and ``^`` to result in ``~sub~`` and ``^sup^``.  If the value starts
  with ``<`` and ends with ``>``, it is treated as an HTML tag and a ``/`` is
  inserted after the ``<`` in the string used after the text; this allows
  specifying ``<sub>`` to use raw HTML in the output for subscripts, for
  example.

newline_style
  Defines the style of marking linebreaks (``<br>``) in markdown. The default
  value ``SPACES`` of this option will adopt the usual two spaces and a newline,
  while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash and a
  newline). While the latter convention is non-standard, it is commonly
  preferred and supported by a lot of interpreters.

code_language
  Defines the language that should be assumed for all ``<pre>`` sections.
  Useful, if all code on a page is in the same programming language and
  should be annotated with `````python`` or similar.
  Defaults to ``''`` (empty string) and can be any string.

code_language_callback
  When the HTML code contains ``pre`` tags that in some way provide the code
  language, for example as class, this callback can be used to extract the
  language from the tag and prefix it to the converted ``pre`` tag.
  The callback gets one single argument, an BeautifylSoup object, and returns
  a string containing the code language, or ``None``.
  An example to use the class name as code language could be::

    def callback(el):
        return el['class'][0] if el.has_attr('class') else None

  Defaults to ``None``.

escape_asterisks
  If set to ``False``, do not escape ``*`` to ``\*`` in text.
  Defaults to ``True``.

escape_underscores
  If set to ``False``, do not escape ``_`` to ``\_`` in text.
  Defaults to ``True``.

escape_misc
  If set to ``True``, escape miscellaneous punctuation characters
  that sometimes have Markdown significance in text.
  Defaults to ``False``.

keep_inline_images_in
  Images are converted to their alt-text when the images are located inside
  headlines or table cells. If some inline images should be converted to
  markdown images instead, this option can be set to a list of parent tags
  that should be allowed to contain inline images, for example ``['td']``.
  Defaults to an empty list.

wrap, wrap_width
  If ``wrap`` is set to ``True``, all text paragraphs are wrapped at
  ``wrap_width`` characters. Defaults to ``False`` and ``80``.
  Use with ``newline_style=BACKSLASH`` to keep line breaks in paragraphs.

Options may be specified as kwargs to the ``markdownify`` function, or as a
nested ``Options`` class in ``MarkdownConverter`` subclasses.


Converting BeautifulSoup objects
================================

.. code:: python

    from markdownify import MarkdownConverter

    # Create shorthand method for conversion
    def md(soup, **options):
        return MarkdownConverter(**options).convert_soup(soup)


Creating Custom Converters
==========================

If you have a special usecase that calls for a special conversion, you can
always inherit from ``MarkdownConverter`` and override the method you want to
change.
The function that handles a HTML tag named ``abc`` is called
``convert_abc(self, el, text, convert_as_inline)`` and returns a string
containing the converted HTML tag.
The ``MarkdownConverter`` object will handle the conversion based on the
function names:

.. code:: python

    from markdownify import MarkdownConverter

    class ImageBlockConverter(MarkdownConverter):
        """
        Create a custom MarkdownConverter that adds two newlines after an image
        """
        def convert_img(self, el, text, convert_as_inline):
            return super().convert_img(el, text, convert_as_inline) + '\n\n'

    # Create shorthand method for conversion
    def md(html, **options):
        return ImageBlockConverter(**options).convert(html)

.. code:: python

    from markdownify import MarkdownConverter

    class IgnoreParagraphsConverter(MarkdownConverter):
        """
        Create a custom MarkdownConverter that ignores paragraphs
        """
        def convert_p(self, el, text, convert_as_inline):
            return ''

    # Create shorthand method for conversion
    def md(html, **options):
        return IgnoreParagraphsConverter(**options).convert(html)


Command Line Interface
======================

Use ``markdownify example.html > example.md`` or pipe input from stdin
(``cat example.html | markdownify > example.md``).
Call ``markdownify -h`` to see all available options.
They are the same as listed above and take the same arguments.


Development
===========

To run tests and the linter run ``pip install tox`` once, then ``tox``.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "markdownify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Matthew Tretter <m@tthewwithanm.com>",
    "download_url": "https://files.pythonhosted.org/packages/1b/75/483a4bcca436fe88d02dc7686c372631d833848951b368700bdc0c770bb7/markdownify-0.14.1.tar.gz",
    "platform": null,
    "description": "|build| |version| |license| |downloads|\n\n.. |build| image:: https://img.shields.io/github/actions/workflow/status/matthewwithanm/python-markdownify/python-app.yml?branch=develop\n    :alt: GitHub Workflow Status\n    :target: https://github.com/matthewwithanm/python-markdownify/actions/workflows/python-app.yml?query=workflow%3A%22Python+application%22\n\n.. |version| image:: https://img.shields.io/pypi/v/markdownify\n    :alt: Pypi version\n    :target: https://pypi.org/project/markdownify/\n\n.. |license| image:: https://img.shields.io/pypi/l/markdownify\n    :alt: License\n    :target: https://github.com/matthewwithanm/python-markdownify/blob/develop/LICENSE\n\n.. |downloads| image:: https://pepy.tech/badge/markdownify\n    :alt: Pypi Downloads\n    :target: https://pepy.tech/project/markdownify\n\nInstallation\n============\n\n``pip install markdownify``\n\n\nUsage\n=====\n\nConvert some HTML to Markdown:\n\n.. code:: python\n\n    from markdownify import markdownify as md\n    md('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'\n\nSpecify tags to exclude:\n\n.. code:: python\n\n    from markdownify import markdownify as md\n    md('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'\n\n\\...or specify the tags you want to include:\n\n.. code:: python\n\n    from markdownify import markdownify as md\n    md('<b>Yay</b> <a href=\"http://github.com\">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'\n\n\nOptions\n=======\n\nMarkdownify supports the following options:\n\nstrip\n  A list of tags to strip. This option can't be used with the\n  ``convert`` option.\n\nconvert\n  A list of tags to convert. This option can't be used with the\n  ``strip`` option.\n\nautolinks\n  A boolean indicating whether the \"automatic link\" style should be used when\n  a ``a`` tag's contents match its href. Defaults to ``True``.\n\ndefault_title\n  A boolean to enable setting the title of a link to its href, if no title is\n  given. Defaults to ``False``.\n\nheading_style\n  Defines how headings should be converted. Accepted values are ``ATX``,\n  ``ATX_CLOSED``, ``SETEXT``, and ``UNDERLINED`` (which is an alias for\n  ``SETEXT``). Defaults to ``UNDERLINED``.\n\nbullets\n  An iterable (string, list, or tuple) of bullet styles to be used. If the\n  iterable only contains one item, it will be used regardless of how deeply\n  lists are nested. Otherwise, the bullet will alternate based on nesting\n  level. Defaults to ``'*+-'``.\n\nstrong_em_symbol\n  In markdown, both ``*`` and ``_`` are used to encode **strong** or\n  *emphasized* texts. Either of these symbols can be chosen by the options\n  ``ASTERISK`` (default) or ``UNDERSCORE`` respectively.\n\nsub_symbol, sup_symbol\n  Define the chars that surround ``<sub>`` and ``<sup>`` text. Defaults to an\n  empty string, because this is non-standard behavior. Could be something like\n  ``~`` and ``^`` to result in ``~sub~`` and ``^sup^``.  If the value starts\n  with ``<`` and ends with ``>``, it is treated as an HTML tag and a ``/`` is\n  inserted after the ``<`` in the string used after the text; this allows\n  specifying ``<sub>`` to use raw HTML in the output for subscripts, for\n  example.\n\nnewline_style\n  Defines the style of marking linebreaks (``<br>``) in markdown. The default\n  value ``SPACES`` of this option will adopt the usual two spaces and a newline,\n  while ``BACKSLASH`` will convert a linebreak to ``\\\\n`` (a backslash and a\n  newline). While the latter convention is non-standard, it is commonly\n  preferred and supported by a lot of interpreters.\n\ncode_language\n  Defines the language that should be assumed for all ``<pre>`` sections.\n  Useful, if all code on a page is in the same programming language and\n  should be annotated with `````python`` or similar.\n  Defaults to ``''`` (empty string) and can be any string.\n\ncode_language_callback\n  When the HTML code contains ``pre`` tags that in some way provide the code\n  language, for example as class, this callback can be used to extract the\n  language from the tag and prefix it to the converted ``pre`` tag.\n  The callback gets one single argument, an BeautifylSoup object, and returns\n  a string containing the code language, or ``None``.\n  An example to use the class name as code language could be::\n\n    def callback(el):\n        return el['class'][0] if el.has_attr('class') else None\n\n  Defaults to ``None``.\n\nescape_asterisks\n  If set to ``False``, do not escape ``*`` to ``\\*`` in text.\n  Defaults to ``True``.\n\nescape_underscores\n  If set to ``False``, do not escape ``_`` to ``\\_`` in text.\n  Defaults to ``True``.\n\nescape_misc\n  If set to ``True``, escape miscellaneous punctuation characters\n  that sometimes have Markdown significance in text.\n  Defaults to ``False``.\n\nkeep_inline_images_in\n  Images are converted to their alt-text when the images are located inside\n  headlines or table cells. If some inline images should be converted to\n  markdown images instead, this option can be set to a list of parent tags\n  that should be allowed to contain inline images, for example ``['td']``.\n  Defaults to an empty list.\n\nwrap, wrap_width\n  If ``wrap`` is set to ``True``, all text paragraphs are wrapped at\n  ``wrap_width`` characters. Defaults to ``False`` and ``80``.\n  Use with ``newline_style=BACKSLASH`` to keep line breaks in paragraphs.\n\nOptions may be specified as kwargs to the ``markdownify`` function, or as a\nnested ``Options`` class in ``MarkdownConverter`` subclasses.\n\n\nConverting BeautifulSoup objects\n================================\n\n.. code:: python\n\n    from markdownify import MarkdownConverter\n\n    # Create shorthand method for conversion\n    def md(soup, **options):\n        return MarkdownConverter(**options).convert_soup(soup)\n\n\nCreating Custom Converters\n==========================\n\nIf you have a special usecase that calls for a special conversion, you can\nalways inherit from ``MarkdownConverter`` and override the method you want to\nchange.\nThe function that handles a HTML tag named ``abc`` is called\n``convert_abc(self, el, text, convert_as_inline)`` and returns a string\ncontaining the converted HTML tag.\nThe ``MarkdownConverter`` object will handle the conversion based on the\nfunction names:\n\n.. code:: python\n\n    from markdownify import MarkdownConverter\n\n    class ImageBlockConverter(MarkdownConverter):\n        \"\"\"\n        Create a custom MarkdownConverter that adds two newlines after an image\n        \"\"\"\n        def convert_img(self, el, text, convert_as_inline):\n            return super().convert_img(el, text, convert_as_inline) + '\\n\\n'\n\n    # Create shorthand method for conversion\n    def md(html, **options):\n        return ImageBlockConverter(**options).convert(html)\n\n.. code:: python\n\n    from markdownify import MarkdownConverter\n\n    class IgnoreParagraphsConverter(MarkdownConverter):\n        \"\"\"\n        Create a custom MarkdownConverter that ignores paragraphs\n        \"\"\"\n        def convert_p(self, el, text, convert_as_inline):\n            return ''\n\n    # Create shorthand method for conversion\n    def md(html, **options):\n        return IgnoreParagraphsConverter(**options).convert(html)\n\n\nCommand Line Interface\n======================\n\nUse ``markdownify example.html > example.md`` or pipe input from stdin\n(``cat example.html | markdownify > example.md``).\nCall ``markdownify -h`` to see all available options.\nThey are the same as listed above and take the same arguments.\n\n\nDevelopment\n===========\n\nTo run tests and the linter run ``pip install tox`` once, then ``tox``.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Convert HTML to markdown.",
    "version": "0.14.1",
    "project_urls": {
        "Download": "http://github.com/matthewwithanm/python-markdownify/tarball/master",
        "Homepage": "http://github.com/matthewwithanm/python-markdownify"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "650b74cec93a7b05edf4fc3ea1c899fe8a37f041d7b9d303c75abf7a162924e0",
                "md5": "3ffcaa65461b9c29efc07cef757ba06b",
                "sha256": "4c46a6c0c12c6005ddcd49b45a5a890398b002ef51380cd319db62df5e09bc2a"
            },
            "downloads": -1,
            "filename": "markdownify-0.14.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3ffcaa65461b9c29efc07cef757ba06b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11530,
            "upload_time": "2024-11-24T22:08:29",
            "upload_time_iso_8601": "2024-11-24T22:08:29.005199Z",
            "url": "https://files.pythonhosted.org/packages/65/0b/74cec93a7b05edf4fc3ea1c899fe8a37f041d7b9d303c75abf7a162924e0/markdownify-0.14.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1b75483a4bcca436fe88d02dc7686c372631d833848951b368700bdc0c770bb7",
                "md5": "1dfe09b73aa302ca9ba37aefd527ef4b",
                "sha256": "a62a7a216947ed0b8dafb95b99b2ef4a0edd1e18d5653c656f68f03db2bfb2f1"
            },
            "downloads": -1,
            "filename": "markdownify-0.14.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1dfe09b73aa302ca9ba37aefd527ef4b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14332,
            "upload_time": "2024-11-24T22:08:30",
            "upload_time_iso_8601": "2024-11-24T22:08:30.775309Z",
            "url": "https://files.pythonhosted.org/packages/1b/75/483a4bcca436fe88d02dc7686c372631d833848951b368700bdc0c770bb7/markdownify-0.14.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-24 22:08:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "matthewwithanm",
    "github_project": "python-markdownify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "markdownify"
}
        
Elapsed time: 0.45957s