<div align="center">
<strong>
<samp>
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandoc-filter?logo=python)](https://badge.fury.io/py/pandoc-filter)
[![PyPI - Version](https://img.shields.io/pypi/v/pandoc-filter?logo=pypi)](https://pypi.org/project/pandoc-filter)
[![DOI](https://zenodo.org/badge/741871139.svg)](https://zenodo.org/doi/10.5281/zenodo.10528322)
[![GitHub License](https://img.shields.io/github/license/Zhaopudark/pandoc-filter)](https://github.com/Zhaopudark/pandoc-filter?tab=GPL-3.0-1-ov-file#readme)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/test.yml?label=Test)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/test.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/build_and_deploy.yml?event=release&label=Build%20and%20Deploy)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/build_and_deploy.yml)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/post_deploy_test.yml?event=workflow_run&label=End%20Test)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/post_deploy_test.yml)
[![codecov](https://codecov.io/gh/Zhaopudark/pandoc-filter/graph/badge.svg?token=lb3cLoh3e5)](https://codecov.io/gh/Zhaopudark/pandoc-filter)
</samp>
</strong>
</div>
# pandoc-filter
This project supports some useful and highly customized [pandoc python filters](https://pandoc.org/filters.html) that based on [panflute](http://scorreia.com/software/panflute/). They can meet some special requests when using [pandoc](https://pandoc.org) to
- [x] convert files from `markdown` to `gfm`
- [x] convert files from `markdown` to `html`
- [ ] convert other formats (In the future)
Please see [Main Features](#main-features) for the concrete features.
Please see [Samples](#Samples) for the recommend usage.
# Backgrounds
I'm used to taking notes with markdown and clean markdown syntax. Then, I usually post these notes on [my site](https://little-train.com/) as web pages. So, I need to convert markdown to html. There were many tools to achieve the converting and I chose [pandoc](https://pandoc.org) at last due to its powerful features.
But sometimes, I need many more features when converting from `markdown` to `html`, where pandoc filters are needed. I have written some pandoc python filters with some advanced features by [panflute](https://github.com/sergiocorreia/panflute) and many other tools. And now, I think it's time to gather these filters into a combined toolset as this project.
# Installation
```
pip install -i https://pypi.org/simple/ -U pandoc-filter
```
# Main Features
There are 2 supported ways:
- **command-line-mode**: use non-parametric filters in command-lines with [pandoc](https://pandoc.org).
- **python-mode**: use `run_filters_pyio` function in python.
For an example, `md2md_enhance_equation_filter` in [enhance_equation.py](https://github.com/Zhaopudark/pandoc-filter/blob/main/src/pandoc_filter/filters/md2md/enhance_equation.py) is a filter function as [panflute-user-guide ](http://scorreia.com/software/panflute/guide.html). And its registered command-line script is `md2md-enhance-equation-filter`.
- So, after the installation, one can use it in **command-line-mode**:
```powershell
pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-enhance-equation-filter
```
- Or, use in **python mode**
```python
import pandoc_filter
file_path = pathlib.Path("./input.md")
output_path = pathlib.Path("./output.md")
pandoc_filter.run_filters_pyio(file_path,output_path,'markdown','gfm',[pandoc_filter.md2md_enhance_equation_filter])
```
**Runtime status** can be recorded. In **python mode**, any filter function will return a proposed panflute `Doc`. Some filter functions will add an instance attribute dict `runtime_dict` to the returned `Doc`, as a record for **runtime status**, which may be very useful for advanced users. For an example, `md2md_enhance_equation_filter`, will add an instance attribute dict `runtime_dict` to the returned `Doc`, which may contain a mapping `{'math':True}` if there is any math element in the `Doc`.
All filters with corresponding registered command-line scripts, the specific features, and the recorded **runtime status** are recorded in the following table:
> [!NOTE]
>
> Since some filters need additional arguments, not all filter functions support **command-line-mode**, even though they all support **python-mode** indeed.
>
> All filters support cascaded invoking.
| Filter Functions | Command Line | Additional Arguments | Features | Runtime status (`doc.runtime_dict`) |
| ----------------------------------------------------------- | ----------------------------------------------------------- | -------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
| md2md_convert_github_style_alert_to_hexo_style_alert_filter | md2md-convert-github-style-alert-to-hexo-style-alert-filter | - | Convert the [github-style alert](https://github.com/orgs/community/discussions/16925) to hexo-style alert. | - |
| md2md_enhance_equation_filter | md2md-enhance-equation-filter | - | Enhance math equations. Specifically, this filter will: Adapt AMS rule for math formula. Auto numbering markdown formulations within \begin{equation} \end{equation}, as in Typora. Allow multiple tags, but only take the first one. Allow multiple labels, but only take the first one. | {'math':< bool >,'equations_count':<some_number>} |
| md2md_norm_footnote_filter | md2md-norm-footnote-filter | - | Normalize the footnotes. Remove unnecessary `\n` in the footnote content. | - |
| md2md_norm_internal_link_filter | md2md-norm-internal-link-filter | - | Normalize internal links' URLs. Decode the URL if it is URL-encoded. | - |
| md2md_upload_figure_to_aliyun_filter | - | doc_path | Auto upload local pictures to Aliyun OSS. Replace the original `src` with the new one. The following environment variables should be given in advance: `$Env:OSS_ENDPOINT_NAME`, `$Env:OSS_BUCKET_NAME`, `$Env:OSS_ACCESS_KEY_ID` , and `$Env:OSS_ACCESS_KEY_SECRET`. The doc_path should be given in advance. | {'doc_path':<doc_path>,'oss_helper':<Oss_Helper>} |
| md2html_centralize_figure_filter | md2html-centralize-figure-filter | - | ==Deprecated== | - |
| md2html_enhance_link_like_filter | md2html-enhance-link-like-filter | - | Enhance the link-like string to a `link` element. | - |
| md2html_hash_anchor_and_internal_link_filter | md2html-hash-anchor-and-internal-link-filter | - | Hash both the anchor's `id` and the internal-link's `url ` simultaneously. | {'anchor_count':<anchor_count_dict>,'internal_link_record':<internal_link_record_list>} |
# Samples
Here are 2 basic types of examples
## Convert markdown to markdown (Normalization)
- [Adapt AMS rule for math formula](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_adapt_ams_rule_for_math_formula.md)
- [Normalize footnotes](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_normalize_footnotes.md)
- [Normalize internal link](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_normalize_internal_link.md)
- [Sync local images to `Aliyun OSS`](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_sync_local_images_to_`Aliyun_OSS`.md)
## Convert markdown to html
- [Normalize headers, anchors, internal links and link-like strings](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2html_normalize_headers_anchors_internal_links_and_link-like_strings.md)
# Contribution
Contributions are welcome. But recently, the introduction and documentation are not complete. So, please wait for a while.
A simple way to contribute is to open an issue to report bugs or request new features.
Raw data
{
"_id": null,
"home_page": "",
"name": "pandoc-filter",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": "",
"keywords": "pandoc,pandoc-filter,python,pandoc-python-filter,markdown,html",
"author": "",
"author_email": "Pu Zhao <zhaopudark@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/c1/07/83ed423edf80b5927d1a99cd1e29d15a2e2d7946598efcd1fc95732c17a2/pandoc-filter-0.2.13.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<strong>\n<samp>\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandoc-filter?logo=python)](https://badge.fury.io/py/pandoc-filter)\n[![PyPI - Version](https://img.shields.io/pypi/v/pandoc-filter?logo=pypi)](https://pypi.org/project/pandoc-filter)\n[![DOI](https://zenodo.org/badge/741871139.svg)](https://zenodo.org/doi/10.5281/zenodo.10528322)\n[![GitHub License](https://img.shields.io/github/license/Zhaopudark/pandoc-filter)](https://github.com/Zhaopudark/pandoc-filter?tab=GPL-3.0-1-ov-file#readme)\n\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/test.yml?label=Test)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/test.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/build_and_deploy.yml?event=release&label=Build%20and%20Deploy)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/build_and_deploy.yml)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Zhaopudark/pandoc-filter/post_deploy_test.yml?event=workflow_run&label=End%20Test)](https://github.com/Zhaopudark/pandoc-filter/actions/workflows/post_deploy_test.yml)\n[![codecov](https://codecov.io/gh/Zhaopudark/pandoc-filter/graph/badge.svg?token=lb3cLoh3e5)](https://codecov.io/gh/Zhaopudark/pandoc-filter)\n\n</samp>\n</strong>\n</div>\n\n# pandoc-filter\n\nThis project supports some useful and highly customized [pandoc python filters](https://pandoc.org/filters.html) that based on [panflute](http://scorreia.com/software/panflute/). They can meet some special requests when using [pandoc](https://pandoc.org) to\n\n- [x] convert files from `markdown` to `gfm`\n- [x] convert files from `markdown` to `html`\n- [ ] convert other formats (In the future)\n\nPlease see [Main Features](#main-features) for the concrete features.\n\nPlease see [Samples](#Samples) for the recommend usage.\n\n# Backgrounds\n\nI'm used to taking notes with markdown and clean markdown syntax. Then, I usually post these notes on [my site](https://little-train.com/) as web pages. So, I need to convert markdown to html. There were many tools to achieve the converting and I chose [pandoc](https://pandoc.org) at last due to its powerful features.\n\nBut sometimes, I need many more features when converting from `markdown` to `html`, where pandoc filters are needed. I have written some pandoc python filters with some advanced features by [panflute](https://github.com/sergiocorreia/panflute) and many other tools. And now, I think it's time to gather these filters into a combined toolset as this project. \n\n# Installation\n\n```\npip install -i https://pypi.org/simple/ -U pandoc-filter\n```\n\n# Main Features\n\nThere are 2 supported ways:\n\n- **command-line-mode**: use non-parametric filters in command-lines with [pandoc](https://pandoc.org).\n- **python-mode**: use `run_filters_pyio` function in python.\n\nFor an example, `md2md_enhance_equation_filter` in [enhance_equation.py](https://github.com/Zhaopudark/pandoc-filter/blob/main/src/pandoc_filter/filters/md2md/enhance_equation.py) is a filter function as [panflute-user-guide ](http://scorreia.com/software/panflute/guide.html). And its registered command-line script is `md2md-enhance-equation-filter`. \n\n- So, after the installation, one can use it in **command-line-mode**:\n\n ```powershell\n pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-enhance-equation-filter\n ```\n\n- Or, use in **python mode**\n\n ```python\n import pandoc_filter\n file_path = pathlib.Path(\"./input.md\")\n output_path = pathlib.Path(\"./output.md\")\n pandoc_filter.run_filters_pyio(file_path,output_path,'markdown','gfm',[pandoc_filter.md2md_enhance_equation_filter])\n ```\n\n**Runtime status** can be recorded. In **python mode**, any filter function will return a proposed panflute `Doc`. Some filter functions will add an instance attribute dict `runtime_dict` to the returned `Doc`, as a record for **runtime status**, which may be very useful for advanced users. For an example, `md2md_enhance_equation_filter`, will add an instance attribute dict `runtime_dict` to the returned `Doc`, which may contain a mapping `{'math':True}` if there is any math element in the `Doc`.\n\nAll filters with corresponding registered command-line scripts, the specific features, and the recorded **runtime status** are recorded in the following table:\n\n> [!NOTE]\n>\n> Since some filters need additional arguments, not all filter functions support **command-line-mode**, even though they all support **python-mode** indeed.\n>\n> All filters support cascaded invoking.\n\n| Filter Functions | Command Line | Additional Arguments | Features | Runtime status (`doc.runtime_dict`) |\n| ----------------------------------------------------------- | ----------------------------------------------------------- | -------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |\n| md2md_convert_github_style_alert_to_hexo_style_alert_filter | md2md-convert-github-style-alert-to-hexo-style-alert-filter | - | Convert the [github-style alert](https://github.com/orgs/community/discussions/16925) to hexo-style alert. | - |\n| md2md_enhance_equation_filter | md2md-enhance-equation-filter | - | Enhance math equations. Specifically, this filter will: Adapt AMS rule for math formula. Auto numbering markdown formulations within \\begin{equation} \\end{equation}, as in Typora. Allow multiple tags, but only take the first one. Allow multiple labels, but only take the first one. | {'math':< bool >,'equations_count':<some_number>} |\n| md2md_norm_footnote_filter | md2md-norm-footnote-filter | - | Normalize the footnotes. Remove unnecessary `\\n` in the footnote content. | - |\n| md2md_norm_internal_link_filter | md2md-norm-internal-link-filter | - | Normalize internal links' URLs. Decode the URL if it is URL-encoded. | - |\n| md2md_upload_figure_to_aliyun_filter | - | doc_path | Auto upload local pictures to Aliyun OSS. Replace the original `src` with the new one. The following environment variables should be given in advance: `$Env:OSS_ENDPOINT_NAME`, `$Env:OSS_BUCKET_NAME`, `$Env:OSS_ACCESS_KEY_ID` , and `$Env:OSS_ACCESS_KEY_SECRET`. The doc_path should be given in advance. | {'doc_path':<doc_path>,'oss_helper':<Oss_Helper>} |\n| md2html_centralize_figure_filter | md2html-centralize-figure-filter | - | ==Deprecated== | - |\n| md2html_enhance_link_like_filter | md2html-enhance-link-like-filter | - | Enhance the link-like string to a `link` element. | - |\n| md2html_hash_anchor_and_internal_link_filter | md2html-hash-anchor-and-internal-link-filter | - | Hash both the anchor's `id` and the internal-link's `url ` simultaneously. | {'anchor_count':<anchor_count_dict>,'internal_link_record':<internal_link_record_list>} |\n\n# Samples\n\nHere are 2 basic types of examples\n\n## Convert markdown to markdown (Normalization)\n\n- [Adapt AMS rule for math formula](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_adapt_ams_rule_for_math_formula.md)\n- [Normalize footnotes](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_normalize_footnotes.md)\n- [Normalize internal link](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_normalize_internal_link.md)\n- [Sync local images to `Aliyun OSS`](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2md_sync_local_images_to_`Aliyun_OSS`.md)\n\n## Convert markdown to html\n\n- [Normalize headers, anchors, internal links and link-like strings](https://github.com/Zhaopudark/pandoc-filter/blob/main/examples/md2html_normalize_headers_anchors_internal_links_and_link-like_strings.md)\n\n\n# Contribution\n\nContributions are welcome. But recently, the introduction and documentation are not complete. So, please wait for a while.\n\nA simple way to contribute is to open an issue to report bugs or request new features.\n\n\n\n",
"bugtrack_url": null,
"license": "GNU General Public License (GPL)",
"summary": "A customized pandoc filters set that can be used to generate a useful pandoc python filter.",
"version": "0.2.13",
"project_urls": {
"Changelog": "https://github.com/Zhaopudark/pandoc-filter/blob/main/RELEASE.md",
"Documentation": "https://github.com/Zhaopudark/pandoc-filter/blob/main/README.md",
"Homepage": "https://github.com/Zhaopudark/pandoc-filter",
"Issues": "https://github.com/Zhaopudark/pandoc-filter/issues",
"Repository": "https://github.com/Zhaopudark/pandoc-filter.git"
},
"split_keywords": [
"pandoc",
"pandoc-filter",
"python",
"pandoc-python-filter",
"markdown",
"html"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a0b937e055493ac6122d5537f944907eb6a9eb8cd21e3562b3e0034d1f7958f2",
"md5": "9cbe8574824a6ec9a125652575b24c90",
"sha256": "2bcbeb05e2632192f781c91a977836a5e2ca0af0fd753e3a3de9e4bab5730bfb"
},
"downloads": -1,
"filename": "pandoc_filter-0.2.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9cbe8574824a6ec9a125652575b24c90",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 36166,
"upload_time": "2024-03-14T11:29:51",
"upload_time_iso_8601": "2024-03-14T11:29:51.236734Z",
"url": "https://files.pythonhosted.org/packages/a0/b9/37e055493ac6122d5537f944907eb6a9eb8cd21e3562b3e0034d1f7958f2/pandoc_filter-0.2.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c10783ed423edf80b5927d1a99cd1e29d15a2e2d7946598efcd1fc95732c17a2",
"md5": "3cfa21f6bad57227f9ea745e7331f69c",
"sha256": "30b54f0644f77ec025a09e9c6b101f8e954bb57db00a8103201024c7c6a19ccf"
},
"downloads": -1,
"filename": "pandoc-filter-0.2.13.tar.gz",
"has_sig": false,
"md5_digest": "3cfa21f6bad57227f9ea745e7331f69c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 32392,
"upload_time": "2024-03-14T11:29:57",
"upload_time_iso_8601": "2024-03-14T11:29:57.577554Z",
"url": "https://files.pythonhosted.org/packages/c1/07/83ed423edf80b5927d1a99cd1e29d15a2e2d7946598efcd1fc95732c17a2/pandoc-filter-0.2.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-14 11:29:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Zhaopudark",
"github_project": "pandoc-filter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pandoc-filter"
}