# markpickle
Markpickle is a Python library for lossy serialization of markdown to simple python data types and back. Imagine if markdown headers were used to define nested dictionaries and Markdown lists were python lists.
It will create predictable markdown from a python object, but can't turn all markdown files into sensible python objects (for that use a markdown library that creates an AST). I created this because I wanted a way to turn json into Markdown. It is an accidental successor to [markdown-to-json](https://github.com/njvack/markdown-to-json).
```markdown
- 1
- 2
```
becomes the python list `[1, 2]`
```markdown
# Cat
## Name
Ringo
## Species
Felix
```
becomes the python list `{"cat":{"Name":"Ringo","Species":"Felix"}`
See [examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md) for representable types.
Almost all markdown libraries use it as a way to generate HTML fragments from untrusted sources for insertion into some other HTML template. We are using it to represent data. See [guidance](docs/choosing_a_library.md) for which library make sense for you.
![Libraries.io dependency status for latest release](https://img.shields.io/librariesio/release/pypi/markpickle) [![Downloads](https://pepy.tech/badge/markpickle/month)](https://pepy.tech/project/markpickle/month)
______________________________________________________________________
## Installation
```shell
pip install markpickle
```
## Capabilities
This is a lossy serialization. Markdown is missing too many concepts to make a high fidelity representation of a python data structure. If you want an object model that faithfully represents each object in a Markdown document, use the AST of mistune or one of the other markdown parsers.
______________________________________________________________________
### Supported Types
- Scalar values
- Lists of scalar values
- Dictionaries with scalar values
- Lists of dictionaries of scalar values
- Dictionaries with list values
- Partial support for blanks/string with leading/trailing whitespace
See [examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md).
### Not Supported
- Things not ordinarily serializable
- Markdown that uses more than headers, lists, tables
- Blanks, falsy values, empty iterables don't round trip
- Scalar type inference doesn't round trip. After a scalar is converted to a markdown string, there is no indication if the original was a string or not.
______________________________________________________________________
## Serializing and Deserializing
______________________________________________________________________
### Serializing
Results can be formatted at the cost of speed. Dictionaries can be represented as tables or header-text pairs.
### Deserializing
Markdown is deserialized by parsing the document to an abstract syntax tree (AST). This is done by `mistune`. If the markdown file has the same structure that markpickle uses, then it will create a sensible object. Deserializing a random README.md file is not expected to always work. For that, you should use mistune's AST.
### Round Tripping
Some but not all data structures will be round-trippable. The goal is that the sort of dicts you get from loading JSON will be round-trippable, provided everything is a string.
### Splitting Files
If typical serialization scenarios, many json files might be written to a single file, or in the case of yaml, you can put multiple documents into one file separated by `---`. markpickle can treat the horizontal rule as a document spliter if you use `split_file`. It works like [splitstream](https://github.com/rickardp/splitstream), but less efficiently.
## CLI
This command will take a deserializable markdown file and output json.
```bash
python -m markpickle "docs/individual/list of scalars.md"
```
## Prior Art
Imagine you have json and want to the same data as markdown. Json looks like python dict, so any python library that can convert json to markdown, probably can convert a python dict to markdown.
Many tools turn tabular data into a markdown table.
### Serializing to Markdown
[json2md](https://github.com/IonicaBizau/json2md), a node package, will turn json that looks like the HTML document object model into markdown, e.g.
```python
{"h1": "Some Header",
"p": "Some Text"}
```
[tomark](https://pypi.org/project/tomark/) will turn dict into a markdown table. Unmaintained.
[pytablewriter](https://pytablewriter.readthedocs.io/en/latest/pages/reference/writers/text/markup/md.html) also, dict to table, but supports many tabular formats.
### Deserializing to Python
Most libraries turn markdown into document object model. Markdown-to-json is the most similar to markpickle's goal of turning a markdown document into a python data types, in this case nested dicts.
[markdown-to-json](https://github.com/njvack/markdown-to-json) is the library most similar to markpickle and is now maintained. It handles only deserialization and conversion to json.
[mistune](https://pypi.org/project/mistune/) will turn markdown into an Abstract Syntax Tree. The AST is faithful representation of the Markdown, including concepts that have no semantic equivalent to python datatypes.
[beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) will let you navigate the HTML DOM. So you can turn the markdown into HTML, then parse with Beautiful Soup.
## Representable Types
There is one optional root dictionary representable with ATX headers, e.g. `#`, `##`, etc. Lists are nestable lists or dicts. For the most part, this looks like the types that JSON can represent.
```python
SerializableTypes: TypeAlias = Union[
ColumnsValuesTableType,
dict[str, "SerializableTypes"],
list["SerializableTypes"],
str,
int,
float,
bool,
datetime.date,
None,
]
```
The deserialized types is the same except all Scalars are strings.
## Schema Validation for Markdown
In the case of a serialization library, you'd want something that would indicate if your markdown file will successfully deserialize back into python.
I haven't really found anything that says, for example, "This markdown document shall have one # Header and a 3 column table and nothing else."
- [schema-markdown-js](https://craigahobbs.github.io/schema-markdown-js/language/) A json schema that happens to be using markdown as its syntax.
## Credits
I copied the ATX-dictionary-like header parsing from [markdown-to-json](https://github.com/njvack/markdown-to-json).
## Documentation
- [Choosing a Library](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/choosing_a_library.md)
- [Examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md)
- [TODO](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/TODO.md)
- [People solving similar problems on StackOverflow](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/stackoverflow.md)
## Change Log
- 0.2.0 - Idea and reserve package name.
- 1.0.0 - Basic functionality.
- 1.1.0 - Basic functionality.
- 1.2.0 - Add support for binary data, which is serialized as images with data URLs.
- 1.3.0 - Improve CLI and more examples
- 1.4.0 - ATX as dictionary now works
- 1.5.0 - Tables less buggy
- 1.5.1 - Fix mypy typing. Pin mistune to <3.0.0
Raw data
{
"_id": null,
"home_page": "https://github.com/matthewdeanmartin/markpickle",
"name": "markpickle",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "serializer,deserializer,markdown",
"author": "Matthew Martin",
"author_email": "matthewdeanmartin@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/24/44/cfe33ad74120cde375d00ccd4f05ed5af6ececbed5eca7de6993837be015/markpickle-1.6.1.tar.gz",
"platform": null,
"description": "# markpickle\n\nMarkpickle is a Python library for lossy serialization of markdown to simple python data types and back. Imagine if markdown headers were used to define nested dictionaries and Markdown lists were python lists.\n\nIt will create predictable markdown from a python object, but can't turn all markdown files into sensible python objects (for that use a markdown library that creates an AST). I created this because I wanted a way to turn json into Markdown. It is an accidental successor to [markdown-to-json](https://github.com/njvack/markdown-to-json).\n\n```markdown\n- 1\n- 2\n```\n\nbecomes the python list `[1, 2]`\n\n```markdown\n# Cat\n## Name\nRingo\n## Species\nFelix\n```\n\nbecomes the python list `{\"cat\":{\"Name\":\"Ringo\",\"Species\":\"Felix\"}`\n\nSee [examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md) for representable types.\n\nAlmost all markdown libraries use it as a way to generate HTML fragments from untrusted sources for insertion into some other HTML template. We are using it to represent data. See [guidance](docs/choosing_a_library.md) for which library make sense for you.\n\n![Libraries.io dependency status for latest release](https://img.shields.io/librariesio/release/pypi/markpickle) [![Downloads](https://pepy.tech/badge/markpickle/month)](https://pepy.tech/project/markpickle/month)\n\n______________________________________________________________________\n\n## Installation\n\n```shell\npip install markpickle\n```\n\n## Capabilities\n\nThis is a lossy serialization. Markdown is missing too many concepts to make a high fidelity representation of a python data structure. If you want an object model that faithfully represents each object in a Markdown document, use the AST of mistune or one of the other markdown parsers.\n\n______________________________________________________________________\n\n### Supported Types\n\n- Scalar values\n- Lists of scalar values\n- Dictionaries with scalar values\n- Lists of dictionaries of scalar values\n- Dictionaries with list values\n- Partial support for blanks/string with leading/trailing whitespace\n\nSee [examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md).\n\n### Not Supported\n\n- Things not ordinarily serializable\n- Markdown that uses more than headers, lists, tables\n- Blanks, falsy values, empty iterables don't round trip\n- Scalar type inference doesn't round trip. After a scalar is converted to a markdown string, there is no indication if the original was a string or not.\n\n______________________________________________________________________\n\n## Serializing and Deserializing\n\n______________________________________________________________________\n\n### Serializing\n\nResults can be formatted at the cost of speed. Dictionaries can be represented as tables or header-text pairs.\n\n### Deserializing\n\nMarkdown is deserialized by parsing the document to an abstract syntax tree (AST). This is done by `mistune`. If the markdown file has the same structure that markpickle uses, then it will create a sensible object. Deserializing a random README.md file is not expected to always work. For that, you should use mistune's AST.\n\n### Round Tripping\n\nSome but not all data structures will be round-trippable. The goal is that the sort of dicts you get from loading JSON will be round-trippable, provided everything is a string.\n\n### Splitting Files\n\nIf typical serialization scenarios, many json files might be written to a single file, or in the case of yaml, you can put multiple documents into one file separated by `---`. markpickle can treat the horizontal rule as a document spliter if you use `split_file`. It works like [splitstream](https://github.com/rickardp/splitstream), but less efficiently.\n\n## CLI\n\nThis command will take a deserializable markdown file and output json.\n\n```bash\npython -m markpickle \"docs/individual/list of scalars.md\"\n```\n\n## Prior Art\n\nImagine you have json and want to the same data as markdown. Json looks like python dict, so any python library that can convert json to markdown, probably can convert a python dict to markdown.\n\nMany tools turn tabular data into a markdown table.\n\n### Serializing to Markdown\n\n[json2md](https://github.com/IonicaBizau/json2md), a node package, will turn json that looks like the HTML document object model into markdown, e.g.\n\n```python\n{\"h1\": \"Some Header\",\n \"p\": \"Some Text\"}\n```\n\n[tomark](https://pypi.org/project/tomark/) will turn dict into a markdown table. Unmaintained.\n\n[pytablewriter](https://pytablewriter.readthedocs.io/en/latest/pages/reference/writers/text/markup/md.html) also, dict to table, but supports many tabular formats.\n\n### Deserializing to Python\n\nMost libraries turn markdown into document object model. Markdown-to-json is the most similar to markpickle's goal of turning a markdown document into a python data types, in this case nested dicts.\n\n[markdown-to-json](https://github.com/njvack/markdown-to-json) is the library most similar to markpickle and is now maintained. It handles only deserialization and conversion to json.\n\n[mistune](https://pypi.org/project/mistune/) will turn markdown into an Abstract Syntax Tree. The AST is faithful representation of the Markdown, including concepts that have no semantic equivalent to python datatypes.\n\n[beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) will let you navigate the HTML DOM. So you can turn the markdown into HTML, then parse with Beautiful Soup.\n\n## Representable Types\n\nThere is one optional root dictionary representable with ATX headers, e.g. `#`, `##`, etc. Lists are nestable lists or dicts. For the most part, this looks like the types that JSON can represent.\n\n```python\nSerializableTypes: TypeAlias = Union[\n ColumnsValuesTableType,\n dict[str, \"SerializableTypes\"],\n list[\"SerializableTypes\"],\n str,\n int,\n float,\n bool,\n datetime.date,\n None,\n]\n```\n\nThe deserialized types is the same except all Scalars are strings.\n\n## Schema Validation for Markdown\n\nIn the case of a serialization library, you'd want something that would indicate if your markdown file will successfully deserialize back into python.\n\nI haven't really found anything that says, for example, \"This markdown document shall have one # Header and a 3 column table and nothing else.\"\n\n- [schema-markdown-js](https://craigahobbs.github.io/schema-markdown-js/language/) A json schema that happens to be using markdown as its syntax.\n\n## Credits\n\nI copied the ATX-dictionary-like header parsing from [markdown-to-json](https://github.com/njvack/markdown-to-json).\n\n## Documentation\n\n- [Choosing a Library](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/choosing_a_library.md)\n- [Examples](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/examples.md)\n- [TODO](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/TODO.md)\n- [People solving similar problems on StackOverflow](https://github.com/matthewdeanmartin/markpickle/blob/main/docs/stackoverflow.md)\n\n## Change Log\n\n- 0.2.0 - Idea and reserve package name.\n- 1.0.0 - Basic functionality.\n- 1.1.0 - Basic functionality.\n- 1.2.0 - Add support for binary data, which is serialized as images with data URLs.\n- 1.3.0 - Improve CLI and more examples\n- 1.4.0 - ATX as dictionary now works\n- 1.5.0 - Tables less buggy\n- 1.5.1 - Fix mypy typing. Pin mistune to <3.0.0\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Lossy python to markdown serializer",
"version": "1.6.1",
"project_urls": {
"Bug Tracker": "https://github.com/matthewdeanmartin/markpickle/issues",
"Change Log": "https://github.com/matthewdeanmartin/markpickle/blob/main/CHANGES.md",
"Documentation": "https://github.com/matthewdeanmartin/markpickle",
"Homepage": "https://github.com/matthewdeanmartin/markpickle",
"Repository": "https://github.com/matthewdeanmartin/markpickle"
},
"split_keywords": [
"serializer",
"deserializer",
"markdown"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fda3bce3d320f2e0279410831f8785754d0d5bf39613e5726cfa6ab0766d8bc5",
"md5": "0a1e05e138087111fd22a6d6723a36eb",
"sha256": "f180d8b62853f117cf0fa13a2686f023a0a595de4246e113e11885439080666b"
},
"downloads": -1,
"filename": "markpickle-1.6.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "0a1e05e138087111fd22a6d6723a36eb",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 23696,
"upload_time": "2023-06-09T03:07:19",
"upload_time_iso_8601": "2023-06-09T03:07:19.822715Z",
"url": "https://files.pythonhosted.org/packages/fd/a3/bce3d320f2e0279410831f8785754d0d5bf39613e5726cfa6ab0766d8bc5/markpickle-1.6.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2444cfe33ad74120cde375d00ccd4f05ed5af6ececbed5eca7de6993837be015",
"md5": "aa946a3b9f81e463f5dce65ea2c19dc7",
"sha256": "e281600ea95286ad4971f2b34eeeb660b48101f7027bc3e995378880abeae61b"
},
"downloads": -1,
"filename": "markpickle-1.6.1.tar.gz",
"has_sig": false,
"md5_digest": "aa946a3b9f81e463f5dce65ea2c19dc7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 22129,
"upload_time": "2023-06-09T03:07:21",
"upload_time_iso_8601": "2023-06-09T03:07:21.551922Z",
"url": "https://files.pythonhosted.org/packages/24/44/cfe33ad74120cde375d00ccd4f05ed5af6ececbed5eca7de6993837be015/markpickle-1.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-09 03:07:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "matthewdeanmartin",
"github_project": "markpickle",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "markpickle"
}