# json-flattener
Python library for denormalizing/flattening lists of complex objects to tables/data frames, with roundtripping
## Notebook Example
[EXAMPLE.ipynb](https://github.com/cmungall/json-flattener/blob/main/EXAMPLE.ipynb)
## Description
Given YAML/JSON/JSON-Lines such as:
```yaml
- id: S001
name: Lord of the Rings
genres:
- fantasy
creator:
name: JRR Tolkein
from_country: England
books:
- id: S001.1
name: Fellowship of the Ring
price: 5.99
summary: Hobbits
- id: S001.2
name: The Two Towers
price: 5.99
summary: More hobbits
- id: S001.3
name: Return of the King
price: 6.99
summary: Yet more hobbits
- id: S002
name: The Culture Series
genres:
- scifi
creator:
name: Ian M Banks
from_country: Scotland
books:
- id: S002.1
name: Consider Phlebas
price: 5.99
- id: S002.2
name: Player of Games
price: 5.99
```
Denormalize using `jfl` command:
```bash
jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv
```
|id|name|genres|creator_name|creator_from_country|books_name|books_summary|books_price|books_id|creator_genres
|---|---|---|---|---|---|---|---|---|---|
|S001|Lord of the Rings|[fantasy]|JRR Tolkein|England|[Fellowship of the Ring\|The Two Towers\|Return of the King]|[Hobbits\|More hobbits\|Yet more hobbits]|[5.99\|5.99\|6.99]|[S001.1\|S001.2\|S001.3]|
|S002|The Culture Series|[scifi]|Ian M Banks|Scotland|[Consider Phlebas\|Player of Games]||[5.99\|5.99]|[S002.1\|S002.2]|
Convert back to JSON/YAML:
```bash
jfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -o examples/books1.yaml
```
This library also allows complex fields to be directly serialized as json or yaml (the default is to append `_json` to the key). For example:
```bash
jfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv
```
|id|name|genres|creator_json|books_json|
|---|---|---|---|---|
|S001|Lord of the Rings|[fantasy]|{\"name\": \"JRR Tolkein\", \"from_country\": \"England\"}|[{\"id\": \"S001.1\", \"name\": \"Fellowship of the Ring\", \"summary\": \"Hobbits\", \"price\": 5.99}, {\"id\": \"S001.2\", \"name\": \"The Two Towers\", \"summary\": \"More hobbits\", \"price\": 5.99}, {\"id\": \"S001.3\", \"name\": \"Return of the King\", \"summary\": \"Yet more hobbits\", \"price\": 6.99}]|
|S002|The Culture Series|[scifi]|{\"name\": \"Ian M Banks\", \"from_country\": \"Scotland\"}|[{\"id\": \"S002.1\", \"name\": \"Consider Phlebas\", \"price\": 5.99}, {\"id\": \"S002.2\", \"name\": \"Player of Games\", \"price\": 5.99}]|
|S003|Book of the New Sun|[scifi, fantasy]|{\"name\": \"Gene Wolfe\", \"genres\": [\"scifi\", \"fantasy\"], \"from_country\": \"USA\"}|[{\"id\": \"S003.1\", \"name\": \"Shadow of the Torturer\"}, {\"id\": \"S003.2\", \"name\": \"Claw of the Conciliator\", \"price\": 6.99}]|
|S004|Example with single book||{\"name\": \"Ms Writer\", \"genres\": [\"romance\"], \"from_country\": \"USA\"}|[{\"id\": \"S004.1\", \"name\": \"Blah\"}]|
|S005|Example with no books||{\"name\": \"Mr Unproductive\", \"genres\": [\"romance\", \"scifi\", \"fantasy\"], \"from_country\": \"USA\"}||
See
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRyM06peU9BkrZbXJazuMlajw5s4Vbj5f0t0TE4hj_X9Ex_EASLSUZuaWUxYIhWbOC6CtPRtxrTGWQD/embed?start=false&loop=false&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
The primary use case is to go from a rich *normalized* data model (as python objects, JSON, or YAML) to a flatter representation that is amenable to processing with:
* Solr/Lucene
* Pandas/R Dataframes
* Excel/Google sheets
* Unix cut/grep/cat/etc
* Simple denormalized SQL database representations
The target denormalized format is a list of rows / a data matrix, where each cell is either an atom or a list of atoms.
## Method
* Each top level key becomes a column
* if the key value is a dict/object, then flatten
* by default a '_' is used to separate the parent key from the inner key
* e.g. the composition of `creator` and `from_country` becomes `creator_from_country`
* currently one level of flattening is supported
* if the key value is a list of atomic entities, then leave as is
* if the key value is a list of dicts/objects, then flatten each key of this inner dict into a list
* e.g. if `books` is a list of book objects, and `name` is a key on book, then `books_name` is a list of names of each book
* order is significant - the first element of `books_name` is matched to the first element of `books_price`, etc
* Allow any key to be serialized as yaml/json/pickle if configured
## Command line usage (TODO)
## Usage from Python
Documentation coming soon: see test folder for now
## use within LinkML
## Comparison
### Pandas json_normalize
- https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.io.json.json_normalize.html
### Java json-flattener
https://github.com/wnameless/json-flattener
### Python
### csvjson
https://csvjson.com/json2csv
Raw data
{
"_id": null,
"home_page": "https://github.com/cmungall/json-flattener",
"name": "json-flattener",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": "",
"keywords": "linkml data-science denormalization yaml json data-frames",
"author": "Chris Mungall",
"author_email": "cjmungall@lbl.gov",
"download_url": "https://files.pythonhosted.org/packages/6d/77/b00e46d904818826275661a690532d3a3a43a4ded0264b2d7fcdb5c0feea/json_flattener-0.1.9.tar.gz",
"platform": "",
"description": "# json-flattener\n\nPython library for denormalizing/flattening lists of complex objects to tables/data frames, with roundtripping\n\n## Notebook Example\n\n[EXAMPLE.ipynb](https://github.com/cmungall/json-flattener/blob/main/EXAMPLE.ipynb)\n\n## Description\n\nGiven YAML/JSON/JSON-Lines such as:\n\n```yaml\n- id: S001\n name: Lord of the Rings\n genres:\n - fantasy\n creator:\n name: JRR Tolkein\n from_country: England\n books:\n - id: S001.1\n name: Fellowship of the Ring\n price: 5.99\n summary: Hobbits\n - id: S001.2\n name: The Two Towers\n price: 5.99\n summary: More hobbits\n - id: S001.3\n name: Return of the King\n price: 6.99\n summary: Yet more hobbits\n- id: S002\n name: The Culture Series\n genres:\n - scifi\n creator:\n name: Ian M Banks\n from_country: Scotland\n books:\n - id: S002.1\n name: Consider Phlebas\n price: 5.99\n - id: S002.2\n name: Player of Games\n price: 5.99\n```\n\nDenormalize using `jfl` command:\n\n```bash\njfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv\n```\n\n\n\n|id|name|genres|creator_name|creator_from_country|books_name|books_summary|books_price|books_id|creator_genres\n|---|---|---|---|---|---|---|---|---|---|\n|S001|Lord of the Rings|[fantasy]|JRR Tolkein|England|[Fellowship of the Ring\\|The Two Towers\\|Return of the King]|[Hobbits\\|More hobbits\\|Yet more hobbits]|[5.99\\|5.99\\|6.99]|[S001.1\\|S001.2\\|S001.3]|\n|S002|The Culture Series|[scifi]|Ian M Banks|Scotland|[Consider Phlebas\\|Player of Games]||[5.99\\|5.99]|[S002.1\\|S002.2]|\n\n\nConvert back to JSON/YAML:\n\n```bash\njfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -o examples/books1.yaml\n```\n\n\n\nThis library also allows complex fields to be directly serialized as json or yaml (the default is to append `_json` to the key). For example:\n\n```bash\njfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv\n```\n\n|id|name|genres|creator_json|books_json|\n|---|---|---|---|---|\n|S001|Lord of the Rings|[fantasy]|{\\\"name\\\": \\\"JRR Tolkein\\\", \\\"from_country\\\": \\\"England\\\"}|[{\\\"id\\\": \\\"S001.1\\\", \\\"name\\\": \\\"Fellowship of the Ring\\\", \\\"summary\\\": \\\"Hobbits\\\", \\\"price\\\": 5.99}, {\\\"id\\\": \\\"S001.2\\\", \\\"name\\\": \\\"The Two Towers\\\", \\\"summary\\\": \\\"More hobbits\\\", \\\"price\\\": 5.99}, {\\\"id\\\": \\\"S001.3\\\", \\\"name\\\": \\\"Return of the King\\\", \\\"summary\\\": \\\"Yet more hobbits\\\", \\\"price\\\": 6.99}]|\n|S002|The Culture Series|[scifi]|{\\\"name\\\": \\\"Ian M Banks\\\", \\\"from_country\\\": \\\"Scotland\\\"}|[{\\\"id\\\": \\\"S002.1\\\", \\\"name\\\": \\\"Consider Phlebas\\\", \\\"price\\\": 5.99}, {\\\"id\\\": \\\"S002.2\\\", \\\"name\\\": \\\"Player of Games\\\", \\\"price\\\": 5.99}]|\n|S003|Book of the New Sun|[scifi, fantasy]|{\\\"name\\\": \\\"Gene Wolfe\\\", \\\"genres\\\": [\\\"scifi\\\", \\\"fantasy\\\"], \\\"from_country\\\": \\\"USA\\\"}|[{\\\"id\\\": \\\"S003.1\\\", \\\"name\\\": \\\"Shadow of the Torturer\\\"}, {\\\"id\\\": \\\"S003.2\\\", \\\"name\\\": \\\"Claw of the Conciliator\\\", \\\"price\\\": 6.99}]|\n|S004|Example with single book||{\\\"name\\\": \\\"Ms Writer\\\", \\\"genres\\\": [\\\"romance\\\"], \\\"from_country\\\": \\\"USA\\\"}|[{\\\"id\\\": \\\"S004.1\\\", \\\"name\\\": \\\"Blah\\\"}]|\n|S005|Example with no books||{\\\"name\\\": \\\"Mr Unproductive\\\", \\\"genres\\\": [\\\"romance\\\", \\\"scifi\\\", \\\"fantasy\\\"], \\\"from_country\\\": \\\"USA\\\"}||\n\n\nSee\n\n<iframe src=\"https://docs.google.com/presentation/d/e/2PACX-1vRyM06peU9BkrZbXJazuMlajw5s4Vbj5f0t0TE4hj_X9Ex_EASLSUZuaWUxYIhWbOC6CtPRtxrTGWQD/embed?start=false&loop=false&delayms=60000\" frameborder=\"0\" width=\"960\" height=\"569\" allowfullscreen=\"true\" mozallowfullscreen=\"true\" webkitallowfullscreen=\"true\"></iframe>\n\nThe primary use case is to go from a rich *normalized* data model (as python objects, JSON, or YAML) to a flatter representation that is amenable to processing with:\n\n * Solr/Lucene\n * Pandas/R Dataframes\n * Excel/Google sheets\n * Unix cut/grep/cat/etc\n * Simple denormalized SQL database representations\n\nThe target denormalized format is a list of rows / a data matrix, where each cell is either an atom or a list of atoms.\n\n## Method\n\n * Each top level key becomes a column\n * if the key value is a dict/object, then flatten\n * by default a '_' is used to separate the parent key from the inner key\n * e.g. the composition of `creator` and `from_country` becomes `creator_from_country`\n * currently one level of flattening is supported\n * if the key value is a list of atomic entities, then leave as is\n * if the key value is a list of dicts/objects, then flatten each key of this inner dict into a list\n * e.g. if `books` is a list of book objects, and `name` is a key on book, then `books_name` is a list of names of each book\n * order is significant - the first element of `books_name` is matched to the first element of `books_price`, etc\n * Allow any key to be serialized as yaml/json/pickle if configured\n\n## Command line usage (TODO)\n\n## Usage from Python\n\nDocumentation coming soon: see test folder for now\n\n\n## use within LinkML\n\n\n\n## Comparison\n\n### Pandas json_normalize\n\n\n - https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.io.json.json_normalize.html\n\n### Java json-flattener\n\n https://github.com/wnameless/json-flattener\n\n### Python\n\n### csvjson\n\nhttps://csvjson.com/json2csv\n\n\n\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "Python library for denormalizing nested dicts or json objects to tables and back",
"version": "0.1.9",
"split_keywords": [
"linkml",
"data-science",
"denormalization",
"yaml",
"json",
"data-frames"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "00cc7fbd75d3362e939eb98bcf9bd22f3f7df8c237a85148899ed3d38e5614e5",
"md5": "903d1ae6cf748972dcff6871ec72dbda",
"sha256": "6b027746f08bf37a75270f30c6690c7149d5f704d8af1740c346a3a1236bc941"
},
"downloads": -1,
"filename": "json_flattener-0.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "903d1ae6cf748972dcff6871ec72dbda",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.0",
"size": 10799,
"upload_time": "2022-02-26T01:36:03",
"upload_time_iso_8601": "2022-02-26T01:36:03.060064Z",
"url": "https://files.pythonhosted.org/packages/00/cc/7fbd75d3362e939eb98bcf9bd22f3f7df8c237a85148899ed3d38e5614e5/json_flattener-0.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6d77b00e46d904818826275661a690532d3a3a43a4ded0264b2d7fcdb5c0feea",
"md5": "f652ecf05bb3fbe29c17606b5613748c",
"sha256": "84cf8523045ffb124301a602602201665fcb003a171ece87e6f46ed02f7f0c15"
},
"downloads": -1,
"filename": "json_flattener-0.1.9.tar.gz",
"has_sig": false,
"md5_digest": "f652ecf05bb3fbe29c17606b5613748c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 11479,
"upload_time": "2022-02-26T01:36:04",
"upload_time_iso_8601": "2022-02-26T01:36:04.545590Z",
"url": "https://files.pythonhosted.org/packages/6d/77/b00e46d904818826275661a690532d3a3a43a4ded0264b2d7fcdb5c0feea/json_flattener-0.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-02-26 01:36:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "cmungall",
"github_project": "json-flattener",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "json-flattener"
}