json-normalize


Namejson-normalize JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/funnel-io/json-normalize
SummaryRecursively flattens a JSON-like structure into a list of flat dicts.
upload_time2024-03-28 17:54:54
maintainerNone
docs_urlNone
authorThe Funnel Dev Team
requires_pythonNone
licenseMIT
keywords json
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # JSON Normalize

![PyPI](https://img.shields.io/pypi/v/json_normalize)
![PyPI - License](https://img.shields.io/pypi/l/json_normalize)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/json_normalize)
![PyPI - Status](https://img.shields.io/pypi/status/json_normalize)

This package contains a function, json_normalize. It will take a json-like structure and convert it to a map object which returns dicts. Output dicts will have their path joined by ".", this can of course be customized.

Data association will flows up and down inside dicts although in iterables, e.g. lists, data

## Installation

Install the package `json_normalize` version `1.1+` from PyPI.  
The recommended `requirements.txt` line is `json_normalize~=1.1`.

## json_normalize.json_normalize

```python
json_normalize.json_normalize(
    tree: Union[dict, Iterable],
    combine_lists: Literal["chain", "product"] = None,
    drop_nodes: Iterable[str] = (),
    freeze_nodes: Iterable[str] = (),
    key_joiner: Union[str, Callable] = ".",
)
```

- *`tree`* - A json like structure. Any iterable inside the object that is not a dict or a string will be treated as a list.
- *`combine_lists`*`=None` - If there are two different branches in the json like object the function will have to know how to combine these. If the default `None` is used the function does not know how to handle them and will raise an error. However if `combine_lists="chain"` simply put them after eachother similar to `itertool.chain`. The other option would be `combine_lists="product"` this will use the `itertool.product` to combine the different branches.
- *`drop_nodes`*`=()` - This makes it possible to ignore nodes with certain names
- *`freeze_nodes`*`=()` - This makes it possible to preserve nodes with certain names, the function will not recursivly keep normalizing anything below this node. If this node contains a dict it will be a dict in the end as well.
- *`key_joiner`*`="."` - If you want to customize the path. `key_joiner` takes either a function or a string as input. If it is a function, it will recieve the path to a certain node in to form of a tuple. If `key_joiner` is a string it will be converted to a function as this: `lambda p: key_joiner.join(p)`


## Examples

A General use case:

```python
>>> from json_normalize import json_normalize
>>> json_like = {
...     "city": "Stockholm",
...     "coords": {
...         "lat": 59.331924,
...         "long": 18.062297
...     },
...     "measurements": [
...         {
...             "time": 1624363200,
...             "temp": {"val": 28, "unit": "C"},
...             "wind": {"val": 2.8, "dir": 290, "unit": "m/s"},
...         },
...         {
...             "time": 1624366800,
...             "temp": {"val": 26, "unit": "C"},
...         }
...     ]
... }
>>> normal_json = json_normalize(json_like)
>>> normal_json
<map object at ...>

>>> list(normal_json)
[
    {
        'city': 'Stockholm',
        'coords.lat': 59.331924,
        'coords.long': 18.062297,
        'measurements.time': 1624363200,
        'measurements.temp.val': 28,
        'measurements.temp.unit': 'C',
        'measurements.wind.val': 2.8,
        'measurements.wind.dir': 290,
        'measurements.wind.unit': 'm/s'
    },
    {
        'city': 'Stockholm',
        'coords.lat': 59.331924,
        'coords.long': 18.062297,
        'measurements.time': 1624366800,
        'measurements.temp.val': 26,
        'measurements.temp.unit': 'C'
    }
]
```




Information always flow both in and out of each container, here data in both `a` and `c` node are associated as their closest common node (the root) is a dict. linked via `b`.

```python
>>> json_like = {
...     "a": 1,
...     "b": {
...         "c": "x",
...         "d": 2
...     }
... }
>>> list(json_normalize(json_like))
[
    {
        "a": 1,
        "b.c": "x",
        "b.d": 2
    }
]
```

However id the closest common node is a list like object the information is not associated with each other, e.g. the nodes `g=2` and `h=3` closest common node is a list and therefor, in the output, that data ends up in different objects.

```python
>>> tree = {
...     "a": 1,
...     "b": [
...         {
...             "c": "x",
...             "g": 2
...         },
...         {
...             "c": "y",
...             "h": 3
...         }
...     ]
... }
>>> list(json_normalize(tree))
[
    {
        "a": 1,
        "b.c": "x",
        "b.h" 2
    },
    {
        "a": 1,
        "b.c": "y",
        "b.g": 3
    }
]

```

Even if a branch contains more data in a deeper layer as long as that data is contained inside a `dict` that data will be associated with the data in other branches.

```python
>>> tree = {
...     "a": {
...         "j": 1.1,
...         "k": 1.2
...     },
...     "b": [
...         {
...             "c": "x",
...             "d": 2
...         },
...         {
...             "c": "y",
...             "d": 3
...         }
...     ]
... }
>>> list(json_normalize(tree))
[
    {
        "j": 1.1,
        "k": 1.2,
        "c": "x",
        "d": 2
    },
    {
        "j": 1.1,
        "k": 1.2,
        "c": "y",
        "d": 3
    }
]

```

When there are multiple lists in different branches the fucntion will have to know how to combine this. Default is `None` which will raise an error incase this happens. `"chain"` will put the information after eachother and `"product"` will combine the information as shown below.

```python
>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
...     "c": [
...         {"y": "3"},
...         {"y": "4"}
...     ]
... }
>>> list(json_normalize(tree))
ValueError()

>>> list(json_normalize(tree, combine_lists="chain"))
[
    {"a": 1, "b.x": "1"},
    {"a": 1, "b.x": "1"},
    {"a": 1, "c.y": "3"},
    {"a": 1, "c.y": "4"},
]

>>> list(json_normalize(tree, combine_lists="product"))
[
    {"a": 1, "b.x": "1", "c.y": "3"},
    {"a": 1, "b.x": "1", "c.y": "4"},
    {"a": 1, "b.x": "2", "c.y": "3"},
    {"a": 1, "b.x": "2", "c.y": "4"},
]

```

If you want to make sure you do not copy information into to many branches you can leave the `combine_lists=None` and instead drop problematic nodes with the argument `drop_nodes=("b",)`.
```python
>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
...     "c": [
...         {"y": "1"},
...         {"y": "2"}
...     ]
... }
>>> list(json_normalize(tree, drop_nodes=("b",)))
[
    {"a": 1, "c.y": "1"},
    {"a": 1, "c.y": "2"},
]
```


If you wish to customize the path generated you can to that by giving the key_joiner argument.
```python
>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
... }

>>> def key_joiner(path: tuple) -> string:
...     return path[-1]

>>> list(json_normalize(tree, key_joiner=key_joiner))
[
    {"a": 1, "x": "1"},
    {"a": 1, "x": "2"},
]

>>> list(json_normalize(tree, key_joiner=" -> "))
[
    {"a": 1, "b -> x": "1"},
    {"a": 1, "b -> x": "2"},
]
```


The function will also accept generators and simlar objects.
```python
>>> from itertools import chain


>>> def meta_generator():
...     yield {"who": "generator", "val": a_generator(1)}
...     yield {"who": "range", "val": range(10, 12)}
...     yield {"who": "map", "val": map(lambda x: x**2, range(20, 22))}
...     yield {"who": "chain", "val": chain([30], [31])}


>>> def a_generator(n):
...     yield n
...     yield 2 * n


>>> list(json_normalize(meta_generator())):
[
    {'who': 'generator', 'val': 1},
    {'who': 'generator', 'val': 2},
    {'who': 'range', 'val': 10},
    {'who': 'range', 'val': 11},
    {'who': 'map', 'val': 400},
    {'who': 'map', 'val': 441},
    {'who': 'chain', 'val': 30},
    {'who': 'chain', 'val': 31},
]
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/funnel-io/json-normalize",
    "name": "json-normalize",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "JSON",
    "author": "The Funnel Dev Team",
    "author_email": "open-source@funnel.io",
    "download_url": "https://files.pythonhosted.org/packages/a0/2d/0003aaee1fe285df9a7ca7a4233f8e05dc82daa075670f7a179b2e7ccf29/json-normalize-1.1.0.tar.gz",
    "platform": null,
    "description": "# JSON Normalize\n\n![PyPI](https://img.shields.io/pypi/v/json_normalize)\n![PyPI - License](https://img.shields.io/pypi/l/json_normalize)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/json_normalize)\n![PyPI - Status](https://img.shields.io/pypi/status/json_normalize)\n\nThis package contains a function, json_normalize. It will take a json-like structure and convert it to a map object which returns dicts. Output dicts will have their path joined by \".\", this can of course be customized.\n\nData association will flows up and down inside dicts although in iterables, e.g. lists, data\n\n## Installation\n\nInstall the package `json_normalize` version `1.1+` from PyPI.  \nThe recommended `requirements.txt` line is `json_normalize~=1.1`.\n\n## json_normalize.json_normalize\n\n```python\njson_normalize.json_normalize(\n    tree: Union[dict, Iterable],\n    combine_lists: Literal[\"chain\", \"product\"] = None,\n    drop_nodes: Iterable[str] = (),\n    freeze_nodes: Iterable[str] = (),\n    key_joiner: Union[str, Callable] = \".\",\n)\n```\n\n- *`tree`* - A json like structure. Any iterable inside the object that is not a dict or a string will be treated as a list.\n- *`combine_lists`*`=None` - If there are two different branches in the json like object the function will have to know how to combine these. If the default `None` is used the function does not know how to handle them and will raise an error. However if `combine_lists=\"chain\"` simply put them after eachother similar to `itertool.chain`. The other option would be `combine_lists=\"product\"` this will use the `itertool.product` to combine the different branches.\n- *`drop_nodes`*`=()` - This makes it possible to ignore nodes with certain names\n- *`freeze_nodes`*`=()` - This makes it possible to preserve nodes with certain names, the function will not recursivly keep normalizing anything below this node. If this node contains a dict it will be a dict in the end as well.\n- *`key_joiner`*`=\".\"` - If you want to customize the path. `key_joiner` takes either a function or a string as input. If it is a function, it will recieve the path to a certain node in to form of a tuple. If `key_joiner` is a string it will be converted to a function as this: `lambda p: key_joiner.join(p)`\n\n\n## Examples\n\nA General use case:\n\n```python\n>>> from json_normalize import json_normalize\n>>> json_like = {\n...     \"city\": \"Stockholm\",\n...     \"coords\": {\n...         \"lat\": 59.331924,\n...         \"long\": 18.062297\n...     },\n...     \"measurements\": [\n...         {\n...             \"time\": 1624363200,\n...             \"temp\": {\"val\": 28, \"unit\": \"C\"},\n...             \"wind\": {\"val\": 2.8, \"dir\": 290, \"unit\": \"m/s\"},\n...         },\n...         {\n...             \"time\": 1624366800,\n...             \"temp\": {\"val\": 26, \"unit\": \"C\"},\n...         }\n...     ]\n... }\n>>> normal_json = json_normalize(json_like)\n>>> normal_json\n<map object at ...>\n\n>>> list(normal_json)\n[\n    {\n        'city': 'Stockholm',\n        'coords.lat': 59.331924,\n        'coords.long': 18.062297,\n        'measurements.time': 1624363200,\n        'measurements.temp.val': 28,\n        'measurements.temp.unit': 'C',\n        'measurements.wind.val': 2.8,\n        'measurements.wind.dir': 290,\n        'measurements.wind.unit': 'm/s'\n    },\n    {\n        'city': 'Stockholm',\n        'coords.lat': 59.331924,\n        'coords.long': 18.062297,\n        'measurements.time': 1624366800,\n        'measurements.temp.val': 26,\n        'measurements.temp.unit': 'C'\n    }\n]\n```\n\n\n\n\nInformation always flow both in and out of each container, here data in both `a` and `c` node are associated as their closest common node (the root) is a dict. linked via `b`.\n\n```python\n>>> json_like = {\n...     \"a\": 1,\n...     \"b\": {\n...         \"c\": \"x\",\n...         \"d\": 2\n...     }\n... }\n>>> list(json_normalize(json_like))\n[\n    {\n        \"a\": 1,\n        \"b.c\": \"x\",\n        \"b.d\": 2\n    }\n]\n```\n\nHowever id the closest common node is a list like object the information is not associated with each other, e.g. the nodes `g=2` and `h=3` closest common node is a list and therefor, in the output, that data ends up in different objects.\n\n```python\n>>> tree = {\n...     \"a\": 1,\n...     \"b\": [\n...         {\n...             \"c\": \"x\",\n...             \"g\": 2\n...         },\n...         {\n...             \"c\": \"y\",\n...             \"h\": 3\n...         }\n...     ]\n... }\n>>> list(json_normalize(tree))\n[\n    {\n        \"a\": 1,\n        \"b.c\": \"x\",\n        \"b.h\" 2\n    },\n    {\n        \"a\": 1,\n        \"b.c\": \"y\",\n        \"b.g\": 3\n    }\n]\n\n```\n\nEven if a branch contains more data in a deeper layer as long as that data is contained inside a `dict` that data will be associated with the data in other branches.\n\n```python\n>>> tree = {\n...     \"a\": {\n...         \"j\": 1.1,\n...         \"k\": 1.2\n...     },\n...     \"b\": [\n...         {\n...             \"c\": \"x\",\n...             \"d\": 2\n...         },\n...         {\n...             \"c\": \"y\",\n...             \"d\": 3\n...         }\n...     ]\n... }\n>>> list(json_normalize(tree))\n[\n    {\n        \"j\": 1.1,\n        \"k\": 1.2,\n        \"c\": \"x\",\n        \"d\": 2\n    },\n    {\n        \"j\": 1.1,\n        \"k\": 1.2,\n        \"c\": \"y\",\n        \"d\": 3\n    }\n]\n\n```\n\nWhen there are multiple lists in different branches the fucntion will have to know how to combine this. Default is `None` which will raise an error incase this happens. `\"chain\"` will put the information after eachother and `\"product\"` will combine the information as shown below.\n\n```python\n>>> tree = {\n...     \"a\": 1,\n...     \"b\": [\n...         {\"x\": \"1\"},\n...         {\"x\": \"2\"}\n...     ],\n...     \"c\": [\n...         {\"y\": \"3\"},\n...         {\"y\": \"4\"}\n...     ]\n... }\n>>> list(json_normalize(tree))\nValueError()\n\n>>> list(json_normalize(tree, combine_lists=\"chain\"))\n[\n    {\"a\": 1, \"b.x\": \"1\"},\n    {\"a\": 1, \"b.x\": \"1\"},\n    {\"a\": 1, \"c.y\": \"3\"},\n    {\"a\": 1, \"c.y\": \"4\"},\n]\n\n>>> list(json_normalize(tree, combine_lists=\"product\"))\n[\n    {\"a\": 1, \"b.x\": \"1\", \"c.y\": \"3\"},\n    {\"a\": 1, \"b.x\": \"1\", \"c.y\": \"4\"},\n    {\"a\": 1, \"b.x\": \"2\", \"c.y\": \"3\"},\n    {\"a\": 1, \"b.x\": \"2\", \"c.y\": \"4\"},\n]\n\n```\n\nIf you want to make sure you do not copy information into to many branches you can leave the `combine_lists=None` and instead drop problematic nodes with the argument `drop_nodes=(\"b\",)`.\n```python\n>>> tree = {\n...     \"a\": 1,\n...     \"b\": [\n...         {\"x\": \"1\"},\n...         {\"x\": \"2\"}\n...     ],\n...     \"c\": [\n...         {\"y\": \"1\"},\n...         {\"y\": \"2\"}\n...     ]\n... }\n>>> list(json_normalize(tree, drop_nodes=(\"b\",)))\n[\n    {\"a\": 1, \"c.y\": \"1\"},\n    {\"a\": 1, \"c.y\": \"2\"},\n]\n```\n\n\nIf you wish to customize the path generated you can to that by giving the key_joiner argument.\n```python\n>>> tree = {\n...     \"a\": 1,\n...     \"b\": [\n...         {\"x\": \"1\"},\n...         {\"x\": \"2\"}\n...     ],\n... }\n\n>>> def key_joiner(path: tuple) -> string:\n...     return path[-1]\n\n>>> list(json_normalize(tree, key_joiner=key_joiner))\n[\n    {\"a\": 1, \"x\": \"1\"},\n    {\"a\": 1, \"x\": \"2\"},\n]\n\n>>> list(json_normalize(tree, key_joiner=\" -> \"))\n[\n    {\"a\": 1, \"b -> x\": \"1\"},\n    {\"a\": 1, \"b -> x\": \"2\"},\n]\n```\n\n\nThe function will also accept generators and simlar objects.\n```python\n>>> from itertools import chain\n\n\n>>> def meta_generator():\n...     yield {\"who\": \"generator\", \"val\": a_generator(1)}\n...     yield {\"who\": \"range\", \"val\": range(10, 12)}\n...     yield {\"who\": \"map\", \"val\": map(lambda x: x**2, range(20, 22))}\n...     yield {\"who\": \"chain\", \"val\": chain([30], [31])}\n\n\n>>> def a_generator(n):\n...     yield n\n...     yield 2 * n\n\n\n>>> list(json_normalize(meta_generator())):\n[\n    {'who': 'generator', 'val': 1},\n    {'who': 'generator', 'val': 2},\n    {'who': 'range', 'val': 10},\n    {'who': 'range', 'val': 11},\n    {'who': 'map', 'val': 400},\n    {'who': 'map', 'val': 441},\n    {'who': 'chain', 'val': 30},\n    {'who': 'chain', 'val': 31},\n]\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Recursively flattens a JSON-like structure into a list of flat dicts.",
    "version": "1.1.0",
    "project_urls": {
        "Bug Reports": "https://github.com/funnel-io/json-normalize/issues",
        "Homepage": "https://github.com/funnel-io/json-normalize",
        "Source": "https://github.com/funnel-io/json-normalize"
    },
    "split_keywords": [
        "json"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7c09739c8f5b556067929fa5b28cdf9c6afeb7ffc6aee63b5adfdf4655494df",
                "md5": "5b49edeeaf48c9d0208095264642f370",
                "sha256": "5eb82bb07cae8321f8d186d739ad1a114c9cc96484515287d897c7d7c1f894f9"
            },
            "downloads": -1,
            "filename": "json_normalize-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5b49edeeaf48c9d0208095264642f370",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6700,
            "upload_time": "2024-03-28T17:54:53",
            "upload_time_iso_8601": "2024-03-28T17:54:53.285308Z",
            "url": "https://files.pythonhosted.org/packages/a7/c0/9739c8f5b556067929fa5b28cdf9c6afeb7ffc6aee63b5adfdf4655494df/json_normalize-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a02d0003aaee1fe285df9a7ca7a4233f8e05dc82daa075670f7a179b2e7ccf29",
                "md5": "8522e2b2bb5b216de53869be5f60166c",
                "sha256": "35d7fe742acfae3d5b0b87c6f6f12c703010a825401c63ca9889107fcbdaf31e"
            },
            "downloads": -1,
            "filename": "json-normalize-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8522e2b2bb5b216de53869be5f60166c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7636,
            "upload_time": "2024-03-28T17:54:54",
            "upload_time_iso_8601": "2024-03-28T17:54:54.564325Z",
            "url": "https://files.pythonhosted.org/packages/a0/2d/0003aaee1fe285df9a7ca7a4233f8e05dc82daa075670f7a179b2e7ccf29/json-normalize-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-28 17:54:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "funnel-io",
    "github_project": "json-normalize",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "json-normalize"
}
        
Elapsed time: 0.22507s