kn-flatten-json


Namekn-flatten-json JSON
Version 0.0.10 PyPI version JSON
download
home_page
Summarykn_flatten_json
upload_time2023-07-18 11:35:37
maintainer
docs_urlNone
authorSukriti
requires_python
license
keywords flatten json normalize normalize pyspark dataframe complex datatypes flatten dataframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flatten Nested  API Data/Dataframe with kpt_flatten_json Package

The kpt_flatten_json package simplifies the process of converting complex JSON API/dataframe data into a structured and easy-to-analyze flat dataframe. It offers a user-friendly function that transforms the complex JSON data into a tabular format, where each row represents a record and each column contains a specific attribute or value. This package is designed to make data analysis and processing tasks more accessible, even for users with limited programming experience. It allows you to extract relevant information from deep within the nested structure, enabling efficient data analysis and visualization. 


# kpt_flatten_json Package consists of two functions: 

     1. kpt_flatten_api(to flatten API data)

     2. kpt_flatten_json(to flatten dataframe)


# 1. kpt_flatten_api(to flatten API data)

Consider an API , which consists a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_api function to flatten this API data structure into a flat table as shown:

## API Information:

url='https://bxray-dev.kockpit.in:6789/userauthentication'

method='post'

uid = "xyz"

pwd = "12345"

body = {
     "userId": uid,
     "password": pwd
     }


## API Data
 
{
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Choclate" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]}


## Installation

$ [sudo] pip install kpt_flatten_json

## Function

kpt_flatten_api: Returns a flattened data from API

## Usage

To use the kpt_flatten_api function, import the function and pass the required API parameters:
```python

from kpt_flatten_json import *

flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep="_")
```

## Flattened Data from API

| id| name| ppu | type | topping_id | topping_type               | batters_batter_id | batters_batter_type               |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |
|0001|Cake|0.55|donut| 5001| None| 1001| Regular|
|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|
|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|
|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|
|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|
|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|
|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|

# Note:  

1. User need to pass spark session as variable in kpt_flatten_api function.

2. Function arguments must be same as specified below.

| Parameters| Type| 
| :-------- | :------- | 
| spark | SparkSession |
| url | String |
| method | Post or Get (String) |
| body | Dictionary |
| username | String |
| password | String |
| sep="_" | (fixed no other separator will be acceptable) |

# Methods To Fetch Data From API:
## 1. Basic Authentication (Using username , password or API Key):

url='https://bxray-dev.kockpit.in:6789/userauthentication'

method='post'

uid = "xyz"

pwd = "12345"

body = {
     "userId": uid,
     "password": pwd
     }

flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep="_")


## 2. With OAuthToken Authentication/Bearer Token (Using authToken):

authToken = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJEb21haW4iOiJUQTAxMzAiLCJpYXQiOjE2NzU2NzM2NTksImV4cCI6MTY3ODI2NTY1OX0.t8A8vYWiIinyCWNOlk6q2IA-C2KajvUUTB8uD_4dQOM'

url = 'https://bxray-dev.kockpit.in:6789/test/tokenauth'

body = {}

method="post" or "get"

flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,authToken=authToken,sep="_")

## 3. Without  Authentication (Using url only):

url='https://bxray-dev.kockpit.in:6789/test/withoutParameter'

method="post" or "get"

flatdf= kpt_flatten_api(spark=spark,url=url,method=method,sep="_")


## Example Code:

import requests

from pyspark.sql import SparkSession, Row

import json

from pyspark.sql.functions import *

from pyspark.sql.types import *

from FlattenApi_func import flatten_api

spark = SparkSession.builder.appName("ReadDarwinAPIWithAuth").getOrCreate()

username = "example"

password = "examplepassword"

api_key='ExampleAPIKEY'

processed_from='04-01-2023 00:00:00'

processed_to='04-01-2023 23:55:00'

url="https://example"

method="post"

body = {
    "api_key": api_key,
    "processed_from": processed_from,
    "processed_to": processed_to
    }

flatdf= flatten_api(spark=spark,url=url,method=method,body=body,username=username,password=password,sep="_")

flatdf.show(2)

# 2. kpt_flatten_json(to flatten dataframe):

## Example
Consider a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_json function to flatten this JSON data structure into a flat table as shown:

data = 
[  
     
    {
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Choclate" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]}

]

# Convert the data to a complex DataFrame

complexdf = spark.createDataFrame(data=data)

## Complex Dataframe


| batters| id| name | ppu | topping| type               |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |
| {[{1001, Regular}... | 0001 | Cake|0.55 | [{5001, None}, {5... | donut|

## Installation

$ [sudo] pip install kpt_flatten_json

## Function

kpt_flatten_json: Returns a flattened dataframe


# Flatten the DataFrame
## Usage
To use the kpt_flatten_json function, import the function and pass in your complex DataFrame as a parameter:
```python
from kpt_flatten_json import *

flatdf= kpt_flatten_json(complexdf)
```


## Flattened Dataframe
| id| name| ppu | type | topping_id | topping_type               | batters_batter_id | batters_batter_type               |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |
|0001|Cake|0.55|donut| 5001| None| 1001| Regular|
|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|
|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|
|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|
|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|
|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|
|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "kn-flatten-json",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "flatten,json,normalize,normalize pyspark dataframe,complex datatypes,flatten dataframe",
    "author": "Sukriti",
    "author_email": "sukriti.saluja@kockpit.in",
    "download_url": "https://files.pythonhosted.org/packages/76/5a/0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336/kn_flatten_json-0.0.10.tar.gz",
    "platform": null,
    "description": "# Flatten Nested  API Data/Dataframe with kpt_flatten_json Package\n\nThe kpt_flatten_json package simplifies the process of converting complex JSON API/dataframe data into a structured and easy-to-analyze flat dataframe. It offers a user-friendly function that transforms the complex JSON data into a tabular format, where each row represents a record and each column contains a specific attribute or value. This package is designed to make data analysis and processing tasks more accessible, even for users with limited programming experience. It allows you to extract relevant information from deep within the nested structure, enabling efficient data analysis and visualization. \n\n\n# kpt_flatten_json Package consists of two functions: \n\n     1. kpt_flatten_api(to flatten API data)\n\n     2. kpt_flatten_json(to flatten dataframe)\n\n\n# 1. kpt_flatten_api(to flatten API data)\n\nConsider an API , which consists a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_api function to flatten this API data structure into a flat table as shown:\n\n## API Information:\n\nurl='https://bxray-dev.kockpit.in:6789/userauthentication'\n\nmethod='post'\n\nuid = \"xyz\"\n\npwd = \"12345\"\n\nbody = {\n     \"userId\": uid,\n     \"password\": pwd\n     }\n\n\n## API Data\n \n{\n    \"id\": \"0001\",\n    \"type\": \"donut\",\n    \"name\": \"Cake\",\n    \"ppu\": 0.55,\n    \"batters\":\n        {\n            \"batter\":\n                [\n                    { \"id\": \"1001\", \"type\": \"Regular\" },\n                    { \"id\": \"1002\", \"type\": \"Chocolate\" },\n                    { \"id\": \"1003\", \"type\": \"Blueberry\" },\n                    { \"id\": \"1004\", \"type\": \"Devil's Choclate\" }\n                ]\n        },\n    \"topping\":\n        [\n            { \"id\": \"5001\", \"type\": \"None\" },\n            { \"id\": \"5002\", \"type\": \"Glazed\" },\n            { \"id\": \"5005\", \"type\": \"Sugar\" },\n            { \"id\": \"5007\", \"type\": \"Powdered Sugar\" },\n            { \"id\": \"5006\", \"type\": \"Chocolate\" },\n            { \"id\": \"5003\", \"type\": \"Chocolate\" },\n            { \"id\": \"5004\", \"type\": \"Maple\" }\n        ]}\n\n\n## Installation\n\n$ [sudo] pip install kpt_flatten_json\n\n## Function\n\nkpt_flatten_api: Returns a flattened data from API\n\n## Usage\n\nTo use the kpt_flatten_api function, import the function and pass the required API parameters:\n```python\n\nfrom kpt_flatten_json import *\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep=\"_\")\n```\n\n## Flattened Data from API\n\n| id| name| ppu | type | topping_id | topping_type               | batters_batter_id | batters_batter_type               |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |\n|0001|Cake|0.55|donut| 5001| None| 1001| Regular|\n|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|\n|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|\n|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|\n|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|\n\n# Note:  \n\n1. User need to pass spark session as variable in kpt_flatten_api function.\n\n2. Function arguments must be same as specified below.\n\n| Parameters| Type| \n| :-------- | :------- | \n| spark | SparkSession |\n| url | String |\n| method | Post or Get (String) |\n| body | Dictionary |\n| username | String |\n| password | String |\n| sep=\"_\" | (fixed no other separator will be acceptable) |\n\n# Methods To Fetch Data From API:\n## 1. Basic Authentication (Using username , password or API Key):\n\nurl='https://bxray-dev.kockpit.in:6789/userauthentication'\n\nmethod='post'\n\nuid = \"xyz\"\n\npwd = \"12345\"\n\nbody = {\n     \"userId\": uid,\n     \"password\": pwd\n     }\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep=\"_\")\n\n\n## 2. With OAuthToken Authentication/Bearer Token (Using authToken):\n\nauthToken = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJEb21haW4iOiJUQTAxMzAiLCJpYXQiOjE2NzU2NzM2NTksImV4cCI6MTY3ODI2NTY1OX0.t8A8vYWiIinyCWNOlk6q2IA-C2KajvUUTB8uD_4dQOM'\n\nurl = 'https://bxray-dev.kockpit.in:6789/test/tokenauth'\n\nbody = {}\n\nmethod=\"post\" or \"get\"\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,authToken=authToken,sep=\"_\")\n\n## 3. Without  Authentication (Using url only):\n\nurl='https://bxray-dev.kockpit.in:6789/test/withoutParameter'\n\nmethod=\"post\" or \"get\"\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,sep=\"_\")\n\n\n## Example Code:\n\nimport requests\n\nfrom pyspark.sql import SparkSession, Row\n\nimport json\n\nfrom pyspark.sql.functions import *\n\nfrom pyspark.sql.types import *\n\nfrom FlattenApi_func import flatten_api\n\nspark = SparkSession.builder.appName(\"ReadDarwinAPIWithAuth\").getOrCreate()\n\nusername = \"example\"\n\npassword = \"examplepassword\"\n\napi_key='ExampleAPIKEY'\n\nprocessed_from='04-01-2023 00:00:00'\n\nprocessed_to='04-01-2023 23:55:00'\n\nurl=\"https://example\"\n\nmethod=\"post\"\n\nbody = {\n    \"api_key\": api_key,\n    \"processed_from\": processed_from,\n    \"processed_to\": processed_to\n    }\n\nflatdf= flatten_api(spark=spark,url=url,method=method,body=body,username=username,password=password,sep=\"_\")\n\nflatdf.show(2)\n\n# 2. kpt_flatten_json(to flatten dataframe):\n\n## Example\nConsider a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_json function to flatten this JSON data structure into a flat table as shown:\n\ndata = \n[  \n     \n    {\n    \"id\": \"0001\",\n    \"type\": \"donut\",\n    \"name\": \"Cake\",\n    \"ppu\": 0.55,\n    \"batters\":\n        {\n            \"batter\":\n                [\n                    { \"id\": \"1001\", \"type\": \"Regular\" },\n                    { \"id\": \"1002\", \"type\": \"Chocolate\" },\n                    { \"id\": \"1003\", \"type\": \"Blueberry\" },\n                    { \"id\": \"1004\", \"type\": \"Devil's Choclate\" }\n                ]\n        },\n    \"topping\":\n        [\n            { \"id\": \"5001\", \"type\": \"None\" },\n            { \"id\": \"5002\", \"type\": \"Glazed\" },\n            { \"id\": \"5005\", \"type\": \"Sugar\" },\n            { \"id\": \"5007\", \"type\": \"Powdered Sugar\" },\n            { \"id\": \"5006\", \"type\": \"Chocolate\" },\n            { \"id\": \"5003\", \"type\": \"Chocolate\" },\n            { \"id\": \"5004\", \"type\": \"Maple\" }\n        ]}\n\n]\n\n# Convert the data to a complex DataFrame\n\ncomplexdf = spark.createDataFrame(data=data)\n\n## Complex Dataframe\n\n\n| batters| id| name | ppu | topping| type               |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |\n| {[{1001, Regular}... | 0001 | Cake|0.55 | [{5001, None}, {5... | donut|\n\n## Installation\n\n$ [sudo] pip install kpt_flatten_json\n\n## Function\n\nkpt_flatten_json: Returns a flattened dataframe\n\n\n# Flatten the DataFrame\n## Usage\nTo use the kpt_flatten_json function, import the function and pass in your complex DataFrame as a parameter:\n```python\nfrom kpt_flatten_json import *\n\nflatdf= kpt_flatten_json(complexdf)\n```\n\n\n## Flattened Dataframe\n| id| name| ppu | type | topping_id | topping_type               | batters_batter_id | batters_batter_type               |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |\n|0001|Cake|0.55|donut| 5001| None| 1001| Regular|\n|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|\n|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|\n|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|\n|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "kn_flatten_json",
    "version": "0.0.10",
    "project_urls": null,
    "split_keywords": [
        "flatten",
        "json",
        "normalize",
        "normalize pyspark dataframe",
        "complex datatypes",
        "flatten dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2605f0be17faaff4fe32d20b752f5c38c59097c346495cca92adb7c41ac81edd",
                "md5": "e0b687c717200c186e7020f5e7512667",
                "sha256": "3c1c2ae1e239deeb010bf1900f78e81f56e75bfc804824ea2ea8e40f0c2bf762"
            },
            "downloads": -1,
            "filename": "kn_flatten_json-0.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e0b687c717200c186e7020f5e7512667",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6669,
            "upload_time": "2023-07-18T11:35:35",
            "upload_time_iso_8601": "2023-07-18T11:35:35.654841Z",
            "url": "https://files.pythonhosted.org/packages/26/05/f0be17faaff4fe32d20b752f5c38c59097c346495cca92adb7c41ac81edd/kn_flatten_json-0.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "765a0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336",
                "md5": "4ef39cdb12cd8219289ce5dcdf215d5e",
                "sha256": "9e77ed0a5ea4f7a09eaab5728bd9b5fd9b84f7c926d81c9cac59900b9ad9e3d9"
            },
            "downloads": -1,
            "filename": "kn_flatten_json-0.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "4ef39cdb12cd8219289ce5dcdf215d5e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6536,
            "upload_time": "2023-07-18T11:35:37",
            "upload_time_iso_8601": "2023-07-18T11:35:37.515458Z",
            "url": "https://files.pythonhosted.org/packages/76/5a/0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336/kn_flatten_json-0.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-18 11:35:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "kn-flatten-json"
}
        
Elapsed time: 4.01986s