# Flatten Nested API Data/Dataframe with kpt_flatten_json Package
The kpt_flatten_json package simplifies the process of converting complex JSON API/dataframe data into a structured and easy-to-analyze flat dataframe. It offers a user-friendly function that transforms the complex JSON data into a tabular format, where each row represents a record and each column contains a specific attribute or value. This package is designed to make data analysis and processing tasks more accessible, even for users with limited programming experience. It allows you to extract relevant information from deep within the nested structure, enabling efficient data analysis and visualization.
# kpt_flatten_json Package consists of two functions:
1. kpt_flatten_api(to flatten API data)
2. kpt_flatten_json(to flatten dataframe)
# 1. kpt_flatten_api(to flatten API data)
Consider an API , which consists a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_api function to flatten this API data structure into a flat table as shown:
## API Information:
url='https://bxray-dev.kockpit.in:6789/userauthentication'
method='post'
uid = "xyz"
pwd = "12345"
body = {
"userId": uid,
"password": pwd
}
## API Data
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Choclate" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]}
## Installation
$ [sudo] pip install kpt_flatten_json
## Function
kpt_flatten_api: Returns a flattened data from API
## Usage
To use the kpt_flatten_api function, import the function and pass the required API parameters:
```python
from kpt_flatten_json import *
flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep="_")
```
## Flattened Data from API
| id| name| ppu | type | topping_id | topping_type | batters_batter_id | batters_batter_type |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |
|0001|Cake|0.55|donut| 5001| None| 1001| Regular|
|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|
|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|
|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|
|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|
|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|
|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|
# Note:
1. User need to pass spark session as variable in kpt_flatten_api function.
2. Function arguments must be same as specified below.
| Parameters| Type|
| :-------- | :------- |
| spark | SparkSession |
| url | String |
| method | Post or Get (String) |
| body | Dictionary |
| username | String |
| password | String |
| sep="_" | (fixed no other separator will be acceptable) |
# Methods To Fetch Data From API:
## 1. Basic Authentication (Using username , password or API Key):
url='https://bxray-dev.kockpit.in:6789/userauthentication'
method='post'
uid = "xyz"
pwd = "12345"
body = {
"userId": uid,
"password": pwd
}
flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep="_")
## 2. With OAuthToken Authentication/Bearer Token (Using authToken):
authToken = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJEb21haW4iOiJUQTAxMzAiLCJpYXQiOjE2NzU2NzM2NTksImV4cCI6MTY3ODI2NTY1OX0.t8A8vYWiIinyCWNOlk6q2IA-C2KajvUUTB8uD_4dQOM'
url = 'https://bxray-dev.kockpit.in:6789/test/tokenauth'
body = {}
method="post" or "get"
flatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,authToken=authToken,sep="_")
## 3. Without Authentication (Using url only):
url='https://bxray-dev.kockpit.in:6789/test/withoutParameter'
method="post" or "get"
flatdf= kpt_flatten_api(spark=spark,url=url,method=method,sep="_")
## Example Code:
import requests
from pyspark.sql import SparkSession, Row
import json
from pyspark.sql.functions import *
from pyspark.sql.types import *
from FlattenApi_func import flatten_api
spark = SparkSession.builder.appName("ReadDarwinAPIWithAuth").getOrCreate()
username = "example"
password = "examplepassword"
api_key='ExampleAPIKEY'
processed_from='04-01-2023 00:00:00'
processed_to='04-01-2023 23:55:00'
url="https://example"
method="post"
body = {
"api_key": api_key,
"processed_from": processed_from,
"processed_to": processed_to
}
flatdf= flatten_api(spark=spark,url=url,method=method,body=body,username=username,password=password,sep="_")
flatdf.show(2)
# 2. kpt_flatten_json(to flatten dataframe):
## Example
Consider a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_json function to flatten this JSON data structure into a flat table as shown:
data =
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Choclate" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]}
]
# Convert the data to a complex DataFrame
complexdf = spark.createDataFrame(data=data)
## Complex Dataframe
| batters| id| name | ppu | topping| type |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |
| {[{1001, Regular}... | 0001 | Cake|0.55 | [{5001, None}, {5... | donut|
## Installation
$ [sudo] pip install kpt_flatten_json
## Function
kpt_flatten_json: Returns a flattened dataframe
# Flatten the DataFrame
## Usage
To use the kpt_flatten_json function, import the function and pass in your complex DataFrame as a parameter:
```python
from kpt_flatten_json import *
flatdf= kpt_flatten_json(complexdf)
```
## Flattened Dataframe
| id| name| ppu | type | topping_id | topping_type | batters_batter_id | batters_batter_type |
| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |
|0001|Cake|0.55|donut| 5001| None| 1001| Regular|
|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|
|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|
|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|
|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|
|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|
|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|
Raw data
{
"_id": null,
"home_page": "",
"name": "kn-flatten-json",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "flatten,json,normalize,normalize pyspark dataframe,complex datatypes,flatten dataframe",
"author": "Sukriti",
"author_email": "sukriti.saluja@kockpit.in",
"download_url": "https://files.pythonhosted.org/packages/76/5a/0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336/kn_flatten_json-0.0.10.tar.gz",
"platform": null,
"description": "# Flatten Nested API Data/Dataframe with kpt_flatten_json Package\n\nThe kpt_flatten_json package simplifies the process of converting complex JSON API/dataframe data into a structured and easy-to-analyze flat dataframe. It offers a user-friendly function that transforms the complex JSON data into a tabular format, where each row represents a record and each column contains a specific attribute or value. This package is designed to make data analysis and processing tasks more accessible, even for users with limited programming experience. It allows you to extract relevant information from deep within the nested structure, enabling efficient data analysis and visualization. \n\n\n# kpt_flatten_json Package consists of two functions: \n\n 1. kpt_flatten_api(to flatten API data)\n\n 2. kpt_flatten_json(to flatten dataframe)\n\n\n# 1. kpt_flatten_api(to flatten API data)\n\nConsider an API , which consists a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_api function to flatten this API data structure into a flat table as shown:\n\n## API Information:\n\nurl='https://bxray-dev.kockpit.in:6789/userauthentication'\n\nmethod='post'\n\nuid = \"xyz\"\n\npwd = \"12345\"\n\nbody = {\n \"userId\": uid,\n \"password\": pwd\n }\n\n\n## API Data\n \n{\n \"id\": \"0001\",\n \"type\": \"donut\",\n \"name\": \"Cake\",\n \"ppu\": 0.55,\n \"batters\":\n {\n \"batter\":\n [\n { \"id\": \"1001\", \"type\": \"Regular\" },\n { \"id\": \"1002\", \"type\": \"Chocolate\" },\n { \"id\": \"1003\", \"type\": \"Blueberry\" },\n { \"id\": \"1004\", \"type\": \"Devil's Choclate\" }\n ]\n },\n \"topping\":\n [\n { \"id\": \"5001\", \"type\": \"None\" },\n { \"id\": \"5002\", \"type\": \"Glazed\" },\n { \"id\": \"5005\", \"type\": \"Sugar\" },\n { \"id\": \"5007\", \"type\": \"Powdered Sugar\" },\n { \"id\": \"5006\", \"type\": \"Chocolate\" },\n { \"id\": \"5003\", \"type\": \"Chocolate\" },\n { \"id\": \"5004\", \"type\": \"Maple\" }\n ]}\n\n\n## Installation\n\n$ [sudo] pip install kpt_flatten_json\n\n## Function\n\nkpt_flatten_api: Returns a flattened data from API\n\n## Usage\n\nTo use the kpt_flatten_api function, import the function and pass the required API parameters:\n```python\n\nfrom kpt_flatten_json import *\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep=\"_\")\n```\n\n## Flattened Data from API\n\n| id| name| ppu | type | topping_id | topping_type | batters_batter_id | batters_batter_type |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |\n|0001|Cake|0.55|donut| 5001| None| 1001| Regular|\n|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|\n|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|\n|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|\n|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|\n\n# Note: \n\n1. User need to pass spark session as variable in kpt_flatten_api function.\n\n2. Function arguments must be same as specified below.\n\n| Parameters| Type| \n| :-------- | :------- | \n| spark | SparkSession |\n| url | String |\n| method | Post or Get (String) |\n| body | Dictionary |\n| username | String |\n| password | String |\n| sep=\"_\" | (fixed no other separator will be acceptable) |\n\n# Methods To Fetch Data From API:\n## 1. Basic Authentication (Using username , password or API Key):\n\nurl='https://bxray-dev.kockpit.in:6789/userauthentication'\n\nmethod='post'\n\nuid = \"xyz\"\n\npwd = \"12345\"\n\nbody = {\n \"userId\": uid,\n \"password\": pwd\n }\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,username=uid,password=pwd,sep=\"_\")\n\n\n## 2. With OAuthToken Authentication/Bearer Token (Using authToken):\n\nauthToken = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJEb21haW4iOiJUQTAxMzAiLCJpYXQiOjE2NzU2NzM2NTksImV4cCI6MTY3ODI2NTY1OX0.t8A8vYWiIinyCWNOlk6q2IA-C2KajvUUTB8uD_4dQOM'\n\nurl = 'https://bxray-dev.kockpit.in:6789/test/tokenauth'\n\nbody = {}\n\nmethod=\"post\" or \"get\"\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,body=body,authToken=authToken,sep=\"_\")\n\n## 3. Without Authentication (Using url only):\n\nurl='https://bxray-dev.kockpit.in:6789/test/withoutParameter'\n\nmethod=\"post\" or \"get\"\n\nflatdf= kpt_flatten_api(spark=spark,url=url,method=method,sep=\"_\")\n\n\n## Example Code:\n\nimport requests\n\nfrom pyspark.sql import SparkSession, Row\n\nimport json\n\nfrom pyspark.sql.functions import *\n\nfrom pyspark.sql.types import *\n\nfrom FlattenApi_func import flatten_api\n\nspark = SparkSession.builder.appName(\"ReadDarwinAPIWithAuth\").getOrCreate()\n\nusername = \"example\"\n\npassword = \"examplepassword\"\n\napi_key='ExampleAPIKEY'\n\nprocessed_from='04-01-2023 00:00:00'\n\nprocessed_to='04-01-2023 23:55:00'\n\nurl=\"https://example\"\n\nmethod=\"post\"\n\nbody = {\n \"api_key\": api_key,\n \"processed_from\": processed_from,\n \"processed_to\": processed_to\n }\n\nflatdf= flatten_api(spark=spark,url=url,method=method,body=body,username=username,password=password,sep=\"_\")\n\nflatdf.show(2)\n\n# 2. kpt_flatten_json(to flatten dataframe):\n\n## Example\nConsider a list of nested dictionaries containing details about batters and toppings. We can use the kpt_flatten_json function to flatten this JSON data structure into a flat table as shown:\n\ndata = \n[ \n \n {\n \"id\": \"0001\",\n \"type\": \"donut\",\n \"name\": \"Cake\",\n \"ppu\": 0.55,\n \"batters\":\n {\n \"batter\":\n [\n { \"id\": \"1001\", \"type\": \"Regular\" },\n { \"id\": \"1002\", \"type\": \"Chocolate\" },\n { \"id\": \"1003\", \"type\": \"Blueberry\" },\n { \"id\": \"1004\", \"type\": \"Devil's Choclate\" }\n ]\n },\n \"topping\":\n [\n { \"id\": \"5001\", \"type\": \"None\" },\n { \"id\": \"5002\", \"type\": \"Glazed\" },\n { \"id\": \"5005\", \"type\": \"Sugar\" },\n { \"id\": \"5007\", \"type\": \"Powdered Sugar\" },\n { \"id\": \"5006\", \"type\": \"Chocolate\" },\n { \"id\": \"5003\", \"type\": \"Chocolate\" },\n { \"id\": \"5004\", \"type\": \"Maple\" }\n ]}\n\n]\n\n# Convert the data to a complex DataFrame\n\ncomplexdf = spark.createDataFrame(data=data)\n\n## Complex Dataframe\n\n\n| batters| id| name | ppu | topping| type |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |\n| {[{1001, Regular}... | 0001 | Cake|0.55 | [{5001, None}, {5... | donut|\n\n## Installation\n\n$ [sudo] pip install kpt_flatten_json\n\n## Function\n\nkpt_flatten_json: Returns a flattened dataframe\n\n\n# Flatten the DataFrame\n## Usage\nTo use the kpt_flatten_json function, import the function and pass in your complex DataFrame as a parameter:\n```python\nfrom kpt_flatten_json import *\n\nflatdf= kpt_flatten_json(complexdf)\n```\n\n\n## Flattened Dataframe\n| id| name| ppu | type | topping_id | topping_type | batters_batter_id | batters_batter_type |\n| :-------- | :------- | :------------------------- |:-------- | :------- | :------------------------- |:-------- | :------- |\n|0001|Cake|0.55|donut| 5001| None| 1001| Regular|\n|0001|Cake|0.55|donut| 5001| None| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5001| None| 1003| Blueberry|\n|0001|Cake|0.55|donut| 5001| None| 1004| Devil's Food|\n|0001|Cake|0.55|donut| 5002| Glazed| 1001| Regular|\n|0001|Cake|0.55|donut| 5002| Glazed| 1002| Chocolate|\n|0001|Cake|0.55|donut| 5002| Glazed| 1003| Blueberry|\n\n",
"bugtrack_url": null,
"license": "",
"summary": "kn_flatten_json",
"version": "0.0.10",
"project_urls": null,
"split_keywords": [
"flatten",
"json",
"normalize",
"normalize pyspark dataframe",
"complex datatypes",
"flatten dataframe"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2605f0be17faaff4fe32d20b752f5c38c59097c346495cca92adb7c41ac81edd",
"md5": "e0b687c717200c186e7020f5e7512667",
"sha256": "3c1c2ae1e239deeb010bf1900f78e81f56e75bfc804824ea2ea8e40f0c2bf762"
},
"downloads": -1,
"filename": "kn_flatten_json-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e0b687c717200c186e7020f5e7512667",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6669,
"upload_time": "2023-07-18T11:35:35",
"upload_time_iso_8601": "2023-07-18T11:35:35.654841Z",
"url": "https://files.pythonhosted.org/packages/26/05/f0be17faaff4fe32d20b752f5c38c59097c346495cca92adb7c41ac81edd/kn_flatten_json-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "765a0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336",
"md5": "4ef39cdb12cd8219289ce5dcdf215d5e",
"sha256": "9e77ed0a5ea4f7a09eaab5728bd9b5fd9b84f7c926d81c9cac59900b9ad9e3d9"
},
"downloads": -1,
"filename": "kn_flatten_json-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "4ef39cdb12cd8219289ce5dcdf215d5e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6536,
"upload_time": "2023-07-18T11:35:37",
"upload_time_iso_8601": "2023-07-18T11:35:37.515458Z",
"url": "https://files.pythonhosted.org/packages/76/5a/0e3261f6eb9f8952aa69737fb100956e55e30c68ca63fe0d7e409a156336/kn_flatten_json-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-18 11:35:37",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "kn-flatten-json"
}