Name | voiladata JSON |
Version |
1.0.1
JSON |
| download |
home_page | None |
Summary | A versatile Python library to read various file formats into a pandas DataFrame, with robust handling of nested data. |
upload_time | 2025-07-25 14:46:36 |
maintainer | None |
docs_url | None |
author | Debrup Mukherjee |
requires_python | >=3.8 |
license | MIT |
keywords |
pandas
dataframe
etl
json
yaml
toml
parquet
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# VoilaData
A versatile Python library to read various file formats into a pandas DataFrame, with robust handling of deeply nested data structures.
This package provides a single, convenient class `DataFrameReader` that automatically detects the file type from its extension and uses the best method to load it into a pandas DataFrame. For nested formats like JSON and YAML, it automatically flattens the data into a wide, easy-to-use format.
## Key Features
- **Simple Interface**: A single `read()` method for all supported file types.
- **Wide Format Support**: Handles a large variety of common data file formats.
- **Intelligent Flattening**: Converts deeply nested JSON and YAML into a flat, wide DataFrame.
- **Extensible**: Easily add support for more file types.
## Supported Formats
- `.csv`, `.tsv`
- `.xls`, `.xlsx`
- `.json`, `.ndjson`
- `.yaml`, `.yml`
- `.toml`
- `.parquet`
- `.orc`
- `.feather`
- `.avro`
- `.html`
- `.dta` (Stata)
- `.sav` (SPSS)
## Usage
Using the `DataFrameReader` is straightforward.
```python
from voiladata import DataFrameReader
# 1. Initialize the reader with a file path
reader = DataFrameReader('path/to/your/data.csv')
# 2. Read the file into a pandas DataFrame
df = reader.read()
print(df.head())
```
### Reading Nested JSON
The real power comes when dealing with nested data. Consider this JSON file (`data.json`):
```json
[
{
"id": "user1",
"profile": {
"name": "Alice",
"age": 30
},
"logins": [
{"timestamp": "2024-01-10T10:00:00Z", "ip": "192.168.1.1"},
{"timestamp": "2024-01-11T12:30:00Z", "ip": "192.168.1.2"}
]
}
]
```
`DataFrameReader` will automatically flatten it:
```python
from voiladata import DataFrameReader
# Read the nested JSON
reader = DataFrameReader('data.json')
df = reader.read()
# The resulting DataFrame is wide and flat
print(df)
```
**Output:**
| id | profile_name | profile_age | logins_0_timestamp | logins_0_ip | logins_1_timestamp | logins_1_ip |
|:------|:-------------|:------------|:------------------------|:------------|:------------------------|:------------|
| user1 | Alice | 30 | 2024-01-10T10:00:00Z | 192.168.1.1 | 2024-01-11T12:30:00Z | 192.168.1.2 |
## License
This project is licensed under the MIT [License](LICENSE).````
Raw data
{
"_id": null,
"home_page": null,
"name": "voiladata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pandas, dataframe, ETL, json, yaml, toml, parquet",
"author": "Debrup Mukherjee",
"author_email": "Debrup Mukherjee <dmukherjeetextiles@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/11/d4/97497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5/voiladata-1.0.1.tar.gz",
"platform": null,
"description": "# VoilaData\r\nA versatile Python library to read various file formats into a pandas DataFrame, with robust handling of deeply nested data structures.\r\n\r\nThis package provides a single, convenient class `DataFrameReader` that automatically detects the file type from its extension and uses the best method to load it into a pandas DataFrame. For nested formats like JSON and YAML, it automatically flattens the data into a wide, easy-to-use format.\r\n\r\n## Key Features\r\n\r\n- **Simple Interface**: A single `read()` method for all supported file types.\r\n- **Wide Format Support**: Handles a large variety of common data file formats.\r\n- **Intelligent Flattening**: Converts deeply nested JSON and YAML into a flat, wide DataFrame.\r\n- **Extensible**: Easily add support for more file types.\r\n\r\n## Supported Formats\r\n\r\n- `.csv`, `.tsv`\r\n- `.xls`, `.xlsx`\r\n- `.json`, `.ndjson`\r\n- `.yaml`, `.yml`\r\n- `.toml`\r\n- `.parquet`\r\n- `.orc`\r\n- `.feather`\r\n- `.avro`\r\n- `.html`\r\n- `.dta` (Stata)\r\n- `.sav` (SPSS)\r\n\r\n\r\n## Usage\r\n\r\nUsing the `DataFrameReader` is straightforward.\r\n\r\n```python\r\nfrom voiladata import DataFrameReader\r\n\r\n# 1. Initialize the reader with a file path\r\nreader = DataFrameReader('path/to/your/data.csv')\r\n\r\n# 2. Read the file into a pandas DataFrame\r\ndf = reader.read()\r\n\r\nprint(df.head())\r\n```\r\n\r\n### Reading Nested JSON\r\n\r\nThe real power comes when dealing with nested data. Consider this JSON file (`data.json`):\r\n\r\n```json\r\n[\r\n {\r\n \"id\": \"user1\",\r\n \"profile\": {\r\n \"name\": \"Alice\",\r\n \"age\": 30\r\n },\r\n \"logins\": [\r\n {\"timestamp\": \"2024-01-10T10:00:00Z\", \"ip\": \"192.168.1.1\"},\r\n {\"timestamp\": \"2024-01-11T12:30:00Z\", \"ip\": \"192.168.1.2\"}\r\n ]\r\n }\r\n]\r\n```\r\n\r\n`DataFrameReader` will automatically flatten it:\r\n\r\n```python\r\nfrom voiladata import DataFrameReader\r\n\r\n# Read the nested JSON\r\nreader = DataFrameReader('data.json')\r\ndf = reader.read()\r\n\r\n# The resulting DataFrame is wide and flat\r\nprint(df)\r\n```\r\n\r\n**Output:**\r\n\r\n| id | profile_name | profile_age | logins_0_timestamp | logins_0_ip | logins_1_timestamp | logins_1_ip |\r\n|:------|:-------------|:------------|:------------------------|:------------|:------------------------|:------------|\r\n| user1 | Alice | 30 | 2024-01-10T10:00:00Z | 192.168.1.1 | 2024-01-11T12:30:00Z | 192.168.1.2 |\r\n\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT [License](LICENSE).````\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A versatile Python library to read various file formats into a pandas DataFrame, with robust handling of nested data.",
"version": "1.0.1",
"project_urls": {
"Repository": "https://github.com/Dmukherjeetextiles/VoilaData"
},
"split_keywords": [
"pandas",
" dataframe",
" etl",
" json",
" yaml",
" toml",
" parquet"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f17a0834dda892cae06b563208f9e9a723107af1c91d2156080c3a79b1a04245",
"md5": "437b2cead5498adef391d3df7187d7d8",
"sha256": "ca54129b066f5dc6b699928cdb23abdc45c8f9aa715f5a9dd15869bc29d79b30"
},
"downloads": -1,
"filename": "voiladata-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "437b2cead5498adef391d3df7187d7d8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5969,
"upload_time": "2025-07-25T14:46:35",
"upload_time_iso_8601": "2025-07-25T14:46:35.339581Z",
"url": "https://files.pythonhosted.org/packages/f1/7a/0834dda892cae06b563208f9e9a723107af1c91d2156080c3a79b1a04245/voiladata-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "11d497497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5",
"md5": "33116035530c391f5970ed5bdae43017",
"sha256": "4efd24426da5a6687bfbe1fa3d70b1e4e21f2d49fd6c2d28442b2f938da85f9d"
},
"downloads": -1,
"filename": "voiladata-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "33116035530c391f5970ed5bdae43017",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6291,
"upload_time": "2025-07-25T14:46:36",
"upload_time_iso_8601": "2025-07-25T14:46:36.673458Z",
"url": "https://files.pythonhosted.org/packages/11/d4/97497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5/voiladata-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 14:46:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Dmukherjeetextiles",
"github_project": "VoilaData",
"github_not_found": true,
"lcname": "voiladata"
}