voiladata


Namevoiladata JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryA versatile Python library to read various file formats into a pandas DataFrame, with robust handling of nested data.
upload_time2025-07-25 14:46:36
maintainerNone
docs_urlNone
authorDebrup Mukherjee
requires_python>=3.8
licenseMIT
keywords pandas dataframe etl json yaml toml parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # VoilaData
A versatile Python library to read various file formats into a pandas DataFrame, with robust handling of deeply nested data structures.

This package provides a single, convenient class `DataFrameReader` that automatically detects the file type from its extension and uses the best method to load it into a pandas DataFrame. For nested formats like JSON and YAML, it automatically flattens the data into a wide, easy-to-use format.

## Key Features

- **Simple Interface**: A single `read()` method for all supported file types.
- **Wide Format Support**: Handles a large variety of common data file formats.
- **Intelligent Flattening**: Converts deeply nested JSON and YAML into a flat, wide DataFrame.
- **Extensible**: Easily add support for more file types.

## Supported Formats

- `.csv`, `.tsv`
- `.xls`, `.xlsx`
- `.json`, `.ndjson`
- `.yaml`, `.yml`
- `.toml`
- `.parquet`
- `.orc`
- `.feather`
- `.avro`
- `.html`
- `.dta` (Stata)
- `.sav` (SPSS)


## Usage

Using the `DataFrameReader` is straightforward.

```python
from voiladata import DataFrameReader

# 1. Initialize the reader with a file path
reader = DataFrameReader('path/to/your/data.csv')

# 2. Read the file into a pandas DataFrame
df = reader.read()

print(df.head())
```

### Reading Nested JSON

The real power comes when dealing with nested data. Consider this JSON file (`data.json`):

```json
[
    {
        "id": "user1",
        "profile": {
            "name": "Alice",
            "age": 30
        },
        "logins": [
            {"timestamp": "2024-01-10T10:00:00Z", "ip": "192.168.1.1"},
            {"timestamp": "2024-01-11T12:30:00Z", "ip": "192.168.1.2"}
        ]
    }
]
```

`DataFrameReader` will automatically flatten it:

```python
from voiladata import DataFrameReader

# Read the nested JSON
reader = DataFrameReader('data.json')
df = reader.read()

# The resulting DataFrame is wide and flat
print(df)
```

**Output:**

| id    | profile_name | profile_age | logins_0_timestamp      | logins_0_ip | logins_1_timestamp      | logins_1_ip |
|:------|:-------------|:------------|:------------------------|:------------|:------------------------|:------------|
| user1 | Alice        | 30          | 2024-01-10T10:00:00Z | 192.168.1.1 | 2024-01-11T12:30:00Z | 192.168.1.2 |


## License

This project is licensed under the MIT [License](LICENSE).````

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "voiladata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pandas, dataframe, ETL, json, yaml, toml, parquet",
    "author": "Debrup Mukherjee",
    "author_email": "Debrup Mukherjee <dmukherjeetextiles@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/11/d4/97497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5/voiladata-1.0.1.tar.gz",
    "platform": null,
    "description": "# VoilaData\r\nA versatile Python library to read various file formats into a pandas DataFrame, with robust handling of deeply nested data structures.\r\n\r\nThis package provides a single, convenient class `DataFrameReader` that automatically detects the file type from its extension and uses the best method to load it into a pandas DataFrame. For nested formats like JSON and YAML, it automatically flattens the data into a wide, easy-to-use format.\r\n\r\n## Key Features\r\n\r\n- **Simple Interface**: A single `read()` method for all supported file types.\r\n- **Wide Format Support**: Handles a large variety of common data file formats.\r\n- **Intelligent Flattening**: Converts deeply nested JSON and YAML into a flat, wide DataFrame.\r\n- **Extensible**: Easily add support for more file types.\r\n\r\n## Supported Formats\r\n\r\n- `.csv`, `.tsv`\r\n- `.xls`, `.xlsx`\r\n- `.json`, `.ndjson`\r\n- `.yaml`, `.yml`\r\n- `.toml`\r\n- `.parquet`\r\n- `.orc`\r\n- `.feather`\r\n- `.avro`\r\n- `.html`\r\n- `.dta` (Stata)\r\n- `.sav` (SPSS)\r\n\r\n\r\n## Usage\r\n\r\nUsing the `DataFrameReader` is straightforward.\r\n\r\n```python\r\nfrom voiladata import DataFrameReader\r\n\r\n# 1. Initialize the reader with a file path\r\nreader = DataFrameReader('path/to/your/data.csv')\r\n\r\n# 2. Read the file into a pandas DataFrame\r\ndf = reader.read()\r\n\r\nprint(df.head())\r\n```\r\n\r\n### Reading Nested JSON\r\n\r\nThe real power comes when dealing with nested data. Consider this JSON file (`data.json`):\r\n\r\n```json\r\n[\r\n    {\r\n        \"id\": \"user1\",\r\n        \"profile\": {\r\n            \"name\": \"Alice\",\r\n            \"age\": 30\r\n        },\r\n        \"logins\": [\r\n            {\"timestamp\": \"2024-01-10T10:00:00Z\", \"ip\": \"192.168.1.1\"},\r\n            {\"timestamp\": \"2024-01-11T12:30:00Z\", \"ip\": \"192.168.1.2\"}\r\n        ]\r\n    }\r\n]\r\n```\r\n\r\n`DataFrameReader` will automatically flatten it:\r\n\r\n```python\r\nfrom voiladata import DataFrameReader\r\n\r\n# Read the nested JSON\r\nreader = DataFrameReader('data.json')\r\ndf = reader.read()\r\n\r\n# The resulting DataFrame is wide and flat\r\nprint(df)\r\n```\r\n\r\n**Output:**\r\n\r\n| id    | profile_name | profile_age | logins_0_timestamp      | logins_0_ip | logins_1_timestamp      | logins_1_ip |\r\n|:------|:-------------|:------------|:------------------------|:------------|:------------------------|:------------|\r\n| user1 | Alice        | 30          | 2024-01-10T10:00:00Z | 192.168.1.1 | 2024-01-11T12:30:00Z | 192.168.1.2 |\r\n\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT [License](LICENSE).````\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A versatile Python library to read various file formats into a pandas DataFrame, with robust handling of nested data.",
    "version": "1.0.1",
    "project_urls": {
        "Repository": "https://github.com/Dmukherjeetextiles/VoilaData"
    },
    "split_keywords": [
        "pandas",
        " dataframe",
        " etl",
        " json",
        " yaml",
        " toml",
        " parquet"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f17a0834dda892cae06b563208f9e9a723107af1c91d2156080c3a79b1a04245",
                "md5": "437b2cead5498adef391d3df7187d7d8",
                "sha256": "ca54129b066f5dc6b699928cdb23abdc45c8f9aa715f5a9dd15869bc29d79b30"
            },
            "downloads": -1,
            "filename": "voiladata-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "437b2cead5498adef391d3df7187d7d8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 5969,
            "upload_time": "2025-07-25T14:46:35",
            "upload_time_iso_8601": "2025-07-25T14:46:35.339581Z",
            "url": "https://files.pythonhosted.org/packages/f1/7a/0834dda892cae06b563208f9e9a723107af1c91d2156080c3a79b1a04245/voiladata-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "11d497497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5",
                "md5": "33116035530c391f5970ed5bdae43017",
                "sha256": "4efd24426da5a6687bfbe1fa3d70b1e4e21f2d49fd6c2d28442b2f938da85f9d"
            },
            "downloads": -1,
            "filename": "voiladata-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "33116035530c391f5970ed5bdae43017",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6291,
            "upload_time": "2025-07-25T14:46:36",
            "upload_time_iso_8601": "2025-07-25T14:46:36.673458Z",
            "url": "https://files.pythonhosted.org/packages/11/d4/97497cf1e953fc9b4a61526c7410793e1321ab4b3e971bf5356d329a36f5/voiladata-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 14:46:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Dmukherjeetextiles",
    "github_project": "VoilaData",
    "github_not_found": true,
    "lcname": "voiladata"
}
        
Elapsed time: 0.43366s