tidyxl

Name	tidyxl JSON
Version	0.1.8 JSON
	download
home_page	None
Summary	Import Excel files into tidy format with cell-level data extraction
upload_time	2025-08-04 14:32:51
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	excel xlsx tidy data spreadsheet openpyxl pandas data-analysis data-science cell-extraction
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <img src="assets/logo.svg" alt="tidyxl Logo" width="400"/>
</p>

[![PyPI version](https://badge.fury.io/py/tidyxl.svg)](https://badge.fury.io/py/tidyxl)
[![PyPI downloads](https://img.shields.io/pypi/dm/tidyxl.svg)](https://pypi.org/project/tidyxl/)
[![Python versions](https://img.shields.io/pypi/pyversions/tidyxl.svg)](https://pypi.org/project/tidyxl/)

A Python package that imports Excel files (.xlsx, .xlsm) into a tidy format where each cell is represented as a single row with detailed metadata. This package is inspired by and closely mirrors the functionality of the R tidyxl package.

## Installation

```bash
pip install tidyxl
```

## Quick Start

```python
from tidyxl import xlsx_cells, xlsx_sheet_names

# Get all sheet names
sheets = xlsx_sheet_names("data.xlsx")
print(f"Found sheets: {sheets}")

# Read all cells from all sheets
cells = xlsx_cells("data.xlsx")
print(f"Total cells: {len(cells)}")

# Read specific sheet only
employees = xlsx_cells("data.xlsx", sheets="Employees")

# Filter for cells with content
content_cells = cells[~cells['is_blank']]
print(content_cells[['sheet', 'address', 'data_type', 'character', 'numeric']].head())
```

## Features

- **Tidy Format**: Each Excel cell becomes one row with comprehensive metadata
- **Complete Cell Information**: Extract values, formulas, formatting, comments, and more
- **Multiple Worksheets**: Process all sheets or specify particular ones
- **Named Ranges**: Extract and analyze Excel named ranges and formulas
- **Data Validation**: Discover data validation rules applied to cells
- **Type Safety**: Separate columns for different data types (numeric, character, logical, date, error)
- **R tidyxl Compatible**: Identical API and output structure to the R package

## Core Functions

### xlsx_cells()
Extract all cell data in tidy format:

```python
from tidyxl import xlsx_cells

# Read all sheets
cells = xlsx_cells("file.xlsx")

# Read specific sheets
cells = xlsx_cells("file.xlsx", sheets=["Sheet1", "Sheet2"])
cells = xlsx_cells("file.xlsx", sheets="Data")

# Include/exclude blank cells
cells = xlsx_cells("file.xlsx", include_blank_cells=False)
```

### xlsx_sheet_names()
List all worksheet names:

```python
from tidyxl import xlsx_sheet_names

sheets = xlsx_sheet_names("file.xlsx")
# Returns: ['Sheet1', 'Data', 'Summary']
```

### xlsx_names()
Extract named ranges and formulas:

```python
from tidyxl import xlsx_names

names = xlsx_names("file.xlsx")
print(names[['name', 'formula', 'sheet', 'is_range']])
```

### xlsx_validation()
Extract data validation rules:

```python
from tidyxl import xlsx_validation

validation = xlsx_validation("file.xlsx")
print(validation[['sheet', 'ref', 'type', 'formula1']])
```

### xlsx_formats()
Extract formatting information:

```python
from tidyxl import xlsx_formats

formats = xlsx_formats("file.xlsx")
# Returns dict with keys: fonts, fills, borders, number_formats
```

## Output Structure

The `xlsx_cells()` function returns a pandas DataFrame with 23 columns matching the R tidyxl package:

| Column | Type | Description |
|--------|------|-------------|
| `sheet` | str | Worksheet name |
| `address` | str | Cell address (A1 notation) |
| `row` | int | Row number |
| `col` | int | Column number |
| `is_blank` | bool | Whether cell has a value |
| `content` | str | Raw cell value before type conversion |
| `data_type` | str | Cell type (character, numeric, logical, date, error, blank) |
| `error` | str | Error value if cell contains error |
| `logical` | bool | Boolean value if cell contains TRUE/FALSE |
| `numeric` | float | Numeric value if cell contains number |
| `date` | datetime | Date value if cell contains date |
| `character` | str | String value if cell contains text |
| `formula` | str | Formula if cell contains formula |
| `is_array` | bool | Whether formula is array formula |
| `formula_ref` | str | Range address for array/shared formulas |
| `formula_group` | int | Formula group identifier |
| `comment` | str | Cell comment text |
| `height` | float | Row height in Excel units |
| `width` | float | Column width in Excel units |
| `row_outline_level` | int | Row outline/grouping level |
| `col_outline_level` | int | Column outline/grouping level |
| `style_format` | str | Style format identifier |
| `local_format_id` | int | Local formatting identifier |

## License

This project is licensed under the MIT License.

## Acknowledgments

- Inspired by the excellent R tidyxl package by Duncan Garmonsway
- Built on top of openpyxl for Excel file processing
- Uses pandas for data structure management

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tidyxl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Wendi Wang <wwd1015@gmail.com>",
    "keywords": "excel, xlsx, tidy, data, spreadsheet, openpyxl, pandas, data-analysis, data-science, cell-extraction",
    "author": null,
    "author_email": "Wendi Wang <wwd1015@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e2/bc/d590aeb3c9c4e8bb27c9f415d2e3d64864f62ff62d4a043aca8260dfcafd/tidyxl-0.1.8.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"assets/logo.svg\" alt=\"tidyxl Logo\" width=\"400\"/>\n</p>\n\n[![PyPI version](https://badge.fury.io/py/tidyxl.svg)](https://badge.fury.io/py/tidyxl)\n[![PyPI downloads](https://img.shields.io/pypi/dm/tidyxl.svg)](https://pypi.org/project/tidyxl/)\n[![Python versions](https://img.shields.io/pypi/pyversions/tidyxl.svg)](https://pypi.org/project/tidyxl/)\n\nA Python package that imports Excel files (.xlsx, .xlsm) into a tidy format where each cell is represented as a single row with detailed metadata. This package is inspired by and closely mirrors the functionality of the R tidyxl package.\n\n## Installation\n\n```bash\npip install tidyxl\n```\n\n## Quick Start\n\n```python\nfrom tidyxl import xlsx_cells, xlsx_sheet_names\n\n# Get all sheet names\nsheets = xlsx_sheet_names(\"data.xlsx\")\nprint(f\"Found sheets: {sheets}\")\n\n# Read all cells from all sheets\ncells = xlsx_cells(\"data.xlsx\")\nprint(f\"Total cells: {len(cells)}\")\n\n# Read specific sheet only\nemployees = xlsx_cells(\"data.xlsx\", sheets=\"Employees\")\n\n# Filter for cells with content\ncontent_cells = cells[~cells['is_blank']]\nprint(content_cells[['sheet', 'address', 'data_type', 'character', 'numeric']].head())\n```\n\n## Features\n\n- **Tidy Format**: Each Excel cell becomes one row with comprehensive metadata\n- **Complete Cell Information**: Extract values, formulas, formatting, comments, and more\n- **Multiple Worksheets**: Process all sheets or specify particular ones\n- **Named Ranges**: Extract and analyze Excel named ranges and formulas\n- **Data Validation**: Discover data validation rules applied to cells\n- **Type Safety**: Separate columns for different data types (numeric, character, logical, date, error)\n- **R tidyxl Compatible**: Identical API and output structure to the R package\n\n## Core Functions\n\n### xlsx_cells()\nExtract all cell data in tidy format:\n\n```python\nfrom tidyxl import xlsx_cells\n\n# Read all sheets\ncells = xlsx_cells(\"file.xlsx\")\n\n# Read specific sheets\ncells = xlsx_cells(\"file.xlsx\", sheets=[\"Sheet1\", \"Sheet2\"])\ncells = xlsx_cells(\"file.xlsx\", sheets=\"Data\")\n\n# Include/exclude blank cells\ncells = xlsx_cells(\"file.xlsx\", include_blank_cells=False)\n```\n\n### xlsx_sheet_names()\nList all worksheet names:\n\n```python\nfrom tidyxl import xlsx_sheet_names\n\nsheets = xlsx_sheet_names(\"file.xlsx\")\n# Returns: ['Sheet1', 'Data', 'Summary']\n```\n\n### xlsx_names()\nExtract named ranges and formulas:\n\n```python\nfrom tidyxl import xlsx_names\n\nnames = xlsx_names(\"file.xlsx\")\nprint(names[['name', 'formula', 'sheet', 'is_range']])\n```\n\n### xlsx_validation()\nExtract data validation rules:\n\n```python\nfrom tidyxl import xlsx_validation\n\nvalidation = xlsx_validation(\"file.xlsx\")\nprint(validation[['sheet', 'ref', 'type', 'formula1']])\n```\n\n### xlsx_formats()\nExtract formatting information:\n\n```python\nfrom tidyxl import xlsx_formats\n\nformats = xlsx_formats(\"file.xlsx\")\n# Returns dict with keys: fonts, fills, borders, number_formats\n```\n\n## Output Structure\n\nThe `xlsx_cells()` function returns a pandas DataFrame with 23 columns matching the R tidyxl package:\n\n| Column | Type | Description |\n|--------|------|-------------|\n| `sheet` | str | Worksheet name |\n| `address` | str | Cell address (A1 notation) |\n| `row` | int | Row number |\n| `col` | int | Column number |\n| `is_blank` | bool | Whether cell has a value |\n| `content` | str | Raw cell value before type conversion |\n| `data_type` | str | Cell type (character, numeric, logical, date, error, blank) |\n| `error` | str | Error value if cell contains error |\n| `logical` | bool | Boolean value if cell contains TRUE/FALSE |\n| `numeric` | float | Numeric value if cell contains number |\n| `date` | datetime | Date value if cell contains date |\n| `character` | str | String value if cell contains text |\n| `formula` | str | Formula if cell contains formula |\n| `is_array` | bool | Whether formula is array formula |\n| `formula_ref` | str | Range address for array/shared formulas |\n| `formula_group` | int | Formula group identifier |\n| `comment` | str | Cell comment text |\n| `height` | float | Row height in Excel units |\n| `width` | float | Column width in Excel units |\n| `row_outline_level` | int | Row outline/grouping level |\n| `col_outline_level` | int | Column outline/grouping level |\n| `style_format` | str | Style format identifier |\n| `local_format_id` | int | Local formatting identifier |\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Acknowledgments\n\n- Inspired by the excellent R tidyxl package by Duncan Garmonsway\n- Built on top of openpyxl for Excel file processing\n- Uses pandas for data structure management\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Import Excel files into tidy format with cell-level data extraction",
    "version": "0.1.8",
    "project_urls": {
        "Bug Tracker": "https://github.com/wwd1015/tidyxl/issues",
        "Changelog": "https://github.com/wwd1015/tidyxl/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/wwd1015/tidyxl#readme",
        "Homepage": "https://github.com/wwd1015/tidyxl",
        "Source Code": "https://github.com/wwd1015/tidyxl"
    },
    "split_keywords": [
        "excel",
        " xlsx",
        " tidy",
        " data",
        " spreadsheet",
        " openpyxl",
        " pandas",
        " data-analysis",
        " data-science",
        " cell-extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c965d8ab97601cab79adaf8f6badf98e01b06e4ba2bf82d9b157320f02636082",
                "md5": "ed85aebb1950e92438b26fedb7816ac2",
                "sha256": "108290167e0e6f431e49b9a9ac7bc6125cf31b282f00e6a6042f1ffc3771973b"
            },
            "downloads": -1,
            "filename": "tidyxl-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed85aebb1950e92438b26fedb7816ac2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 17322,
            "upload_time": "2025-08-04T14:32:50",
            "upload_time_iso_8601": "2025-08-04T14:32:50.775812Z",
            "url": "https://files.pythonhosted.org/packages/c9/65/d8ab97601cab79adaf8f6badf98e01b06e4ba2bf82d9b157320f02636082/tidyxl-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e2bcd590aeb3c9c4e8bb27c9f415d2e3d64864f62ff62d4a043aca8260dfcafd",
                "md5": "fcf65d5ee158134b8f85fb725982e4da",
                "sha256": "a6e5e1d70a8b111c0b5593add6b4c65f8d4058577bde7e1a3b9fb867f9b0e782"
            },
            "downloads": -1,
            "filename": "tidyxl-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "fcf65d5ee158134b8f85fb725982e4da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 27822,
            "upload_time": "2025-08-04T14:32:51",
            "upload_time_iso_8601": "2025-08-04T14:32:51.916417Z",
            "url": "https://files.pythonhosted.org/packages/e2/bc/d590aeb3c9c4e8bb27c9f415d2e3d64864f62ff62d4a043aca8260dfcafd/tidyxl-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 14:32:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wwd1015",
    "github_project": "tidyxl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tidyxl"
}

None