# tidyxl
A Python package that imports Excel files (.xlsx, .xlsm) into a tidy format where each cell is represented as a single row with detailed metadata. This package is inspired by and closely mirrors the functionality of the R tidyxl package.
## Installation
```bash
pip install tidyxl
```
## Quick Start
```python
from tidyxl import xlsx_cells, xlsx_sheet_names
# Get all sheet names
sheets = xlsx_sheet_names("data.xlsx")
print(f"Found sheets: {sheets}")
# Read all cells from all sheets
cells = xlsx_cells("data.xlsx")
print(f"Total cells: {len(cells)}")
# Read specific sheet only
employees = xlsx_cells("data.xlsx", sheets="Employees")
# Filter for cells with content
content_cells = cells[~cells['is_blank']]
print(content_cells[['sheet', 'address', 'data_type', 'character', 'numeric']].head())
```
## Features
- **Tidy Format**: Each Excel cell becomes one row with comprehensive metadata
- **Complete Cell Information**: Extract values, formulas, formatting, comments, and more
- **Multiple Worksheets**: Process all sheets or specify particular ones
- **Named Ranges**: Extract and analyze Excel named ranges and formulas
- **Data Validation**: Discover data validation rules applied to cells
- **Type Safety**: Separate columns for different data types (numeric, character, logical, date, error)
- **R tidyxl Compatible**: Identical API and output structure to the R package
## Core Functions
### xlsx_cells()
Extract all cell data in tidy format:
```python
from tidyxl import xlsx_cells
# Read all sheets
cells = xlsx_cells("file.xlsx")
# Read specific sheets
cells = xlsx_cells("file.xlsx", sheets=["Sheet1", "Sheet2"])
cells = xlsx_cells("file.xlsx", sheets="Data")
# Include/exclude blank cells
cells = xlsx_cells("file.xlsx", include_blank_cells=False)
```
### xlsx_sheet_names()
List all worksheet names:
```python
from tidyxl import xlsx_sheet_names
sheets = xlsx_sheet_names("file.xlsx")
# Returns: ['Sheet1', 'Data', 'Summary']
```
### xlsx_names()
Extract named ranges and formulas:
```python
from tidyxl import xlsx_names
names = xlsx_names("file.xlsx")
print(names[['name', 'formula', 'sheet', 'is_range']])
```
### xlsx_validation()
Extract data validation rules:
```python
from tidyxl import xlsx_validation
validation = xlsx_validation("file.xlsx")
print(validation[['sheet', 'ref', 'type', 'formula1']])
```
### xlsx_formats()
Extract formatting information:
```python
from tidyxl import xlsx_formats
formats = xlsx_formats("file.xlsx")
# Returns dict with keys: fonts, fills, borders, number_formats
```
## Output Structure
The `xlsx_cells()` function returns a pandas DataFrame with 23 columns matching the R tidyxl package:
| Column | Type | Description |
|--------|------|-------------|
| `sheet` | str | Worksheet name |
| `address` | str | Cell address (A1 notation) |
| `row` | int | Row number |
| `col` | int | Column number |
| `is_blank` | bool | Whether cell has a value |
| `content` | str | Raw cell value before type conversion |
| `data_type` | str | Cell type (character, numeric, logical, date, error, blank) |
| `error` | str | Error value if cell contains error |
| `logical` | bool | Boolean value if cell contains TRUE/FALSE |
| `numeric` | float | Numeric value if cell contains number |
| `date` | datetime | Date value if cell contains date |
| `character` | str | String value if cell contains text |
| `formula` | str | Formula if cell contains formula |
| `is_array` | bool | Whether formula is array formula |
| `formula_ref` | str | Range address for array/shared formulas |
| `formula_group` | int | Formula group identifier |
| `comment` | str | Cell comment text |
| `height` | float | Row height in Excel units |
| `width` | float | Column width in Excel units |
| `row_outline_level` | int | Row outline/grouping level |
| `col_outline_level` | int | Column outline/grouping level |
| `style_format` | str | Style format identifier |
| `local_format_id` | int | Local formatting identifier |
## License
This project is licensed under the MIT License.
## Acknowledgments
- Inspired by the excellent R tidyxl package by Duncan Garmonsway
- Built on top of openpyxl for Excel file processing
- Uses pandas for data structure management
Raw data
{
"_id": null,
"home_page": null,
"name": "tidyxl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": "Wendi Wang <wwd1015@gmail.com>",
"keywords": "excel, xlsx, tidy, data, spreadsheet, openpyxl, pandas, data-analysis, data-science, cell-extraction",
"author": null,
"author_email": "Wendi Wang <wwd1015@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ef/ea/bbeea189c96256d6c89ade9cd0c927155e33141ffa07fb8058fe2e2477df/tidyxl-0.1.7.tar.gz",
"platform": null,
"description": "# tidyxl\n\nA Python package that imports Excel files (.xlsx, .xlsm) into a tidy format where each cell is represented as a single row with detailed metadata. This package is inspired by and closely mirrors the functionality of the R tidyxl package.\n\n## Installation\n\n```bash\npip install tidyxl\n```\n\n## Quick Start\n\n```python\nfrom tidyxl import xlsx_cells, xlsx_sheet_names\n\n# Get all sheet names\nsheets = xlsx_sheet_names(\"data.xlsx\")\nprint(f\"Found sheets: {sheets}\")\n\n# Read all cells from all sheets\ncells = xlsx_cells(\"data.xlsx\")\nprint(f\"Total cells: {len(cells)}\")\n\n# Read specific sheet only\nemployees = xlsx_cells(\"data.xlsx\", sheets=\"Employees\")\n\n# Filter for cells with content\ncontent_cells = cells[~cells['is_blank']]\nprint(content_cells[['sheet', 'address', 'data_type', 'character', 'numeric']].head())\n```\n\n## Features\n\n- **Tidy Format**: Each Excel cell becomes one row with comprehensive metadata\n- **Complete Cell Information**: Extract values, formulas, formatting, comments, and more\n- **Multiple Worksheets**: Process all sheets or specify particular ones\n- **Named Ranges**: Extract and analyze Excel named ranges and formulas\n- **Data Validation**: Discover data validation rules applied to cells\n- **Type Safety**: Separate columns for different data types (numeric, character, logical, date, error)\n- **R tidyxl Compatible**: Identical API and output structure to the R package\n\n## Core Functions\n\n### xlsx_cells()\nExtract all cell data in tidy format:\n\n```python\nfrom tidyxl import xlsx_cells\n\n# Read all sheets\ncells = xlsx_cells(\"file.xlsx\")\n\n# Read specific sheets\ncells = xlsx_cells(\"file.xlsx\", sheets=[\"Sheet1\", \"Sheet2\"])\ncells = xlsx_cells(\"file.xlsx\", sheets=\"Data\")\n\n# Include/exclude blank cells\ncells = xlsx_cells(\"file.xlsx\", include_blank_cells=False)\n```\n\n### xlsx_sheet_names()\nList all worksheet names:\n\n```python\nfrom tidyxl import xlsx_sheet_names\n\nsheets = xlsx_sheet_names(\"file.xlsx\")\n# Returns: ['Sheet1', 'Data', 'Summary']\n```\n\n### xlsx_names()\nExtract named ranges and formulas:\n\n```python\nfrom tidyxl import xlsx_names\n\nnames = xlsx_names(\"file.xlsx\")\nprint(names[['name', 'formula', 'sheet', 'is_range']])\n```\n\n### xlsx_validation()\nExtract data validation rules:\n\n```python\nfrom tidyxl import xlsx_validation\n\nvalidation = xlsx_validation(\"file.xlsx\")\nprint(validation[['sheet', 'ref', 'type', 'formula1']])\n```\n\n### xlsx_formats()\nExtract formatting information:\n\n```python\nfrom tidyxl import xlsx_formats\n\nformats = xlsx_formats(\"file.xlsx\")\n# Returns dict with keys: fonts, fills, borders, number_formats\n```\n\n## Output Structure\n\nThe `xlsx_cells()` function returns a pandas DataFrame with 23 columns matching the R tidyxl package:\n\n| Column | Type | Description |\n|--------|------|-------------|\n| `sheet` | str | Worksheet name |\n| `address` | str | Cell address (A1 notation) |\n| `row` | int | Row number |\n| `col` | int | Column number |\n| `is_blank` | bool | Whether cell has a value |\n| `content` | str | Raw cell value before type conversion |\n| `data_type` | str | Cell type (character, numeric, logical, date, error, blank) |\n| `error` | str | Error value if cell contains error |\n| `logical` | bool | Boolean value if cell contains TRUE/FALSE |\n| `numeric` | float | Numeric value if cell contains number |\n| `date` | datetime | Date value if cell contains date |\n| `character` | str | String value if cell contains text |\n| `formula` | str | Formula if cell contains formula |\n| `is_array` | bool | Whether formula is array formula |\n| `formula_ref` | str | Range address for array/shared formulas |\n| `formula_group` | int | Formula group identifier |\n| `comment` | str | Cell comment text |\n| `height` | float | Row height in Excel units |\n| `width` | float | Column width in Excel units |\n| `row_outline_level` | int | Row outline/grouping level |\n| `col_outline_level` | int | Column outline/grouping level |\n| `style_format` | str | Style format identifier |\n| `local_format_id` | int | Local formatting identifier |\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Acknowledgments\n\n- Inspired by the excellent R tidyxl package by Duncan Garmonsway\n- Built on top of openpyxl for Excel file processing\n- Uses pandas for data structure management\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Import Excel files into tidy format with cell-level data extraction",
"version": "0.1.7",
"project_urls": {
"Bug Tracker": "https://github.com/yourusername/tidyxl/issues",
"Changelog": "https://github.com/yourusername/tidyxl/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/yourusername/tidyxl#readme",
"Homepage": "https://github.com/yourusername/tidyxl",
"Source Code": "https://github.com/yourusername/tidyxl"
},
"split_keywords": [
"excel",
" xlsx",
" tidy",
" data",
" spreadsheet",
" openpyxl",
" pandas",
" data-analysis",
" data-science",
" cell-extraction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e3799310b77d563ae1ab3bfb9f57fe88e432461f0934f9f465746ff6ccff3ad3",
"md5": "2d513be5bec9790b2fd9e68795c08cbb",
"sha256": "2f4f218e589cb7a094c46348dc08a52c4612d751598cfefb0608146762fa1ed8"
},
"downloads": -1,
"filename": "tidyxl-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2d513be5bec9790b2fd9e68795c08cbb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 11985,
"upload_time": "2025-07-31T01:51:23",
"upload_time_iso_8601": "2025-07-31T01:51:23.048343Z",
"url": "https://files.pythonhosted.org/packages/e3/79/9310b77d563ae1ab3bfb9f57fe88e432461f0934f9f465746ff6ccff3ad3/tidyxl-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "efeabbeea189c96256d6c89ade9cd0c927155e33141ffa07fb8058fe2e2477df",
"md5": "64c4ea896a7f93b38a7c9e80ff5b1cf7",
"sha256": "3dc7035b778e74fc5084e6349bbfdb69b9d52db0f0e07b19ff9a4bf898b6753d"
},
"downloads": -1,
"filename": "tidyxl-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "64c4ea896a7f93b38a7c9e80ff5b1cf7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 21547,
"upload_time": "2025-07-31T01:51:24",
"upload_time_iso_8601": "2025-07-31T01:51:24.037137Z",
"url": "https://files.pythonhosted.org/packages/ef/ea/bbeea189c96256d6c89ade9cd0c927155e33141ffa07fb8058fe2e2477df/tidyxl-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-31 01:51:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "tidyxl",
"github_not_found": true,
"lcname": "tidyxl"
}