Name | fastxlsx JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | A high-performance Excel XLSX reader/writer for Python built with Rust. |
upload_time | 2025-02-04 05:13:09 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.13,>=3.8 |
license | MIT |
keywords |
excel
xlsx
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# FastXLSX
[](https://pypi.org/project/fastxlsx/)
[](https://github.com/shuangluoxss/fastxlsx/blob/main/LICENSE)
**A lightweight, high-performance Python library for blazing-fast XLSX I/O operations.**
Powered by Rust's [Calamine](https://github.com/tafia/calamine) (reading) and [Rust-XlsxWriter](https://github.com/jmcnamara/rust_xlsxwriter) (writing), with seamless Python integration via [PyO3](https://github.com/PyO3/pyo3).
## ✨ Key Features
### ✅ Supported Capabilities
- **Data Types**: Native support for `bool`, `int`, `float`, `date`, `datetime`, and `str`.
- **Data Operations**: Scalars, rows, columns, matrices, and batch processing.
- **Coordinate Systems**: Dual support for **A1** (e.g., `B2`) and **R1C1** (e.g., `(2, 3)`) notation.
- **Parallel Processing**: Multi-threaded read/write operations for massive datasets.
- **Type Safety**: Full type hints and IDE-friendly documentation.
- **Blasting Performance**: 5-10x faster compared to `openpyxl`.
### 🚫 Current Limitations
- **File Formats**: Only XLSX (no XLS/XLSB support).
- **Formulas & Styling**: Cell formulas, merged cells, and formatting not supported.
- **Modifications**: Append/update operations on existing files unavailable.
- **Advanced Features**: Charts, images, and other advanced features not supported.
## 🏆 Performance Benchmarks
Tested on AMD Ryzen 7 5600X @ 3.7GHz (Ubuntu 24.04 VM) using `pytest-benchmark`.
Full details could be obtained from [benchmarks](./benchmarks).
### 📝 Writing Performance (Lower is Better)
| library | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |
| :------------------- | :-------------- | :----------------- | :--------------- |
| **fastxlsx** | 0.97(1.00x) | 62.06(1.00x) | 7.77(1.00x) |
| pyexcelerate | 2.65(2.73x) | 256.89(4.14x) | 50.33(6.48x) |
| xlsxwriter | 5.03(5.19x) | 297.14(4.79x) | 61.25(7.89x) |
| openpyxl(write_only) | 5.91(6.09x) | 422.22(6.80x) | 83.89(10.80x) |
| openpyxl | 6.25(6.44x) | 737.30(11.88x) | 83.65(10.77x) |
### 📖 Reading Performance (Lower is Better)
| library | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |
| :-------------- | :-------------- | :----------------- | :--------------- |
| **fastxlsx** | 0.24(1.00x) | 24.22(1.00x) | 3.14(1.00x) |
| pycalamine | 0.32(1.30x) | 33.51(1.38x) | 28.25(8.99x) |
| openpyxl | 3.93(16.07x) | 330.63(13.65x) | 62.71(19.96x) |
⚠️ **Windows Users Note**: Batch operations use `multiprocessing.Pool`, which may underperform due to `spawn` method limitations.
## 🛠️ Installation
### PyPI Install
```bash
pip install fastxlsx
```
### Source Build (Requires Rust Toolchain)
```bash
git clone https://github.com/shuangluoxss/fastxlsx.git
cd fastxlsx
pip install .
```
## 🚀 Quick Start Guide
### Writing
```python
import datetime
import numpy as np
from fastxlsx import DType, WriteOnlyWorkbook, WriteOnlyWorksheet, write_many
# Initialize workbook
wb = WriteOnlyWorkbook()
ws = wb.create_sheet("sheet1")
ws.write_cell((0, 0), "Hello World!")
ws.write_cell((1, 0), True, dtype=DType.Bool)
ws.write_cell("B1", datetime.datetime.now(), dtype=DType.DateTime)
ws.write_row((4, 2), ["var_a", "var_b", "var_c"], dtype=DType.Str)
ws.write_column((4, 0), [2.5, "xyz", datetime.date.today()], dtype=DType.Any)
# If `dtype` is one of [DType.Bool, DType.Int, DType.Float], must pass a numpy array
ws.write_matrix((5, 2), np.random.random((3, 3)), dtype=DType.Float)
# Save to file
wb.save("./example.xlsx")
# Write multiple files in parallel
workbooks_to_write = {}
for i_workbook in range(10):
ws_list = []
for i_sheet in range(6):
ws = WriteOnlyWorksheet(f"Sheet{i_sheet}")
ws.write_cell("A1", 10 * i_workbook + i_sheet, dtype=DType.Int)
ws.write_matrix((1, 1), np.random.random((3, 3)), dtype=DType.Float)
ws_list.append(ws)
workbooks_to_write[f"example_{i_workbook:02d}.xlsx"] = ws_list
write_many(workbooks_to_write)
```
### Reading
```python
from fastxlsx import DShape, DType, RangeInfo, ReadOnlyWorkbook, read_many
# Load xlsx file
wb = ReadOnlyWorkbook("./example.xlsx")
# List all sheet names
wb.sheetnames
# Get a worksheet by index or name
ws = wb.get_by_idx(0)
# Read a single cell, notice the index is 0-based
print(ws.cell_value((0, 0)))
print(ws.cell_value("B1", dtype=DType.DateTime))
# Read a column with `read_value` and `RangeInfo`
print(ws.read_value(RangeInfo((4, 0), DShape.Column(3), dtype=DType.Any)))
print(
ws.read_values(
{
"var_a": RangeInfo((5, 2), DShape.Column(3), dtype=DType.Float),
"matrix": RangeInfo((5, 2), DShape.Matrix(3, 3), dtype=DType.Float),
}
)
)
# Read multiple sheets
print(wb.read_worksheets({"sheet1": [RangeInfo((2, 2), DShape.Scalar())]}))
# Read multiple files in parallel
print(
read_many(
{
f"./example_{i_workbook:02d}.xlsx": {
f"Sheet{i_sheet}": [
RangeInfo((0, 0), DShape.Scalar()),
RangeInfo((1, 1), DShape.Matrix(3, 3)),
]
for i_sheet in range(6)
}
for i_workbook in range(10)
}
)
)
```
_For full details, see [docs](./docs)._
## 📖 Motivation
As is well known, Excel is not a good format for performance, but due to its widely used nature, sometimes we have to handle massive XLSX datasets. When I do some postprocessing work in Python, a lot of time is wasted on reading and writing; and when I tried to speed it up by parallelization, the spawn feature in Windows disturb me again. Therefore, I decided to develop a xlsx read-write library with Rust+PyO3 to solve that.
Thanks to the high performance of `calamine` and `rust_xlsxwriter`, as well as the great work of `PyO3` and `maturin`, it is possible to do that by just binding them together with Python. Also thanks to the help of Deepseek enable me, a Rust beginner, could finish that.
## 📌 Future Plans
- Add support for formula and cell formatting
`rust_xlsxwriter` supports formula and cell formatting well so that is not too hard to implent them into `fastxlsx`. But personally, when I export a large amount of data, format is usually not important, so the priority of this item is not high.
- Improve error handling
Raw data
{
"_id": null,
"home_page": null,
"name": "fastxlsx",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.8",
"maintainer_email": "shuangluoxss <shuangluoxss@qq.com>",
"keywords": "excel, xlsx",
"author": null,
"author_email": "shuangluoxss <shuangluoxss@qq.com>",
"download_url": "https://files.pythonhosted.org/packages/3d/4d/c0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c/fastxlsx-0.2.0.tar.gz",
"platform": null,
"description": "# FastXLSX\r\n\r\n[](https://pypi.org/project/fastxlsx/)\r\n[](https://github.com/shuangluoxss/fastxlsx/blob/main/LICENSE)\r\n\r\n**A lightweight, high-performance Python library for blazing-fast XLSX I/O operations.** \r\nPowered by Rust's [Calamine](https://github.com/tafia/calamine) (reading) and [Rust-XlsxWriter](https://github.com/jmcnamara/rust_xlsxwriter) (writing), with seamless Python integration via [PyO3](https://github.com/PyO3/pyo3).\r\n\r\n## \u2728 Key Features\r\n\r\n### \u2705 Supported Capabilities\r\n\r\n- **Data Types**: Native support for `bool`, `int`, `float`, `date`, `datetime`, and `str`.\r\n- **Data Operations**: Scalars, rows, columns, matrices, and batch processing.\r\n- **Coordinate Systems**: Dual support for **A1** (e.g., `B2`) and **R1C1** (e.g., `(2, 3)`) notation.\r\n- **Parallel Processing**: Multi-threaded read/write operations for massive datasets.\r\n- **Type Safety**: Full type hints and IDE-friendly documentation.\r\n- **Blasting Performance**: 5-10x faster compared to `openpyxl`.\r\n\r\n### \ud83d\udeab Current Limitations\r\n\r\n- **File Formats**: Only XLSX (no XLS/XLSB support).\r\n- **Formulas & Styling**: Cell formulas, merged cells, and formatting not supported.\r\n- **Modifications**: Append/update operations on existing files unavailable.\r\n- **Advanced Features**: Charts, images, and other advanced features not supported.\r\n\r\n## \ud83c\udfc6 Performance Benchmarks\r\n\r\nTested on AMD Ryzen 7 5600X @ 3.7GHz (Ubuntu 24.04 VM) using `pytest-benchmark`. \r\nFull details could be obtained from [benchmarks](./benchmarks).\r\n\r\n### \ud83d\udcdd Writing Performance (Lower is Better)\r\n\r\n| library | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |\r\n| :------------------- | :-------------- | :----------------- | :--------------- |\r\n| **fastxlsx** | 0.97(1.00x) | 62.06(1.00x) | 7.77(1.00x) |\r\n| pyexcelerate | 2.65(2.73x) | 256.89(4.14x) | 50.33(6.48x) |\r\n| xlsxwriter | 5.03(5.19x) | 297.14(4.79x) | 61.25(7.89x) |\r\n| openpyxl(write_only) | 5.91(6.09x) | 422.22(6.80x) | 83.89(10.80x) |\r\n| openpyxl | 6.25(6.44x) | 737.30(11.88x) | 83.65(10.77x) |\r\n\r\n### \ud83d\udcd6 Reading Performance (Lower is Better)\r\n\r\n| library | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |\r\n| :-------------- | :-------------- | :----------------- | :--------------- |\r\n| **fastxlsx** | 0.24(1.00x) | 24.22(1.00x) | 3.14(1.00x) |\r\n| pycalamine | 0.32(1.30x) | 33.51(1.38x) | 28.25(8.99x) |\r\n| openpyxl | 3.93(16.07x) | 330.63(13.65x) | 62.71(19.96x) |\r\n\r\n\u26a0\ufe0f **Windows Users Note**: Batch operations use `multiprocessing.Pool`, which may underperform due to `spawn` method limitations.\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### PyPI Install\r\n\r\n```bash\r\npip install fastxlsx\r\n```\r\n\r\n### Source Build (Requires Rust Toolchain)\r\n\r\n```bash\r\ngit clone https://github.com/shuangluoxss/fastxlsx.git\r\ncd fastxlsx\r\npip install .\r\n```\r\n\r\n## \ud83d\ude80 Quick Start Guide\r\n\r\n### Writing\r\n\r\n```python\r\nimport datetime\r\nimport numpy as np\r\nfrom fastxlsx import DType, WriteOnlyWorkbook, WriteOnlyWorksheet, write_many\r\n\r\n# Initialize workbook\r\nwb = WriteOnlyWorkbook()\r\nws = wb.create_sheet(\"sheet1\")\r\n\r\nws.write_cell((0, 0), \"Hello World!\")\r\nws.write_cell((1, 0), True, dtype=DType.Bool)\r\nws.write_cell(\"B1\", datetime.datetime.now(), dtype=DType.DateTime)\r\nws.write_row((4, 2), [\"var_a\", \"var_b\", \"var_c\"], dtype=DType.Str)\r\nws.write_column((4, 0), [2.5, \"xyz\", datetime.date.today()], dtype=DType.Any)\r\n# If `dtype` is one of [DType.Bool, DType.Int, DType.Float], must pass a numpy array\r\nws.write_matrix((5, 2), np.random.random((3, 3)), dtype=DType.Float)\r\n\r\n# Save to file\r\nwb.save(\"./example.xlsx\")\r\n\r\n# Write multiple files in parallel\r\nworkbooks_to_write = {}\r\nfor i_workbook in range(10):\r\n ws_list = []\r\n for i_sheet in range(6):\r\n ws = WriteOnlyWorksheet(f\"Sheet{i_sheet}\")\r\n ws.write_cell(\"A1\", 10 * i_workbook + i_sheet, dtype=DType.Int)\r\n ws.write_matrix((1, 1), np.random.random((3, 3)), dtype=DType.Float)\r\n ws_list.append(ws)\r\n workbooks_to_write[f\"example_{i_workbook:02d}.xlsx\"] = ws_list\r\nwrite_many(workbooks_to_write)\r\n```\r\n\r\n### Reading\r\n\r\n```python\r\nfrom fastxlsx import DShape, DType, RangeInfo, ReadOnlyWorkbook, read_many\r\n\r\n# Load xlsx file\r\nwb = ReadOnlyWorkbook(\"./example.xlsx\")\r\n# List all sheet names\r\nwb.sheetnames\r\n# Get a worksheet by index or name\r\nws = wb.get_by_idx(0)\r\n# Read a single cell, notice the index is 0-based\r\nprint(ws.cell_value((0, 0)))\r\nprint(ws.cell_value(\"B1\", dtype=DType.DateTime))\r\n# Read a column with `read_value` and `RangeInfo`\r\nprint(ws.read_value(RangeInfo((4, 0), DShape.Column(3), dtype=DType.Any)))\r\nprint(\r\n ws.read_values(\r\n {\r\n \"var_a\": RangeInfo((5, 2), DShape.Column(3), dtype=DType.Float),\r\n \"matrix\": RangeInfo((5, 2), DShape.Matrix(3, 3), dtype=DType.Float),\r\n }\r\n )\r\n)\r\n\r\n# Read multiple sheets\r\nprint(wb.read_worksheets({\"sheet1\": [RangeInfo((2, 2), DShape.Scalar())]}))\r\n# Read multiple files in parallel\r\nprint(\r\n read_many(\r\n {\r\n f\"./example_{i_workbook:02d}.xlsx\": {\r\n f\"Sheet{i_sheet}\": [\r\n RangeInfo((0, 0), DShape.Scalar()),\r\n RangeInfo((1, 1), DShape.Matrix(3, 3)),\r\n ]\r\n for i_sheet in range(6)\r\n }\r\n for i_workbook in range(10)\r\n }\r\n )\r\n)\r\n```\r\n\r\n_For full details, see [docs](./docs)._\r\n\r\n## \ud83d\udcd6 Motivation\r\n\r\nAs is well known, Excel is not a good format for performance, but due to its widely used nature, sometimes we have to handle massive XLSX datasets. When I do some postprocessing work in Python, a lot of time is wasted on reading and writing; and when I tried to speed it up by parallelization, the spawn feature in Windows disturb me again. Therefore, I decided to develop a xlsx read-write library with Rust+PyO3 to solve that.\r\n\r\nThanks to the high performance of `calamine` and `rust_xlsxwriter`, as well as the great work of `PyO3` and `maturin`, it is possible to do that by just binding them together with Python. Also thanks to the help of Deepseek enable me, a Rust beginner, could finish that.\r\n\r\n## \ud83d\udccc Future Plans\r\n\r\n- Add support for formula and cell formatting \r\n `rust_xlsxwriter` supports formula and cell formatting well so that is not too hard to implent them into `fastxlsx`. But personally, when I export a large amount of data, format is usually not important, so the priority of this item is not high.\r\n- Improve error handling\r\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A high-performance Excel XLSX reader/writer for Python built with Rust.",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [
"excel",
" xlsx"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "90c42b738f9be01d24d9803c46fb0ec2e1c771012d7afcad3714e25e446bbbc7",
"md5": "f6393553a517fa4459211eb9ef9643cf",
"sha256": "7117ad5223402d499a03d45ad074278bccdff70beb06b33d3bb9a86654a45f72"
},
"downloads": -1,
"filename": "fastxlsx-0.2.0-cp310-cp310-win_amd64.whl",
"has_sig": false,
"md5_digest": "f6393553a517fa4459211eb9ef9643cf",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": "<3.13,>=3.8",
"size": 1088704,
"upload_time": "2025-02-04T05:13:05",
"upload_time_iso_8601": "2025-02-04T05:13:05.999995Z",
"url": "https://files.pythonhosted.org/packages/90/c4/2b738f9be01d24d9803c46fb0ec2e1c771012d7afcad3714e25e446bbbc7/fastxlsx-0.2.0-cp310-cp310-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3d4dc0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c",
"md5": "84fbaefad89ef43af0c579a43e35db53",
"sha256": "06dad2f7cdeceec07da22e44833e4b2de5f48af669c30504e357cf5aab90e2ce"
},
"downloads": -1,
"filename": "fastxlsx-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "84fbaefad89ef43af0c579a43e35db53",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.8",
"size": 41661,
"upload_time": "2025-02-04T05:13:09",
"upload_time_iso_8601": "2025-02-04T05:13:09.035843Z",
"url": "https://files.pythonhosted.org/packages/3d/4d/c0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c/fastxlsx-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-04 05:13:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fastxlsx"
}