fastxlsx


Namefastxlsx JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryA high-performance Excel XLSX reader/writer for Python built with Rust.
upload_time2025-02-04 05:13:09
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.8
licenseMIT
keywords excel xlsx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FastXLSX

[![PyPI](https://img.shields.io/pypi/v/fastxlsx)](https://pypi.org/project/fastxlsx/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/shuangluoxss/fastxlsx/blob/main/LICENSE)

**A lightweight, high-performance Python library for blazing-fast XLSX I/O operations.**  
Powered by Rust's [Calamine](https://github.com/tafia/calamine) (reading) and [Rust-XlsxWriter](https://github.com/jmcnamara/rust_xlsxwriter) (writing), with seamless Python integration via [PyO3](https://github.com/PyO3/pyo3).

## ✨ Key Features

### ✅ Supported Capabilities

- **Data Types**: Native support for `bool`, `int`, `float`, `date`, `datetime`, and `str`.
- **Data Operations**: Scalars, rows, columns, matrices, and batch processing.
- **Coordinate Systems**: Dual support for **A1** (e.g., `B2`) and **R1C1** (e.g., `(2, 3)`) notation.
- **Parallel Processing**: Multi-threaded read/write operations for massive datasets.
- **Type Safety**: Full type hints and IDE-friendly documentation.
- **Blasting Performance**: 5-10x faster compared to `openpyxl`.

### 🚫 Current Limitations

- **File Formats**: Only XLSX (no XLS/XLSB support).
- **Formulas & Styling**: Cell formulas, merged cells, and formatting not supported.
- **Modifications**: Append/update operations on existing files unavailable.
- **Advanced Features**: Charts, images, and other advanced features not supported.

## 🏆 Performance Benchmarks

Tested on AMD Ryzen 7 5600X @ 3.7GHz (Ubuntu 24.04 VM) using `pytest-benchmark`.  
Full details could be obtained from [benchmarks](./benchmarks).

### 📝 Writing Performance (Lower is Better)

| library              | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |
| :------------------- | :-------------- | :----------------- | :--------------- |
| **fastxlsx**         | 0.97(1.00x)     | 62.06(1.00x)       | 7.77(1.00x)      |
| pyexcelerate         | 2.65(2.73x)     | 256.89(4.14x)      | 50.33(6.48x)     |
| xlsxwriter           | 5.03(5.19x)     | 297.14(4.79x)      | 61.25(7.89x)     |
| openpyxl(write_only) | 5.91(6.09x)     | 422.22(6.80x)      | 83.89(10.80x)    |
| openpyxl             | 6.25(6.44x)     | 737.30(11.88x)     | 83.65(10.77x)    |

### 📖 Reading Performance (Lower is Better)

| library         | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |
| :-------------- | :-------------- | :----------------- | :--------------- |
| **fastxlsx**    | 0.24(1.00x)     | 24.22(1.00x)       | 3.14(1.00x)      |
| pycalamine      | 0.32(1.30x)     | 33.51(1.38x)       | 28.25(8.99x)     |
| openpyxl        | 3.93(16.07x)    | 330.63(13.65x)     | 62.71(19.96x)    |

⚠️ **Windows Users Note**: Batch operations use `multiprocessing.Pool`, which may underperform due to `spawn` method limitations.

## 🛠️ Installation

### PyPI Install

```bash
pip install fastxlsx
```

### Source Build (Requires Rust Toolchain)

```bash
git clone https://github.com/shuangluoxss/fastxlsx.git
cd fastxlsx
pip install .
```

## 🚀 Quick Start Guide

### Writing

```python
import datetime
import numpy as np
from fastxlsx import DType, WriteOnlyWorkbook, WriteOnlyWorksheet, write_many

# Initialize workbook
wb = WriteOnlyWorkbook()
ws = wb.create_sheet("sheet1")

ws.write_cell((0, 0), "Hello World!")
ws.write_cell((1, 0), True, dtype=DType.Bool)
ws.write_cell("B1", datetime.datetime.now(), dtype=DType.DateTime)
ws.write_row((4, 2), ["var_a", "var_b", "var_c"], dtype=DType.Str)
ws.write_column((4, 0), [2.5, "xyz", datetime.date.today()], dtype=DType.Any)
# If `dtype` is one of [DType.Bool, DType.Int, DType.Float], must pass a numpy array
ws.write_matrix((5, 2), np.random.random((3, 3)), dtype=DType.Float)

# Save to file
wb.save("./example.xlsx")

# Write multiple files in parallel
workbooks_to_write = {}
for i_workbook in range(10):
    ws_list = []
    for i_sheet in range(6):
        ws = WriteOnlyWorksheet(f"Sheet{i_sheet}")
        ws.write_cell("A1", 10 * i_workbook + i_sheet, dtype=DType.Int)
        ws.write_matrix((1, 1), np.random.random((3, 3)), dtype=DType.Float)
        ws_list.append(ws)
    workbooks_to_write[f"example_{i_workbook:02d}.xlsx"] = ws_list
write_many(workbooks_to_write)
```

### Reading

```python
from fastxlsx import DShape, DType, RangeInfo, ReadOnlyWorkbook, read_many

# Load xlsx file
wb = ReadOnlyWorkbook("./example.xlsx")
# List all sheet names
wb.sheetnames
# Get a worksheet by index or name
ws = wb.get_by_idx(0)
# Read a single cell, notice the index is 0-based
print(ws.cell_value((0, 0)))
print(ws.cell_value("B1", dtype=DType.DateTime))
# Read a column with `read_value` and `RangeInfo`
print(ws.read_value(RangeInfo((4, 0), DShape.Column(3), dtype=DType.Any)))
print(
    ws.read_values(
        {
            "var_a": RangeInfo((5, 2), DShape.Column(3), dtype=DType.Float),
            "matrix": RangeInfo((5, 2), DShape.Matrix(3, 3), dtype=DType.Float),
        }
    )
)

# Read multiple sheets
print(wb.read_worksheets({"sheet1": [RangeInfo((2, 2), DShape.Scalar())]}))
# Read multiple files in parallel
print(
    read_many(
        {
            f"./example_{i_workbook:02d}.xlsx": {
                f"Sheet{i_sheet}": [
                    RangeInfo((0, 0), DShape.Scalar()),
                    RangeInfo((1, 1), DShape.Matrix(3, 3)),
                ]
                for i_sheet in range(6)
            }
            for i_workbook in range(10)
        }
    )
)
```

_For full details, see [docs](./docs)._

## 📖 Motivation

As is well known, Excel is not a good format for performance, but due to its widely used nature, sometimes we have to handle massive XLSX datasets. When I do some postprocessing work in Python, a lot of time is wasted on reading and writing; and when I tried to speed it up by parallelization, the spawn feature in Windows disturb me again. Therefore, I decided to develop a xlsx read-write library with Rust+PyO3 to solve that.

Thanks to the high performance of `calamine` and `rust_xlsxwriter`, as well as the great work of `PyO3` and `maturin`, it is possible to do that by just binding them together with Python. Also thanks to the help of Deepseek enable me, a Rust beginner, could finish that.

## 📌 Future Plans

- Add support for formula and cell formatting  
  `rust_xlsxwriter` supports formula and cell formatting well so that is not too hard to implent them into `fastxlsx`. But personally, when I export a large amount of data, format is usually not important, so the priority of this item is not high.
- Improve error handling


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fastxlsx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": "shuangluoxss <shuangluoxss@qq.com>",
    "keywords": "excel, xlsx",
    "author": null,
    "author_email": "shuangluoxss <shuangluoxss@qq.com>",
    "download_url": "https://files.pythonhosted.org/packages/3d/4d/c0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c/fastxlsx-0.2.0.tar.gz",
    "platform": null,
    "description": "# FastXLSX\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/fastxlsx)](https://pypi.org/project/fastxlsx/)\r\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/shuangluoxss/fastxlsx/blob/main/LICENSE)\r\n\r\n**A lightweight, high-performance Python library for blazing-fast XLSX I/O operations.**  \r\nPowered by Rust's [Calamine](https://github.com/tafia/calamine) (reading) and [Rust-XlsxWriter](https://github.com/jmcnamara/rust_xlsxwriter) (writing), with seamless Python integration via [PyO3](https://github.com/PyO3/pyo3).\r\n\r\n## \u2728 Key Features\r\n\r\n### \u2705 Supported Capabilities\r\n\r\n- **Data Types**: Native support for `bool`, `int`, `float`, `date`, `datetime`, and `str`.\r\n- **Data Operations**: Scalars, rows, columns, matrices, and batch processing.\r\n- **Coordinate Systems**: Dual support for **A1** (e.g., `B2`) and **R1C1** (e.g., `(2, 3)`) notation.\r\n- **Parallel Processing**: Multi-threaded read/write operations for massive datasets.\r\n- **Type Safety**: Full type hints and IDE-friendly documentation.\r\n- **Blasting Performance**: 5-10x faster compared to `openpyxl`.\r\n\r\n### \ud83d\udeab Current Limitations\r\n\r\n- **File Formats**: Only XLSX (no XLS/XLSB support).\r\n- **Formulas & Styling**: Cell formulas, merged cells, and formatting not supported.\r\n- **Modifications**: Append/update operations on existing files unavailable.\r\n- **Advanced Features**: Charts, images, and other advanced features not supported.\r\n\r\n## \ud83c\udfc6 Performance Benchmarks\r\n\r\nTested on AMD Ryzen 7 5600X @ 3.7GHz (Ubuntu 24.04 VM) using `pytest-benchmark`.  \r\nFull details could be obtained from [benchmarks](./benchmarks).\r\n\r\n### \ud83d\udcdd Writing Performance (Lower is Better)\r\n\r\n| library              | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |\r\n| :------------------- | :-------------- | :----------------- | :--------------- |\r\n| **fastxlsx**         | 0.97(1.00x)     | 62.06(1.00x)       | 7.77(1.00x)      |\r\n| pyexcelerate         | 2.65(2.73x)     | 256.89(4.14x)      | 50.33(6.48x)     |\r\n| xlsxwriter           | 5.03(5.19x)     | 297.14(4.79x)      | 61.25(7.89x)     |\r\n| openpyxl(write_only) | 5.91(6.09x)     | 422.22(6.80x)      | 83.89(10.80x)    |\r\n| openpyxl             | 6.25(6.44x)     | 737.30(11.88x)     | 83.65(10.77x)    |\r\n\r\n### \ud83d\udcd6 Reading Performance (Lower is Better)\r\n\r\n| library         | Mixed Data (ms) | 5000x10 Matrix(ms) | Batch Write (ms) |\r\n| :-------------- | :-------------- | :----------------- | :--------------- |\r\n| **fastxlsx**    | 0.24(1.00x)     | 24.22(1.00x)       | 3.14(1.00x)      |\r\n| pycalamine      | 0.32(1.30x)     | 33.51(1.38x)       | 28.25(8.99x)     |\r\n| openpyxl        | 3.93(16.07x)    | 330.63(13.65x)     | 62.71(19.96x)    |\r\n\r\n\u26a0\ufe0f **Windows Users Note**: Batch operations use `multiprocessing.Pool`, which may underperform due to `spawn` method limitations.\r\n\r\n## \ud83d\udee0\ufe0f Installation\r\n\r\n### PyPI Install\r\n\r\n```bash\r\npip install fastxlsx\r\n```\r\n\r\n### Source Build (Requires Rust Toolchain)\r\n\r\n```bash\r\ngit clone https://github.com/shuangluoxss/fastxlsx.git\r\ncd fastxlsx\r\npip install .\r\n```\r\n\r\n## \ud83d\ude80 Quick Start Guide\r\n\r\n### Writing\r\n\r\n```python\r\nimport datetime\r\nimport numpy as np\r\nfrom fastxlsx import DType, WriteOnlyWorkbook, WriteOnlyWorksheet, write_many\r\n\r\n# Initialize workbook\r\nwb = WriteOnlyWorkbook()\r\nws = wb.create_sheet(\"sheet1\")\r\n\r\nws.write_cell((0, 0), \"Hello World!\")\r\nws.write_cell((1, 0), True, dtype=DType.Bool)\r\nws.write_cell(\"B1\", datetime.datetime.now(), dtype=DType.DateTime)\r\nws.write_row((4, 2), [\"var_a\", \"var_b\", \"var_c\"], dtype=DType.Str)\r\nws.write_column((4, 0), [2.5, \"xyz\", datetime.date.today()], dtype=DType.Any)\r\n# If `dtype` is one of [DType.Bool, DType.Int, DType.Float], must pass a numpy array\r\nws.write_matrix((5, 2), np.random.random((3, 3)), dtype=DType.Float)\r\n\r\n# Save to file\r\nwb.save(\"./example.xlsx\")\r\n\r\n# Write multiple files in parallel\r\nworkbooks_to_write = {}\r\nfor i_workbook in range(10):\r\n    ws_list = []\r\n    for i_sheet in range(6):\r\n        ws = WriteOnlyWorksheet(f\"Sheet{i_sheet}\")\r\n        ws.write_cell(\"A1\", 10 * i_workbook + i_sheet, dtype=DType.Int)\r\n        ws.write_matrix((1, 1), np.random.random((3, 3)), dtype=DType.Float)\r\n        ws_list.append(ws)\r\n    workbooks_to_write[f\"example_{i_workbook:02d}.xlsx\"] = ws_list\r\nwrite_many(workbooks_to_write)\r\n```\r\n\r\n### Reading\r\n\r\n```python\r\nfrom fastxlsx import DShape, DType, RangeInfo, ReadOnlyWorkbook, read_many\r\n\r\n# Load xlsx file\r\nwb = ReadOnlyWorkbook(\"./example.xlsx\")\r\n# List all sheet names\r\nwb.sheetnames\r\n# Get a worksheet by index or name\r\nws = wb.get_by_idx(0)\r\n# Read a single cell, notice the index is 0-based\r\nprint(ws.cell_value((0, 0)))\r\nprint(ws.cell_value(\"B1\", dtype=DType.DateTime))\r\n# Read a column with `read_value` and `RangeInfo`\r\nprint(ws.read_value(RangeInfo((4, 0), DShape.Column(3), dtype=DType.Any)))\r\nprint(\r\n    ws.read_values(\r\n        {\r\n            \"var_a\": RangeInfo((5, 2), DShape.Column(3), dtype=DType.Float),\r\n            \"matrix\": RangeInfo((5, 2), DShape.Matrix(3, 3), dtype=DType.Float),\r\n        }\r\n    )\r\n)\r\n\r\n# Read multiple sheets\r\nprint(wb.read_worksheets({\"sheet1\": [RangeInfo((2, 2), DShape.Scalar())]}))\r\n# Read multiple files in parallel\r\nprint(\r\n    read_many(\r\n        {\r\n            f\"./example_{i_workbook:02d}.xlsx\": {\r\n                f\"Sheet{i_sheet}\": [\r\n                    RangeInfo((0, 0), DShape.Scalar()),\r\n                    RangeInfo((1, 1), DShape.Matrix(3, 3)),\r\n                ]\r\n                for i_sheet in range(6)\r\n            }\r\n            for i_workbook in range(10)\r\n        }\r\n    )\r\n)\r\n```\r\n\r\n_For full details, see [docs](./docs)._\r\n\r\n## \ud83d\udcd6 Motivation\r\n\r\nAs is well known, Excel is not a good format for performance, but due to its widely used nature, sometimes we have to handle massive XLSX datasets. When I do some postprocessing work in Python, a lot of time is wasted on reading and writing; and when I tried to speed it up by parallelization, the spawn feature in Windows disturb me again. Therefore, I decided to develop a xlsx read-write library with Rust+PyO3 to solve that.\r\n\r\nThanks to the high performance of `calamine` and `rust_xlsxwriter`, as well as the great work of `PyO3` and `maturin`, it is possible to do that by just binding them together with Python. Also thanks to the help of Deepseek enable me, a Rust beginner, could finish that.\r\n\r\n## \ud83d\udccc Future Plans\r\n\r\n- Add support for formula and cell formatting  \r\n  `rust_xlsxwriter` supports formula and cell formatting well so that is not too hard to implent them into `fastxlsx`. But personally, when I export a large amount of data, format is usually not important, so the priority of this item is not high.\r\n- Improve error handling\r\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A high-performance Excel XLSX reader/writer for Python built with Rust.",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [
        "excel",
        " xlsx"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90c42b738f9be01d24d9803c46fb0ec2e1c771012d7afcad3714e25e446bbbc7",
                "md5": "f6393553a517fa4459211eb9ef9643cf",
                "sha256": "7117ad5223402d499a03d45ad074278bccdff70beb06b33d3bb9a86654a45f72"
            },
            "downloads": -1,
            "filename": "fastxlsx-0.2.0-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "f6393553a517fa4459211eb9ef9643cf",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.13,>=3.8",
            "size": 1088704,
            "upload_time": "2025-02-04T05:13:05",
            "upload_time_iso_8601": "2025-02-04T05:13:05.999995Z",
            "url": "https://files.pythonhosted.org/packages/90/c4/2b738f9be01d24d9803c46fb0ec2e1c771012d7afcad3714e25e446bbbc7/fastxlsx-0.2.0-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d4dc0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c",
                "md5": "84fbaefad89ef43af0c579a43e35db53",
                "sha256": "06dad2f7cdeceec07da22e44833e4b2de5f48af669c30504e357cf5aab90e2ce"
            },
            "downloads": -1,
            "filename": "fastxlsx-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "84fbaefad89ef43af0c579a43e35db53",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 41661,
            "upload_time": "2025-02-04T05:13:09",
            "upload_time_iso_8601": "2025-02-04T05:13:09.035843Z",
            "url": "https://files.pythonhosted.org/packages/3d/4d/c0868e3520ad5b8cb574a6ad309cfa5ecab25161187bc94a1d5f3efef10c/fastxlsx-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-04 05:13:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fastxlsx"
}
        
Elapsed time: 0.39260s