Name | turboxl JSON |
Version |
0.1.11
JSON |
| download |
home_page | None |
Summary | Fast XLSX to CSV converter (C++ core with Python bindings) |
upload_time | 2025-09-19 22:44:45 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
xlsx
csv
excel
converter
fast
c++
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# TurboXL
<p align="center">
<img src="assets/logo.svg" alt="TurboXL Logo" width="400"/>
</p>
Fast, read-only XLSX to CSV converter with C++20 core and Python bindings.
## Performance
**Real-world benchmarks** on Chicago Crime dataset (21.9MB, 146,574 rows):
| Metric | TurboXL | OpenPyXL | Improvement |
| -------------- | --------------- | -------------- | ---------------- |
| **Speed** | 2.4s | 63.1s | **26.7x faster** |
| **Memory** | 33.5MB | 66.9MB | **2.0x less** |
| **Throughput** | 62,040 rows/sec | 2,321 rows/sec | **26.7x faster** |
_Dataset: [Chicago Crimes 2025](https://data.cityofchicago.org/Public-Safety/Crimes-2025/t7ek-mgzi/about_data)_
🚀 **Recent Optimizations Implemented:**
- **zlib-ng integration** - Up to 2.5x faster ZIP decompression
- **Release build optimizations** - `-O3 -march=native -flto` for GCC/Clang, `/O2 /GL /arch:AVX2` for MSVC
- **Arena-based shared strings** - Memory-efficient string storage
- **Chunked ZIP reading** - 512 KiB buffer optimization
## What It Does
- ✅ Read XLSX files and convert to CSV
- ✅ Handle shared strings, numbers, dates, booleans
- ✅ Process multiple worksheets
- ✅ Memory-efficient streaming (33.5MB for 146k rows)
- ✅ Cross-platform (Linux, macOS, Windows)
## What It Doesn't Do
- ❌ Write or modify XLSX files
- ❌ Formula evaluation (uses cached values)
- ❌ Charts, images, pivot tables
- ❌ Password-protected files
## Quick Start
### Python
```python
import turboxl
# Convert first sheet
csv_data = turboxl.read_sheet_to_csv("data.xlsx")
# Convert specific sheet
csv_data = turboxl.read_sheet_to_csv("data.xlsx", sheet="Sheet2")
# Custom options
csv_data = turboxl.read_sheet_to_csv(
"data.xlsx",
sheet=0,
delimiter=";",
date_mode="iso"
)
# Save to file
with open("output.csv", "w", encoding="utf-8") as f:
f.write(csv_data)
```
### C++
```cpp
#include <xlsxcsv.hpp>
#include <iostream>
int main() {
try {
std::string csv = xlsxcsv::readSheetToCsv("data.xlsx");
std::cout << csv << std::endl;
} catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
}
return 0;
}
```
## Building
### Prerequisites
Install system dependencies (used via pkg-config/CMake):
```bash
# macOS (Recommended for best performance)
brew install libxml2 minizip-ng zlib-ng cmake pybind11 pkg-config
# Ubuntu/Debian (Recommended for best performance)
sudo apt-get install -y libxml2-dev libminizip-dev cmake build-essential pkg-config
# For zlib-ng on Ubuntu/Debian, build from source:
# git clone https://github.com/zlib-ng/zlib-ng.git
# cd zlib-ng && cmake -B build && cmake --build build -j && sudo cmake --install build
# Windows (vcpkg)
vcpkg install libxml2 minizip-ng zlib-ng
```
**Performance Note:** Installing `zlib-ng` provides significant performance improvements (up to 2.5x faster decompression). The build system automatically detects and uses zlib-ng if available, falling back to standard zlib otherwise.
### Build C++ Core (library only)
Build the C++ core without Python bindings (no Python/pybind11 required):
```bash
# From repo root
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTS=OFF \
-DBUILD_PYTHON=OFF \
-DBUILD_CLI=OFF
cmake --build build -j4
```
Artifacts:
- Static library: `build/libturboxl_core.a`
**Build Modes:**
- **Release** (Recommended): Enables `-O3 -march=native -flto` optimizations
- **Debug**: Enables debugging symbols and assertions
### Build Options
- `BUILD_TESTS=ON/OFF` - Build test suite (default: ON)
- `BUILD_PYTHON=ON/OFF` - Build Python bindings (default: ON)
- `BUILD_CLI=ON/OFF` - Build command-line tool (default: OFF)
---
## Python Wheel
TurboXL ships a PEP 517/518 build powered by scikit-build-core. The wheel builds the C++ core and Python extension in Release mode using CMake.
### Python prerequisites
```bash
python3 -m pip install -U pip build scikit-build-core pybind11
```
System dependencies listed above (libxml2, minizip-ng, zlib-ng, cmake, compiler) must be installed and discoverable by CMake/pkg-config.
### Build the wheel
```bash
# From repo root
python3 -m build -w
```
Outputs go to `dist/`, for example:
- `dist/turboxl-0.1.0-<python>-<abi>-<platform>.whl`
Install the built wheel locally:
```bash
pip install python/dist/turboxl-*.whl
```
Tips:
- Parallel CMake build: `CMAKE_BUILD_PARALLEL_LEVEL=4 python3 -m build -w`
- macOS arch (defaults to arm64 via `pyproject.toml`): to override, you can pass
`--config-setting=cmake.define.CMAKE_OSX_ARCHITECTURES="arm64;x86_64"` to `python -m build`.
## Requirements
- **C++**: C++20 compiler (GCC 10+, Clang 12+, MSVC 2019+)
- **Build**: CMake 3.20+
- **Python**: 3.8-3.12 (for Python bindings)
## API Reference
### Python
```python
turboxl.read_sheet_to_csv(
xlsx_path: str,
sheet: Union[str, int] = None, # First sheet if None
delimiter: str = ",",
newline: Literal["LF", "CRLF"] = "LF",
include_bom: bool = False,
date_mode: Literal["iso", "rawNumber"] = "iso"
) -> str
```
### C++
```cpp
struct CsvOptions {
std::string sheetByName;
int sheetByIndex = -1;
char delimiter = ',';
bool includeBom = false;
// ... more options
};
std::string readSheetToCsv(
const std::string& xlsxPath,
const CsvOptions& opts = {}
);
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "turboxl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "xlsx, csv, excel, converter, fast, c++",
"author": null,
"author_email": "Michail Kaseris <mich.kaseris@gmail.com>",
"download_url": null,
"platform": null,
"description": "# TurboXL\n\n<p align=\"center\">\n <img src=\"assets/logo.svg\" alt=\"TurboXL Logo\" width=\"400\"/>\n</p>\nFast, read-only XLSX to CSV converter with C++20 core and Python bindings.\n\n## Performance\n\n**Real-world benchmarks** on Chicago Crime dataset (21.9MB, 146,574 rows):\n\n| Metric | TurboXL | OpenPyXL | Improvement |\n| -------------- | --------------- | -------------- | ---------------- |\n| **Speed** | 2.4s | 63.1s | **26.7x faster** |\n| **Memory** | 33.5MB | 66.9MB | **2.0x less** |\n| **Throughput** | 62,040 rows/sec | 2,321 rows/sec | **26.7x faster** |\n\n_Dataset: [Chicago Crimes 2025](https://data.cityofchicago.org/Public-Safety/Crimes-2025/t7ek-mgzi/about_data)_\n\n\ud83d\ude80 **Recent Optimizations Implemented:**\n\n- **zlib-ng integration** - Up to 2.5x faster ZIP decompression\n- **Release build optimizations** - `-O3 -march=native -flto` for GCC/Clang, `/O2 /GL /arch:AVX2` for MSVC\n- **Arena-based shared strings** - Memory-efficient string storage\n- **Chunked ZIP reading** - 512 KiB buffer optimization\n\n## What It Does\n\n- \u2705 Read XLSX files and convert to CSV\n- \u2705 Handle shared strings, numbers, dates, booleans\n- \u2705 Process multiple worksheets\n- \u2705 Memory-efficient streaming (33.5MB for 146k rows)\n- \u2705 Cross-platform (Linux, macOS, Windows)\n\n## What It Doesn't Do\n\n- \u274c Write or modify XLSX files\n- \u274c Formula evaluation (uses cached values)\n- \u274c Charts, images, pivot tables\n- \u274c Password-protected files\n\n## Quick Start\n\n### Python\n\n```python\nimport turboxl\n\n# Convert first sheet\ncsv_data = turboxl.read_sheet_to_csv(\"data.xlsx\")\n\n# Convert specific sheet\ncsv_data = turboxl.read_sheet_to_csv(\"data.xlsx\", sheet=\"Sheet2\")\n\n# Custom options\ncsv_data = turboxl.read_sheet_to_csv(\n \"data.xlsx\",\n sheet=0,\n delimiter=\";\",\n date_mode=\"iso\"\n)\n\n# Save to file\nwith open(\"output.csv\", \"w\", encoding=\"utf-8\") as f:\n f.write(csv_data)\n```\n\n### C++\n\n```cpp\n#include <xlsxcsv.hpp>\n#include <iostream>\n\nint main() {\n try {\n std::string csv = xlsxcsv::readSheetToCsv(\"data.xlsx\");\n std::cout << csv << std::endl;\n } catch (const std::exception& e) {\n std::cerr << \"Error: \" << e.what() << std::endl;\n }\n return 0;\n}\n```\n\n## Building\n\n### Prerequisites\n\nInstall system dependencies (used via pkg-config/CMake):\n\n```bash\n# macOS (Recommended for best performance)\nbrew install libxml2 minizip-ng zlib-ng cmake pybind11 pkg-config\n\n# Ubuntu/Debian (Recommended for best performance)\nsudo apt-get install -y libxml2-dev libminizip-dev cmake build-essential pkg-config\n# For zlib-ng on Ubuntu/Debian, build from source:\n# git clone https://github.com/zlib-ng/zlib-ng.git\n# cd zlib-ng && cmake -B build && cmake --build build -j && sudo cmake --install build\n\n# Windows (vcpkg)\nvcpkg install libxml2 minizip-ng zlib-ng\n```\n\n**Performance Note:** Installing `zlib-ng` provides significant performance improvements (up to 2.5x faster decompression). The build system automatically detects and uses zlib-ng if available, falling back to standard zlib otherwise.\n\n### Build C++ Core (library only)\n\nBuild the C++ core without Python bindings (no Python/pybind11 required):\n\n```bash\n# From repo root\ncmake -S . -B build \\\n -DCMAKE_BUILD_TYPE=Release \\\n -DBUILD_TESTS=OFF \\\n -DBUILD_PYTHON=OFF \\\n -DBUILD_CLI=OFF\ncmake --build build -j4\n```\n\nArtifacts:\n\n- Static library: `build/libturboxl_core.a`\n\n**Build Modes:**\n\n- **Release** (Recommended): Enables `-O3 -march=native -flto` optimizations\n- **Debug**: Enables debugging symbols and assertions\n\n### Build Options\n\n- `BUILD_TESTS=ON/OFF` - Build test suite (default: ON)\n- `BUILD_PYTHON=ON/OFF` - Build Python bindings (default: ON)\n- `BUILD_CLI=ON/OFF` - Build command-line tool (default: OFF)\n\n---\n\n## Python Wheel\n\nTurboXL ships a PEP 517/518 build powered by scikit-build-core. The wheel builds the C++ core and Python extension in Release mode using CMake.\n\n### Python prerequisites\n\n```bash\npython3 -m pip install -U pip build scikit-build-core pybind11\n```\n\nSystem dependencies listed above (libxml2, minizip-ng, zlib-ng, cmake, compiler) must be installed and discoverable by CMake/pkg-config.\n\n### Build the wheel\n\n```bash\n# From repo root\npython3 -m build -w\n```\n\nOutputs go to `dist/`, for example:\n\n- `dist/turboxl-0.1.0-<python>-<abi>-<platform>.whl`\n\nInstall the built wheel locally:\n\n```bash\npip install python/dist/turboxl-*.whl\n```\n\nTips:\n\n- Parallel CMake build: `CMAKE_BUILD_PARALLEL_LEVEL=4 python3 -m build -w`\n- macOS arch (defaults to arm64 via `pyproject.toml`): to override, you can pass\n `--config-setting=cmake.define.CMAKE_OSX_ARCHITECTURES=\"arm64;x86_64\"` to `python -m build`.\n\n## Requirements\n\n- **C++**: C++20 compiler (GCC 10+, Clang 12+, MSVC 2019+)\n- **Build**: CMake 3.20+\n- **Python**: 3.8-3.12 (for Python bindings)\n\n## API Reference\n\n### Python\n\n```python\nturboxl.read_sheet_to_csv(\n xlsx_path: str,\n sheet: Union[str, int] = None, # First sheet if None\n delimiter: str = \",\",\n newline: Literal[\"LF\", \"CRLF\"] = \"LF\",\n include_bom: bool = False,\n date_mode: Literal[\"iso\", \"rawNumber\"] = \"iso\"\n) -> str\n```\n\n### C++\n\n```cpp\nstruct CsvOptions {\n std::string sheetByName;\n int sheetByIndex = -1;\n char delimiter = ',';\n bool includeBom = false;\n // ... more options\n};\n\nstd::string readSheetToCsv(\n const std::string& xlsxPath,\n const CsvOptions& opts = {}\n);\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fast XLSX to CSV converter (C++ core with Python bindings)",
"version": "0.1.11",
"project_urls": null,
"split_keywords": [
"xlsx",
" csv",
" excel",
" converter",
" fast",
" c++"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "18d3398cd41b30043792a77fd1aa3eb9f68ac4d6d4adc0e4521f930f24b2a49e",
"md5": "05826c39f79c59a4c393b706318d7320",
"sha256": "fce0a315b33c1e373fc5267d23a9e296dd44cc8443a78381bea2627e7d088afa"
},
"downloads": -1,
"filename": "turboxl-0.1.11-cp311-cp311-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "05826c39f79c59a4c393b706318d7320",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.8",
"size": 602407,
"upload_time": "2025-09-19T22:44:45",
"upload_time_iso_8601": "2025-09-19T22:44:45.811155Z",
"url": "https://files.pythonhosted.org/packages/18/d3/398cd41b30043792a77fd1aa3eb9f68ac4d6d4adc0e4521f930f24b2a49e/turboxl-0.1.11-cp311-cp311-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5347382c518c60355fa594403fe24cab70aae381d7663fad5332a5d369b933fd",
"md5": "e82cad03dc0d726dbc0fa42a3692846f",
"sha256": "0b355a860ef76d6b126bb3a331ebb132d53fdbf8e213f20924ec891b34ee048e"
},
"downloads": -1,
"filename": "turboxl-0.1.11-cp311-cp311-macosx_11_0_x86_64.whl",
"has_sig": false,
"md5_digest": "e82cad03dc0d726dbc0fa42a3692846f",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.8",
"size": 712364,
"upload_time": "2025-09-19T22:44:47",
"upload_time_iso_8601": "2025-09-19T22:44:47.504794Z",
"url": "https://files.pythonhosted.org/packages/53/47/382c518c60355fa594403fe24cab70aae381d7663fad5332a5d369b933fd/turboxl-0.1.11-cp311-cp311-macosx_11_0_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-19 22:44:45",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "turboxl"
}