turboxl


Nameturboxl JSON
Version 0.1.11 PyPI version JSON
download
home_pageNone
SummaryFast XLSX to CSV converter (C++ core with Python bindings)
upload_time2025-09-19 22:44:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords xlsx csv excel converter fast c++
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TurboXL

<p align="center">
  <img src="assets/logo.svg" alt="TurboXL Logo" width="400"/>
</p>
Fast, read-only XLSX to CSV converter with C++20 core and Python bindings.

## Performance

**Real-world benchmarks** on Chicago Crime dataset (21.9MB, 146,574 rows):

| Metric         | TurboXL         | OpenPyXL       | Improvement      |
| -------------- | --------------- | -------------- | ---------------- |
| **Speed**      | 2.4s            | 63.1s          | **26.7x faster** |
| **Memory**     | 33.5MB          | 66.9MB         | **2.0x less**    |
| **Throughput** | 62,040 rows/sec | 2,321 rows/sec | **26.7x faster** |

_Dataset: [Chicago Crimes 2025](https://data.cityofchicago.org/Public-Safety/Crimes-2025/t7ek-mgzi/about_data)_

🚀 **Recent Optimizations Implemented:**

- **zlib-ng integration** - Up to 2.5x faster ZIP decompression
- **Release build optimizations** - `-O3 -march=native -flto` for GCC/Clang, `/O2 /GL /arch:AVX2` for MSVC
- **Arena-based shared strings** - Memory-efficient string storage
- **Chunked ZIP reading** - 512 KiB buffer optimization

## What It Does

- ✅ Read XLSX files and convert to CSV
- ✅ Handle shared strings, numbers, dates, booleans
- ✅ Process multiple worksheets
- ✅ Memory-efficient streaming (33.5MB for 146k rows)
- ✅ Cross-platform (Linux, macOS, Windows)

## What It Doesn't Do

- ❌ Write or modify XLSX files
- ❌ Formula evaluation (uses cached values)
- ❌ Charts, images, pivot tables
- ❌ Password-protected files

## Quick Start

### Python

```python
import turboxl

# Convert first sheet
csv_data = turboxl.read_sheet_to_csv("data.xlsx")

# Convert specific sheet
csv_data = turboxl.read_sheet_to_csv("data.xlsx", sheet="Sheet2")

# Custom options
csv_data = turboxl.read_sheet_to_csv(
    "data.xlsx",
    sheet=0,
    delimiter=";",
    date_mode="iso"
)

# Save to file
with open("output.csv", "w", encoding="utf-8") as f:
    f.write(csv_data)
```

### C++

```cpp
#include <xlsxcsv.hpp>
#include <iostream>

int main() {
    try {
        std::string csv = xlsxcsv::readSheetToCsv("data.xlsx");
        std::cout << csv << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }
    return 0;
}
```

## Building

### Prerequisites

Install system dependencies (used via pkg-config/CMake):

```bash
# macOS (Recommended for best performance)
brew install libxml2 minizip-ng zlib-ng cmake pybind11 pkg-config

# Ubuntu/Debian (Recommended for best performance)
sudo apt-get install -y libxml2-dev libminizip-dev cmake build-essential pkg-config
# For zlib-ng on Ubuntu/Debian, build from source:
# git clone https://github.com/zlib-ng/zlib-ng.git
# cd zlib-ng && cmake -B build && cmake --build build -j && sudo cmake --install build

# Windows (vcpkg)
vcpkg install libxml2 minizip-ng zlib-ng
```

**Performance Note:** Installing `zlib-ng` provides significant performance improvements (up to 2.5x faster decompression). The build system automatically detects and uses zlib-ng if available, falling back to standard zlib otherwise.

### Build C++ Core (library only)

Build the C++ core without Python bindings (no Python/pybind11 required):

```bash
# From repo root
cmake -S . -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_TESTS=OFF \
  -DBUILD_PYTHON=OFF \
  -DBUILD_CLI=OFF
cmake --build build -j4
```

Artifacts:

- Static library: `build/libturboxl_core.a`

**Build Modes:**

- **Release** (Recommended): Enables `-O3 -march=native -flto` optimizations
- **Debug**: Enables debugging symbols and assertions

### Build Options

- `BUILD_TESTS=ON/OFF` - Build test suite (default: ON)
- `BUILD_PYTHON=ON/OFF` - Build Python bindings (default: ON)
- `BUILD_CLI=ON/OFF` - Build command-line tool (default: OFF)

---

## Python Wheel

TurboXL ships a PEP 517/518 build powered by scikit-build-core. The wheel builds the C++ core and Python extension in Release mode using CMake.

### Python prerequisites

```bash
python3 -m pip install -U pip build scikit-build-core pybind11
```

System dependencies listed above (libxml2, minizip-ng, zlib-ng, cmake, compiler) must be installed and discoverable by CMake/pkg-config.

### Build the wheel

```bash
# From repo root
python3 -m build -w
```

Outputs go to `dist/`, for example:

- `dist/turboxl-0.1.0-<python>-<abi>-<platform>.whl`

Install the built wheel locally:

```bash
pip install python/dist/turboxl-*.whl
```

Tips:

- Parallel CMake build: `CMAKE_BUILD_PARALLEL_LEVEL=4 python3 -m build -w`
- macOS arch (defaults to arm64 via `pyproject.toml`): to override, you can pass
  `--config-setting=cmake.define.CMAKE_OSX_ARCHITECTURES="arm64;x86_64"` to `python -m build`.

## Requirements

- **C++**: C++20 compiler (GCC 10+, Clang 12+, MSVC 2019+)
- **Build**: CMake 3.20+
- **Python**: 3.8-3.12 (for Python bindings)

## API Reference

### Python

```python
turboxl.read_sheet_to_csv(
    xlsx_path: str,
    sheet: Union[str, int] = None,  # First sheet if None
    delimiter: str = ",",
    newline: Literal["LF", "CRLF"] = "LF",
    include_bom: bool = False,
    date_mode: Literal["iso", "rawNumber"] = "iso"
) -> str
```

### C++

```cpp
struct CsvOptions {
    std::string sheetByName;
    int sheetByIndex = -1;
    char delimiter = ',';
    bool includeBom = false;
    // ... more options
};

std::string readSheetToCsv(
    const std::string& xlsxPath,
    const CsvOptions& opts = {}
);
```

## License

MIT License - see [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "turboxl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "xlsx, csv, excel, converter, fast, c++",
    "author": null,
    "author_email": "Michail Kaseris <mich.kaseris@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# TurboXL\n\n<p align=\"center\">\n  <img src=\"assets/logo.svg\" alt=\"TurboXL Logo\" width=\"400\"/>\n</p>\nFast, read-only XLSX to CSV converter with C++20 core and Python bindings.\n\n## Performance\n\n**Real-world benchmarks** on Chicago Crime dataset (21.9MB, 146,574 rows):\n\n| Metric         | TurboXL         | OpenPyXL       | Improvement      |\n| -------------- | --------------- | -------------- | ---------------- |\n| **Speed**      | 2.4s            | 63.1s          | **26.7x faster** |\n| **Memory**     | 33.5MB          | 66.9MB         | **2.0x less**    |\n| **Throughput** | 62,040 rows/sec | 2,321 rows/sec | **26.7x faster** |\n\n_Dataset: [Chicago Crimes 2025](https://data.cityofchicago.org/Public-Safety/Crimes-2025/t7ek-mgzi/about_data)_\n\n\ud83d\ude80 **Recent Optimizations Implemented:**\n\n- **zlib-ng integration** - Up to 2.5x faster ZIP decompression\n- **Release build optimizations** - `-O3 -march=native -flto` for GCC/Clang, `/O2 /GL /arch:AVX2` for MSVC\n- **Arena-based shared strings** - Memory-efficient string storage\n- **Chunked ZIP reading** - 512 KiB buffer optimization\n\n## What It Does\n\n- \u2705 Read XLSX files and convert to CSV\n- \u2705 Handle shared strings, numbers, dates, booleans\n- \u2705 Process multiple worksheets\n- \u2705 Memory-efficient streaming (33.5MB for 146k rows)\n- \u2705 Cross-platform (Linux, macOS, Windows)\n\n## What It Doesn't Do\n\n- \u274c Write or modify XLSX files\n- \u274c Formula evaluation (uses cached values)\n- \u274c Charts, images, pivot tables\n- \u274c Password-protected files\n\n## Quick Start\n\n### Python\n\n```python\nimport turboxl\n\n# Convert first sheet\ncsv_data = turboxl.read_sheet_to_csv(\"data.xlsx\")\n\n# Convert specific sheet\ncsv_data = turboxl.read_sheet_to_csv(\"data.xlsx\", sheet=\"Sheet2\")\n\n# Custom options\ncsv_data = turboxl.read_sheet_to_csv(\n    \"data.xlsx\",\n    sheet=0,\n    delimiter=\";\",\n    date_mode=\"iso\"\n)\n\n# Save to file\nwith open(\"output.csv\", \"w\", encoding=\"utf-8\") as f:\n    f.write(csv_data)\n```\n\n### C++\n\n```cpp\n#include <xlsxcsv.hpp>\n#include <iostream>\n\nint main() {\n    try {\n        std::string csv = xlsxcsv::readSheetToCsv(\"data.xlsx\");\n        std::cout << csv << std::endl;\n    } catch (const std::exception& e) {\n        std::cerr << \"Error: \" << e.what() << std::endl;\n    }\n    return 0;\n}\n```\n\n## Building\n\n### Prerequisites\n\nInstall system dependencies (used via pkg-config/CMake):\n\n```bash\n# macOS (Recommended for best performance)\nbrew install libxml2 minizip-ng zlib-ng cmake pybind11 pkg-config\n\n# Ubuntu/Debian (Recommended for best performance)\nsudo apt-get install -y libxml2-dev libminizip-dev cmake build-essential pkg-config\n# For zlib-ng on Ubuntu/Debian, build from source:\n# git clone https://github.com/zlib-ng/zlib-ng.git\n# cd zlib-ng && cmake -B build && cmake --build build -j && sudo cmake --install build\n\n# Windows (vcpkg)\nvcpkg install libxml2 minizip-ng zlib-ng\n```\n\n**Performance Note:** Installing `zlib-ng` provides significant performance improvements (up to 2.5x faster decompression). The build system automatically detects and uses zlib-ng if available, falling back to standard zlib otherwise.\n\n### Build C++ Core (library only)\n\nBuild the C++ core without Python bindings (no Python/pybind11 required):\n\n```bash\n# From repo root\ncmake -S . -B build \\\n  -DCMAKE_BUILD_TYPE=Release \\\n  -DBUILD_TESTS=OFF \\\n  -DBUILD_PYTHON=OFF \\\n  -DBUILD_CLI=OFF\ncmake --build build -j4\n```\n\nArtifacts:\n\n- Static library: `build/libturboxl_core.a`\n\n**Build Modes:**\n\n- **Release** (Recommended): Enables `-O3 -march=native -flto` optimizations\n- **Debug**: Enables debugging symbols and assertions\n\n### Build Options\n\n- `BUILD_TESTS=ON/OFF` - Build test suite (default: ON)\n- `BUILD_PYTHON=ON/OFF` - Build Python bindings (default: ON)\n- `BUILD_CLI=ON/OFF` - Build command-line tool (default: OFF)\n\n---\n\n## Python Wheel\n\nTurboXL ships a PEP 517/518 build powered by scikit-build-core. The wheel builds the C++ core and Python extension in Release mode using CMake.\n\n### Python prerequisites\n\n```bash\npython3 -m pip install -U pip build scikit-build-core pybind11\n```\n\nSystem dependencies listed above (libxml2, minizip-ng, zlib-ng, cmake, compiler) must be installed and discoverable by CMake/pkg-config.\n\n### Build the wheel\n\n```bash\n# From repo root\npython3 -m build -w\n```\n\nOutputs go to `dist/`, for example:\n\n- `dist/turboxl-0.1.0-<python>-<abi>-<platform>.whl`\n\nInstall the built wheel locally:\n\n```bash\npip install python/dist/turboxl-*.whl\n```\n\nTips:\n\n- Parallel CMake build: `CMAKE_BUILD_PARALLEL_LEVEL=4 python3 -m build -w`\n- macOS arch (defaults to arm64 via `pyproject.toml`): to override, you can pass\n  `--config-setting=cmake.define.CMAKE_OSX_ARCHITECTURES=\"arm64;x86_64\"` to `python -m build`.\n\n## Requirements\n\n- **C++**: C++20 compiler (GCC 10+, Clang 12+, MSVC 2019+)\n- **Build**: CMake 3.20+\n- **Python**: 3.8-3.12 (for Python bindings)\n\n## API Reference\n\n### Python\n\n```python\nturboxl.read_sheet_to_csv(\n    xlsx_path: str,\n    sheet: Union[str, int] = None,  # First sheet if None\n    delimiter: str = \",\",\n    newline: Literal[\"LF\", \"CRLF\"] = \"LF\",\n    include_bom: bool = False,\n    date_mode: Literal[\"iso\", \"rawNumber\"] = \"iso\"\n) -> str\n```\n\n### C++\n\n```cpp\nstruct CsvOptions {\n    std::string sheetByName;\n    int sheetByIndex = -1;\n    char delimiter = ',';\n    bool includeBom = false;\n    // ... more options\n};\n\nstd::string readSheetToCsv(\n    const std::string& xlsxPath,\n    const CsvOptions& opts = {}\n);\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast XLSX to CSV converter (C++ core with Python bindings)",
    "version": "0.1.11",
    "project_urls": null,
    "split_keywords": [
        "xlsx",
        " csv",
        " excel",
        " converter",
        " fast",
        " c++"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "18d3398cd41b30043792a77fd1aa3eb9f68ac4d6d4adc0e4521f930f24b2a49e",
                "md5": "05826c39f79c59a4c393b706318d7320",
                "sha256": "fce0a315b33c1e373fc5267d23a9e296dd44cc8443a78381bea2627e7d088afa"
            },
            "downloads": -1,
            "filename": "turboxl-0.1.11-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "05826c39f79c59a4c393b706318d7320",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.8",
            "size": 602407,
            "upload_time": "2025-09-19T22:44:45",
            "upload_time_iso_8601": "2025-09-19T22:44:45.811155Z",
            "url": "https://files.pythonhosted.org/packages/18/d3/398cd41b30043792a77fd1aa3eb9f68ac4d6d4adc0e4521f930f24b2a49e/turboxl-0.1.11-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5347382c518c60355fa594403fe24cab70aae381d7663fad5332a5d369b933fd",
                "md5": "e82cad03dc0d726dbc0fa42a3692846f",
                "sha256": "0b355a860ef76d6b126bb3a331ebb132d53fdbf8e213f20924ec891b34ee048e"
            },
            "downloads": -1,
            "filename": "turboxl-0.1.11-cp311-cp311-macosx_11_0_x86_64.whl",
            "has_sig": false,
            "md5_digest": "e82cad03dc0d726dbc0fa42a3692846f",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.8",
            "size": 712364,
            "upload_time": "2025-09-19T22:44:47",
            "upload_time_iso_8601": "2025-09-19T22:44:47.504794Z",
            "url": "https://files.pythonhosted.org/packages/53/47/382c518c60355fa594403fe24cab70aae381d7663fad5332a5d369b933fd/turboxl-0.1.11-cp311-cp311-macosx_11_0_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-19 22:44:45",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "turboxl"
}
        
Elapsed time: 1.76465s