vortex-data


Namevortex-data JSON
Version 0.42.1 PyPI version JSON
download
home_pagehttps://github.com/spiraldb/vortex
SummaryPython bindings for Vortex, an Apache Arrow-compatible toolkit for working with compressed array data.
upload_time2025-07-24 18:45:33
maintainerNone
docs_urlNone
authorVortex Authors <hello@vortex.dev>
requires_python>=3.11
licenseApache-2.0
keywords vortex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🌪️ Vortex

[![Build Status](https://github.com/vortex-data/vortex/actions/workflows/ci.yml/badge.svg)](https://github.com/vortex-data/vortex/actions)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10567/badge)](https://www.bestpractices.dev/projects/10567)
[![Documentation](https://docs.rs/vortex/badge.svg)](https://docs.vortex.dev)
[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/vortex-data/vortex)
[![Crates.io](https://img.shields.io/crates/v/vortex.svg)](https://crates.io/crates/vortex)
[![PyPI - Version](https://img.shields.io/pypi/v/vortex-data)](https://pypi.org/project/vortex-data/)
[![Maven - Version](https://img.shields.io/maven-central/v/dev.vortex/vortex-spark)](https://central.sonatype.com/artifact/dev.vortex/vortex-spark)

📚 [Documentation](https://docs.vortex.dev/) | 📊 [Performance Benchmarks](https://bench.vortex.dev)

## Overview

Vortex is a next-generation columnar file format and toolkit designed for high-performance data processing.
It is the fastest and most extensible format for building data systems backed by object storage. It provides:

- **⚡️ Blazing Fast Performance**
  - 200x faster random access reads (vs. modern Apache Parquet)
  - 2-10x faster scans
  - 2-10x faster writes
  - Similar compression ratios
  - Efficient support for wide tables with zero-copy/zero-parse metadata

- **🔧 Extensible Architecture**
  - Modeled after Apache DataFusion's extensible approach
  - Pluggable encoding system, type system, compression strategy, & layout strategy
  - Zero-copy compatibility with Apache Arrow

- **🗳️ Open Source, Neutral Governance**
  - A Linux Foundation (LF AI & Data) Project
  - Apache-2.0 Licensed

- **↔️ Integrations**
  - Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more
  - Apache Iceberg (coming soon)

> 🟢 **Development Status**: Library APIs may change from version to version, but we now consider
> the file format <ins>*stable*</ins>. From release 0.36.0, all future releases of Vortex should
> maintain backwards compatibility of the file format (i.e., be able to read files written by
> any earlier version >= 0.36.0).

## Key Features

### Core Capabilities

- ✨ **Logical Types** - Clean separation between logical schema and physical layout
- 🔄 **Zero-Copy Arrow Integration** - Seamless conversion to/from Apache Arrow arrays
- 🧩 **Extensible Encodings** - Pluggable physical layouts with built-in optimizations
- 📦 **Cascading Compression** - Support for nested encoding schemes
- 🚀 **High-Performance Computing** - Optimized compute kernels for encoded data
- 📊 **Rich Statistics** - Lazy-loaded summary statistics for optimization

### Technical Architecture

#### Logical vs Physical Design

Vortex strictly separates logical and physical concerns:

- **Logical Layer**: Defines data types and schema
- **Physical Layer**: Handles encoding and storage implementation
- **Built-in Encodings**: Compatible with Apache Arrow's memory format
- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)

## Quick Start

### Installation

#### Rust Crate

All features are exported through the main `vortex` crate.

```bash
cargo add vortex
```

#### Python Package

```bash
uv add vortex-data
```

#### Command Line UI (vx)

For browsing the structure of Vortex files, you can use the `vx` command-line tool.

```bash
# Install latest release
cargo install vortex-tui --locked

# Or build from source
cargo install --path vortex-tui --locked

# Usage
vx browse <file>
```

### Development Setup

#### Prerequisites (macOS)

```bash
# Optional but recommended dependencies
brew install flatbuffers protobuf  # For .fbs and .proto files
brew install duckdb               # For benchmarks

# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup

# Initialize submodules
git submodule update --init --recursive

# Setup dependencies with uv
uv sync --all-packages
```

### Performance Optimization

For optimal performance, we suggest using [MiMalloc](https://github.com/microsoft/mimalloc):

```rust,ignore
#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;
```

## Project Information

### License

Licensed under the Apache License, Version 2.0.

### Governance

Vortex is an independent open-source project and not controlled by any single company. The Vortex Project is a
sub-project of the Linux Foundation Projects. The governance model is documented in
[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of
the [Technical Charter](https://vortex.dev/charter.pdf).

### Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Reporting Vulnerabilities

If you discovery a security vulnerability, please email <vuln-report@vortex.dev>.

### Trademarks

Copyright © Vortex a Series of LF Projects, LLC.
For terms of use, trademark policy, and other project policies please see <https://lfprojects.org>

## Acknowledgments 🏆

The Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.

### Research in Vortex

- [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) - Efficient columnar compression
- [FastLanes](https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf) - High-performance integer compression
- [FSST](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) - Fast random access string compression
- [ALP](https://ir.cwi.nl/pub/33334/33334.pdf) - Adaptive lossless floating-point compression
- [Procella](https://dl.acm.org/citation.cfm?id=3360438) - YouTube's unified data system
- [Anyblob](https://www.durner.dev/app/media/papers/anyblob-vldb23.pdf) - High-performance
  access to object storage
- [ClickHouse](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) - Fast analytics for everyone

### Vortex in Research

- [Anyblox](https://gienieczko.com/anyblox-paper) - A Framework for Self-Decoding Datasets

### Open Source Inspiration

- [Apache Arrow](https://arrow.apache.org)
- [Apache DataFusion](https://github.com/apache/datafusion)
- [parquet2](https://github.com/jorgecarleitao/parquet2) by Jorge Leitao
- [DuckDB](https://github.com/duckdb/duckdb)
- [Velox](https://github.com/facebookincubator/velox) & [Nimble](https://github.com/facebookincubator/nimble)

#### Thanks to all contributors who have shared their knowledge and code with the community! 🚀


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/spiraldb/vortex",
    "name": "vortex-data",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "vortex",
    "author": "Vortex Authors <hello@vortex.dev>",
    "author_email": "Vortex Authors <hello@vortex.dev>",
    "download_url": null,
    "platform": null,
    "description": "# \ud83c\udf2a\ufe0f Vortex\n\n[![Build Status](https://github.com/vortex-data/vortex/actions/workflows/ci.yml/badge.svg)](https://github.com/vortex-data/vortex/actions)\n[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10567/badge)](https://www.bestpractices.dev/projects/10567)\n[![Documentation](https://docs.rs/vortex/badge.svg)](https://docs.vortex.dev)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/vortex-data/vortex)\n[![Crates.io](https://img.shields.io/crates/v/vortex.svg)](https://crates.io/crates/vortex)\n[![PyPI - Version](https://img.shields.io/pypi/v/vortex-data)](https://pypi.org/project/vortex-data/)\n[![Maven - Version](https://img.shields.io/maven-central/v/dev.vortex/vortex-spark)](https://central.sonatype.com/artifact/dev.vortex/vortex-spark)\n\n\ud83d\udcda [Documentation](https://docs.vortex.dev/) | \ud83d\udcca [Performance Benchmarks](https://bench.vortex.dev)\n\n## Overview\n\nVortex is a next-generation columnar file format and toolkit designed for high-performance data processing.\nIt is the fastest and most extensible format for building data systems backed by object storage. It provides:\n\n- **\u26a1\ufe0f Blazing Fast Performance**\n  - 200x faster random access reads (vs. modern Apache Parquet)\n  - 2-10x faster scans\n  - 2-10x faster writes\n  - Similar compression ratios\n  - Efficient support for wide tables with zero-copy/zero-parse metadata\n\n- **\ud83d\udd27 Extensible Architecture**\n  - Modeled after Apache DataFusion's extensible approach\n  - Pluggable encoding system, type system, compression strategy, & layout strategy\n  - Zero-copy compatibility with Apache Arrow\n\n- **\ud83d\uddf3\ufe0f Open Source, Neutral Governance**\n  - A Linux Foundation (LF AI & Data) Project\n  - Apache-2.0 Licensed\n\n- **\u2194\ufe0f Integrations**\n  - Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more\n  - Apache Iceberg (coming soon)\n\n> \ud83d\udfe2 **Development Status**: Library APIs may change from version to version, but we now consider\n> the file format <ins>*stable*</ins>. From release 0.36.0, all future releases of Vortex should\n> maintain backwards compatibility of the file format (i.e., be able to read files written by\n> any earlier version >= 0.36.0).\n\n## Key Features\n\n### Core Capabilities\n\n- \u2728 **Logical Types** - Clean separation between logical schema and physical layout\n- \ud83d\udd04 **Zero-Copy Arrow Integration** - Seamless conversion to/from Apache Arrow arrays\n- \ud83e\udde9 **Extensible Encodings** - Pluggable physical layouts with built-in optimizations\n- \ud83d\udce6 **Cascading Compression** - Support for nested encoding schemes\n- \ud83d\ude80 **High-Performance Computing** - Optimized compute kernels for encoded data\n- \ud83d\udcca **Rich Statistics** - Lazy-loaded summary statistics for optimization\n\n### Technical Architecture\n\n#### Logical vs Physical Design\n\nVortex strictly separates logical and physical concerns:\n\n- **Logical Layer**: Defines data types and schema\n- **Physical Layer**: Handles encoding and storage implementation\n- **Built-in Encodings**: Compatible with Apache Arrow's memory format\n- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)\n\n## Quick Start\n\n### Installation\n\n#### Rust Crate\n\nAll features are exported through the main `vortex` crate.\n\n```bash\ncargo add vortex\n```\n\n#### Python Package\n\n```bash\nuv add vortex-data\n```\n\n#### Command Line UI (vx)\n\nFor browsing the structure of Vortex files, you can use the `vx` command-line tool.\n\n```bash\n# Install latest release\ncargo install vortex-tui --locked\n\n# Or build from source\ncargo install --path vortex-tui --locked\n\n# Usage\nvx browse <file>\n```\n\n### Development Setup\n\n#### Prerequisites (macOS)\n\n```bash\n# Optional but recommended dependencies\nbrew install flatbuffers protobuf  # For .fbs and .proto files\nbrew install duckdb               # For benchmarks\n\n# Install Rust toolchain\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n# or\nbrew install rustup\n\n# Initialize submodules\ngit submodule update --init --recursive\n\n# Setup dependencies with uv\nuv sync --all-packages\n```\n\n### Performance Optimization\n\nFor optimal performance, we suggest using [MiMalloc](https://github.com/microsoft/mimalloc):\n\n```rust,ignore\n#[global_allocator]\nstatic GLOBAL_ALLOC: MiMalloc = MiMalloc;\n```\n\n## Project Information\n\n### License\n\nLicensed under the Apache License, Version 2.0.\n\n### Governance\n\nVortex is an independent open-source project and not controlled by any single company. The Vortex Project is a\nsub-project of the Linux Foundation Projects. The governance model is documented in\n[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of\nthe [Technical Charter](https://vortex.dev/charter.pdf).\n\n### Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Reporting Vulnerabilities\n\nIf you discovery a security vulnerability, please email <vuln-report@vortex.dev>.\n\n### Trademarks\n\nCopyright \u00a9 Vortex a Series of LF Projects, LLC.\nFor terms of use, trademark policy, and other project policies please see <https://lfprojects.org>\n\n## Acknowledgments \ud83c\udfc6\n\nThe Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.\n\n### Research in Vortex\n\n- [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) - Efficient columnar compression\n- [FastLanes](https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf) - High-performance integer compression\n- [FSST](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) - Fast random access string compression\n- [ALP](https://ir.cwi.nl/pub/33334/33334.pdf) - Adaptive lossless floating-point compression\n- [Procella](https://dl.acm.org/citation.cfm?id=3360438) - YouTube's unified data system\n- [Anyblob](https://www.durner.dev/app/media/papers/anyblob-vldb23.pdf) - High-performance\n  access to object storage\n- [ClickHouse](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) - Fast analytics for everyone\n\n### Vortex in Research\n\n- [Anyblox](https://gienieczko.com/anyblox-paper) - A Framework for Self-Decoding Datasets\n\n### Open Source Inspiration\n\n- [Apache Arrow](https://arrow.apache.org)\n- [Apache DataFusion](https://github.com/apache/datafusion)\n- [parquet2](https://github.com/jorgecarleitao/parquet2) by Jorge Leitao\n- [DuckDB](https://github.com/duckdb/duckdb)\n- [Velox](https://github.com/facebookincubator/velox) & [Nimble](https://github.com/facebookincubator/nimble)\n\n#### Thanks to all contributors who have shared their knowledge and code with the community! \ud83d\ude80\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python bindings for Vortex, an Apache Arrow-compatible toolkit for working with compressed array data.",
    "version": "0.42.1",
    "project_urls": {
        "Benchmarks": "https://bench.vortex.dev",
        "Changelog": "https://github.com/vortex-data/vortex/blob/develop/CHANGELOG.md",
        "Documentation": "https://docs.vortex.dev",
        "Homepage": "https://github.com/spiraldb/vortex",
        "Issues": "https://github.com/vortex-data/vortex/issues"
    },
    "split_keywords": [
        "vortex"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a96c21e684c278562e6dda78797f5fa2b2802a9f31c87494b963fee030d87a47",
                "md5": "12ea6ca64eb8db05d83f39ed68a18189",
                "sha256": "f7bfc1f301dd1c40dd4537683056835cc4e50bbec8ff22175e1ac64a636b49a4"
            },
            "downloads": -1,
            "filename": "vortex_data-0.42.1-cp310-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "12ea6ca64eb8db05d83f39ed68a18189",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.11",
            "size": 11020757,
            "upload_time": "2025-07-24T18:45:33",
            "upload_time_iso_8601": "2025-07-24T18:45:33.187654Z",
            "url": "https://files.pythonhosted.org/packages/a9/6c/21e684c278562e6dda78797f5fa2b2802a9f31c87494b963fee030d87a47/vortex_data-0.42.1-cp310-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5d492d4dea664e801359cf1796739174b2d9c59be7ff3d50860a82570634ca23",
                "md5": "84771fa336717a72387ed51571846195",
                "sha256": "b62121c917d3dde013a79a696a8c87553e4c92a0dc5c3b0d0960a465cd88d337"
            },
            "downloads": -1,
            "filename": "vortex_data-0.42.1-cp310-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "84771fa336717a72387ed51571846195",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.11",
            "size": 10269363,
            "upload_time": "2025-07-24T18:45:35",
            "upload_time_iso_8601": "2025-07-24T18:45:35.609229Z",
            "url": "https://files.pythonhosted.org/packages/5d/49/2d4dea664e801359cf1796739174b2d9c59be7ff3d50860a82570634ca23/vortex_data-0.42.1-cp310-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4083d9637880afb65db35ceaca5a69f71a46181fa27020a0c2173657bb50c27f",
                "md5": "cfacf7b1a55bed009aa6c0d98b8dc96e",
                "sha256": "138b86999ae02c9f87a46fb84f435016ad69399204d8ea37d1cf029ff5844387"
            },
            "downloads": -1,
            "filename": "vortex_data-0.42.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "cfacf7b1a55bed009aa6c0d98b8dc96e",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.11",
            "size": 9448432,
            "upload_time": "2025-07-24T18:45:38",
            "upload_time_iso_8601": "2025-07-24T18:45:38.184337Z",
            "url": "https://files.pythonhosted.org/packages/40/83/d9637880afb65db35ceaca5a69f71a46181fa27020a0c2173657bb50c27f/vortex_data-0.42.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0c59afa1d3c4d70ee6dba3b2258d60342aaeb22da615a1d41a2efa0ff81b4611",
                "md5": "d5344664883a78275670fb96f18c394d",
                "sha256": "b6a2471ad7043de9ed19fe556ceb68b526561102965d1a9365706434767a130a"
            },
            "downloads": -1,
            "filename": "vortex_data-0.42.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d5344664883a78275670fb96f18c394d",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.11",
            "size": 10177627,
            "upload_time": "2025-07-24T18:45:40",
            "upload_time_iso_8601": "2025-07-24T18:45:40.454331Z",
            "url": "https://files.pythonhosted.org/packages/0c/59/afa1d3c4d70ee6dba3b2258d60342aaeb22da615a1d41a2efa0ff81b4611/vortex_data-0.42.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 18:45:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "spiraldb",
    "github_project": "vortex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vortex-data"
}
        
Elapsed time: 1.62686s