# 🌪️ Vortex
[](https://github.com/vortex-data/vortex/actions)
[](https://www.bestpractices.dev/projects/10567)
[](https://docs.vortex.dev)
[](https://codspeed.io/vortex-data/vortex)
[](https://crates.io/crates/vortex)
[](https://pypi.org/project/vortex-data/)
[](https://central.sonatype.com/artifact/dev.vortex/vortex-spark)
📚 [Documentation](https://docs.vortex.dev/) | 📊 [Performance Benchmarks](https://bench.vortex.dev)
## Overview
Vortex is a next-generation columnar file format and toolkit designed for high-performance data processing.
It is the fastest and most extensible format for building data systems backed by object storage. It provides:
- **⚡️ Blazing Fast Performance**
- 200x faster random access reads (vs. modern Apache Parquet)
- 2-10x faster scans
- 2-10x faster writes
- Similar compression ratios
- Efficient support for wide tables with zero-copy/zero-parse metadata
- **🔧 Extensible Architecture**
- Modeled after Apache DataFusion's extensible approach
- Pluggable encoding system, type system, compression strategy, & layout strategy
- Zero-copy compatibility with Apache Arrow
- **🗳️ Open Source, Neutral Governance**
- A Linux Foundation (LF AI & Data) Project
- Apache-2.0 Licensed
- **↔️ Integrations**
- Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more
- Apache Iceberg (coming soon)
> 🟢 **Development Status**: Library APIs may change from version to version, but we now consider
> the file format <ins>*stable*</ins>. From release 0.36.0, all future releases of Vortex should
> maintain backwards compatibility of the file format (i.e., be able to read files written by
> any earlier version >= 0.36.0).
## Key Features
### Core Capabilities
- ✨ **Logical Types** - Clean separation between logical schema and physical layout
- 🔄 **Zero-Copy Arrow Integration** - Seamless conversion to/from Apache Arrow arrays
- 🧩 **Extensible Encodings** - Pluggable physical layouts with built-in optimizations
- 📦 **Cascading Compression** - Support for nested encoding schemes
- 🚀 **High-Performance Computing** - Optimized compute kernels for encoded data
- 📊 **Rich Statistics** - Lazy-loaded summary statistics for optimization
### Technical Architecture
#### Logical vs Physical Design
Vortex strictly separates logical and physical concerns:
- **Logical Layer**: Defines data types and schema
- **Physical Layer**: Handles encoding and storage implementation
- **Built-in Encodings**: Compatible with Apache Arrow's memory format
- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)
## Quick Start
### Installation
#### Rust Crate
All features are exported through the main `vortex` crate.
```bash
cargo add vortex
```
#### Python Package
```bash
uv add vortex-data
```
#### Command Line UI (vx)
For browsing the structure of Vortex files, you can use the `vx` command-line tool.
```bash
# Install latest release
cargo install vortex-tui --locked
# Or build from source
cargo install --path vortex-tui --locked
# Usage
vx browse <file>
```
### Development Setup
#### Prerequisites (macOS)
```bash
# Optional but recommended dependencies
brew install flatbuffers protobuf # For .fbs and .proto files
brew install duckdb # For benchmarks
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup
# Initialize submodules
git submodule update --init --recursive
# Setup dependencies with uv
uv sync --all-packages
```
### Performance Optimization
For optimal performance, we suggest using [MiMalloc](https://github.com/microsoft/mimalloc):
```rust,ignore
#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;
```
## Project Information
### License
Licensed under the Apache License, Version 2.0.
### Governance
Vortex is an independent open-source project and not controlled by any single company. The Vortex Project is a
sub-project of the Linux Foundation Projects. The governance model is documented in
[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of
the [Technical Charter](https://vortex.dev/charter.pdf).
### Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Reporting Vulnerabilities
If you discovery a security vulnerability, please email <vuln-report@vortex.dev>.
### Trademarks
Copyright © Vortex a Series of LF Projects, LLC.
For terms of use, trademark policy, and other project policies please see <https://lfprojects.org>
## Acknowledgments 🏆
The Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.
### Research in Vortex
- [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) - Efficient columnar compression
- [FastLanes](https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf) - High-performance integer compression
- [FSST](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) - Fast random access string compression
- [ALP](https://ir.cwi.nl/pub/33334/33334.pdf) - Adaptive lossless floating-point compression
- [Procella](https://dl.acm.org/citation.cfm?id=3360438) - YouTube's unified data system
- [Anyblob](https://www.durner.dev/app/media/papers/anyblob-vldb23.pdf) - High-performance
access to object storage
- [ClickHouse](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) - Fast analytics for everyone
### Vortex in Research
- [Anyblox](https://gienieczko.com/anyblox-paper) - A Framework for Self-Decoding Datasets
### Open Source Inspiration
- [Apache Arrow](https://arrow.apache.org)
- [Apache DataFusion](https://github.com/apache/datafusion)
- [parquet2](https://github.com/jorgecarleitao/parquet2) by Jorge Leitao
- [DuckDB](https://github.com/duckdb/duckdb)
- [Velox](https://github.com/facebookincubator/velox) & [Nimble](https://github.com/facebookincubator/nimble)
#### Thanks to all contributors who have shared their knowledge and code with the community! 🚀
Raw data
{
"_id": null,
"home_page": "https://github.com/spiraldb/vortex",
"name": "vortex-data",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "vortex",
"author": "Vortex Authors <hello@vortex.dev>",
"author_email": "Vortex Authors <hello@vortex.dev>",
"download_url": null,
"platform": null,
"description": "# \ud83c\udf2a\ufe0f Vortex\n\n[](https://github.com/vortex-data/vortex/actions)\n[](https://www.bestpractices.dev/projects/10567)\n[](https://docs.vortex.dev)\n[](https://codspeed.io/vortex-data/vortex)\n[](https://crates.io/crates/vortex)\n[](https://pypi.org/project/vortex-data/)\n[](https://central.sonatype.com/artifact/dev.vortex/vortex-spark)\n\n\ud83d\udcda [Documentation](https://docs.vortex.dev/) | \ud83d\udcca [Performance Benchmarks](https://bench.vortex.dev)\n\n## Overview\n\nVortex is a next-generation columnar file format and toolkit designed for high-performance data processing.\nIt is the fastest and most extensible format for building data systems backed by object storage. It provides:\n\n- **\u26a1\ufe0f Blazing Fast Performance**\n - 200x faster random access reads (vs. modern Apache Parquet)\n - 2-10x faster scans\n - 2-10x faster writes\n - Similar compression ratios\n - Efficient support for wide tables with zero-copy/zero-parse metadata\n\n- **\ud83d\udd27 Extensible Architecture**\n - Modeled after Apache DataFusion's extensible approach\n - Pluggable encoding system, type system, compression strategy, & layout strategy\n - Zero-copy compatibility with Apache Arrow\n\n- **\ud83d\uddf3\ufe0f Open Source, Neutral Governance**\n - A Linux Foundation (LF AI & Data) Project\n - Apache-2.0 Licensed\n\n- **\u2194\ufe0f Integrations**\n - Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more\n - Apache Iceberg (coming soon)\n\n> \ud83d\udfe2 **Development Status**: Library APIs may change from version to version, but we now consider\n> the file format <ins>*stable*</ins>. From release 0.36.0, all future releases of Vortex should\n> maintain backwards compatibility of the file format (i.e., be able to read files written by\n> any earlier version >= 0.36.0).\n\n## Key Features\n\n### Core Capabilities\n\n- \u2728 **Logical Types** - Clean separation between logical schema and physical layout\n- \ud83d\udd04 **Zero-Copy Arrow Integration** - Seamless conversion to/from Apache Arrow arrays\n- \ud83e\udde9 **Extensible Encodings** - Pluggable physical layouts with built-in optimizations\n- \ud83d\udce6 **Cascading Compression** - Support for nested encoding schemes\n- \ud83d\ude80 **High-Performance Computing** - Optimized compute kernels for encoded data\n- \ud83d\udcca **Rich Statistics** - Lazy-loaded summary statistics for optimization\n\n### Technical Architecture\n\n#### Logical vs Physical Design\n\nVortex strictly separates logical and physical concerns:\n\n- **Logical Layer**: Defines data types and schema\n- **Physical Layer**: Handles encoding and storage implementation\n- **Built-in Encodings**: Compatible with Apache Arrow's memory format\n- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)\n\n## Quick Start\n\n### Installation\n\n#### Rust Crate\n\nAll features are exported through the main `vortex` crate.\n\n```bash\ncargo add vortex\n```\n\n#### Python Package\n\n```bash\nuv add vortex-data\n```\n\n#### Command Line UI (vx)\n\nFor browsing the structure of Vortex files, you can use the `vx` command-line tool.\n\n```bash\n# Install latest release\ncargo install vortex-tui --locked\n\n# Or build from source\ncargo install --path vortex-tui --locked\n\n# Usage\nvx browse <file>\n```\n\n### Development Setup\n\n#### Prerequisites (macOS)\n\n```bash\n# Optional but recommended dependencies\nbrew install flatbuffers protobuf # For .fbs and .proto files\nbrew install duckdb # For benchmarks\n\n# Install Rust toolchain\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n# or\nbrew install rustup\n\n# Initialize submodules\ngit submodule update --init --recursive\n\n# Setup dependencies with uv\nuv sync --all-packages\n```\n\n### Performance Optimization\n\nFor optimal performance, we suggest using [MiMalloc](https://github.com/microsoft/mimalloc):\n\n```rust,ignore\n#[global_allocator]\nstatic GLOBAL_ALLOC: MiMalloc = MiMalloc;\n```\n\n## Project Information\n\n### License\n\nLicensed under the Apache License, Version 2.0.\n\n### Governance\n\nVortex is an independent open-source project and not controlled by any single company. The Vortex Project is a\nsub-project of the Linux Foundation Projects. The governance model is documented in\n[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of\nthe [Technical Charter](https://vortex.dev/charter.pdf).\n\n### Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Reporting Vulnerabilities\n\nIf you discovery a security vulnerability, please email <vuln-report@vortex.dev>.\n\n### Trademarks\n\nCopyright \u00a9 Vortex a Series of LF Projects, LLC.\nFor terms of use, trademark policy, and other project policies please see <https://lfprojects.org>\n\n## Acknowledgments \ud83c\udfc6\n\nThe Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.\n\n### Research in Vortex\n\n- [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) - Efficient columnar compression\n- [FastLanes](https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf) - High-performance integer compression\n- [FSST](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) - Fast random access string compression\n- [ALP](https://ir.cwi.nl/pub/33334/33334.pdf) - Adaptive lossless floating-point compression\n- [Procella](https://dl.acm.org/citation.cfm?id=3360438) - YouTube's unified data system\n- [Anyblob](https://www.durner.dev/app/media/papers/anyblob-vldb23.pdf) - High-performance\n access to object storage\n- [ClickHouse](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) - Fast analytics for everyone\n\n### Vortex in Research\n\n- [Anyblox](https://gienieczko.com/anyblox-paper) - A Framework for Self-Decoding Datasets\n\n### Open Source Inspiration\n\n- [Apache Arrow](https://arrow.apache.org)\n- [Apache DataFusion](https://github.com/apache/datafusion)\n- [parquet2](https://github.com/jorgecarleitao/parquet2) by Jorge Leitao\n- [DuckDB](https://github.com/duckdb/duckdb)\n- [Velox](https://github.com/facebookincubator/velox) & [Nimble](https://github.com/facebookincubator/nimble)\n\n#### Thanks to all contributors who have shared their knowledge and code with the community! \ud83d\ude80\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Python bindings for Vortex, an Apache Arrow-compatible toolkit for working with compressed array data.",
"version": "0.42.1",
"project_urls": {
"Benchmarks": "https://bench.vortex.dev",
"Changelog": "https://github.com/vortex-data/vortex/blob/develop/CHANGELOG.md",
"Documentation": "https://docs.vortex.dev",
"Homepage": "https://github.com/spiraldb/vortex",
"Issues": "https://github.com/vortex-data/vortex/issues"
},
"split_keywords": [
"vortex"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a96c21e684c278562e6dda78797f5fa2b2802a9f31c87494b963fee030d87a47",
"md5": "12ea6ca64eb8db05d83f39ed68a18189",
"sha256": "f7bfc1f301dd1c40dd4537683056835cc4e50bbec8ff22175e1ac64a636b49a4"
},
"downloads": -1,
"filename": "vortex_data-0.42.1-cp310-abi3-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "12ea6ca64eb8db05d83f39ed68a18189",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.11",
"size": 11020757,
"upload_time": "2025-07-24T18:45:33",
"upload_time_iso_8601": "2025-07-24T18:45:33.187654Z",
"url": "https://files.pythonhosted.org/packages/a9/6c/21e684c278562e6dda78797f5fa2b2802a9f31c87494b963fee030d87a47/vortex_data-0.42.1-cp310-abi3-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5d492d4dea664e801359cf1796739174b2d9c59be7ff3d50860a82570634ca23",
"md5": "84771fa336717a72387ed51571846195",
"sha256": "b62121c917d3dde013a79a696a8c87553e4c92a0dc5c3b0d0960a465cd88d337"
},
"downloads": -1,
"filename": "vortex_data-0.42.1-cp310-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "84771fa336717a72387ed51571846195",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.11",
"size": 10269363,
"upload_time": "2025-07-24T18:45:35",
"upload_time_iso_8601": "2025-07-24T18:45:35.609229Z",
"url": "https://files.pythonhosted.org/packages/5d/49/2d4dea664e801359cf1796739174b2d9c59be7ff3d50860a82570634ca23/vortex_data-0.42.1-cp310-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4083d9637880afb65db35ceaca5a69f71a46181fa27020a0c2173657bb50c27f",
"md5": "cfacf7b1a55bed009aa6c0d98b8dc96e",
"sha256": "138b86999ae02c9f87a46fb84f435016ad69399204d8ea37d1cf029ff5844387"
},
"downloads": -1,
"filename": "vortex_data-0.42.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "cfacf7b1a55bed009aa6c0d98b8dc96e",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.11",
"size": 9448432,
"upload_time": "2025-07-24T18:45:38",
"upload_time_iso_8601": "2025-07-24T18:45:38.184337Z",
"url": "https://files.pythonhosted.org/packages/40/83/d9637880afb65db35ceaca5a69f71a46181fa27020a0c2173657bb50c27f/vortex_data-0.42.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0c59afa1d3c4d70ee6dba3b2258d60342aaeb22da615a1d41a2efa0ff81b4611",
"md5": "d5344664883a78275670fb96f18c394d",
"sha256": "b6a2471ad7043de9ed19fe556ceb68b526561102965d1a9365706434767a130a"
},
"downloads": -1,
"filename": "vortex_data-0.42.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "d5344664883a78275670fb96f18c394d",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.11",
"size": 10177627,
"upload_time": "2025-07-24T18:45:40",
"upload_time_iso_8601": "2025-07-24T18:45:40.454331Z",
"url": "https://files.pythonhosted.org/packages/0c/59/afa1d3c4d70ee6dba3b2258d60342aaeb22da615a1d41a2efa0ff81b4611/vortex_data-0.42.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 18:45:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "spiraldb",
"github_project": "vortex",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vortex-data"
}