veloxx


Nameveloxx JSON
Version 0.2.4 PyPI version JSON
download
home_pagehttps://github.com/Conqxeror/veloxx
SummaryVeloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Featuring DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.
upload_time2025-07-09 19:18:04
maintainerNone
docs_urlNone
authorWali Mohammad Kadri
requires_pythonNone
licenseMIT
keywords dataframe data-processing analytics high-performance lightweight
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # <img src="docs/veloxx_logo.png" alt="Veloxx Logo" height="70px"> Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library

[![crates.io](https://img.shields.io/crates/v/veloxx.svg)](https://crates.io/crates/veloxx)

> **New in 0.2.1:** Major performance improvements across all core operations. See CHANGELOG for details.

Veloxx is a new Rust library designed for highly performant and **extremely lightweight** in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.

## Core Principles & Design Goals

- **Extreme Lightweighting:** Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.
- **Performance First:** Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.
- **Safety & Reliability:** Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.
- **Ergonomics & Idiomatic Rust API:** Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.
- **Composability & Extensibility:** Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.

## Key Features

### Core Data Structures

- **DataFrame:** A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String, DateTime). Efficient storage and handling of missing values.
- **Series (or Column):** A single-typed, named column of data within a DataFrame, providing type-specific operations.

### Data Ingestion & Loading

- **From `Vec<Vec<T>>` / Iterator:** Basic in-memory construction from Rust native collections.
- **CSV Support:** Minimalistic, highly efficient CSV parser for loading data.
- **JSON Support:** Efficient parsing for common JSON structures.
- **Custom Data Sources:** Traits/interfaces for users to implement their own data loading mechanisms.

### Data Cleaning & Preparation

- `drop_nulls()`: Remove rows with any null values.
- `fill_nulls(value)`: Fill nulls with a specified value (type-aware, including DateTime).
- `interpolate_nulls()`: Basic linear interpolation for numeric and DateTime series.
- **Type Casting:** Efficient conversion between compatible data types for Series (e.g., i32 to f64).
- `rename_column(old_name, new_name)`: Rename columns.

### Data Transformation & Manipulation

- **Selection:** `select_columns(names)`, `drop_columns(names)`.
- **Filtering:** Predicate-based row selection using logical (`AND`, `OR`, `NOT`) and comparison operators (`==`, `!=`, `<`, `>`, `<=`, `>=`).
- **Projection:** `with_column(new_name, expression)`, `apply()` for user-defined functions.
- **Sorting:** Sort DataFrame by one or more columns (ascending/descending).
- **Joining:** Basic inner, left, and right join operations on common keys.
- **Concatenation/Append:** Combine DataFrames vertically.

### Aggregation & Reduction

- **Simple Aggregations:** `sum()`, `mean()`, `median()`, `min()`, `max()`, `count()`, `std_dev()`.
- **Group By:** Perform aggregations on groups defined by one or more columns.
- **Unique Values:** `unique()` for a Series or DataFrame columns.

### Basic Analytics & Statistics

- `describe()`: Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).
- `correlation()`: Calculate Pearson correlation between two numeric Series.
- `covariance()`: Calculate covariance.

### Output & Export

- **To `Vec<Vec<T>>`:** Export DataFrame content back to standard Rust collections.
- **To CSV:** Efficiently write DataFrame to a CSV file.
- **Display/Pretty Print:** User-friendly console output for DataFrame and Series.

## Installation

### Rust

Veloxx is available on [crates.io](https://crates.io/crates/veloxx).

Add the following to your `Cargo.toml` file:

```toml
[dependencies]
veloxx = "0.2.4" # Or the latest version
```

To build your Rust project with Veloxx, run:

```bash
cargo build
```

To run tests:

```bash
cargo test
```

## Usage Examples

### Rust Usage

Here's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:

```rust
use veloxx::dataframe::DataFrame;
use veloxx::series::Series;
use veloxx::types::{Value, DataType};
use veloxx::conditions::Condition;
use veloxx::expressions::Expr;
use std::collections::BTreeMap;

fn main() -> Result<(), String> {
    // 1. Create a DataFrame
    let mut columns = BTreeMap::new();
    columns.insert("name".to_string(), Series::new_string("name", vec![Some("Alice".to_string()), Some("Bob".to_string()), Some("Charlie".to_string()), Some("David".to_string())]));
    columns.insert("age".to_string(), Series::new_i32("age", vec![Some(25), Some(30), Some(22), Some(35)]));
    columns.insert("city".to_string(), Series::new_string("city", vec![Some("New York".to_string()), Some("London".to_string()), Some("New York".to_string()), Some("Paris".to_string())]));
    columns.insert("last_login".to_string(), Series::new_datetime("last_login", vec![Some(1678886400), Some(1678972800), Some(1679059200), Some(1679145600)]));

    let df = DataFrame::new(columns)?;
    println!("Original DataFrame:
{}", df);

    // 2. Filter data: age > 25 AND city == "New York"
    let condition = Condition::And(
        Box::new(Condition::Gt("age".to_string(), Value::I32(25))),
        Box::new(Condition::Eq("city".to_string(), Value::String("New York".to_string()))),
    );
    let filtered_df = df.filter(&condition)?;
    println!("
Filtered DataFrame (age > 25 AND city == \"New York\"):
{}", filtered_df);

    // 3. Add a new column: age_in_10_years = age + 10
    let expr_add_10 = Expr::Add(Box::new(Expr::Column("age".to_string())), Box::new(Expr::Literal(Value::I32(10))));
    let df_with_new_col = df.with_column("age_in_10_years", &expr_add_10)?;
    println!("
DataFrame with new column (age_in_10_years):
{}", df_with_new_col);

    // 4. Group by city and calculate average age and count of users
    let grouped_df = df.group_by(vec!["city".to_string()])?;
    let aggregated_df = grouped_df.agg(vec![("age", "mean"), ("name", "count")])?;
    println!("
Aggregated DataFrame (average age and user count by city):
{}", aggregated_df);

    // 5. Demonstrate DateTime filtering (users logged in after a specific date)
    let specific_date_timestamp = 1679000000; // Example timestamp
    let condition_dt = Condition::Gt("last_login".to_string(), Value::DateTime(specific_date_timestamp));
    let filtered_df_dt = df.filter(&condition_dt)?;
    println!("
Filtered DataFrame (users logged in after {}):
{}", specific_date_timestamp, filtered_df_dt);

    Ok(())
}
```

### Python Usage

```python
import veloxx

# 1. Create a DataFrame
df = veloxx.PyDataFrame({
    "name": veloxx.PySeries("name", ["Alice", "Bob", "Charlie", "David"]),
    "age": veloxx.PySeries("age", [25, 30, 22, 35]),
    "city": veloxx.PySeries("city", ["New York", "London", "New York", "Paris"]),
})
print("Original DataFrame:")
print(df)

# 2. Filter data: age > 25
filtered_df = df.filter([i for i, age in enumerate(df.get_column("age").to_vec_f64()) if age > 25])
print("\nFiltered DataFrame (age > 25):")
print(filtered_df)

# 3. Select columns
selected_df = df.select_columns(["name", "city"])
print("\nSelected Columns (name, city):")
print(selected_df)

# 4. Rename a column
renamed_df = df.rename_column("age", "years")
print("\nRenamed Column (age to years):")
print(renamed_df)

# 5. Series operations
age_series = df.get_column("age")
print(f"\nAge Series Sum: {age_series.sum()}")
print(f"Age Series Mean: {age_series.mean()}")
print(f"Age Series Max: {age_series.max()}")
print(f"Age Series Unique: {age_series.unique().to_vec_f64()}")
```

### WebAssembly Usage (Node.js)

```javascript
const veloxx = require('veloxx');

async function runWasmExample() {
    // 1. Create a DataFrame
    const df = new veloxx.WasmDataFrame({
        name: ["Alice", "Bob", "Charlie", "David"],
        age: [25, 30, 22, 35],
        city: ["New York", "London", "New York", "Paris"],
    });
    console.log("Original DataFrame:");
    console.log(df);

    // 2. Filter data: age > 25
    const ageSeries = df.getColumn("age");
    const filteredIndices = [];
    for (let i = 0; i < ageSeries.len; i++) {
        if (ageSeries.getValue(i) > 25) {
            filteredIndices.push(i);
        }
    }
    const filteredDf = df.filter(new Uint32Array(filteredIndices));
    console.log("\nFiltered DataFrame (age > 25):");
    console.log(filteredDf);

    // 3. Series operations
    console.log(`\nAge Series Sum: ${ageSeries.sum()}`);
    console.log(`Age Series Mean: ${ageSeries.mean()}`);
    console.log(`Age Series Unique: ${ageSeries.unique().toVecF64()}`);
}

runWasmExample();
```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Conqxeror/veloxx",
    "name": "veloxx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "dataframe, data-processing, analytics, high-performance, lightweight",
    "author": "Wali Mohammad Kadri",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# <img src=\"docs/veloxx_logo.png\" alt=\"Veloxx Logo\" height=\"70px\"> Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library\n\n[![crates.io](https://img.shields.io/crates/v/veloxx.svg)](https://crates.io/crates/veloxx)\n\n> **New in 0.2.1:** Major performance improvements across all core operations. See CHANGELOG for details.\n\nVeloxx is a new Rust library designed for highly performant and **extremely lightweight** in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.\n\n## Core Principles & Design Goals\n\n- **Extreme Lightweighting:** Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.\n- **Performance First:** Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.\n- **Safety & Reliability:** Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.\n- **Ergonomics & Idiomatic Rust API:** Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.\n- **Composability & Extensibility:** Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.\n\n## Key Features\n\n### Core Data Structures\n\n- **DataFrame:** A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String, DateTime). Efficient storage and handling of missing values.\n- **Series (or Column):** A single-typed, named column of data within a DataFrame, providing type-specific operations.\n\n### Data Ingestion & Loading\n\n- **From `Vec<Vec<T>>` / Iterator:** Basic in-memory construction from Rust native collections.\n- **CSV Support:** Minimalistic, highly efficient CSV parser for loading data.\n- **JSON Support:** Efficient parsing for common JSON structures.\n- **Custom Data Sources:** Traits/interfaces for users to implement their own data loading mechanisms.\n\n### Data Cleaning & Preparation\n\n- `drop_nulls()`: Remove rows with any null values.\n- `fill_nulls(value)`: Fill nulls with a specified value (type-aware, including DateTime).\n- `interpolate_nulls()`: Basic linear interpolation for numeric and DateTime series.\n- **Type Casting:** Efficient conversion between compatible data types for Series (e.g., i32 to f64).\n- `rename_column(old_name, new_name)`: Rename columns.\n\n### Data Transformation & Manipulation\n\n- **Selection:** `select_columns(names)`, `drop_columns(names)`.\n- **Filtering:** Predicate-based row selection using logical (`AND`, `OR`, `NOT`) and comparison operators (`==`, `!=`, `<`, `>`, `<=`, `>=`).\n- **Projection:** `with_column(new_name, expression)`, `apply()` for user-defined functions.\n- **Sorting:** Sort DataFrame by one or more columns (ascending/descending).\n- **Joining:** Basic inner, left, and right join operations on common keys.\n- **Concatenation/Append:** Combine DataFrames vertically.\n\n### Aggregation & Reduction\n\n- **Simple Aggregations:** `sum()`, `mean()`, `median()`, `min()`, `max()`, `count()`, `std_dev()`.\n- **Group By:** Perform aggregations on groups defined by one or more columns.\n- **Unique Values:** `unique()` for a Series or DataFrame columns.\n\n### Basic Analytics & Statistics\n\n- `describe()`: Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).\n- `correlation()`: Calculate Pearson correlation between two numeric Series.\n- `covariance()`: Calculate covariance.\n\n### Output & Export\n\n- **To `Vec<Vec<T>>`:** Export DataFrame content back to standard Rust collections.\n- **To CSV:** Efficiently write DataFrame to a CSV file.\n- **Display/Pretty Print:** User-friendly console output for DataFrame and Series.\n\n## Installation\n\n### Rust\n\nVeloxx is available on [crates.io](https://crates.io/crates/veloxx).\n\nAdd the following to your `Cargo.toml` file:\n\n```toml\n[dependencies]\nveloxx = \"0.2.4\" # Or the latest version\n```\n\nTo build your Rust project with Veloxx, run:\n\n```bash\ncargo build\n```\n\nTo run tests:\n\n```bash\ncargo test\n```\n\n## Usage Examples\n\n### Rust Usage\n\nHere's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:\n\n```rust\nuse veloxx::dataframe::DataFrame;\nuse veloxx::series::Series;\nuse veloxx::types::{Value, DataType};\nuse veloxx::conditions::Condition;\nuse veloxx::expressions::Expr;\nuse std::collections::BTreeMap;\n\nfn main() -> Result<(), String> {\n    // 1. Create a DataFrame\n    let mut columns = BTreeMap::new();\n    columns.insert(\"name\".to_string(), Series::new_string(\"name\", vec![Some(\"Alice\".to_string()), Some(\"Bob\".to_string()), Some(\"Charlie\".to_string()), Some(\"David\".to_string())]));\n    columns.insert(\"age\".to_string(), Series::new_i32(\"age\", vec![Some(25), Some(30), Some(22), Some(35)]));\n    columns.insert(\"city\".to_string(), Series::new_string(\"city\", vec![Some(\"New York\".to_string()), Some(\"London\".to_string()), Some(\"New York\".to_string()), Some(\"Paris\".to_string())]));\n    columns.insert(\"last_login\".to_string(), Series::new_datetime(\"last_login\", vec![Some(1678886400), Some(1678972800), Some(1679059200), Some(1679145600)]));\n\n    let df = DataFrame::new(columns)?;\n    println!(\"Original DataFrame:\n{}\", df);\n\n    // 2. Filter data: age > 25 AND city == \"New York\"\n    let condition = Condition::And(\n        Box::new(Condition::Gt(\"age\".to_string(), Value::I32(25))),\n        Box::new(Condition::Eq(\"city\".to_string(), Value::String(\"New York\".to_string()))),\n    );\n    let filtered_df = df.filter(&condition)?;\n    println!(\"\nFiltered DataFrame (age > 25 AND city == \\\"New York\\\"):\n{}\", filtered_df);\n\n    // 3. Add a new column: age_in_10_years = age + 10\n    let expr_add_10 = Expr::Add(Box::new(Expr::Column(\"age\".to_string())), Box::new(Expr::Literal(Value::I32(10))));\n    let df_with_new_col = df.with_column(\"age_in_10_years\", &expr_add_10)?;\n    println!(\"\nDataFrame with new column (age_in_10_years):\n{}\", df_with_new_col);\n\n    // 4. Group by city and calculate average age and count of users\n    let grouped_df = df.group_by(vec![\"city\".to_string()])?;\n    let aggregated_df = grouped_df.agg(vec![(\"age\", \"mean\"), (\"name\", \"count\")])?;\n    println!(\"\nAggregated DataFrame (average age and user count by city):\n{}\", aggregated_df);\n\n    // 5. Demonstrate DateTime filtering (users logged in after a specific date)\n    let specific_date_timestamp = 1679000000; // Example timestamp\n    let condition_dt = Condition::Gt(\"last_login\".to_string(), Value::DateTime(specific_date_timestamp));\n    let filtered_df_dt = df.filter(&condition_dt)?;\n    println!(\"\nFiltered DataFrame (users logged in after {}):\n{}\", specific_date_timestamp, filtered_df_dt);\n\n    Ok(())\n}\n```\n\n### Python Usage\n\n```python\nimport veloxx\n\n# 1. Create a DataFrame\ndf = veloxx.PyDataFrame({\n    \"name\": veloxx.PySeries(\"name\", [\"Alice\", \"Bob\", \"Charlie\", \"David\"]),\n    \"age\": veloxx.PySeries(\"age\", [25, 30, 22, 35]),\n    \"city\": veloxx.PySeries(\"city\", [\"New York\", \"London\", \"New York\", \"Paris\"]),\n})\nprint(\"Original DataFrame:\")\nprint(df)\n\n# 2. Filter data: age > 25\nfiltered_df = df.filter([i for i, age in enumerate(df.get_column(\"age\").to_vec_f64()) if age > 25])\nprint(\"\\nFiltered DataFrame (age > 25):\")\nprint(filtered_df)\n\n# 3. Select columns\nselected_df = df.select_columns([\"name\", \"city\"])\nprint(\"\\nSelected Columns (name, city):\")\nprint(selected_df)\n\n# 4. Rename a column\nrenamed_df = df.rename_column(\"age\", \"years\")\nprint(\"\\nRenamed Column (age to years):\")\nprint(renamed_df)\n\n# 5. Series operations\nage_series = df.get_column(\"age\")\nprint(f\"\\nAge Series Sum: {age_series.sum()}\")\nprint(f\"Age Series Mean: {age_series.mean()}\")\nprint(f\"Age Series Max: {age_series.max()}\")\nprint(f\"Age Series Unique: {age_series.unique().to_vec_f64()}\")\n```\n\n### WebAssembly Usage (Node.js)\n\n```javascript\nconst veloxx = require('veloxx');\n\nasync function runWasmExample() {\n    // 1. Create a DataFrame\n    const df = new veloxx.WasmDataFrame({\n        name: [\"Alice\", \"Bob\", \"Charlie\", \"David\"],\n        age: [25, 30, 22, 35],\n        city: [\"New York\", \"London\", \"New York\", \"Paris\"],\n    });\n    console.log(\"Original DataFrame:\");\n    console.log(df);\n\n    // 2. Filter data: age > 25\n    const ageSeries = df.getColumn(\"age\");\n    const filteredIndices = [];\n    for (let i = 0; i < ageSeries.len; i++) {\n        if (ageSeries.getValue(i) > 25) {\n            filteredIndices.push(i);\n        }\n    }\n    const filteredDf = df.filter(new Uint32Array(filteredIndices));\n    console.log(\"\\nFiltered DataFrame (age > 25):\");\n    console.log(filteredDf);\n\n    // 3. Series operations\n    console.log(`\\nAge Series Sum: ${ageSeries.sum()}`);\n    console.log(`Age Series Mean: ${ageSeries.mean()}`);\n    console.log(`Age Series Unique: ${ageSeries.unique().toVecF64()}`);\n}\n\nrunWasmExample();\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Featuring DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.",
    "version": "0.2.4",
    "project_urls": {
        "Homepage": "https://github.com/Conqxeror/veloxx",
        "Source Code": "https://github.com/Conqxeror/veloxx"
    },
    "split_keywords": [
        "dataframe",
        " data-processing",
        " analytics",
        " high-performance",
        " lightweight"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "04c6f4dbf281ff82bc5c9dda03cb9f6590862bb8bc3d68b67c44801a25e61d57",
                "md5": "1a28d20d6fa73633258912d3fe746e65",
                "sha256": "0d1428c7baa09774aa8595613d5bf1b073d447e5aef265924bdebcc75171437e"
            },
            "downloads": -1,
            "filename": "veloxx-0.2.4-py3-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "1a28d20d6fa73633258912d3fe746e65",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 62824,
            "upload_time": "2025-07-09T19:18:04",
            "upload_time_iso_8601": "2025-07-09T19:18:04.725667Z",
            "url": "https://files.pythonhosted.org/packages/04/c6/f4dbf281ff82bc5c9dda03cb9f6590862bb8bc3d68b67c44801a25e61d57/veloxx-0.2.4-py3-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-09 19:18:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Conqxeror",
    "github_project": "veloxx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "veloxx"
}
        
Elapsed time: 1.00248s