petepak


Namepetepak JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://pypi.org/project/petepak/
SummarySQL-like data manipulation for Python lists of dictionaries
upload_time2025-10-23 19:54:41
maintainerNone
docs_urlNone
authorPeter Bernard
requires_python>=3.7
licenseMIT
keywords sql data manipulation pandas alternative list dictionaries
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Petepak

**SQL-like data manipulation for Python lists of dictionaries**

Petepak provides a comprehensive set of functions for data manipulation that mimics SQL operations but works directly on Python lists of dictionaries. It's a lightweight alternative to pandas for simple data operations.

## Features

- 🔍 **SQL-like operations**: select, filter, join, group_by, order_by
- 🔗 **Multiple join types**: inner, left, right, outer joins  
- 📁 **CSV I/O**: read_csv, write_csv with schema support
- 🔄 **Data transformation**: rename, transform, distinct
- 📊 **Sorting algorithms**: bubble, merge, quick sort
- 🛡️ **Safe expressions**: Secure string-to-lambda conversion
- ⚡ **Lightweight**: No heavy dependencies like pandas

## Installation

### From PyPI (when published)

```bash
pip install petepak
```


## Quick Start

```python
from petepak import select, filter, join, group_by, order_by

# Sample data
users = [
    {'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'},
    {'id': 2, 'name': 'Bob', 'age': 30, 'city': 'Boston'},
    {'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'New York'}
]

orders = [
    {'user_id': 1, 'product': 'Laptop', 'amount': 1200},
    {'user_id': 2, 'product': 'Mouse', 'amount': 25},
    {'user_id': 1, 'product': 'Monitor', 'amount': 300}
]

# Filter users by age
young_users = filter(users, "a.age < 30")
print(young_users)  # [{'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'}]

# Select specific columns
names = select(users, ['name', 'city'])
print(names)  # [{'name': 'Alice', 'city': 'New York'}, ...]

# Join users with orders
user_orders = join(users, orders, "a.id == b.user_id", join_type="inner")
print(user_orders)  # Combined data with prefixed columns

# Group by city
by_city = group_by(users, 'city')
print(by_city)  # [[users from New York], [users from Boston]]

# Sort by age
sorted_users = order_by(users, 'age', reverse=True)
print(sorted_users)  # Users sorted by age descending
```

## Core Operations

### Filtering Data

```python
from petepak import filter

data = [{'score': 90}, {'score': 75}, {'score': 85}]

# Using expression strings (SQL-like)
high_scores = filter(data, "a.score >= 80")
print(high_scores)  # [{'score': 90}, {'score': 85}]

# Using lambda functions (Python-like)
high_scores = filter(data, lambda row: row.get('score', 0) >= 80)
print(high_scores)  # [{'score': 90}, {'score': 85}]
```

### Joining Data

```python
from petepak import join

users = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
orders = [{'user_id': 1, 'product': 'Laptop'}, {'user_id': 2, 'product': 'Mouse'}]

# Inner join
result = join(users, orders, "a.id == b.user_id", join_type="inner")
print(result)
# [{'user_id': 1, 'user_name': 'Alice', 'order_user_id': 1, 'order_product': 'Laptop'}, ...]

# Left join
result = join(users, orders, "a.id == b.user_id", join_type="left")
# Includes all users, even those without orders
```

### CSV Operations

```python
from petepak import read_csv, write_csv

# Read CSV with schema
data = read_csv('users.csv', schema={'id': int, 'age': int, 'score': float})

# Write to CSV
write_csv(data, 'output.csv')
```

### Data Transformation

```python
from petepak import transform, rename, distinct

# Add computed columns
data = [{'name': 'Alice', 'score': 90}]
result = transform(data, 'grade', lambda row: 'A' if row['score'] >= 90 else 'B')
print(result)  # [{'name': 'Alice', 'score': 90, 'grade': 'A'}]

# Rename columns
renamed = rename(data, {'name': 'full_name'})
print(renamed)  # [{'full_name': 'Alice', 'score': 90}]

# Remove duplicates
unique = distinct(data, 'name')
```

## Advanced Examples

### E-commerce Analysis

```python
from petepak import *

# Load data
customers = read_csv('customers.csv', schema={'id': int, 'age': int})
orders = read_csv('orders.csv', schema={'customer_id': int, 'amount': float})

# Join customers with orders
customer_orders = join(customers, orders, "a.id == b.customer_id", 
                      join_type="left", list1_name="customer", list2_name="order")

# Filter high-value customers
high_value = filter(customer_orders, "a.order_amount > 100")

# Group by age ranges
age_groups = group_by(high_value, lambda row: (row['customer_age'] // 10) * 10)

# Display results
display_grouped(age_groups, 'age_range')
```

### Data Processing Pipeline

```python
from petepak import *

# 1. Load and clean data
raw_data = read_csv('sales.csv', schema={'amount': float, 'date': str})
clean_data = filter(raw_data, "a.amount > 0")

# 2. Transform data
processed = transform(clean_data, {
    'month': lambda row: row['date'].split('-')[1],
    'category': lambda row: 'high' if row['amount'] > 1000 else 'low'
})

# 3. Aggregate by month
monthly = group_by(processed, 'month')

# 4. Calculate totals
totals = []
for group in monthly:
    total = sum(row['amount'] for row in group)
    totals.append({'month': group[0]['month'], 'total': total})

# 5. Sort and display
final = order_by(totals, 'total', reverse=True)
display(final)
```

## API Reference

### Core Functions

- `select(rows, columns)` - Select specific columns
- `filter(rows, predicate)` - Filter rows using expressions or functions
- `join(list1, list2, expr, join_type)` - Join two datasets
- `group_by(rows, keys)` - Group rows by key values
- `order_by(rows, keys, reverse)` - Sort rows
- `distinct(rows, keys)` - Remove duplicate rows

### I/O Functions

- `read_csv(file_path, schema=None)` - Read CSV files
- `write_csv(rows, file_path)` - Write CSV files
- `display(rows)` - Pretty print data
- `display_grouped(groups, keys)` - Display grouped data

### Sorting Algorithms

- `bubble_sort(data, key, reverse)` - Bubble sort implementation
- `merge_sort(data, key, reverse)` - Merge sort implementation  
- `quick_sort(data, key, reverse)` - Quick sort implementation

## Development


### Running tests

```bash
pytest
```

### Code formatting

```bash
black petepak tests
```

### Linting

```bash
flake8 petepak tests
```


## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

### 0.1.0 (2024-01-01)

- Initial release
- SQL-like data manipulation functions
- CSV I/O with schema support
- Multiple sorting algorithms
- Comprehensive test suite (137 tests)
- 78% code coverage

            

Raw data

            {
    "_id": null,
    "home_page": "https://pypi.org/project/petepak/",
    "name": "petepak",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "sql, data, manipulation, pandas, alternative, list, dictionaries",
    "author": "Peter Bernard",
    "author_email": "Peter Bernard <peter.a.bernard1@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ff/95/c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e/petepak-0.1.0.tar.gz",
    "platform": null,
    "description": "# Petepak\r\n\r\n**SQL-like data manipulation for Python lists of dictionaries**\r\n\r\nPetepak provides a comprehensive set of functions for data manipulation that mimics SQL operations but works directly on Python lists of dictionaries. It's a lightweight alternative to pandas for simple data operations.\r\n\r\n## Features\r\n\r\n- \ud83d\udd0d **SQL-like operations**: select, filter, join, group_by, order_by\r\n- \ud83d\udd17 **Multiple join types**: inner, left, right, outer joins  \r\n- \ud83d\udcc1 **CSV I/O**: read_csv, write_csv with schema support\r\n- \ud83d\udd04 **Data transformation**: rename, transform, distinct\r\n- \ud83d\udcca **Sorting algorithms**: bubble, merge, quick sort\r\n- \ud83d\udee1\ufe0f **Safe expressions**: Secure string-to-lambda conversion\r\n- \u26a1 **Lightweight**: No heavy dependencies like pandas\r\n\r\n## Installation\r\n\r\n### From PyPI (when published)\r\n\r\n```bash\r\npip install petepak\r\n```\r\n\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom petepak import select, filter, join, group_by, order_by\r\n\r\n# Sample data\r\nusers = [\r\n    {'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'},\r\n    {'id': 2, 'name': 'Bob', 'age': 30, 'city': 'Boston'},\r\n    {'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'New York'}\r\n]\r\n\r\norders = [\r\n    {'user_id': 1, 'product': 'Laptop', 'amount': 1200},\r\n    {'user_id': 2, 'product': 'Mouse', 'amount': 25},\r\n    {'user_id': 1, 'product': 'Monitor', 'amount': 300}\r\n]\r\n\r\n# Filter users by age\r\nyoung_users = filter(users, \"a.age < 30\")\r\nprint(young_users)  # [{'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'}]\r\n\r\n# Select specific columns\r\nnames = select(users, ['name', 'city'])\r\nprint(names)  # [{'name': 'Alice', 'city': 'New York'}, ...]\r\n\r\n# Join users with orders\r\nuser_orders = join(users, orders, \"a.id == b.user_id\", join_type=\"inner\")\r\nprint(user_orders)  # Combined data with prefixed columns\r\n\r\n# Group by city\r\nby_city = group_by(users, 'city')\r\nprint(by_city)  # [[users from New York], [users from Boston]]\r\n\r\n# Sort by age\r\nsorted_users = order_by(users, 'age', reverse=True)\r\nprint(sorted_users)  # Users sorted by age descending\r\n```\r\n\r\n## Core Operations\r\n\r\n### Filtering Data\r\n\r\n```python\r\nfrom petepak import filter\r\n\r\ndata = [{'score': 90}, {'score': 75}, {'score': 85}]\r\n\r\n# Using expression strings (SQL-like)\r\nhigh_scores = filter(data, \"a.score >= 80\")\r\nprint(high_scores)  # [{'score': 90}, {'score': 85}]\r\n\r\n# Using lambda functions (Python-like)\r\nhigh_scores = filter(data, lambda row: row.get('score', 0) >= 80)\r\nprint(high_scores)  # [{'score': 90}, {'score': 85}]\r\n```\r\n\r\n### Joining Data\r\n\r\n```python\r\nfrom petepak import join\r\n\r\nusers = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]\r\norders = [{'user_id': 1, 'product': 'Laptop'}, {'user_id': 2, 'product': 'Mouse'}]\r\n\r\n# Inner join\r\nresult = join(users, orders, \"a.id == b.user_id\", join_type=\"inner\")\r\nprint(result)\r\n# [{'user_id': 1, 'user_name': 'Alice', 'order_user_id': 1, 'order_product': 'Laptop'}, ...]\r\n\r\n# Left join\r\nresult = join(users, orders, \"a.id == b.user_id\", join_type=\"left\")\r\n# Includes all users, even those without orders\r\n```\r\n\r\n### CSV Operations\r\n\r\n```python\r\nfrom petepak import read_csv, write_csv\r\n\r\n# Read CSV with schema\r\ndata = read_csv('users.csv', schema={'id': int, 'age': int, 'score': float})\r\n\r\n# Write to CSV\r\nwrite_csv(data, 'output.csv')\r\n```\r\n\r\n### Data Transformation\r\n\r\n```python\r\nfrom petepak import transform, rename, distinct\r\n\r\n# Add computed columns\r\ndata = [{'name': 'Alice', 'score': 90}]\r\nresult = transform(data, 'grade', lambda row: 'A' if row['score'] >= 90 else 'B')\r\nprint(result)  # [{'name': 'Alice', 'score': 90, 'grade': 'A'}]\r\n\r\n# Rename columns\r\nrenamed = rename(data, {'name': 'full_name'})\r\nprint(renamed)  # [{'full_name': 'Alice', 'score': 90}]\r\n\r\n# Remove duplicates\r\nunique = distinct(data, 'name')\r\n```\r\n\r\n## Advanced Examples\r\n\r\n### E-commerce Analysis\r\n\r\n```python\r\nfrom petepak import *\r\n\r\n# Load data\r\ncustomers = read_csv('customers.csv', schema={'id': int, 'age': int})\r\norders = read_csv('orders.csv', schema={'customer_id': int, 'amount': float})\r\n\r\n# Join customers with orders\r\ncustomer_orders = join(customers, orders, \"a.id == b.customer_id\", \r\n                      join_type=\"left\", list1_name=\"customer\", list2_name=\"order\")\r\n\r\n# Filter high-value customers\r\nhigh_value = filter(customer_orders, \"a.order_amount > 100\")\r\n\r\n# Group by age ranges\r\nage_groups = group_by(high_value, lambda row: (row['customer_age'] // 10) * 10)\r\n\r\n# Display results\r\ndisplay_grouped(age_groups, 'age_range')\r\n```\r\n\r\n### Data Processing Pipeline\r\n\r\n```python\r\nfrom petepak import *\r\n\r\n# 1. Load and clean data\r\nraw_data = read_csv('sales.csv', schema={'amount': float, 'date': str})\r\nclean_data = filter(raw_data, \"a.amount > 0\")\r\n\r\n# 2. Transform data\r\nprocessed = transform(clean_data, {\r\n    'month': lambda row: row['date'].split('-')[1],\r\n    'category': lambda row: 'high' if row['amount'] > 1000 else 'low'\r\n})\r\n\r\n# 3. Aggregate by month\r\nmonthly = group_by(processed, 'month')\r\n\r\n# 4. Calculate totals\r\ntotals = []\r\nfor group in monthly:\r\n    total = sum(row['amount'] for row in group)\r\n    totals.append({'month': group[0]['month'], 'total': total})\r\n\r\n# 5. Sort and display\r\nfinal = order_by(totals, 'total', reverse=True)\r\ndisplay(final)\r\n```\r\n\r\n## API Reference\r\n\r\n### Core Functions\r\n\r\n- `select(rows, columns)` - Select specific columns\r\n- `filter(rows, predicate)` - Filter rows using expressions or functions\r\n- `join(list1, list2, expr, join_type)` - Join two datasets\r\n- `group_by(rows, keys)` - Group rows by key values\r\n- `order_by(rows, keys, reverse)` - Sort rows\r\n- `distinct(rows, keys)` - Remove duplicate rows\r\n\r\n### I/O Functions\r\n\r\n- `read_csv(file_path, schema=None)` - Read CSV files\r\n- `write_csv(rows, file_path)` - Write CSV files\r\n- `display(rows)` - Pretty print data\r\n- `display_grouped(groups, keys)` - Display grouped data\r\n\r\n### Sorting Algorithms\r\n\r\n- `bubble_sort(data, key, reverse)` - Bubble sort implementation\r\n- `merge_sort(data, key, reverse)` - Merge sort implementation  \r\n- `quick_sort(data, key, reverse)` - Quick sort implementation\r\n\r\n## Development\r\n\r\n\r\n### Running tests\r\n\r\n```bash\r\npytest\r\n```\r\n\r\n### Code formatting\r\n\r\n```bash\r\nblack petepak tests\r\n```\r\n\r\n### Linting\r\n\r\n```bash\r\nflake8 petepak tests\r\n```\r\n\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Changelog\r\n\r\n### 0.1.0 (2024-01-01)\r\n\r\n- Initial release\r\n- SQL-like data manipulation functions\r\n- CSV I/O with schema support\r\n- Multiple sorting algorithms\r\n- Comprehensive test suite (137 tests)\r\n- 78% code coverage\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SQL-like data manipulation for Python lists of dictionaries",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://pypi.org/project/petepak/"
    },
    "split_keywords": [
        "sql",
        " data",
        " manipulation",
        " pandas",
        " alternative",
        " list",
        " dictionaries"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2558ac223527c9e9b622b6191d99b6ecbe08a08973183f798b5713286713cdb3",
                "md5": "d3c63aebe1ebc1035ebbf004ddb7186a",
                "sha256": "952fc056c530e06cfeb2f5d9faeffc00940ed5b0b2bba142a5fc4f91aed0837c"
            },
            "downloads": -1,
            "filename": "petepak-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d3c63aebe1ebc1035ebbf004ddb7186a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 13956,
            "upload_time": "2025-10-23T19:54:39",
            "upload_time_iso_8601": "2025-10-23T19:54:39.973061Z",
            "url": "https://files.pythonhosted.org/packages/25/58/ac223527c9e9b622b6191d99b6ecbe08a08973183f798b5713286713cdb3/petepak-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ff95c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e",
                "md5": "7a453ce46653f326e5eadfd702cfd5d4",
                "sha256": "2b38cb7a9c5226e5e30a5a085ca3f34ee299f207edf02ea3c42b8b5e1756afbc"
            },
            "downloads": -1,
            "filename": "petepak-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7a453ce46653f326e5eadfd702cfd5d4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 21905,
            "upload_time": "2025-10-23T19:54:41",
            "upload_time_iso_8601": "2025-10-23T19:54:41.411326Z",
            "url": "https://files.pythonhosted.org/packages/ff/95/c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e/petepak-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-23 19:54:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "petepak"
}
        
Elapsed time: 2.18371s