# Petepak
**SQL-like data manipulation for Python lists of dictionaries**
Petepak provides a comprehensive set of functions for data manipulation that mimics SQL operations but works directly on Python lists of dictionaries. It's a lightweight alternative to pandas for simple data operations.
## Features
- 🔍 **SQL-like operations**: select, filter, join, group_by, order_by
- 🔗 **Multiple join types**: inner, left, right, outer joins
- 📁 **CSV I/O**: read_csv, write_csv with schema support
- 🔄 **Data transformation**: rename, transform, distinct
- 📊 **Sorting algorithms**: bubble, merge, quick sort
- 🛡️ **Safe expressions**: Secure string-to-lambda conversion
- ⚡ **Lightweight**: No heavy dependencies like pandas
## Installation
### From PyPI (when published)
```bash
pip install petepak
```
## Quick Start
```python
from petepak import select, filter, join, group_by, order_by
# Sample data
users = [
{'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'},
{'id': 2, 'name': 'Bob', 'age': 30, 'city': 'Boston'},
{'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'New York'}
]
orders = [
{'user_id': 1, 'product': 'Laptop', 'amount': 1200},
{'user_id': 2, 'product': 'Mouse', 'amount': 25},
{'user_id': 1, 'product': 'Monitor', 'amount': 300}
]
# Filter users by age
young_users = filter(users, "a.age < 30")
print(young_users) # [{'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'}]
# Select specific columns
names = select(users, ['name', 'city'])
print(names) # [{'name': 'Alice', 'city': 'New York'}, ...]
# Join users with orders
user_orders = join(users, orders, "a.id == b.user_id", join_type="inner")
print(user_orders) # Combined data with prefixed columns
# Group by city
by_city = group_by(users, 'city')
print(by_city) # [[users from New York], [users from Boston]]
# Sort by age
sorted_users = order_by(users, 'age', reverse=True)
print(sorted_users) # Users sorted by age descending
```
## Core Operations
### Filtering Data
```python
from petepak import filter
data = [{'score': 90}, {'score': 75}, {'score': 85}]
# Using expression strings (SQL-like)
high_scores = filter(data, "a.score >= 80")
print(high_scores) # [{'score': 90}, {'score': 85}]
# Using lambda functions (Python-like)
high_scores = filter(data, lambda row: row.get('score', 0) >= 80)
print(high_scores) # [{'score': 90}, {'score': 85}]
```
### Joining Data
```python
from petepak import join
users = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
orders = [{'user_id': 1, 'product': 'Laptop'}, {'user_id': 2, 'product': 'Mouse'}]
# Inner join
result = join(users, orders, "a.id == b.user_id", join_type="inner")
print(result)
# [{'user_id': 1, 'user_name': 'Alice', 'order_user_id': 1, 'order_product': 'Laptop'}, ...]
# Left join
result = join(users, orders, "a.id == b.user_id", join_type="left")
# Includes all users, even those without orders
```
### CSV Operations
```python
from petepak import read_csv, write_csv
# Read CSV with schema
data = read_csv('users.csv', schema={'id': int, 'age': int, 'score': float})
# Write to CSV
write_csv(data, 'output.csv')
```
### Data Transformation
```python
from petepak import transform, rename, distinct
# Add computed columns
data = [{'name': 'Alice', 'score': 90}]
result = transform(data, 'grade', lambda row: 'A' if row['score'] >= 90 else 'B')
print(result) # [{'name': 'Alice', 'score': 90, 'grade': 'A'}]
# Rename columns
renamed = rename(data, {'name': 'full_name'})
print(renamed) # [{'full_name': 'Alice', 'score': 90}]
# Remove duplicates
unique = distinct(data, 'name')
```
## Advanced Examples
### E-commerce Analysis
```python
from petepak import *
# Load data
customers = read_csv('customers.csv', schema={'id': int, 'age': int})
orders = read_csv('orders.csv', schema={'customer_id': int, 'amount': float})
# Join customers with orders
customer_orders = join(customers, orders, "a.id == b.customer_id",
join_type="left", list1_name="customer", list2_name="order")
# Filter high-value customers
high_value = filter(customer_orders, "a.order_amount > 100")
# Group by age ranges
age_groups = group_by(high_value, lambda row: (row['customer_age'] // 10) * 10)
# Display results
display_grouped(age_groups, 'age_range')
```
### Data Processing Pipeline
```python
from petepak import *
# 1. Load and clean data
raw_data = read_csv('sales.csv', schema={'amount': float, 'date': str})
clean_data = filter(raw_data, "a.amount > 0")
# 2. Transform data
processed = transform(clean_data, {
'month': lambda row: row['date'].split('-')[1],
'category': lambda row: 'high' if row['amount'] > 1000 else 'low'
})
# 3. Aggregate by month
monthly = group_by(processed, 'month')
# 4. Calculate totals
totals = []
for group in monthly:
total = sum(row['amount'] for row in group)
totals.append({'month': group[0]['month'], 'total': total})
# 5. Sort and display
final = order_by(totals, 'total', reverse=True)
display(final)
```
## API Reference
### Core Functions
- `select(rows, columns)` - Select specific columns
- `filter(rows, predicate)` - Filter rows using expressions or functions
- `join(list1, list2, expr, join_type)` - Join two datasets
- `group_by(rows, keys)` - Group rows by key values
- `order_by(rows, keys, reverse)` - Sort rows
- `distinct(rows, keys)` - Remove duplicate rows
### I/O Functions
- `read_csv(file_path, schema=None)` - Read CSV files
- `write_csv(rows, file_path)` - Write CSV files
- `display(rows)` - Pretty print data
- `display_grouped(groups, keys)` - Display grouped data
### Sorting Algorithms
- `bubble_sort(data, key, reverse)` - Bubble sort implementation
- `merge_sort(data, key, reverse)` - Merge sort implementation
- `quick_sort(data, key, reverse)` - Quick sort implementation
## Development
### Running tests
```bash
pytest
```
### Code formatting
```bash
black petepak tests
```
### Linting
```bash
flake8 petepak tests
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
### 0.1.0 (2024-01-01)
- Initial release
- SQL-like data manipulation functions
- CSV I/O with schema support
- Multiple sorting algorithms
- Comprehensive test suite (137 tests)
- 78% code coverage
Raw data
{
"_id": null,
"home_page": "https://pypi.org/project/petepak/",
"name": "petepak",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "sql, data, manipulation, pandas, alternative, list, dictionaries",
"author": "Peter Bernard",
"author_email": "Peter Bernard <peter.a.bernard1@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ff/95/c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e/petepak-0.1.0.tar.gz",
"platform": null,
"description": "# Petepak\r\n\r\n**SQL-like data manipulation for Python lists of dictionaries**\r\n\r\nPetepak provides a comprehensive set of functions for data manipulation that mimics SQL operations but works directly on Python lists of dictionaries. It's a lightweight alternative to pandas for simple data operations.\r\n\r\n## Features\r\n\r\n- \ud83d\udd0d **SQL-like operations**: select, filter, join, group_by, order_by\r\n- \ud83d\udd17 **Multiple join types**: inner, left, right, outer joins \r\n- \ud83d\udcc1 **CSV I/O**: read_csv, write_csv with schema support\r\n- \ud83d\udd04 **Data transformation**: rename, transform, distinct\r\n- \ud83d\udcca **Sorting algorithms**: bubble, merge, quick sort\r\n- \ud83d\udee1\ufe0f **Safe expressions**: Secure string-to-lambda conversion\r\n- \u26a1 **Lightweight**: No heavy dependencies like pandas\r\n\r\n## Installation\r\n\r\n### From PyPI (when published)\r\n\r\n```bash\r\npip install petepak\r\n```\r\n\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom petepak import select, filter, join, group_by, order_by\r\n\r\n# Sample data\r\nusers = [\r\n {'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'},\r\n {'id': 2, 'name': 'Bob', 'age': 30, 'city': 'Boston'},\r\n {'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'New York'}\r\n]\r\n\r\norders = [\r\n {'user_id': 1, 'product': 'Laptop', 'amount': 1200},\r\n {'user_id': 2, 'product': 'Mouse', 'amount': 25},\r\n {'user_id': 1, 'product': 'Monitor', 'amount': 300}\r\n]\r\n\r\n# Filter users by age\r\nyoung_users = filter(users, \"a.age < 30\")\r\nprint(young_users) # [{'id': 1, 'name': 'Alice', 'age': 25, 'city': 'New York'}]\r\n\r\n# Select specific columns\r\nnames = select(users, ['name', 'city'])\r\nprint(names) # [{'name': 'Alice', 'city': 'New York'}, ...]\r\n\r\n# Join users with orders\r\nuser_orders = join(users, orders, \"a.id == b.user_id\", join_type=\"inner\")\r\nprint(user_orders) # Combined data with prefixed columns\r\n\r\n# Group by city\r\nby_city = group_by(users, 'city')\r\nprint(by_city) # [[users from New York], [users from Boston]]\r\n\r\n# Sort by age\r\nsorted_users = order_by(users, 'age', reverse=True)\r\nprint(sorted_users) # Users sorted by age descending\r\n```\r\n\r\n## Core Operations\r\n\r\n### Filtering Data\r\n\r\n```python\r\nfrom petepak import filter\r\n\r\ndata = [{'score': 90}, {'score': 75}, {'score': 85}]\r\n\r\n# Using expression strings (SQL-like)\r\nhigh_scores = filter(data, \"a.score >= 80\")\r\nprint(high_scores) # [{'score': 90}, {'score': 85}]\r\n\r\n# Using lambda functions (Python-like)\r\nhigh_scores = filter(data, lambda row: row.get('score', 0) >= 80)\r\nprint(high_scores) # [{'score': 90}, {'score': 85}]\r\n```\r\n\r\n### Joining Data\r\n\r\n```python\r\nfrom petepak import join\r\n\r\nusers = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]\r\norders = [{'user_id': 1, 'product': 'Laptop'}, {'user_id': 2, 'product': 'Mouse'}]\r\n\r\n# Inner join\r\nresult = join(users, orders, \"a.id == b.user_id\", join_type=\"inner\")\r\nprint(result)\r\n# [{'user_id': 1, 'user_name': 'Alice', 'order_user_id': 1, 'order_product': 'Laptop'}, ...]\r\n\r\n# Left join\r\nresult = join(users, orders, \"a.id == b.user_id\", join_type=\"left\")\r\n# Includes all users, even those without orders\r\n```\r\n\r\n### CSV Operations\r\n\r\n```python\r\nfrom petepak import read_csv, write_csv\r\n\r\n# Read CSV with schema\r\ndata = read_csv('users.csv', schema={'id': int, 'age': int, 'score': float})\r\n\r\n# Write to CSV\r\nwrite_csv(data, 'output.csv')\r\n```\r\n\r\n### Data Transformation\r\n\r\n```python\r\nfrom petepak import transform, rename, distinct\r\n\r\n# Add computed columns\r\ndata = [{'name': 'Alice', 'score': 90}]\r\nresult = transform(data, 'grade', lambda row: 'A' if row['score'] >= 90 else 'B')\r\nprint(result) # [{'name': 'Alice', 'score': 90, 'grade': 'A'}]\r\n\r\n# Rename columns\r\nrenamed = rename(data, {'name': 'full_name'})\r\nprint(renamed) # [{'full_name': 'Alice', 'score': 90}]\r\n\r\n# Remove duplicates\r\nunique = distinct(data, 'name')\r\n```\r\n\r\n## Advanced Examples\r\n\r\n### E-commerce Analysis\r\n\r\n```python\r\nfrom petepak import *\r\n\r\n# Load data\r\ncustomers = read_csv('customers.csv', schema={'id': int, 'age': int})\r\norders = read_csv('orders.csv', schema={'customer_id': int, 'amount': float})\r\n\r\n# Join customers with orders\r\ncustomer_orders = join(customers, orders, \"a.id == b.customer_id\", \r\n join_type=\"left\", list1_name=\"customer\", list2_name=\"order\")\r\n\r\n# Filter high-value customers\r\nhigh_value = filter(customer_orders, \"a.order_amount > 100\")\r\n\r\n# Group by age ranges\r\nage_groups = group_by(high_value, lambda row: (row['customer_age'] // 10) * 10)\r\n\r\n# Display results\r\ndisplay_grouped(age_groups, 'age_range')\r\n```\r\n\r\n### Data Processing Pipeline\r\n\r\n```python\r\nfrom petepak import *\r\n\r\n# 1. Load and clean data\r\nraw_data = read_csv('sales.csv', schema={'amount': float, 'date': str})\r\nclean_data = filter(raw_data, \"a.amount > 0\")\r\n\r\n# 2. Transform data\r\nprocessed = transform(clean_data, {\r\n 'month': lambda row: row['date'].split('-')[1],\r\n 'category': lambda row: 'high' if row['amount'] > 1000 else 'low'\r\n})\r\n\r\n# 3. Aggregate by month\r\nmonthly = group_by(processed, 'month')\r\n\r\n# 4. Calculate totals\r\ntotals = []\r\nfor group in monthly:\r\n total = sum(row['amount'] for row in group)\r\n totals.append({'month': group[0]['month'], 'total': total})\r\n\r\n# 5. Sort and display\r\nfinal = order_by(totals, 'total', reverse=True)\r\ndisplay(final)\r\n```\r\n\r\n## API Reference\r\n\r\n### Core Functions\r\n\r\n- `select(rows, columns)` - Select specific columns\r\n- `filter(rows, predicate)` - Filter rows using expressions or functions\r\n- `join(list1, list2, expr, join_type)` - Join two datasets\r\n- `group_by(rows, keys)` - Group rows by key values\r\n- `order_by(rows, keys, reverse)` - Sort rows\r\n- `distinct(rows, keys)` - Remove duplicate rows\r\n\r\n### I/O Functions\r\n\r\n- `read_csv(file_path, schema=None)` - Read CSV files\r\n- `write_csv(rows, file_path)` - Write CSV files\r\n- `display(rows)` - Pretty print data\r\n- `display_grouped(groups, keys)` - Display grouped data\r\n\r\n### Sorting Algorithms\r\n\r\n- `bubble_sort(data, key, reverse)` - Bubble sort implementation\r\n- `merge_sort(data, key, reverse)` - Merge sort implementation \r\n- `quick_sort(data, key, reverse)` - Quick sort implementation\r\n\r\n## Development\r\n\r\n\r\n### Running tests\r\n\r\n```bash\r\npytest\r\n```\r\n\r\n### Code formatting\r\n\r\n```bash\r\nblack petepak tests\r\n```\r\n\r\n### Linting\r\n\r\n```bash\r\nflake8 petepak tests\r\n```\r\n\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Changelog\r\n\r\n### 0.1.0 (2024-01-01)\r\n\r\n- Initial release\r\n- SQL-like data manipulation functions\r\n- CSV I/O with schema support\r\n- Multiple sorting algorithms\r\n- Comprehensive test suite (137 tests)\r\n- 78% code coverage\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SQL-like data manipulation for Python lists of dictionaries",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://pypi.org/project/petepak/"
},
"split_keywords": [
"sql",
" data",
" manipulation",
" pandas",
" alternative",
" list",
" dictionaries"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2558ac223527c9e9b622b6191d99b6ecbe08a08973183f798b5713286713cdb3",
"md5": "d3c63aebe1ebc1035ebbf004ddb7186a",
"sha256": "952fc056c530e06cfeb2f5d9faeffc00940ed5b0b2bba142a5fc4f91aed0837c"
},
"downloads": -1,
"filename": "petepak-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d3c63aebe1ebc1035ebbf004ddb7186a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 13956,
"upload_time": "2025-10-23T19:54:39",
"upload_time_iso_8601": "2025-10-23T19:54:39.973061Z",
"url": "https://files.pythonhosted.org/packages/25/58/ac223527c9e9b622b6191d99b6ecbe08a08973183f798b5713286713cdb3/petepak-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ff95c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e",
"md5": "7a453ce46653f326e5eadfd702cfd5d4",
"sha256": "2b38cb7a9c5226e5e30a5a085ca3f34ee299f207edf02ea3c42b8b5e1756afbc"
},
"downloads": -1,
"filename": "petepak-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "7a453ce46653f326e5eadfd702cfd5d4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 21905,
"upload_time": "2025-10-23T19:54:41",
"upload_time_iso_8601": "2025-10-23T19:54:41.411326Z",
"url": "https://files.pythonhosted.org/packages/ff/95/c8c597a9a3325550bc8ce21bb79b4b5af89cdc6aab3d304304309b00976e/petepak-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-23 19:54:41",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "petepak"
}