[![Python Package](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml/badge.svg)](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml)[![PyPI](https://img.shields.io/pypi/v/perse)](https://img.shields.io/pypi/v/perse)[![Supported Python Versions](https://img.shields.io/pypi/pyversions/perse)](https://pypi.org/project/perse/)
# Perse
**Perse** is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries **Pandas**, **Polars**, and **DuckDB** into a single, unified `DataFrame` object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling experience.
This package is currently experimental, with a focus on essential functions. We plan to expand its capabilities by integrating more features from Pandas, Polars, and DuckDB in future versions.
## Key Features
The `Perse` DataFrame currently supports the following functionalities:
### 1. Data Manipulation
Core data-handling tools inspired by Pandas and Polars.
- **Indexing and Selection**: Access specific rows or columns with `.loc` and `.iloc` properties.
- **Column Operations**: Add, modify, or delete columns efficiently.
- **Row Filtering**: Filter rows based on specific conditions.
- **Aggregation**: Summarize data with aggregations like `sum`, `mean`, `count`.
- **Sorting**: Sort data based on column values.
- **Custom Function Application**: Apply custom functions to columns, supporting both element-wise operations and complex transformations.
### 2. SQL Querying
Use DuckDB's SQL engine to run SQL queries directly on the DataFrame, ideal for complex filtering and data manipulation.
- **Direct SQL Queries**: Run SQL queries directly on data using DuckDB’s powerful engine.
- **Seamless Integration**: Convert between Polars and DuckDB seamlessly for efficient querying on large datasets.
- **Advanced Filtering**: Filter, join, and group data using SQL syntax.
### 3. Data Transformation
A collection of versatile data transformation functions.
- **Pivot and Unpivot**: Reshape data for summary reports and visualizations.
- **Melt/Stack**: Transform data between wide and long formats.
- **Mapping and Replacing**: Map values based on conditions or replace them in columns.
- **Grouping and Window Functions**: Group by specific columns and apply aggregations or window functions for advanced data summarization.
### 4. Compatibility and Conversion
Interoperability between Pandas, Polars, and DuckDB formats, offering flexibility in data manipulation.
- **Pandas Compatibility**: Conversion utilities to easily move data between Pandas and Polars.
- **Automatic Data Handling**: Automatically convert and handle data depending on the operation, allowing users to work flexibly with either Pandas or Polars.
- **File I/O Support**: Read and write from common file formats (e.g., CSV, Parquet, JSON).
### 5. Visualization
Basic plotting capabilities that make it easy to visualize data directly from the Perse DataFrame.
- **Line, Bar, and Scatter Plots**: Quick visualizations with common plot types.
- **Customization**: Customize plot titles, labels, and legends with Matplotlib.
- **Direct Plotting**: Plot directly from the Perse DataFrame, which internally uses Pandas’ Matplotlib integration.
### 6. Data Integrity and Locking
Features designed to prevent accidental modifications and ensure data integrity.
- **Locking Mechanism**: Lock the DataFrame to prevent accidental edits.
- **Unlocking**: Explicitly unlock to allow modifications.
- **Validation**: Ensure data type consistency across columns for critical operations.
## Installation
To install Perse, run:
```bash
pip install perse
```
### Usage
```python
from perse import DataFrame
import numpy as np
# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)
# 1. Add a New Column
df.add_column("D", np.random.random(10), inplace=True)
print("DataFrame with new column D:\n", df)
# 2. Filter Rows
df2 = df.filter_rows(df.dl["A"] > 50, inplace=False) # default inplace = False
print("Filtered DataFrame (A > 50):\n", df2)
# 3. SQL Querying with DuckDB
df2 = df.query("SELECT A, AVG(B) AS avg_B FROM this GROUP BY A")
print("SQL Query Result:\n", df2)
# 4. Visualization
df.plot(kind="scatter", x="A", y="B", title="Scatter Plot of A vs B", xlabel="A values", ylabel="B values")
# 5. Convert to Pandas
df2 = df.to_pandas()
print("Converted to Pandas DataFrame:\n", df2)
```
Pipe Operator
================
In Python, the | operator is traditionally used as the OR operator. However, in the DataFrame class, the | operator has been repurposed for a functional, chainable approach, similar to other modern data processing libraries. This enables more readable and flexible expressions.
```python
from perse import DataFrame
import numpy as np
# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)
# Applying the print function to the DataFrame instance
df | print
# Chaining functions: the instance is returned if no modification is made
df2 = df | print | print
# Using a lambda function to call `to_csv` with arguments, demonstrating flexibility in piping
_ = df | (lambda x: x.to_csv('example.csv'))
```
```python
from perse import DataFrame
import numpy as np
# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)
# Export as CSV file
df.to_csv('example.csv')
# Export as Excel file
df.to_excel('example.xlsx')
# Export as JSON file
df.to_json('example.json')
# Alternatively this concise expression can also be used
df > 'example.csv'
df > 'example.xlsx'
df > 'example.json'
```
Raw data
{
"_id": null,
"home_page": null,
"name": "perse",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "dataframe, polars, pandas, duckdb, data-science, data-analysis, SQL, visualization",
"author": null,
"author_email": "Sermet Pekin <Sermet.Pekin@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/28/14/767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5/perse-0.1.9.tar.gz",
"platform": null,
"description": "[![Python Package](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml/badge.svg)](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml)[![PyPI](https://img.shields.io/pypi/v/perse)](https://img.shields.io/pypi/v/perse)[![Supported Python Versions](https://img.shields.io/pypi/pyversions/perse)](https://pypi.org/project/perse/) \n\n\n\n\n\n# Perse\n\n**Perse** is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries **Pandas**, **Polars**, and **DuckDB** into a single, unified `DataFrame` object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling experience.\n\nThis package is currently experimental, with a focus on essential functions. We plan to expand its capabilities by integrating more features from Pandas, Polars, and DuckDB in future versions.\n\n## Key Features\n\nThe `Perse` DataFrame currently supports the following functionalities:\n\n### 1. Data Manipulation\nCore data-handling tools inspired by Pandas and Polars.\n\n- **Indexing and Selection**: Access specific rows or columns with `.loc` and `.iloc` properties.\n- **Column Operations**: Add, modify, or delete columns efficiently.\n- **Row Filtering**: Filter rows based on specific conditions.\n- **Aggregation**: Summarize data with aggregations like `sum`, `mean`, `count`.\n- **Sorting**: Sort data based on column values.\n- **Custom Function Application**: Apply custom functions to columns, supporting both element-wise operations and complex transformations.\n\n### 2. SQL Querying\nUse DuckDB's SQL engine to run SQL queries directly on the DataFrame, ideal for complex filtering and data manipulation.\n\n- **Direct SQL Queries**: Run SQL queries directly on data using DuckDB\u2019s powerful engine.\n- **Seamless Integration**: Convert between Polars and DuckDB seamlessly for efficient querying on large datasets.\n- **Advanced Filtering**: Filter, join, and group data using SQL syntax.\n\n### 3. Data Transformation\nA collection of versatile data transformation functions.\n\n- **Pivot and Unpivot**: Reshape data for summary reports and visualizations.\n- **Melt/Stack**: Transform data between wide and long formats.\n- **Mapping and Replacing**: Map values based on conditions or replace them in columns.\n- **Grouping and Window Functions**: Group by specific columns and apply aggregations or window functions for advanced data summarization.\n\n### 4. Compatibility and Conversion\nInteroperability between Pandas, Polars, and DuckDB formats, offering flexibility in data manipulation.\n\n- **Pandas Compatibility**: Conversion utilities to easily move data between Pandas and Polars.\n- **Automatic Data Handling**: Automatically convert and handle data depending on the operation, allowing users to work flexibly with either Pandas or Polars.\n- **File I/O Support**: Read and write from common file formats (e.g., CSV, Parquet, JSON).\n\n### 5. Visualization\nBasic plotting capabilities that make it easy to visualize data directly from the Perse DataFrame.\n\n- **Line, Bar, and Scatter Plots**: Quick visualizations with common plot types.\n- **Customization**: Customize plot titles, labels, and legends with Matplotlib.\n- **Direct Plotting**: Plot directly from the Perse DataFrame, which internally uses Pandas\u2019 Matplotlib integration.\n\n### 6. Data Integrity and Locking\nFeatures designed to prevent accidental modifications and ensure data integrity.\n\n- **Locking Mechanism**: Lock the DataFrame to prevent accidental edits.\n- **Unlocking**: Explicitly unlock to allow modifications.\n- **Validation**: Ensure data type consistency across columns for critical operations.\n\n## Installation\n\nTo install Perse, run:\n\n```bash\npip install perse\n```\n\n### Usage \n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n\n# 1. Add a New Column \ndf.add_column(\"D\", np.random.random(10), inplace=True)\nprint(\"DataFrame with new column D:\\n\", df)\n\n# 2. Filter Rows\ndf2 = df.filter_rows(df.dl[\"A\"] > 50, inplace=False) # default inplace = False \nprint(\"Filtered DataFrame (A > 50):\\n\", df2)\n\n# 3. SQL Querying with DuckDB\ndf2 = df.query(\"SELECT A, AVG(B) AS avg_B FROM this GROUP BY A\")\nprint(\"SQL Query Result:\\n\", df2)\n\n# 4. Visualization\ndf.plot(kind=\"scatter\", x=\"A\", y=\"B\", title=\"Scatter Plot of A vs B\", xlabel=\"A values\", ylabel=\"B values\")\n\n# 5. Convert to Pandas\ndf2 = df.to_pandas()\nprint(\"Converted to Pandas DataFrame:\\n\", df2)\n\n\n```\n\n\n\n\n\nPipe Operator\n================\nIn Python, the | operator is traditionally used as the OR operator. However, in the DataFrame class, the | operator has been repurposed for a functional, chainable approach, similar to other modern data processing libraries. This enables more readable and flexible expressions.\n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n# Applying the print function to the DataFrame instance\ndf | print\n\n# Chaining functions: the instance is returned if no modification is made\ndf2 = df | print | print\n\n# Using a lambda function to call `to_csv` with arguments, demonstrating flexibility in piping\n_ = df | (lambda x: x.to_csv('example.csv'))\n\n```\n\n```python \nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n\n# Export as CSV file\ndf.to_csv('example.csv')\n\n# Export as Excel file\ndf.to_excel('example.xlsx')\n\n# Export as JSON file\ndf.to_json('example.json')\n\n# Alternatively this concise expression can also be used\ndf > 'example.csv'\ndf > 'example.xlsx'\ndf > 'example.json'\n\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Perse is a unified DataFrame package that combines the best of Pandas, Polars, and DuckDB for efficient data handling, querying, and visualization.",
"version": "0.1.9",
"project_urls": {
"documentation": "https://perse.readthedocs.io/en/latest/home.html",
"issue_tracker": "https://github.com/SermetPekin/perse/issues",
"repository": "https://github.com/SermetPekin/perse"
},
"split_keywords": [
"dataframe",
" polars",
" pandas",
" duckdb",
" data-science",
" data-analysis",
" sql",
" visualization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "27132f8b76b1245140c6ec80af910f7314943a57009b99abe221560a69756dba",
"md5": "3a22df8c7bf79ac2e83b2b004e2e7d48",
"sha256": "80a029bff6eea69a50f9d6809c3da3d6a66ea126ee740bd528e8259f350445d8"
},
"downloads": -1,
"filename": "perse-0.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3a22df8c7bf79ac2e83b2b004e2e7d48",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13822,
"upload_time": "2024-11-08T23:11:31",
"upload_time_iso_8601": "2024-11-08T23:11:31.602273Z",
"url": "https://files.pythonhosted.org/packages/27/13/2f8b76b1245140c6ec80af910f7314943a57009b99abe221560a69756dba/perse-0.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2814767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5",
"md5": "4421d56346d3372fe5075ce1b459e3ea",
"sha256": "eae9aa3fd846000ef6bfac479bb0e2a2db0c38d6952abb37ab09f502902a9c6c"
},
"downloads": -1,
"filename": "perse-0.1.9.tar.gz",
"has_sig": false,
"md5_digest": "4421d56346d3372fe5075ce1b459e3ea",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 12648,
"upload_time": "2024-11-08T23:11:33",
"upload_time_iso_8601": "2024-11-08T23:11:33.105028Z",
"url": "https://files.pythonhosted.org/packages/28/14/767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5/perse-0.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-08 23:11:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SermetPekin",
"github_project": "perse",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "perse"
}