perse


Nameperse JSON
Version 0.1.9 PyPI version JSON
download
home_pageNone
SummaryPerse is a unified DataFrame package that combines the best of Pandas, Polars, and DuckDB for efficient data handling, querying, and visualization.
upload_time2024-11-08 23:11:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords dataframe polars pandas duckdb data-science data-analysis sql visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Python Package](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml/badge.svg)](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml)[![PyPI](https://img.shields.io/pypi/v/perse)](https://img.shields.io/pypi/v/perse)[![Supported Python Versions](https://img.shields.io/pypi/pyversions/perse)](https://pypi.org/project/perse/) 





# Perse

**Perse** is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries **Pandas**, **Polars**, and **DuckDB** into a single, unified `DataFrame` object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling experience.

This package is currently experimental, with a focus on essential functions. We plan to expand its capabilities by integrating more features from Pandas, Polars, and DuckDB in future versions.

## Key Features

The `Perse` DataFrame currently supports the following functionalities:

### 1. Data Manipulation
Core data-handling tools inspired by Pandas and Polars.

- **Indexing and Selection**: Access specific rows or columns with `.loc` and `.iloc` properties.
- **Column Operations**: Add, modify, or delete columns efficiently.
- **Row Filtering**: Filter rows based on specific conditions.
- **Aggregation**: Summarize data with aggregations like `sum`, `mean`, `count`.
- **Sorting**: Sort data based on column values.
- **Custom Function Application**: Apply custom functions to columns, supporting both element-wise operations and complex transformations.

### 2. SQL Querying
Use DuckDB's SQL engine to run SQL queries directly on the DataFrame, ideal for complex filtering and data manipulation.

- **Direct SQL Queries**: Run SQL queries directly on data using DuckDB’s powerful engine.
- **Seamless Integration**: Convert between Polars and DuckDB seamlessly for efficient querying on large datasets.
- **Advanced Filtering**: Filter, join, and group data using SQL syntax.

### 3. Data Transformation
A collection of versatile data transformation functions.

- **Pivot and Unpivot**: Reshape data for summary reports and visualizations.
- **Melt/Stack**: Transform data between wide and long formats.
- **Mapping and Replacing**: Map values based on conditions or replace them in columns.
- **Grouping and Window Functions**: Group by specific columns and apply aggregations or window functions for advanced data summarization.

### 4. Compatibility and Conversion
Interoperability between Pandas, Polars, and DuckDB formats, offering flexibility in data manipulation.

- **Pandas Compatibility**: Conversion utilities to easily move data between Pandas and Polars.
- **Automatic Data Handling**: Automatically convert and handle data depending on the operation, allowing users to work flexibly with either Pandas or Polars.
- **File I/O Support**: Read and write from common file formats (e.g., CSV, Parquet, JSON).

### 5. Visualization
Basic plotting capabilities that make it easy to visualize data directly from the Perse DataFrame.

- **Line, Bar, and Scatter Plots**: Quick visualizations with common plot types.
- **Customization**: Customize plot titles, labels, and legends with Matplotlib.
- **Direct Plotting**: Plot directly from the Perse DataFrame, which internally uses Pandas’ Matplotlib integration.

### 6. Data Integrity and Locking
Features designed to prevent accidental modifications and ensure data integrity.

- **Locking Mechanism**: Lock the DataFrame to prevent accidental edits.
- **Unlocking**: Explicitly unlock to allow modifications.
- **Validation**: Ensure data type consistency across columns for critical operations.

## Installation

To install Perse, run:

```bash
pip install perse
```

### Usage 

```python 

from perse import DataFrame
import numpy as np

# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)

# 1. Add a New Column 
df.add_column("D", np.random.random(10), inplace=True)
print("DataFrame with new column D:\n", df)

# 2. Filter Rows
df2 = df.filter_rows(df.dl["A"] > 50, inplace=False) # default inplace = False 
print("Filtered DataFrame (A > 50):\n", df2)

# 3. SQL Querying with DuckDB
df2 = df.query("SELECT A, AVG(B) AS avg_B FROM this GROUP BY A")
print("SQL Query Result:\n", df2)

# 4. Visualization
df.plot(kind="scatter", x="A", y="B", title="Scatter Plot of A vs B", xlabel="A values", ylabel="B values")

# 5. Convert to Pandas
df2 = df.to_pandas()
print("Converted to Pandas DataFrame:\n", df2)


```





Pipe Operator
================
In Python, the | operator is traditionally used as the OR operator. However, in the DataFrame class, the | operator has been repurposed for a functional, chainable approach, similar to other modern data processing libraries. This enables more readable and flexible expressions.

```python 

from perse import DataFrame
import numpy as np

# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)
# Applying the print function to the DataFrame instance
df | print

# Chaining functions: the instance is returned if no modification is made
df2 = df | print | print

# Using a lambda function to call `to_csv` with arguments, demonstrating flexibility in piping
_ = df | (lambda x: x.to_csv('example.csv'))

```

```python 
from perse import DataFrame
import numpy as np

# Sample data
data = {"A": np.random.randint(0, 100, 10), "B": np.random.random(10), "C": np.random.choice(["X", "Y", "Z"], 10)}
df = DataFrame(data)

# Export as CSV file
df.to_csv('example.csv')

# Export as Excel file
df.to_excel('example.xlsx')

# Export as JSON file
df.to_json('example.json')

# Alternatively this concise expression can also be used
df > 'example.csv'
df > 'example.xlsx'
df > 'example.json'

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "perse",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "dataframe, polars, pandas, duckdb, data-science, data-analysis, SQL, visualization",
    "author": null,
    "author_email": "Sermet Pekin <Sermet.Pekin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/28/14/767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5/perse-0.1.9.tar.gz",
    "platform": null,
    "description": "[![Python Package](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml/badge.svg)](https://github.com/SermetPekin/perse/actions/workflows/python-package.yml)[![PyPI](https://img.shields.io/pypi/v/perse)](https://img.shields.io/pypi/v/perse)[![Supported Python Versions](https://img.shields.io/pypi/pyversions/perse)](https://pypi.org/project/perse/) \n\n\n\n\n\n# Perse\n\n**Perse** is an experimental Python package that combines some of the most widely-used functionalities from the powerhouse libraries **Pandas**, **Polars**, and **DuckDB** into a single, unified `DataFrame` object. The goal of Perse is to provide a streamlined and efficient interface, leveraging the strengths of these libraries to create a versatile data handling experience.\n\nThis package is currently experimental, with a focus on essential functions. We plan to expand its capabilities by integrating more features from Pandas, Polars, and DuckDB in future versions.\n\n## Key Features\n\nThe `Perse` DataFrame currently supports the following functionalities:\n\n### 1. Data Manipulation\nCore data-handling tools inspired by Pandas and Polars.\n\n- **Indexing and Selection**: Access specific rows or columns with `.loc` and `.iloc` properties.\n- **Column Operations**: Add, modify, or delete columns efficiently.\n- **Row Filtering**: Filter rows based on specific conditions.\n- **Aggregation**: Summarize data with aggregations like `sum`, `mean`, `count`.\n- **Sorting**: Sort data based on column values.\n- **Custom Function Application**: Apply custom functions to columns, supporting both element-wise operations and complex transformations.\n\n### 2. SQL Querying\nUse DuckDB's SQL engine to run SQL queries directly on the DataFrame, ideal for complex filtering and data manipulation.\n\n- **Direct SQL Queries**: Run SQL queries directly on data using DuckDB\u2019s powerful engine.\n- **Seamless Integration**: Convert between Polars and DuckDB seamlessly for efficient querying on large datasets.\n- **Advanced Filtering**: Filter, join, and group data using SQL syntax.\n\n### 3. Data Transformation\nA collection of versatile data transformation functions.\n\n- **Pivot and Unpivot**: Reshape data for summary reports and visualizations.\n- **Melt/Stack**: Transform data between wide and long formats.\n- **Mapping and Replacing**: Map values based on conditions or replace them in columns.\n- **Grouping and Window Functions**: Group by specific columns and apply aggregations or window functions for advanced data summarization.\n\n### 4. Compatibility and Conversion\nInteroperability between Pandas, Polars, and DuckDB formats, offering flexibility in data manipulation.\n\n- **Pandas Compatibility**: Conversion utilities to easily move data between Pandas and Polars.\n- **Automatic Data Handling**: Automatically convert and handle data depending on the operation, allowing users to work flexibly with either Pandas or Polars.\n- **File I/O Support**: Read and write from common file formats (e.g., CSV, Parquet, JSON).\n\n### 5. Visualization\nBasic plotting capabilities that make it easy to visualize data directly from the Perse DataFrame.\n\n- **Line, Bar, and Scatter Plots**: Quick visualizations with common plot types.\n- **Customization**: Customize plot titles, labels, and legends with Matplotlib.\n- **Direct Plotting**: Plot directly from the Perse DataFrame, which internally uses Pandas\u2019 Matplotlib integration.\n\n### 6. Data Integrity and Locking\nFeatures designed to prevent accidental modifications and ensure data integrity.\n\n- **Locking Mechanism**: Lock the DataFrame to prevent accidental edits.\n- **Unlocking**: Explicitly unlock to allow modifications.\n- **Validation**: Ensure data type consistency across columns for critical operations.\n\n## Installation\n\nTo install Perse, run:\n\n```bash\npip install perse\n```\n\n### Usage \n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n\n# 1. Add a New Column \ndf.add_column(\"D\", np.random.random(10), inplace=True)\nprint(\"DataFrame with new column D:\\n\", df)\n\n# 2. Filter Rows\ndf2 = df.filter_rows(df.dl[\"A\"] > 50, inplace=False) # default inplace = False \nprint(\"Filtered DataFrame (A > 50):\\n\", df2)\n\n# 3. SQL Querying with DuckDB\ndf2 = df.query(\"SELECT A, AVG(B) AS avg_B FROM this GROUP BY A\")\nprint(\"SQL Query Result:\\n\", df2)\n\n# 4. Visualization\ndf.plot(kind=\"scatter\", x=\"A\", y=\"B\", title=\"Scatter Plot of A vs B\", xlabel=\"A values\", ylabel=\"B values\")\n\n# 5. Convert to Pandas\ndf2 = df.to_pandas()\nprint(\"Converted to Pandas DataFrame:\\n\", df2)\n\n\n```\n\n\n\n\n\nPipe Operator\n================\nIn Python, the | operator is traditionally used as the OR operator. However, in the DataFrame class, the | operator has been repurposed for a functional, chainable approach, similar to other modern data processing libraries. This enables more readable and flexible expressions.\n\n```python \n\nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n# Applying the print function to the DataFrame instance\ndf | print\n\n# Chaining functions: the instance is returned if no modification is made\ndf2 = df | print | print\n\n# Using a lambda function to call `to_csv` with arguments, demonstrating flexibility in piping\n_ = df | (lambda x: x.to_csv('example.csv'))\n\n```\n\n```python \nfrom perse import DataFrame\nimport numpy as np\n\n# Sample data\ndata = {\"A\": np.random.randint(0, 100, 10), \"B\": np.random.random(10), \"C\": np.random.choice([\"X\", \"Y\", \"Z\"], 10)}\ndf = DataFrame(data)\n\n# Export as CSV file\ndf.to_csv('example.csv')\n\n# Export as Excel file\ndf.to_excel('example.xlsx')\n\n# Export as JSON file\ndf.to_json('example.json')\n\n# Alternatively this concise expression can also be used\ndf > 'example.csv'\ndf > 'example.xlsx'\ndf > 'example.json'\n\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Perse is a unified DataFrame package that combines the best of Pandas, Polars, and DuckDB for efficient data handling, querying, and visualization.",
    "version": "0.1.9",
    "project_urls": {
        "documentation": "https://perse.readthedocs.io/en/latest/home.html",
        "issue_tracker": "https://github.com/SermetPekin/perse/issues",
        "repository": "https://github.com/SermetPekin/perse"
    },
    "split_keywords": [
        "dataframe",
        " polars",
        " pandas",
        " duckdb",
        " data-science",
        " data-analysis",
        " sql",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "27132f8b76b1245140c6ec80af910f7314943a57009b99abe221560a69756dba",
                "md5": "3a22df8c7bf79ac2e83b2b004e2e7d48",
                "sha256": "80a029bff6eea69a50f9d6809c3da3d6a66ea126ee740bd528e8259f350445d8"
            },
            "downloads": -1,
            "filename": "perse-0.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a22df8c7bf79ac2e83b2b004e2e7d48",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13822,
            "upload_time": "2024-11-08T23:11:31",
            "upload_time_iso_8601": "2024-11-08T23:11:31.602273Z",
            "url": "https://files.pythonhosted.org/packages/27/13/2f8b76b1245140c6ec80af910f7314943a57009b99abe221560a69756dba/perse-0.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2814767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5",
                "md5": "4421d56346d3372fe5075ce1b459e3ea",
                "sha256": "eae9aa3fd846000ef6bfac479bb0e2a2db0c38d6952abb37ab09f502902a9c6c"
            },
            "downloads": -1,
            "filename": "perse-0.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "4421d56346d3372fe5075ce1b459e3ea",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12648,
            "upload_time": "2024-11-08T23:11:33",
            "upload_time_iso_8601": "2024-11-08T23:11:33.105028Z",
            "url": "https://files.pythonhosted.org/packages/28/14/767d6c83c0c7e8346d59e328676fb77d1840551577cfbbf8f7c6c06c42d5/perse-0.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-08 23:11:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SermetPekin",
    "github_project": "perse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "perse"
}
        
Elapsed time: 0.34963s