## polars_partitions
**GitHub:** [polars_partitions](https://github.com/dwenlvov/polars_partitions)
**Version**: 0.1.1
### Python
```
pip install polars_partitions
```
## Description
This library is not a replacement for [Polars](https://pola.rs/).
The main goal is to improve the work (write/read/filter) with partitions by creating a Table Of Contents file (hereinafter referred to as "TOC").
### Write Partition
### polars_parquet.wr_partition()
**polars_parquet.wr_partition(**
_df_: DataFrame,
_columns_: array | string,
_output_path_: str
**)**
#### Parameters
**df**
Polars DataFrame
**columns**
Array of columns on which to create partitions
**output_path**
Path to save to
### TOC record
### polars_parquet.wr_toc()
**polars_parquet.wr_toc(**
_df_: DataFrame on which the partitions are based,
_columns_: array | string,
_output_path_: str
**)**
#### Parameters
**df**
Dictionary, where the key is the column and the array is the values
**columns**
Array of columns to create partitions for
**output_path**
Path to save to
### Reading TOC
### polars_parquet.rd_toc()
**polars_parquet.rd_toc(**
_output_path_: DataFrame,
_filters_: dict = None,
_btwn_: str = None
**)**
#### Parameters
**output_path**
Path where to save.
**filters**
Dictionary, where the key is the column and the array is the values
**btwn**
Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).
### Read Partition
### polars_parquet.rd_partition()
**polars_parquet.rd_partition(**
_output_path_: str,
_columns_: array | string = "*",
_filters_: dict = None,
_btwn_: str = None
**)** → LazyFrame
#### Parameters
**output_path**
Path to the parquet file or to the partitions folder
**columns**
Array of columns to return
**filters**
Dictionary where the key is the column and the array is the values
**btwn**
Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).
## How to use (example)
``` python
import polars_partitions as plp
from datetime import date
import polars as pl
# Create a test dataset
df = pl.DataFrame({'col1':[date(2024,1,1),date(2024,1,1),date(2024,1,2),date(2024,1,2),date(2024,1,2),date(2024,1,3),date(2024,1,3),date(2024,1,3)],
'col2':['A2','A2','A2','A2','A2','A2','B2','B2','B2','B2'],
'col3':[1,2,3,4,5,6,7,8]
})
output_path = 'your_path/folder_name_where_to_save'.
# Which columns are partitioned by
columns = ['col1', 'col2']
pp = plp.polars_partitions()
# Write the partitions
pp.wr_partition(df, columns, output_path)
# Read TOC
# print(pp.rd_toc(output_path))
# Read partitions and apply filters
# filters = {'col1':[date(2024,1,1),date(2024,1,3)]}
# df = pp.rd_partition(output_path, filters=filters, btwn='col1', columns=['col1', 'col3'])
# print(df.collect())
```
Raw data
{
"_id": null,
"home_page": "https://github.com/dwenlvov/",
"name": "polars-partitions",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "",
"keywords": "polars,partitions,parquet",
"author": "denis_lvov",
"author_email": "dwenlvov@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b7/34/fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e/polars_partitions-0.1.1.tar.gz",
"platform": null,
"description": "## polars_partitions\n\n**GitHub:** [polars_partitions](https://github.com/dwenlvov/polars_partitions) \n**Version**: 0.1.1\n\n### Python\n```\npip install polars_partitions\n```\n\n## Description\nThis library is not a replacement for [Polars](https://pola.rs/).\nThe main goal is to improve the work (write/read/filter) with partitions by creating a Table Of Contents file (hereinafter referred to as \"TOC\").\n\n### Write Partition\n### polars_parquet.wr_partition() \n**polars_parquet.wr_partition(** \n _df_: DataFrame, \n _columns_: array | string, \n _output_path_: str \n**)**\n\n#### Parameters\n**df** \n Polars DataFrame \n**columns** \n Array of columns on which to create partitions \n**output_path** \n Path to save to \n\n### TOC record\n### polars_parquet.wr_toc() \n**polars_parquet.wr_toc(** \n _df_: DataFrame on which the partitions are based, \n _columns_: array | string, \n _output_path_: str \n**)**\n\n#### Parameters\n**df** \n Dictionary, where the key is the column and the array is the values \n**columns** \n Array of columns to create partitions for \n**output_path** \n Path to save to \n\n### Reading TOC\n### polars_parquet.rd_toc() \n**polars_parquet.rd_toc(** \n _output_path_: DataFrame, \n _filters_: dict = None, \n _btwn_: str = None \n**)**\n\n#### Parameters\n**output_path** \n Path where to save. \n**filters** \n Dictionary, where the key is the column and the array is the values \n**btwn** \n Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array). \n\n### Read Partition\n### polars_parquet.rd_partition() \n**polars_parquet.rd_partition(** \n _output_path_: str, \n _columns_: array | string = \"*\", \n _filters_: dict = None, \n _btwn_: str = None \n**)** \u2192 LazyFrame \n\n#### Parameters\n**output_path** \n Path to the parquet file or to the partitions folder \n**columns** \n Array of columns to return \n**filters** \n Dictionary where the key is the column and the array is the values \n**btwn** \n Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array). \n## How to use (example)\n``` python\nimport polars_partitions as plp\nfrom datetime import date\nimport polars as pl\n\n# Create a test dataset\ndf = pl.DataFrame({'col1':[date(2024,1,1),date(2024,1,1),date(2024,1,2),date(2024,1,2),date(2024,1,2),date(2024,1,3),date(2024,1,3),date(2024,1,3)],\n 'col2':['A2','A2','A2','A2','A2','A2','B2','B2','B2','B2'],\n 'col3':[1,2,3,4,5,6,7,8]\n })\n\noutput_path = 'your_path/folder_name_where_to_save'.\n# Which columns are partitioned by\ncolumns = ['col1', 'col2'] \n\npp = plp.polars_partitions()\n\n# Write the partitions\npp.wr_partition(df, columns, output_path)\n\n# Read TOC\n# print(pp.rd_toc(output_path))\n\n# Read partitions and apply filters\n# filters = {'col1':[date(2024,1,1),date(2024,1,3)]}\n# df = pp.rd_partition(output_path, filters=filters, btwn='col1', columns=['col1', 'col3']) \n# print(df.collect())\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "Simplified work with partitions based on Polars library",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/dwenlvov/polars_partitions",
"Homepage": "https://github.com/dwenlvov/"
},
"split_keywords": [
"polars",
"partitions",
"parquet"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f9a8e7b5a3aef5756d67e0a685a7adccb60b8aac3c82cb6e5913dd0c865ddd86",
"md5": "617ee1eb6832704224384e721033ee13",
"sha256": "782455d46aa5db27ff49de82570a15a9c23581fa77e7346d127617a26e5d38db"
},
"downloads": -1,
"filename": "polars_partitions-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "617ee1eb6832704224384e721033ee13",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 4965,
"upload_time": "2024-02-19T18:38:57",
"upload_time_iso_8601": "2024-02-19T18:38:57.570457Z",
"url": "https://files.pythonhosted.org/packages/f9/a8/e7b5a3aef5756d67e0a685a7adccb60b8aac3c82cb6e5913dd0c865ddd86/polars_partitions-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b734fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e",
"md5": "63962e340b40244b27b7465cd82ad601",
"sha256": "446a83959b8db36d6a0a685c3dc11bc7e133b40677d5682aa352f5c0f9ee8471"
},
"downloads": -1,
"filename": "polars_partitions-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "63962e340b40244b27b7465cd82ad601",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 4879,
"upload_time": "2024-02-19T18:38:59",
"upload_time_iso_8601": "2024-02-19T18:38:59.125139Z",
"url": "https://files.pythonhosted.org/packages/b7/34/fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e/polars_partitions-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-19 18:38:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dwenlvov",
"github_project": "polars_partitions",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "polars-partitions"
}