polars-partitions


Namepolars-partitions JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/dwenlvov/
SummarySimplified work with partitions based on Polars library
upload_time2024-02-19 18:38:59
maintainer
docs_urlNone
authordenis_lvov
requires_python>=3.11
license
keywords polars partitions parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## polars_partitions

**GitHub:** [polars_partitions](https://github.com/dwenlvov/polars_partitions)  
**Version**: 0.1.1

### Python
```
pip install polars_partitions
```

## Description
This library is not a replacement for [Polars](https://pola.rs/).
The main goal is to improve the work (write/read/filter) with partitions by creating a Table Of Contents file (hereinafter referred to as "TOC").

### Write Partition
### polars_parquet.wr_partition() 
**polars_parquet.wr_partition(**  
          _df_: DataFrame,  
          _columns_: array | string,  
          _output_path_: str  
**)**

#### Parameters
**df**  
          Polars DataFrame  
**columns**  
          Array of columns on which to create partitions  
**output_path**  
          Path to save to  

### TOC record
### polars_parquet.wr_toc() 
**polars_parquet.wr_toc(**  
          _df_: DataFrame on which the partitions are based,  
          _columns_: array | string,  
          _output_path_: str  
**)**

#### Parameters
**df**  
          Dictionary, where the key is the column and the array is the values  
**columns**  
          Array of columns to create partitions for  
**output_path**  
          Path to save to  

### Reading TOC
### polars_parquet.rd_toc() 
**polars_parquet.rd_toc(**  
          _output_path_: DataFrame,  
          _filters_: dict = None,  
          _btwn_: str = None  
**)**

#### Parameters
**output_path**  
          Path where to save.  
**filters**  
          Dictionary, where the key is the column and the array is the values  
**btwn**  
          Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).  

### Read Partition
### polars_parquet.rd_partition() 
**polars_parquet.rd_partition(**  
          _output_path_: str,  
          _columns_: array | string = "*",  
          _filters_: dict = None,  
          _btwn_: str = None  
**)** → LazyFrame  

#### Parameters
**output_path**  
          Path to the parquet file or to the partitions folder  
**columns**  
          Array of columns to return  
**filters**  
          Dictionary where the key is the column and the array is the values  
**btwn**  
          Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).  
## How to use (example)
``` python
import polars_partitions as plp
from datetime import date
import polars as pl

# Create a test dataset
df = pl.DataFrame({'col1':[date(2024,1,1),date(2024,1,1),date(2024,1,2),date(2024,1,2),date(2024,1,2),date(2024,1,3),date(2024,1,3),date(2024,1,3)],
              'col2':['A2','A2','A2','A2','A2','A2','B2','B2','B2','B2'],
              'col3':[1,2,3,4,5,6,7,8]
              })

output_path = 'your_path/folder_name_where_to_save'.
# Which columns are partitioned by
columns = ['col1', 'col2'] 

pp = plp.polars_partitions()

# Write the partitions
pp.wr_partition(df, columns, output_path)

# Read TOC
# print(pp.rd_toc(output_path))

# Read partitions and apply filters
# filters = {'col1':[date(2024,1,1),date(2024,1,3)]}
# df = pp.rd_partition(output_path, filters=filters, btwn='col1', columns=['col1', 'col3']) 
# print(df.collect())
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dwenlvov/",
    "name": "polars-partitions",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "",
    "keywords": "polars,partitions,parquet",
    "author": "denis_lvov",
    "author_email": "dwenlvov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/34/fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e/polars_partitions-0.1.1.tar.gz",
    "platform": null,
    "description": "## polars_partitions\n\n**GitHub:** [polars_partitions](https://github.com/dwenlvov/polars_partitions)  \n**Version**: 0.1.1\n\n### Python\n```\npip install polars_partitions\n```\n\n## Description\nThis library is not a replacement for [Polars](https://pola.rs/).\nThe main goal is to improve the work (write/read/filter) with partitions by creating a Table Of Contents file (hereinafter referred to as \"TOC\").\n\n### Write Partition\n### polars_parquet.wr_partition() \n**polars_parquet.wr_partition(**  \n          _df_: DataFrame,  \n          _columns_: array | string,  \n          _output_path_: str  \n**)**\n\n#### Parameters\n**df**  \n          Polars DataFrame  \n**columns**  \n          Array of columns on which to create partitions  \n**output_path**  \n          Path to save to  \n\n### TOC record\n### polars_parquet.wr_toc() \n**polars_parquet.wr_toc(**  \n          _df_: DataFrame on which the partitions are based,  \n          _columns_: array | string,  \n          _output_path_: str  \n**)**\n\n#### Parameters\n**df**  \n          Dictionary, where the key is the column and the array is the values  \n**columns**  \n          Array of columns to create partitions for  \n**output_path**  \n          Path to save to  \n\n### Reading TOC\n### polars_parquet.rd_toc() \n**polars_parquet.rd_toc(**  \n          _output_path_: DataFrame,  \n          _filters_: dict = None,  \n          _btwn_: str = None  \n**)**\n\n#### Parameters\n**output_path**  \n          Path where to save.  \n**filters**  \n          Dictionary, where the key is the column and the array is the values  \n**btwn**  \n          Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).  \n\n### Read Partition\n### polars_parquet.rd_partition() \n**polars_parquet.rd_partition(**  \n          _output_path_: str,  \n          _columns_: array | string = \"*\",  \n          _filters_: dict = None,  \n          _btwn_: str = None  \n**)** \u2192 LazyFrame  \n\n#### Parameters\n**output_path**  \n          Path to the parquet file or to the partitions folder  \n**columns**  \n          Array of columns to return  \n**filters**  \n          Dictionary where the key is the column and the array is the values  \n**btwn**  \n          Works in conjunction with **filters**. It takes as input the **column name** on which to apply the **between** filter. It takes the first two values from the filters(array).  \n## How to use (example)\n``` python\nimport polars_partitions as plp\nfrom datetime import date\nimport polars as pl\n\n# Create a test dataset\ndf = pl.DataFrame({'col1':[date(2024,1,1),date(2024,1,1),date(2024,1,2),date(2024,1,2),date(2024,1,2),date(2024,1,3),date(2024,1,3),date(2024,1,3)],\n              'col2':['A2','A2','A2','A2','A2','A2','B2','B2','B2','B2'],\n              'col3':[1,2,3,4,5,6,7,8]\n              })\n\noutput_path = 'your_path/folder_name_where_to_save'.\n# Which columns are partitioned by\ncolumns = ['col1', 'col2'] \n\npp = plp.polars_partitions()\n\n# Write the partitions\npp.wr_partition(df, columns, output_path)\n\n# Read TOC\n# print(pp.rd_toc(output_path))\n\n# Read partitions and apply filters\n# filters = {'col1':[date(2024,1,1),date(2024,1,3)]}\n# df = pp.rd_partition(output_path, filters=filters, btwn='col1', columns=['col1', 'col3']) \n# print(df.collect())\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Simplified work with partitions based on Polars library",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/dwenlvov/polars_partitions",
        "Homepage": "https://github.com/dwenlvov/"
    },
    "split_keywords": [
        "polars",
        "partitions",
        "parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9a8e7b5a3aef5756d67e0a685a7adccb60b8aac3c82cb6e5913dd0c865ddd86",
                "md5": "617ee1eb6832704224384e721033ee13",
                "sha256": "782455d46aa5db27ff49de82570a15a9c23581fa77e7346d127617a26e5d38db"
            },
            "downloads": -1,
            "filename": "polars_partitions-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "617ee1eb6832704224384e721033ee13",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 4965,
            "upload_time": "2024-02-19T18:38:57",
            "upload_time_iso_8601": "2024-02-19T18:38:57.570457Z",
            "url": "https://files.pythonhosted.org/packages/f9/a8/e7b5a3aef5756d67e0a685a7adccb60b8aac3c82cb6e5913dd0c865ddd86/polars_partitions-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b734fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e",
                "md5": "63962e340b40244b27b7465cd82ad601",
                "sha256": "446a83959b8db36d6a0a685c3dc11bc7e133b40677d5682aa352f5c0f9ee8471"
            },
            "downloads": -1,
            "filename": "polars_partitions-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "63962e340b40244b27b7465cd82ad601",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 4879,
            "upload_time": "2024-02-19T18:38:59",
            "upload_time_iso_8601": "2024-02-19T18:38:59.125139Z",
            "url": "https://files.pythonhosted.org/packages/b7/34/fdf988e3155849f0ba478d9739cdfce40fa1929d513cfabfc2abaa1c725e/polars_partitions-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-19 18:38:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dwenlvov",
    "github_project": "polars_partitions",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "polars-partitions"
}
        
Elapsed time: 3.09128s