# Partitioneer
Partitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.
## Installation
You can install Partitioneer using pip:
```sh
pip install partitioneer
```
## Usage
### Writing Data to Partitions
To write data to partitioned Parquet files:
```python
from partitioneer import write_data_to_partitions
import pandas as pd
df = pd.DataFrame(...) # Your data
write_data_to_partitions(
df,
base_path="/path/to/data",
date_col="date_column",
override_existing=False
)
```
### Reading Data from Partitions
To read data from partitioned Parquet files:
```python
from partitioneer import read_data_from_partitions, PartitionFilter
df = read_data_from_partitions(
base_path="/path/to/data",
filters=[
PartitionFilter("category", "in", ["A", "B"]),
PartitionFilter("value", "greater_than", 100)
],
add_partition_date=True,
start_date="2024-01-01",
end_date="2024-12-31"
)
```
### Getting Partition Date Information
To get the latest or first partition date:
```python
from partitioneer import get_latest_partition_date, get_first_partition_date
latest_date = get_latest_partition_date("/path/to/data")
first_date = get_first_partition_date("/path/to/data")
```
## Build Instructions
To build the package:
```sh
python setup.py sdist bdist_wheel
```
To upload to PyPI:
```sh
pip install twine
twine upload dist/*
```
Automated build and publish script:
```shell
python setup.py sdist bdist_wheel
pip install twine
twine upload dist/* --password <add_pypi_token_here>
rm -r ./build
rm -r ./dist
rm -r ./partitioneer.egg-info
```
## License
[MIT License](LICENSE)
Raw data
{
"_id": null,
"home_page": "https://github.com/lachlancahill/partitioneer",
"name": "partitioneer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/8b/a2/01cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e/partitioneer-0.2.13.tar.gz",
"platform": null,
"description": "# Partitioneer\n\nPartitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.\n\n## Installation\n\nYou can install Partitioneer using pip:\n\n```sh\npip install partitioneer\n```\n\n## Usage\n\n### Writing Data to Partitions\n\nTo write data to partitioned Parquet files:\n\n```python\nfrom partitioneer import write_data_to_partitions\nimport pandas as pd\n\ndf = pd.DataFrame(...) # Your data\nwrite_data_to_partitions(\n df,\n base_path=\"/path/to/data\",\n date_col=\"date_column\",\n override_existing=False\n)\n```\n\n### Reading Data from Partitions\n\nTo read data from partitioned Parquet files:\n\n```python\nfrom partitioneer import read_data_from_partitions, PartitionFilter\n\ndf = read_data_from_partitions(\n base_path=\"/path/to/data\",\n filters=[\n PartitionFilter(\"category\", \"in\", [\"A\", \"B\"]),\n PartitionFilter(\"value\", \"greater_than\", 100)\n ],\n add_partition_date=True,\n start_date=\"2024-01-01\",\n end_date=\"2024-12-31\"\n)\n```\n\n### Getting Partition Date Information\n\nTo get the latest or first partition date:\n\n```python\nfrom partitioneer import get_latest_partition_date, get_first_partition_date\n\nlatest_date = get_latest_partition_date(\"/path/to/data\")\nfirst_date = get_first_partition_date(\"/path/to/data\")\n```\n\n## Build Instructions\n\nTo build the package:\n\n```sh\npython setup.py sdist bdist_wheel\n```\n\nTo upload to PyPI:\n\n```sh\npip install twine\ntwine upload dist/*\n```\n\nAutomated build and publish script:\n\n```shell\npython setup.py sdist bdist_wheel\npip install twine\ntwine upload dist/* --password <add_pypi_token_here>\nrm -r ./build\nrm -r ./dist\nrm -r ./partitioneer.egg-info\n```\n\n## License\n\n[MIT License](LICENSE)\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for reading and writing partitioned data",
"version": "0.2.13",
"project_urls": {
"Homepage": "https://github.com/lachlancahill/partitioneer"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fd329341b31c398cd1e82a66c065c2bb55d2e995dd3eaea54c8e549049bc14a6",
"md5": "ad43edb4325c0f73a7f927af0226d9c3",
"sha256": "a1838208e573d7d26326a1bf7e862db3e6c854ac2fca8165e855116041c9edc2"
},
"downloads": -1,
"filename": "partitioneer-0.2.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad43edb4325c0f73a7f927af0226d9c3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 6787,
"upload_time": "2024-09-10T21:00:34",
"upload_time_iso_8601": "2024-09-10T21:00:34.666089Z",
"url": "https://files.pythonhosted.org/packages/fd/32/9341b31c398cd1e82a66c065c2bb55d2e995dd3eaea54c8e549049bc14a6/partitioneer-0.2.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8ba201cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e",
"md5": "4c9b06d9d342fc45c6b7f7ed64d9615b",
"sha256": "6cd23fcc6a958bb6ffd2a2b20cf199111f2049c16dc437a69ee58527180d44be"
},
"downloads": -1,
"filename": "partitioneer-0.2.13.tar.gz",
"has_sig": false,
"md5_digest": "4c9b06d9d342fc45c6b7f7ed64d9615b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7047,
"upload_time": "2024-09-10T21:00:36",
"upload_time_iso_8601": "2024-09-10T21:00:36.028525Z",
"url": "https://files.pythonhosted.org/packages/8b/a2/01cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e/partitioneer-0.2.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-10 21:00:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lachlancahill",
"github_project": "partitioneer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "partitioneer"
}