partitioneer


Namepartitioneer JSON
Version 0.2.13 PyPI version JSON
download
home_pagehttps://github.com/lachlancahill/partitioneer
SummaryA library for reading and writing partitioned data
upload_time2024-09-10 21:00:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Partitioneer

Partitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.

## Installation

You can install Partitioneer using pip:

```sh
pip install partitioneer
```

## Usage

### Writing Data to Partitions

To write data to partitioned Parquet files:

```python
from partitioneer import write_data_to_partitions
import pandas as pd

df = pd.DataFrame(...)  # Your data
write_data_to_partitions(
    df,
    base_path="/path/to/data",
    date_col="date_column",
    override_existing=False
)
```

### Reading Data from Partitions

To read data from partitioned Parquet files:

```python
from partitioneer import read_data_from_partitions, PartitionFilter

df = read_data_from_partitions(
    base_path="/path/to/data",
    filters=[
        PartitionFilter("category", "in", ["A", "B"]),
        PartitionFilter("value", "greater_than", 100)
    ],
    add_partition_date=True,
    start_date="2024-01-01",
    end_date="2024-12-31"
)
```

### Getting Partition Date Information

To get the latest or first partition date:

```python
from partitioneer import get_latest_partition_date, get_first_partition_date

latest_date = get_latest_partition_date("/path/to/data")
first_date = get_first_partition_date("/path/to/data")
```

## Build Instructions

To build the package:

```sh
python setup.py sdist bdist_wheel
```

To upload to PyPI:

```sh
pip install twine
twine upload dist/*
```

Automated build and publish script:

```shell
python setup.py sdist bdist_wheel
pip install twine
twine upload dist/* --password <add_pypi_token_here>
rm -r ./build
rm -r ./dist
rm -r ./partitioneer.egg-info
```

## License

[MIT License](LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lachlancahill/partitioneer",
    "name": "partitioneer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8b/a2/01cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e/partitioneer-0.2.13.tar.gz",
    "platform": null,
    "description": "# Partitioneer\n\nPartitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.\n\n## Installation\n\nYou can install Partitioneer using pip:\n\n```sh\npip install partitioneer\n```\n\n## Usage\n\n### Writing Data to Partitions\n\nTo write data to partitioned Parquet files:\n\n```python\nfrom partitioneer import write_data_to_partitions\nimport pandas as pd\n\ndf = pd.DataFrame(...)  # Your data\nwrite_data_to_partitions(\n    df,\n    base_path=\"/path/to/data\",\n    date_col=\"date_column\",\n    override_existing=False\n)\n```\n\n### Reading Data from Partitions\n\nTo read data from partitioned Parquet files:\n\n```python\nfrom partitioneer import read_data_from_partitions, PartitionFilter\n\ndf = read_data_from_partitions(\n    base_path=\"/path/to/data\",\n    filters=[\n        PartitionFilter(\"category\", \"in\", [\"A\", \"B\"]),\n        PartitionFilter(\"value\", \"greater_than\", 100)\n    ],\n    add_partition_date=True,\n    start_date=\"2024-01-01\",\n    end_date=\"2024-12-31\"\n)\n```\n\n### Getting Partition Date Information\n\nTo get the latest or first partition date:\n\n```python\nfrom partitioneer import get_latest_partition_date, get_first_partition_date\n\nlatest_date = get_latest_partition_date(\"/path/to/data\")\nfirst_date = get_first_partition_date(\"/path/to/data\")\n```\n\n## Build Instructions\n\nTo build the package:\n\n```sh\npython setup.py sdist bdist_wheel\n```\n\nTo upload to PyPI:\n\n```sh\npip install twine\ntwine upload dist/*\n```\n\nAutomated build and publish script:\n\n```shell\npython setup.py sdist bdist_wheel\npip install twine\ntwine upload dist/* --password <add_pypi_token_here>\nrm -r ./build\nrm -r ./dist\nrm -r ./partitioneer.egg-info\n```\n\n## License\n\n[MIT License](LICENSE)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library for reading and writing partitioned data",
    "version": "0.2.13",
    "project_urls": {
        "Homepage": "https://github.com/lachlancahill/partitioneer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd329341b31c398cd1e82a66c065c2bb55d2e995dd3eaea54c8e549049bc14a6",
                "md5": "ad43edb4325c0f73a7f927af0226d9c3",
                "sha256": "a1838208e573d7d26326a1bf7e862db3e6c854ac2fca8165e855116041c9edc2"
            },
            "downloads": -1,
            "filename": "partitioneer-0.2.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad43edb4325c0f73a7f927af0226d9c3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 6787,
            "upload_time": "2024-09-10T21:00:34",
            "upload_time_iso_8601": "2024-09-10T21:00:34.666089Z",
            "url": "https://files.pythonhosted.org/packages/fd/32/9341b31c398cd1e82a66c065c2bb55d2e995dd3eaea54c8e549049bc14a6/partitioneer-0.2.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ba201cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e",
                "md5": "4c9b06d9d342fc45c6b7f7ed64d9615b",
                "sha256": "6cd23fcc6a958bb6ffd2a2b20cf199111f2049c16dc437a69ee58527180d44be"
            },
            "downloads": -1,
            "filename": "partitioneer-0.2.13.tar.gz",
            "has_sig": false,
            "md5_digest": "4c9b06d9d342fc45c6b7f7ed64d9615b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7047,
            "upload_time": "2024-09-10T21:00:36",
            "upload_time_iso_8601": "2024-09-10T21:00:36.028525Z",
            "url": "https://files.pythonhosted.org/packages/8b/a2/01cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e/partitioneer-0.2.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-10 21:00:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lachlancahill",
    "github_project": "partitioneer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "partitioneer"
}
        
Elapsed time: 0.47388s