hyper-python-utils


Namehyper-python-utils JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryAWS S3 and Athena utilities for data processing with Polars
upload_time2025-08-12 05:32:29
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords aws s3 athena polars data utilities
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hyper Python Utils

AWS S3 and Athena utilities for data processing with Polars.

## Installation

```bash
pip install hyper-python-utils
```

## Features

- **FileHandler**: S3 file operations with Polars DataFrames
  - Upload/download CSV and Parquet files
  - Parallel loading of multiple files
  - Partitioned uploads by range or date
  - Support for compressed formats

- **QueryManager**: Athena query execution and management
  - Execute queries with result monitoring
  - Clean up query result files
  - Error handling and timeouts

## Quick Start

### FileHandler Usage

```python
from hyper_python_utils import FileHandler
import polars as pl

# Initialize FileHandler
handler = FileHandler(bucket="my-s3-bucket", region="ap-northeast-2")

# Read a file from S3
df = handler.get_object("data/sample.parquet")

# Upload a DataFrame to S3
sample_df = pl.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
handler.upload_dataframe(sample_df, "output/result.parquet", "parquet")

# Upload with partitioning by range
handler.upload_dataframe_partitioned_by_range(
    df, "partitioned_data/", partition_size=50000
)

# Load all files from a prefix in parallel
combined_df = handler.load_all_objects_parallel("data/batch_*/", max_workers=4)
```

### QueryManager Usage

```python
from hyper_python_utils import QueryManager, EmptyResultError
import polars as pl

# Initialize QueryManager with custom result prefix and auto cleanup
query_manager = QueryManager(
    bucket="my-athena-results",
    result_prefix="custom/query_results/",
    auto_cleanup=True  # Default: True - automatically delete query result files after reading
)

# Method 1: Execute query and get DataFrame result directly (recommended)
query = "SELECT * FROM my_table LIMIT 100"
try:
    df = query_manager.query(query, database="my_database")
    print(df)
except EmptyResultError:
    print("Query returned no results")

# Method 2: Manual query execution with result retrieval
query_id = query_manager.execute(query, database="my_database")
result_location = query_manager.wait_for_completion(query_id)
df = query_manager.get_result(query_id)  # Auto cleanup based on QueryManager setting

# Method 2b: Override auto cleanup for specific query
df_no_cleanup = query_manager.get_result(query_id, auto_cleanup=False)  # Keep result file

# Method 3: Execute UNLOAD query and get list of output files
unload_query = """
UNLOAD (SELECT * FROM my_large_table)
TO 's3://my-bucket/unloaded-data/'
WITH (format = 'PARQUET', compression = 'SNAPPY')
"""
output_files = query_manager.unload(unload_query, database="my_database")
print(f"Unloaded files: {output_files}")

# Manual cleanup of old query results (if auto_cleanup is disabled)
query_manager.delete_query_results_by_prefix("s3://my-bucket/old-results/")

# Disable auto cleanup for all queries
query_manager_no_cleanup = QueryManager(
    bucket="my-athena-results",
    auto_cleanup=False
)
```

## Requirements

- Python >= 3.8
- boto3 >= 1.26.0
- polars >= 0.18.0

## AWS Configuration

Make sure your AWS credentials are configured either through:
- AWS CLI (`aws configure`)
- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
- IAM roles (when running on EC2)

Required permissions:
- S3: `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, `s3:DeleteObject`
- Athena: `athena:StartQueryExecution`, `athena:GetQueryExecution`

## License

MIT License

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hyper-python-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "aws, s3, athena, polars, data, utilities",
    "author": null,
    "author_email": "jaeyoung_lim <limjyjustin@naver.com>",
    "download_url": "https://files.pythonhosted.org/packages/c9/36/bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260/hyper_python_utils-0.1.2.tar.gz",
    "platform": null,
    "description": "# Hyper Python Utils\n\nAWS S3 and Athena utilities for data processing with Polars.\n\n## Installation\n\n```bash\npip install hyper-python-utils\n```\n\n## Features\n\n- **FileHandler**: S3 file operations with Polars DataFrames\n  - Upload/download CSV and Parquet files\n  - Parallel loading of multiple files\n  - Partitioned uploads by range or date\n  - Support for compressed formats\n\n- **QueryManager**: Athena query execution and management\n  - Execute queries with result monitoring\n  - Clean up query result files\n  - Error handling and timeouts\n\n## Quick Start\n\n### FileHandler Usage\n\n```python\nfrom hyper_python_utils import FileHandler\nimport polars as pl\n\n# Initialize FileHandler\nhandler = FileHandler(bucket=\"my-s3-bucket\", region=\"ap-northeast-2\")\n\n# Read a file from S3\ndf = handler.get_object(\"data/sample.parquet\")\n\n# Upload a DataFrame to S3\nsample_df = pl.DataFrame({\"col1\": [1, 2, 3], \"col2\": [\"a\", \"b\", \"c\"]})\nhandler.upload_dataframe(sample_df, \"output/result.parquet\", \"parquet\")\n\n# Upload with partitioning by range\nhandler.upload_dataframe_partitioned_by_range(\n    df, \"partitioned_data/\", partition_size=50000\n)\n\n# Load all files from a prefix in parallel\ncombined_df = handler.load_all_objects_parallel(\"data/batch_*/\", max_workers=4)\n```\n\n### QueryManager Usage\n\n```python\nfrom hyper_python_utils import QueryManager, EmptyResultError\nimport polars as pl\n\n# Initialize QueryManager with custom result prefix and auto cleanup\nquery_manager = QueryManager(\n    bucket=\"my-athena-results\",\n    result_prefix=\"custom/query_results/\",\n    auto_cleanup=True  # Default: True - automatically delete query result files after reading\n)\n\n# Method 1: Execute query and get DataFrame result directly (recommended)\nquery = \"SELECT * FROM my_table LIMIT 100\"\ntry:\n    df = query_manager.query(query, database=\"my_database\")\n    print(df)\nexcept EmptyResultError:\n    print(\"Query returned no results\")\n\n# Method 2: Manual query execution with result retrieval\nquery_id = query_manager.execute(query, database=\"my_database\")\nresult_location = query_manager.wait_for_completion(query_id)\ndf = query_manager.get_result(query_id)  # Auto cleanup based on QueryManager setting\n\n# Method 2b: Override auto cleanup for specific query\ndf_no_cleanup = query_manager.get_result(query_id, auto_cleanup=False)  # Keep result file\n\n# Method 3: Execute UNLOAD query and get list of output files\nunload_query = \"\"\"\nUNLOAD (SELECT * FROM my_large_table)\nTO 's3://my-bucket/unloaded-data/'\nWITH (format = 'PARQUET', compression = 'SNAPPY')\n\"\"\"\noutput_files = query_manager.unload(unload_query, database=\"my_database\")\nprint(f\"Unloaded files: {output_files}\")\n\n# Manual cleanup of old query results (if auto_cleanup is disabled)\nquery_manager.delete_query_results_by_prefix(\"s3://my-bucket/old-results/\")\n\n# Disable auto cleanup for all queries\nquery_manager_no_cleanup = QueryManager(\n    bucket=\"my-athena-results\",\n    auto_cleanup=False\n)\n```\n\n## Requirements\n\n- Python >= 3.8\n- boto3 >= 1.26.0\n- polars >= 0.18.0\n\n## AWS Configuration\n\nMake sure your AWS credentials are configured either through:\n- AWS CLI (`aws configure`)\n- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)\n- IAM roles (when running on EC2)\n\nRequired permissions:\n- S3: `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, `s3:DeleteObject`\n- Athena: `athena:StartQueryExecution`, `athena:GetQueryExecution`\n\n## License\n\nMIT License\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AWS S3 and Athena utilities for data processing with Polars",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/NHNAD-wooyeon/hyper-python-utils/issues",
        "Documentation": "https://github.com/NHNAD-wooyeon/hyper-python-utils#readme",
        "Homepage": "https://github.com/NHNAD-wooyeon/hyper-python-utils",
        "Repository": "https://github.com/NHNAD-wooyeon/hyper-python-utils"
    },
    "split_keywords": [
        "aws",
        " s3",
        " athena",
        " polars",
        " data",
        " utilities"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1e11f29c56c9f1a1abadcde973bac29c74f0ab4bcf6d66dd475206c4eef91ca4",
                "md5": "76c5d1ad970b5241e2bb6df6317977c6",
                "sha256": "ebfbe5d341e068aa98d4faabd4fb7d561a1ea6513f46024cad690343e49a94c6"
            },
            "downloads": -1,
            "filename": "hyper_python_utils-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "76c5d1ad970b5241e2bb6df6317977c6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7982,
            "upload_time": "2025-08-12T05:32:28",
            "upload_time_iso_8601": "2025-08-12T05:32:28.461338Z",
            "url": "https://files.pythonhosted.org/packages/1e/11/f29c56c9f1a1abadcde973bac29c74f0ab4bcf6d66dd475206c4eef91ca4/hyper_python_utils-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c936bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260",
                "md5": "3d96e0a4dd6ed6300eca6ac669227435",
                "sha256": "4bb22f5916b4411f01c2d98e81870261638489190d1f4f53ac4504adbeb555ad"
            },
            "downloads": -1,
            "filename": "hyper_python_utils-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3d96e0a4dd6ed6300eca6ac669227435",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8754,
            "upload_time": "2025-08-12T05:32:29",
            "upload_time_iso_8601": "2025-08-12T05:32:29.301826Z",
            "url": "https://files.pythonhosted.org/packages/c9/36/bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260/hyper_python_utils-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-12 05:32:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NHNAD-wooyeon",
    "github_project": "hyper-python-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hyper-python-utils"
}
        
Elapsed time: 1.13960s