Name | hyper-python-utils JSON |
Version |
0.1.2
JSON |
| download |
home_page | None |
Summary | AWS S3 and Athena utilities for data processing with Polars |
upload_time | 2025-08-12 05:32:29 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
aws
s3
athena
polars
data
utilities
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Hyper Python Utils
AWS S3 and Athena utilities for data processing with Polars.
## Installation
```bash
pip install hyper-python-utils
```
## Features
- **FileHandler**: S3 file operations with Polars DataFrames
- Upload/download CSV and Parquet files
- Parallel loading of multiple files
- Partitioned uploads by range or date
- Support for compressed formats
- **QueryManager**: Athena query execution and management
- Execute queries with result monitoring
- Clean up query result files
- Error handling and timeouts
## Quick Start
### FileHandler Usage
```python
from hyper_python_utils import FileHandler
import polars as pl
# Initialize FileHandler
handler = FileHandler(bucket="my-s3-bucket", region="ap-northeast-2")
# Read a file from S3
df = handler.get_object("data/sample.parquet")
# Upload a DataFrame to S3
sample_df = pl.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
handler.upload_dataframe(sample_df, "output/result.parquet", "parquet")
# Upload with partitioning by range
handler.upload_dataframe_partitioned_by_range(
df, "partitioned_data/", partition_size=50000
)
# Load all files from a prefix in parallel
combined_df = handler.load_all_objects_parallel("data/batch_*/", max_workers=4)
```
### QueryManager Usage
```python
from hyper_python_utils import QueryManager, EmptyResultError
import polars as pl
# Initialize QueryManager with custom result prefix and auto cleanup
query_manager = QueryManager(
bucket="my-athena-results",
result_prefix="custom/query_results/",
auto_cleanup=True # Default: True - automatically delete query result files after reading
)
# Method 1: Execute query and get DataFrame result directly (recommended)
query = "SELECT * FROM my_table LIMIT 100"
try:
df = query_manager.query(query, database="my_database")
print(df)
except EmptyResultError:
print("Query returned no results")
# Method 2: Manual query execution with result retrieval
query_id = query_manager.execute(query, database="my_database")
result_location = query_manager.wait_for_completion(query_id)
df = query_manager.get_result(query_id) # Auto cleanup based on QueryManager setting
# Method 2b: Override auto cleanup for specific query
df_no_cleanup = query_manager.get_result(query_id, auto_cleanup=False) # Keep result file
# Method 3: Execute UNLOAD query and get list of output files
unload_query = """
UNLOAD (SELECT * FROM my_large_table)
TO 's3://my-bucket/unloaded-data/'
WITH (format = 'PARQUET', compression = 'SNAPPY')
"""
output_files = query_manager.unload(unload_query, database="my_database")
print(f"Unloaded files: {output_files}")
# Manual cleanup of old query results (if auto_cleanup is disabled)
query_manager.delete_query_results_by_prefix("s3://my-bucket/old-results/")
# Disable auto cleanup for all queries
query_manager_no_cleanup = QueryManager(
bucket="my-athena-results",
auto_cleanup=False
)
```
## Requirements
- Python >= 3.8
- boto3 >= 1.26.0
- polars >= 0.18.0
## AWS Configuration
Make sure your AWS credentials are configured either through:
- AWS CLI (`aws configure`)
- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
- IAM roles (when running on EC2)
Required permissions:
- S3: `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, `s3:DeleteObject`
- Athena: `athena:StartQueryExecution`, `athena:GetQueryExecution`
## License
MIT License
Raw data
{
"_id": null,
"home_page": null,
"name": "hyper-python-utils",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "aws, s3, athena, polars, data, utilities",
"author": null,
"author_email": "jaeyoung_lim <limjyjustin@naver.com>",
"download_url": "https://files.pythonhosted.org/packages/c9/36/bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260/hyper_python_utils-0.1.2.tar.gz",
"platform": null,
"description": "# Hyper Python Utils\n\nAWS S3 and Athena utilities for data processing with Polars.\n\n## Installation\n\n```bash\npip install hyper-python-utils\n```\n\n## Features\n\n- **FileHandler**: S3 file operations with Polars DataFrames\n - Upload/download CSV and Parquet files\n - Parallel loading of multiple files\n - Partitioned uploads by range or date\n - Support for compressed formats\n\n- **QueryManager**: Athena query execution and management\n - Execute queries with result monitoring\n - Clean up query result files\n - Error handling and timeouts\n\n## Quick Start\n\n### FileHandler Usage\n\n```python\nfrom hyper_python_utils import FileHandler\nimport polars as pl\n\n# Initialize FileHandler\nhandler = FileHandler(bucket=\"my-s3-bucket\", region=\"ap-northeast-2\")\n\n# Read a file from S3\ndf = handler.get_object(\"data/sample.parquet\")\n\n# Upload a DataFrame to S3\nsample_df = pl.DataFrame({\"col1\": [1, 2, 3], \"col2\": [\"a\", \"b\", \"c\"]})\nhandler.upload_dataframe(sample_df, \"output/result.parquet\", \"parquet\")\n\n# Upload with partitioning by range\nhandler.upload_dataframe_partitioned_by_range(\n df, \"partitioned_data/\", partition_size=50000\n)\n\n# Load all files from a prefix in parallel\ncombined_df = handler.load_all_objects_parallel(\"data/batch_*/\", max_workers=4)\n```\n\n### QueryManager Usage\n\n```python\nfrom hyper_python_utils import QueryManager, EmptyResultError\nimport polars as pl\n\n# Initialize QueryManager with custom result prefix and auto cleanup\nquery_manager = QueryManager(\n bucket=\"my-athena-results\",\n result_prefix=\"custom/query_results/\",\n auto_cleanup=True # Default: True - automatically delete query result files after reading\n)\n\n# Method 1: Execute query and get DataFrame result directly (recommended)\nquery = \"SELECT * FROM my_table LIMIT 100\"\ntry:\n df = query_manager.query(query, database=\"my_database\")\n print(df)\nexcept EmptyResultError:\n print(\"Query returned no results\")\n\n# Method 2: Manual query execution with result retrieval\nquery_id = query_manager.execute(query, database=\"my_database\")\nresult_location = query_manager.wait_for_completion(query_id)\ndf = query_manager.get_result(query_id) # Auto cleanup based on QueryManager setting\n\n# Method 2b: Override auto cleanup for specific query\ndf_no_cleanup = query_manager.get_result(query_id, auto_cleanup=False) # Keep result file\n\n# Method 3: Execute UNLOAD query and get list of output files\nunload_query = \"\"\"\nUNLOAD (SELECT * FROM my_large_table)\nTO 's3://my-bucket/unloaded-data/'\nWITH (format = 'PARQUET', compression = 'SNAPPY')\n\"\"\"\noutput_files = query_manager.unload(unload_query, database=\"my_database\")\nprint(f\"Unloaded files: {output_files}\")\n\n# Manual cleanup of old query results (if auto_cleanup is disabled)\nquery_manager.delete_query_results_by_prefix(\"s3://my-bucket/old-results/\")\n\n# Disable auto cleanup for all queries\nquery_manager_no_cleanup = QueryManager(\n bucket=\"my-athena-results\",\n auto_cleanup=False\n)\n```\n\n## Requirements\n\n- Python >= 3.8\n- boto3 >= 1.26.0\n- polars >= 0.18.0\n\n## AWS Configuration\n\nMake sure your AWS credentials are configured either through:\n- AWS CLI (`aws configure`)\n- Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)\n- IAM roles (when running on EC2)\n\nRequired permissions:\n- S3: `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, `s3:DeleteObject`\n- Athena: `athena:StartQueryExecution`, `athena:GetQueryExecution`\n\n## License\n\nMIT License\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "AWS S3 and Athena utilities for data processing with Polars",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/NHNAD-wooyeon/hyper-python-utils/issues",
"Documentation": "https://github.com/NHNAD-wooyeon/hyper-python-utils#readme",
"Homepage": "https://github.com/NHNAD-wooyeon/hyper-python-utils",
"Repository": "https://github.com/NHNAD-wooyeon/hyper-python-utils"
},
"split_keywords": [
"aws",
" s3",
" athena",
" polars",
" data",
" utilities"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1e11f29c56c9f1a1abadcde973bac29c74f0ab4bcf6d66dd475206c4eef91ca4",
"md5": "76c5d1ad970b5241e2bb6df6317977c6",
"sha256": "ebfbe5d341e068aa98d4faabd4fb7d561a1ea6513f46024cad690343e49a94c6"
},
"downloads": -1,
"filename": "hyper_python_utils-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "76c5d1ad970b5241e2bb6df6317977c6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 7982,
"upload_time": "2025-08-12T05:32:28",
"upload_time_iso_8601": "2025-08-12T05:32:28.461338Z",
"url": "https://files.pythonhosted.org/packages/1e/11/f29c56c9f1a1abadcde973bac29c74f0ab4bcf6d66dd475206c4eef91ca4/hyper_python_utils-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c936bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260",
"md5": "3d96e0a4dd6ed6300eca6ac669227435",
"sha256": "4bb22f5916b4411f01c2d98e81870261638489190d1f4f53ac4504adbeb555ad"
},
"downloads": -1,
"filename": "hyper_python_utils-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "3d96e0a4dd6ed6300eca6ac669227435",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8754,
"upload_time": "2025-08-12T05:32:29",
"upload_time_iso_8601": "2025-08-12T05:32:29.301826Z",
"url": "https://files.pythonhosted.org/packages/c9/36/bb303b5387e7d168b0542666b8834cd95346e1c6bc0e7c6e38e12e21e260/hyper_python_utils-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 05:32:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NHNAD-wooyeon",
"github_project": "hyper-python-utils",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hyper-python-utils"
}