Name | s3-datakit JSON |
Version |
0.3.9
JSON |
| download |
home_page | None |
Summary | A Python toolkit to simplify common operations between S3 and Pandas. |
upload_time | 2025-08-10 19:17:56 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT License |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# S3 DataKit 🧰
A Python toolkit to simplify common operations between Amazon S3 and Pandas DataFrames.
## Key Features
* **List** files in an S3 bucket.
* **Upload** local files to S3.
* **Download** files from S3 directly to a local path or a Pandas DataFrame.
* Supports **CSV** and **Stata (.dta)** when reading into DataFrames.
## Installation
```bash
pip install s3-datakit
```
or
```bash
uv add s3-datakit
```
## Credential Configuration
This package uses `boto3` to interact with AWS. `boto3` will automatically search for credentials in the following order:
1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.).
2. The AWS CLI credentials file (`~/.aws/credentials`).
3. IAM roles (if running on an EC2 instance or ECS container).
For local development, the easiest method is to use a `.env` file.
**1. Install `python-dotenv` in your project (not as a library dependency):**
```bash
pip install python-dotenv
```
**2. Create a `.env` file in your project's root:**
```
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
AWS_DEFAULT_REGION=your-region # e.g., us-east-1
```
**3. Load the variables in your script *before* using `s3datakit`:**
```python
from dotenv import load_dotenv
import s3datakit as s3dk
# Load environment variables from .env
load_dotenv()
# Now you can use the package's functions
s3dk.list_s3_files(bucket="my-bucket")
```
## Usage
### List Files
```python
import s3datakit as s3dk
file_list = s3dk.list_s3_files(bucket="my-data-bucket")
if file_list:
print(file_list)
```
### Upload a File
You can specify the full destination path in S3. If `s3_path` is not provided, the original filename from `local_path` is used as the S3 object key.
```python
import s3datakit as s3dk
# Upload with a specific S3 path
s3dk.upload_s3_file(
local_path="reports/report.csv",
bucket="my-data-bucket",
s3_path="final-reports/report_2025.csv"
)
# Upload using the local filename as the S3 key
# This will upload 'reports/report.csv' to 's3://my-data-bucket/report.csv'
s3dk.upload_s3_file(
local_path="reports/report.csv",
bucket="my-data-bucket"
)
```
### Download a File
The `download_s3_file` function is versatile. You can download a file to a local path or load it directly into a Pandas DataFrame.
The download_s3_file function accepts the following parameters:
`bucket (str): Required.` The name of the S3 bucket where the file is located.
`s3_path (str): Required.` The full path (key) of the file within the bucket.
local_path (str, optional): The local path where the file will be saved. If you don't provide this, the file will be saved in a data/ directory in your current working folder, using its original S3 filename.
`to_df (bool, optional, default: False):` If set to True, the function will attempt to read the downloaded file into a Pandas DataFrame. This is useful for .csv and Stata .dta files.
`replace (bool, optional, default: False):` If True, it will overwrite a local file if it already exists. By default, it skips the download if the file is already present to save time and bandwidth.
`low_memory (bool, optional, default: True):` When reading a CSV into a DataFrame (to_df=True), this is passed to pandas.read_csv to process the file in chunks, which can reduce memory usage for large files.
`sep (str, optional, default: ","):`**` The separator or delimiter to use when reading a CSV file into a DataFrame. For example, use '\t' for tab-separated files.
**Option 1: Download to a local path**
By default, if `local_path` is not provided, files are saved to a `data/` directory in the current working directory.
```python
import s3datakit as s3dk
# Download to a specific path
local_file = s3dk.download_s3_file(
bucket="my-data-bucket",
s3_path="final-reports/report_2025.csv",
local_path="downloads/report.csv"
)
print(f"File downloaded to: {local_file}")
# Download to the default 'data/' directory, overwriting if it exists
s3dk.download_s3_file(
bucket="my-data-bucket",
s3_path="final-reports/report_2025.csv",
replace=True
)
```
**Option 2: Download directly to a Pandas DataFrame**
```python
import s3datakit as s3dk
df = s3dk.download_s3_file(
bucket="my-data-bucket",
s3_path="stata-data/survey.dta",
to_df=True
)
print(df.head())
```
Raw data
{
"_id": null,
"home_page": null,
"name": "s3-datakit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Carlos Coelho <coelho.carlosw@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8d/87/00e2f317eb02335b7d7a397f67938f5ef654807cdff8a9b77322fdc5a847/s3_datakit-0.3.9.tar.gz",
"platform": null,
"description": "# S3 DataKit \ud83e\uddf0\n\nA Python toolkit to simplify common operations between Amazon S3 and Pandas DataFrames.\n\n## Key Features\n\n* **List** files in an S3 bucket.\n* **Upload** local files to S3.\n* **Download** files from S3 directly to a local path or a Pandas DataFrame.\n* Supports **CSV** and **Stata (.dta)** when reading into DataFrames.\n\n## Installation\n```bash\npip install s3-datakit\n```\nor\n```bash\nuv add s3-datakit\n```\n\n## Credential Configuration\nThis package uses `boto3` to interact with AWS. `boto3` will automatically search for credentials in the following order:\n\n1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.).\n2. The AWS CLI credentials file (`~/.aws/credentials`).\n3. IAM roles (if running on an EC2 instance or ECS container).\n\nFor local development, the easiest method is to use a `.env` file.\n\n**1. Install `python-dotenv` in your project (not as a library dependency):**\n```bash\npip install python-dotenv\n```\n\n**2. Create a `.env` file in your project's root:**\n```\nAWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY\nAWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY\nAWS_DEFAULT_REGION=your-region # e.g., us-east-1\n```\n\n**3. Load the variables in your script *before* using `s3datakit`:**\n```python\nfrom dotenv import load_dotenv\nimport s3datakit as s3dk\n\n# Load environment variables from .env\nload_dotenv()\n\n# Now you can use the package's functions\ns3dk.list_s3_files(bucket=\"my-bucket\")\n```\n\n## Usage\n\n### List Files\n\n```python\nimport s3datakit as s3dk\n\nfile_list = s3dk.list_s3_files(bucket=\"my-data-bucket\")\nif file_list:\n print(file_list)\n```\n\n### Upload a File\n\nYou can specify the full destination path in S3. If `s3_path` is not provided, the original filename from `local_path` is used as the S3 object key.\n\n```python\nimport s3datakit as s3dk\n\n# Upload with a specific S3 path\ns3dk.upload_s3_file(\n local_path=\"reports/report.csv\",\n bucket=\"my-data-bucket\",\n s3_path=\"final-reports/report_2025.csv\"\n)\n\n# Upload using the local filename as the S3 key\n# This will upload 'reports/report.csv' to 's3://my-data-bucket/report.csv'\ns3dk.upload_s3_file(\n local_path=\"reports/report.csv\",\n bucket=\"my-data-bucket\"\n)\n```\n\n### Download a File\n\nThe `download_s3_file` function is versatile. You can download a file to a local path or load it directly into a Pandas DataFrame.\n\nThe download_s3_file function accepts the following parameters:\n\n`bucket (str): Required.` The name of the S3 bucket where the file is located.\n\n`s3_path (str): Required.` The full path (key) of the file within the bucket.\nlocal_path (str, optional): The local path where the file will be saved. If you don't provide this, the file will be saved in a data/ directory in your current working folder, using its original S3 filename.\n\n`to_df (bool, optional, default: False):` If set to True, the function will attempt to read the downloaded file into a Pandas DataFrame. This is useful for .csv and Stata .dta files.\n\n`replace (bool, optional, default: False):` If True, it will overwrite a local file if it already exists. By default, it skips the download if the file is already present to save time and bandwidth.\n\n`low_memory (bool, optional, default: True):` When reading a CSV into a DataFrame (to_df=True), this is passed to pandas.read_csv to process the file in chunks, which can reduce memory usage for large files.\n\n`sep (str, optional, default: \",\"):`**` The separator or delimiter to use when reading a CSV file into a DataFrame. For example, use '\\t' for tab-separated files.\n\n**Option 1: Download to a local path**\n\nBy default, if `local_path` is not provided, files are saved to a `data/` directory in the current working directory.\n\n```python\nimport s3datakit as s3dk\n\n# Download to a specific path\nlocal_file = s3dk.download_s3_file(\n bucket=\"my-data-bucket\",\n s3_path=\"final-reports/report_2025.csv\",\n local_path=\"downloads/report.csv\"\n)\nprint(f\"File downloaded to: {local_file}\")\n\n# Download to the default 'data/' directory, overwriting if it exists\ns3dk.download_s3_file(\n bucket=\"my-data-bucket\",\n s3_path=\"final-reports/report_2025.csv\",\n replace=True\n)\n```\n\n**Option 2: Download directly to a Pandas DataFrame**\n```python\nimport s3datakit as s3dk\n\ndf = s3dk.download_s3_file(\n bucket=\"my-data-bucket\",\n s3_path=\"stata-data/survey.dta\",\n to_df=True\n)\nprint(df.head())\n```\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A Python toolkit to simplify common operations between S3 and Pandas.",
"version": "0.3.9",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3c8336e4b19cdd7477305ede5b631bf63a0ed8ad042c76a72df97a133a711367",
"md5": "944eef8da20afea484d30f15bc984430",
"sha256": "cd1883868d06c93855b9ed3bb6f367f0ef7aae3913c1a396249ad5bf6bad30ce"
},
"downloads": -1,
"filename": "s3_datakit-0.3.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "944eef8da20afea484d30f15bc984430",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5839,
"upload_time": "2025-08-10T19:17:55",
"upload_time_iso_8601": "2025-08-10T19:17:55.524175Z",
"url": "https://files.pythonhosted.org/packages/3c/83/36e4b19cdd7477305ede5b631bf63a0ed8ad042c76a72df97a133a711367/s3_datakit-0.3.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8d8700e2f317eb02335b7d7a397f67938f5ef654807cdff8a9b77322fdc5a847",
"md5": "b3aec8c02a97ca66f6e837670e884161",
"sha256": "80ff05fe82875e477f1c80376ba66e5ecb507a373ff17c82e48587bc43521b58"
},
"downloads": -1,
"filename": "s3_datakit-0.3.9.tar.gz",
"has_sig": false,
"md5_digest": "b3aec8c02a97ca66f6e837670e884161",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 5769,
"upload_time": "2025-08-10T19:17:56",
"upload_time_iso_8601": "2025-08-10T19:17:56.381266Z",
"url": "https://files.pythonhosted.org/packages/8d/87/00e2f317eb02335b7d7a397f67938f5ef654807cdff8a9b77322fdc5a847/s3_datakit-0.3.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-10 19:17:56",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "s3-datakit"
}