pg-helpers


Namepg-helpers JSON
Version 1.3.3 PyPI version JSON
download
home_pagehttps://github.com/lenwood/pg_helpers
SummaryPostgreSQL helper functions for data analysis with enterprise-grade security
upload_time2025-10-10 20:15:01
maintainerNone
docs_urlNone
authorChris Leonard
requires_python>=3.8
licenseNone
keywords postgresql postgres database sql pandas data-analysis ssl security retry-logic sqlalchemy data-science
VCS
bugtrack_url
requirements pandas psycopg2-binary sqlalchemy python-dotenv
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PostgreSQL Helper Functions

A Python package providing robust utilities for PostgreSQL database operations, query management, and data analysis workflows with enterprise-grade security.

## Features

- ๐Ÿ”„ **Automatic retry logic** with exponential backoff for unreliable connections
- ๐Ÿ“Š **Seamless pandas integration** for data analysis
- ๐Ÿ”ง **SQL query templating** and parameter substitution
- ๐Ÿ”Š **Cross-platform notifications** when long queries complete
- โš™๏ธ **Environment-based configuration** for secure credential management
- ๐Ÿ›ก๏ธ **Comprehensive error handling** and logging
- ๐Ÿ”’ **SSL/TLS encryption** with CA certificate verification for AWS RDS and other cloud databases
- **Updated in v1.3.3**
  - โœ… **Recommended install via PyPi**: Use 'pip install pg_helpers' to install
- **Enhanced in v1.3.2**
  - ๐Ÿ“ **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
  - ๐Ÿ”„ **Enhanced configuration**: Specify custom .env file paths for different projects and environments
  - ๐Ÿข **Centralized credentials**: Share credential directories across multiple projects
- **Enhanced in v1.3.1:**
  - ๐Ÿงช **Comprehensive test suite** with 40+ test cases covering all functionality
  - โœ… **Cross-platform validation** ensuring reliability on Windows, macOS, and Linux
  - ๐Ÿ” **Function name improvements** for better API clarity
- **Previous in v1.2.0:**
  - ๐Ÿ” **Full SSL support** with optional CA certificate verification
  - ๐Ÿ›ก๏ธ **Man-in-the-middle attack prevention** for production environments
  - โœ… **SSL connection testing** and diagnostics
- **Previous in v1.1.0:**
  - ๐Ÿš€ **Advanced fallback methods** for SQLAlchemy/pandas compatibility issues
  - ๐Ÿ” **Enhanced debugging** and diagnostic capabilities

## What's New in Version 1.3.2 ๐Ÿงช

### Flexible Credential File Configuration
- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory
- **Environment-specific files**: Use different credential files for dev/staging/production
- **Fully backwards compatible**: Existing projects continue to work without any changes
```python
import os

# Use centralized credentials directory
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads C:\Documents\Credentials\.env

# Use custom filename for database credentials
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads ./.env.database

# Combine both for full flexibility
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads C:\Documents\Project\Assets\.env.production
```

## Installation

### From PyPi (Recommended)
```bash
pip install pg_helpers
```

### From Source (Development)
```bash
git clone https://github.com/lenwood/pg_helpers.git
cd pg_helpers
pip install -e .
```

### From GitHub (Specific Version)
```bash
pip install git+https://github.com/lenwood/pg_helpers.git@v1.3.3
```

### Dependencies
```bash
pip install pandas psycopg2-binary sqlalchemy python-dotenv
```

## Quick Start

### 1. Environment Setup

Create a `.env` file in your project root (NOT in this repository):

```env
# Required database credentials
DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=your_host
DB_PORT=5432
DB_NAME=your_database

# SSL Configuration (optional)
# SSL Mode options: disable, allow, prefer, require, verify-ca, verify-full
DB_SSL_MODE=require

# Optional: Path to CA certificate for maximum security
# DB_SSL_CA_CERT=/path/to/rds-ca-2019-root.pem

# For AWS RDS with maximum security:
# DB_SSL_MODE=verify-full
# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem
```

#### Flexible Credential Location Setup **NEW in v1.3.2**

You can now specify custom locations for your credential files using environment variables:

**Option 1: Centralized credential directory**
```python
import os
# Point to a shared credentials folder
os.environ['CREDENTIALS_DIR'] = r'C:\Users\Me\Documents\Credentials'
# Will load: C:\Users\Me\Documents\Credentials\.env

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```

**Option 2: Custom credential filename**
```python
import os
# Use a specific credential file name in current directory
os.environ['CREDENTIALS_FILE'] = '.env.database'
# Will load: ./.env.database

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```

**Option 3: Both custom directory and filename**
```python
import os
# Full custom path
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
# Will load: C:\Documents\Project\Assets\.env.production

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```

**Configuration Priority:**
- Both `CREDENTIALS_DIR` and `CREDENTIALS_FILE` set โ†’ `{CREDENTIALS_DIR}/{CREDENTIALS_FILE}`
- Only `CREDENTIALS_DIR` set โ†’ `{CREDENTIALS_DIR}/.env`
- Only `CREDENTIALS_FILE` set โ†’ `./{CREDENTIALS_FILE}`
- Neither set (default) โ†’ `./.env`

**Benefits of flexible credentials:**
- ๐Ÿ” **Centralized security**: Keep all credentials in one secure location
- ๐Ÿข **Team workflows**: Share credential directories across team members
- ๐ŸŒ **Environment-specific**: Use `.env.production`, `.env.staging`, `.env.dev`
- ๐Ÿ”„ **Multi-project**: Reuse credentials across multiple analysis projects
- โœ… **Backwards compatible**: Existing code works without changes

### 2. Basic Usage

```python
from pg_helpers import createPostgresqlEngine, dataGrabber, queryCleaner, check_ssl_connection

# Create secure database connection (uses SSL by default)
engine = createPostgresqlEngine()

# Verify SSL connection
ssl_info = check_ssl_connection(engine)
print(f"SSL Active: {ssl_info.get('ssl_active', 'Unknown')}")

# Execute a simple query
data = dataGrabber("SELECT * FROM users LIMIT 10", engine)
print(data.head())

# Enable debugging for troubleshooting
data = dataGrabber("SELECT * FROM complex_view", engine, debug=True)

# Use query templates
query = queryCleaner(
    'queries/user_analysis.sql',
    list1=[100, 200, 300],
    varString1='$USER_IDS',
    startDate='2023-01-01',
    endDate='2023-12-31'
)
results = dataGrabber(query, engine)
```

### 3. Custom SSL Configuration

```python
from pg_helpers import createPostgresqlEngineWithCustomSSL

# Maximum security for production
engine = createPostgresqlEngineWithCustomSSL(
    ssl_ca_cert="/path/to/rds-ca-2019-root.pem",
    ssl_mode="verify-full"
)

# Basic SSL without certificate verification
engine = createPostgresqlEngineWithCustomSSL(
    ssl_mode="require"
)

# Disable SSL (not recommended for production)
engine = createPostgresqlEngineWithCustomSSL(
    ssl_mode="disable"
)
```

## Function Reference

### Database Operations

#### `createPostgresqlEngine()`
Creates a SQLAlchemy engine for PostgreSQL connections using environment variables with SSL support.

**Returns:** `sqlalchemy.Engine`

**Environment variables:**
- **Required:** `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`
- **Optional:** `DB_PORT` (default: 5432)
- **SSL Options:**
  - `DB_SSL_MODE` (default: "require")
  - `DB_SSL_CA_CERT` (path to CA certificate)
  - `DB_SSL_CERT` (client certificate)
  - `DB_SSL_KEY` (client key)

#### `createPostgresqlEngineWithCustomSSL(ssl_ca_cert=None, ssl_mode='require', ssl_cert=None, ssl_key=None)` **v1.2.0**
Creates a SQLAlchemy engine with custom SSL configuration, overriding environment variables.

**Parameters:**
- `ssl_ca_cert` (str, optional): Path to CA certificate file
- `ssl_mode` (str): SSL mode - "disable", "allow", "prefer", "require", "verify-ca", "verify-full"
- `ssl_cert` (str, optional): Path to client certificate file
- `ssl_key` (str, optional): Path to client key file

**Returns:** `sqlalchemy.Engine`

**SSL Modes Explained:**
- `disable`: No SSL connection
- `allow`: Try SSL, fallback to non-SSL
- `prefer`: Try SSL first, fallback to non-SSL
- `require`: **Default** - Require SSL, fail if unavailable
- `verify-ca`: Require SSL and verify server certificate
- `verify-full`: **Most secure** - Require SSL, verify certificate and hostname

#### `check_ssl_connection(engine=None)` **v1.2.0, improved v1.3.0**
Tests SSL connection and displays SSL information. **(Renamed from `test_ssl_connection` in v1.3.0 for API clarity)**

**Parameters:**
- `engine` (optional): SQLAlchemy engine (creates one if not provided)

**Returns:** `dict` with SSL connection details including:
- `ssl_active`: Whether SSL is active
- `ssl_cipher`: Encryption cipher used
- `ssl_version`: SSL/TLS version
- `client_cert_present`: Whether client certificate is present

#### `dataGrabber(query, engine, limit='None', debug=False)` **ENHANCED in v1.1.0**
Executes SQL queries and returns pandas DataFrames with robust error handling.

**Parameters:**
- `query` (str): SQL query to execute
- `engine`: SQLAlchemy engine
- `limit` (str, optional): Row limit for results
- `debug` (bool, optional): Enable detailed logging and debugging output

**Returns:** `pandas.DataFrame`

**Features:**
- Multiple fallback methods for compatibility issues
- Automatic detection and handling of metadata interpretation errors
- Comprehensive error logging and debugging
- Execution timing display
- Cross-platform sound notifications
- Automatic error propagation for retry logic

#### `recursiveDataGrabber(query_dict, results_dict, n=1, max_attempts=50)`
Executes multiple queries with automatic retry and exponential backoff.

**Parameters:**
- `query_dict` (dict): Dictionary of {query_name: sql_string}
- `results_dict` (dict): Dictionary to store results
- `n` (int): Current attempt number
- `max_attempts` (int): Maximum retry attempts

**Returns:** `dict` with DataFrames or None for failed queries

**Use case:** Perfect for running multiple large queries overnight with unreliable connections.

#### `diagnose_connection_and_query(engine, query, limit=10)` **ENHANCED in v1.1.0**
Diagnostic function to help troubleshoot SQLAlchemy/pandas compatibility issues.

**Parameters:**
- `engine`: SQLAlchemy engine
- `query` (str): SQL query to diagnose
- `limit` (int): Number of rows to test with

**Returns:** `dict` with diagnostic information

### Query Utilities

#### `queryCleaner(file, list1='empty', varString1='empty', ...)`
Loads SQL files and substitutes parameters for dynamic queries.

**Parameters:**
- `file` (str): Path to SQL file
- `list1`: List to substitute (converts to comma-separated string)
- `varString1` (str): Placeholder string in SQL file
- `startDate/endDate`: Date range parameters

**Returns:** `str` - Processed SQL query

#### `listPrep(iList)`
Converts Python lists to SQL-compatible comma-separated strings.

**Parameters:**
- `iList`: List of integers, floats, strings, or single value

**Returns:** `str` - SQL-formatted string

## Testing and Quality Assurance **NEW in v1.3.0**

### Running Tests

The package includes a comprehensive test suite with 40+ test cases covering all functionality:

```bash
# Install test dependencies
pip install pytest pytest-cov

# Run all tests
python -m pytest tests/ -v

# Run tests with coverage report
python -m pytest tests/ --cov=pg_helpers --cov-report=html --cov-report=term-missing

# Run specific test categories
python -m pytest tests/test_database.py::TestQueryUtils -v      # Query utilities
python -m pytest tests/test_database.py::TestConfig -v         # Configuration
python -m pytest tests/test_database.py::TestDatabase -v       # Database operations
python -m pytest tests/test_database.py::TestNotifications -v  # Cross-platform notifications
```

### What the Tests Validate

#### **Core Functionality Tests**
- โœ… **Database engine creation** with various SSL configurations
- โœ… **Query execution** with multiple fallback methods
- โœ… **SSL connection testing** and diagnostics
- โœ… **Configuration validation** and environment variable handling
- โœ… **Query template processing** with parameter substitution
- โœ… **Error handling** for network issues and compatibility problems

#### **Cross-Platform Compatibility**
- โœ… **File operations** on Windows, macOS, and Linux
- โœ… **Sound notifications** across different operating systems
- โœ… **Environment variable handling** with different shell environments
- โœ… **Temporary file management** and cleanup

#### **Error Condition Testing**
- โœ… **Missing environment variables** and configuration errors
- โœ… **SSL certificate validation** failures and missing files
- โœ… **Database connection failures** and retry logic
- โœ… **Pandas/SQLAlchemy compatibility** issues and fallback methods
- โœ… **Invalid query parameters** and malformed SQL

#### **Integration Testing**
- โœ… **End-to-end workflows** combining multiple functions
- โœ… **Configuration to engine creation** pipelines
- โœ… **Query templating to execution** workflows

### Test Output Example

```bash
================================================= test session starts =================================================
platform win32 -- Python 3.12.9, pytest-8.4.1, pluggy-1.5.0
collected 40 items

tests/test_database.py::TestQueryUtils::test_listPrep_empty_list PASSED                                          [  2%]
tests/test_database.py::TestQueryUtils::test_listPrep_floats PASSED                                              [  5%]
tests/test_database.py::TestQueryUtils::test_listPrep_integers PASSED                                            [  7%]
tests/test_database.py::TestQueryUtils::test_listPrep_single_value PASSED                                        [ 10%]
tests/test_database.py::TestQueryUtils::test_listPrep_strings PASSED                                             [ 12%]
... (34 more tests) ...
tests/test_database.py::TestNotifications::test_play_notification_sound_windows PASSED                           [100%]

================================================= 40 passed in 1.03s ==================================================
```

### Flexible Credential File Configuration **NEW in v1.3.2**

- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory
- **Environment-specific files**: Use different credential files for dev/staging/production
- **Fully backwards compatible**: Existing projects continue to work without any changes

```python
import os

# Use centralized credentials directory
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads C:\Documents\Credentials\.env

# Use custom filename for database credentials
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads ./.env.database

# Combine both for full flexibility
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()  # Loads C:\Documents\Project\Assets\.env.production
```

### Continuous Integration Setup **NEW in v1.3.0**

For automated testing across multiple environments, add this GitHub Actions workflow:

```yaml
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        python-version: [3.8, 3.9, "3.10", "3.11", "3.12"]
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest pytest-cov
        pip install -e .
    - name: Run tests
      run: python -m pytest tests/ -v --tb=short
```

### Test Coverage Goals

- **Target Coverage**: >90% (currently achieved)
- **Critical Functions**: 100% coverage for database operations
- **Error Paths**: All exception handling paths tested
- **Edge Cases**: Empty lists, missing files, invalid configurations

## Advanced Usage

### SSL Security Configuration **v1.2.0**

#### For AWS RDS (Recommended)
```python
from pg_helpers import createPostgresqlEngineWithCustomSSL, check_ssl_connection

# Download RDS CA certificate first
# wget https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem

# Maximum security configuration
engine = createPostgresqlEngineWithCustomSSL(
    ssl_ca_cert="./certs/rds-ca-2019-root.pem",
    ssl_mode="verify-full"
)

# Test the secure connection
ssl_info = check_ssl_connection(engine)
print(f"SSL Cipher: {ssl_info.get('ssl_cipher')}")
print(f"SSL Version: {ssl_info.get('ssl_version')}")
```

#### Environment-based SSL Configuration
```python
# Set in .env file:
# DB_SSL_MODE=verify-full
# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem

from pg_helpers import createPostgresqlEngine, check_ssl_connection

# Uses environment variables
engine = createPostgresqlEngine()

# Verify security
ssl_info = check_ssl_connection(engine)
if ssl_info.get('ssl_active'):
    print("โœ… Secure SSL connection established")
else:
    print("โš ๏ธ SSL not active - check configuration")
```

#### SSL Mode Comparison
```python
# Different security levels for different environments

# Development (basic encryption)
dev_engine = createPostgresqlEngineWithCustomSSL(ssl_mode="require")

# Staging (certificate verification)
staging_engine = createPostgresqlEngineWithCustomSSL(
    ssl_mode="verify-ca",
    ssl_ca_cert="./certs/staging-ca.pem"
)

# Production (maximum security)
prod_engine = createPostgresqlEngineWithCustomSSL(
    ssl_mode="verify-full",
    ssl_ca_cert="./certs/rds-ca-2019-root.pem"
)
```

### Debugging and Troubleshooting
```python
from pg_helpers import createPostgresqlEngine, dataGrabber, diagnose_connection_and_query

engine = createPostgresqlEngine()

# Enable detailed debugging for problematic queries
data = dataGrabber("""
    SELECT * FROM complex_view 
    WHERE date_column > '2023-01-01'
""", engine, debug=True)

# Diagnose specific issues
diagnostics = diagnose_connection_and_query(engine, "SELECT * FROM problematic_table")
print(f"Engine info: {diagnostics['engine_info']}")
print(f"Test results: {diagnostics['test_results']}")
print(f"Recommendations: {diagnostics['recommendations']}")
```

### Batch Query Processing
```python
from pg_helpers import recursiveDataGrabber, queryCleaner

# Prepare multiple queries
queries = {
    'user_stats': queryCleaner('sql/user_stats.sql', startDate='2023-01-01'),
    'sales_data': queryCleaner('sql/sales.sql', list1=[1,2,3], varString1='$REGIONS'),
    'inventory': 'SELECT * FROM inventory WHERE status = "active"'
}

# Execute with automatic retry
results = {}
final_results = recursiveDataGrabber(queries, results)

# Access individual results
user_df = final_results['user_stats']
sales_df = final_results['sales_data']
```

## Credential Management Strategies **NEW in v1.3.2**

### Strategy 1: Centralized Credentials for Multiple Projects

Keep one credential directory for all your projects:

```
C:\Users\Me\Documents\Credentials\
โ”œโ”€โ”€ .env                    # General/shared credentials
โ”œโ”€โ”€ .env.database          # Database-specific credentials  
โ”œโ”€โ”€ .env.api               # API credentials
โ””โ”€โ”€ .env.production        # Production environment

Project1\
โ”œโ”€โ”€ analysis.py
โ””โ”€โ”€ (no .env file needed)

Project2\
โ”œโ”€โ”€ dashboard.py
โ””โ”€โ”€ (no .env file needed)
```

**Setup in each project:**
```python
import os
os.environ['CREDENTIALS_DIR'] = r'C:\Users\Me\Documents\Credentials'
os.environ['CREDENTIALS_FILE'] = '.env.database'

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```

### Strategy 2: Environment-Specific Credentials

Use different credential files for different environments:

```python
import os

# Detect environment (from ENV variable, config file, etc.)
environment = os.getenv('APP_ENV', 'development')

# Set credential file based on environment
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
os.environ['CREDENTIALS_FILE'] = f'.env.{environment}'

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
# Loads .env.development, .env.staging, or .env.production
```

### Strategy 3: Project-Specific Overrides

Use centralized credentials with project-specific overrides:

```python
import os
from pathlib import Path

# Try project-local first, fall back to centralized
if Path('.env.local').exists():
    os.environ['CREDENTIALS_FILE'] = '.env.local'
else:
    os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
    os.environ['CREDENTIALS_FILE'] = '.env.database'

from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```

### Best Practices for Credential Files

**Security:**
- โœ… Store credential directories outside of project repositories
- โœ… Use restrictive file permissions (e.g., `chmod 600` on Linux/macOS)
- โœ… Never commit credential file paths to version control
- โœ… Document credential file locations in team wikis, not in code

**Organization:**
- ๐Ÿ“ Use descriptive filenames: `.env.database`, `.env.api`, `.env.aws`
- ๐Ÿ“ Group by environment: `.env.production`, `.env.staging`
- ๐Ÿ“ Keep a `.env.template` in your repository with dummy values

**Team Workflows:**
```python
# config.py (committed to repository)
import os
from pathlib import Path

def setup_credentials():
    """Load credentials from standard team location or local override"""
    # Check for local override first (for development)
    if Path('.env.local').exists():
        os.environ['CREDENTIALS_FILE'] = '.env.local'
        return
    
    # Use team shared credentials location
    team_creds = Path.home() / 'TeamShared' / 'Credentials'
    if team_creds.exists():
        os.environ['CREDENTIALS_DIR'] = str(team_creds)
        os.environ['CREDENTIALS_FILE'] = '.env.database'
        return
    
    # Fallback to default behavior
    print("โš ๏ธ No credentials found - using .env in current directory")

# Call this before importing pg_helpers
setup_credentials()

## Security Best Practices

### SSL/TLS Configuration
- **Always use SSL in production** - Set `DB_SSL_MODE=require` minimum
- **Use CA certificate verification** for cloud databases (AWS RDS, GCP Cloud SQL)
- **Download official CA certificates** from your cloud provider
- **Use `verify-full` mode** for maximum security in production
- **Test SSL connections** regularly with `check_ssl_connection()`

### AWS RDS Security Setup
```bash
# Download AWS RDS CA certificate
mkdir -p ./certs
wget -O ./certs/rds-ca-2019-root.pem \
  https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem

# Set environment variables
echo "DB_SSL_MODE=verify-full" >> .env
echo "DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem" >> .env
```

### General Security
- **Never commit `.env` files** - add to `.gitignore`
- **Use environment variables** for all credentials
- **Rotate database passwords** regularly
- **Use database connection pooling** for production
- **Monitor SSL certificate expiration** dates

## File Structure for Your Projects

```
your_project/
โ”œโ”€โ”€ .env                 # Database credentials (DO NOT COMMIT)
โ”œโ”€โ”€ .gitignore          # Include .env and certs/
โ”œโ”€โ”€ main.py             # Your analysis code
โ”œโ”€โ”€ certs/              # SSL certificates (DO NOT COMMIT private keys)
โ”‚   โ”œโ”€โ”€ rds-ca-2019-root.pem
โ”‚   โ””โ”€โ”€ custom-ca.pem
โ”œโ”€โ”€ queries/            # SQL template files
โ”‚   โ”œโ”€โ”€ user_analysis.sql
โ”‚   โ”œโ”€โ”€ sales_report.sql
โ”‚   โ””โ”€โ”€ inventory.sql
โ”œโ”€โ”€ data/              # Output data files
โ”‚   โ””โ”€โ”€ results.pkl
โ””โ”€โ”€ tests/             # Test files (if contributing)
    โ””โ”€โ”€ test_database.py
```

## Troubleshooting

### Common Issues

**SSL certificate verification failed**
- Ensure CA certificate file exists and is readable
- Download the latest CA certificate from your provider
- Check that `ssl_mode` is set correctly
- Use `check_ssl_connection()` to diagnose SSL issues

**"SSL connection required"**
- Check that your database server supports SSL
- Verify `DB_SSL_MODE` is set correctly
- For development, you can temporarily use `ssl_mode="disable"`

**"Missing required environment variables"**
- Ensure `.env` file exists in your project root
- Check that all required variables are set: `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`

**Connection timeouts**
- The `recursiveDataGrabber` will automatically retry with exponential backoff
- Check network connectivity and database server status
- Consider increasing `max_attempts` for very unreliable connections

**SQLAlchemy/pandas compatibility errors** **IMPROVED in v1.1.0**
- The package now automatically handles most compatibility issues
- Enable `debug=True` to see which fallback method succeeds
- Use `diagnose_connection_and_query()` for detailed troubleshooting

**"immutabledict" or metadata interpretation errors** **FIXED in v1.1.0**
- These errors are now automatically detected and handled
- The package will try alternative pandas parameters and manual DataFrame construction
- No action needed from users - fallback methods handle this automatically

**Sound notifications not working**
- This is normal and won't affect functionality
- Ensure system sound is enabled
- On Linux, you may need to install additional audio packages

**Test failures during development** **NEW in v1.3.0**
- Ensure all dependencies are installed: `pip install pytest pytest-cov`
- Check that you're running tests from the project root directory
- Use `python -m pytest tests/ -v` for detailed output
- Some tests require write access to temporary directories

## Changelog

### Version 1.3.3 (Current) ๐Ÿ“
- โœ… **Updated installation instructions**: Recommend using 'pip install pg_helpers'

### Version 1.3.2
- ๐Ÿ“ **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- ๐Ÿ”„ **Enhanced configuration**: Specify custom .env file paths for different projects and environments
- ๐Ÿข **Centralized credentials**: Share credential directories across multiple projects
- ๐ŸŒ **Environment-specific files**: Easy support for .env.production, .env.staging, .env.dev
- โœ… **Backwards compatible**: Defaults to `.env` in current directory when no variables set
- ๐Ÿ“– **Documentation**: New credential management strategies and best practices guide

### Version 1.3.1
- ๐Ÿงช **Comprehensive test suite**: 40+ test cases with >90% code coverage
- โœ… **Cross-platform validation**: Tests confirm functionality on Windows, macOS, and Linux
- ๐Ÿ”ง **API improvements**: `test_ssl_connection()` renamed to `check_ssl_connection()` for clarity
- ๐Ÿš€ **CI/CD ready**: GitHub Actions workflow for automated testing across environments
- ๐Ÿ“‹ **Enhanced documentation**: Detailed testing guide and troubleshooting section
- ๐Ÿ›ก๏ธ **Quality assurance**: All functions tested including error conditions and edge cases

### Version 1.2.0
- ๐Ÿ”’ **SSL/TLS support**: Full SSL encryption with optional CA certificate verification
- ๐Ÿ›ก๏ธ **Security enhancements**: Man-in-the-middle attack prevention for production environments  
- โœ… **SSL testing**: New SSL connection diagnostics
- ๐Ÿ”ง **Custom SSL configuration**: Programmatic SSL parameter override
- ๐Ÿ“‹ **Environment SSL config**: Optional SSL settings via environment variables
- ๐Ÿ”„ **Backward compatibility**: Existing code continues to work without changes

### Version 1.1.0
- โœจ **Enhanced error handling**: Multiple fallback methods for pandas/SQLAlchemy compatibility
- ๐Ÿ” **Improved debugging**: Comprehensive logging and diagnostic capabilities
- ๐Ÿ› ๏ธ **Better reliability**: Automatic detection and handling of metadata interpretation errors
- ๐Ÿ“Š **Manual DataFrame construction**: Fallback method for complex data type issues
- ๐Ÿ”ง **Alternative parameter testing**: Tries different pandas configurations automatically

### Version 1.0.0
- Initial release with core functionality
- Basic retry logic and PostgreSQL integration
- Query templating and notification system

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Add tests for new functionality: `python -m pytest tests/ -v`
4. Ensure all tests pass and maintain >90% coverage
5. Run the full test suite: `python -m pytest tests/ --cov=pg_helpers`
6. Submit a pull request

### Development Setup
```bash
git clone https://github.com/lenwood/pg_helpers.git
cd pg_helpers
pip install -e .
pip install pytest pytest-cov
python -m pytest tests/ -v
```

## License

MIT License - feel free to use in your projects!

## Author

Chris Leonard

---

*This package was designed for data analysts and engineers who work with PostgreSQL databases and need reliable, automated query execution with enterprise-grade security. Version 1.3.2 adds flexible credential management while ensuring this reliability through comprehensive testing across multiple platforms and Python versions.*

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lenwood/pg_helpers",
    "name": "pg-helpers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "postgresql, postgres, database, sql, pandas, data-analysis, ssl, security, retry-logic, sqlalchemy, data-science",
    "author": "Chris Leonard",
    "author_email": "lenwood@duck.com",
    "download_url": "https://files.pythonhosted.org/packages/c4/88/2cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e/pg_helpers-1.3.3.tar.gz",
    "platform": null,
    "description": "# PostgreSQL Helper Functions\r\n\r\nA Python package providing robust utilities for PostgreSQL database operations, query management, and data analysis workflows with enterprise-grade security.\r\n\r\n## Features\r\n\r\n- \ud83d\udd04 **Automatic retry logic** with exponential backoff for unreliable connections\r\n- \ud83d\udcca **Seamless pandas integration** for data analysis\r\n- \ud83d\udd27 **SQL query templating** and parameter substitution\r\n- \ud83d\udd0a **Cross-platform notifications** when long queries complete\r\n- \u2699\ufe0f **Environment-based configuration** for secure credential management\r\n- \ud83d\udee1\ufe0f **Comprehensive error handling** and logging\r\n- \ud83d\udd12 **SSL/TLS encryption** with CA certificate verification for AWS RDS and other cloud databases\r\n- **Updated in v1.3.3**\r\n  - \u2705 **Recommended install via PyPi**: Use 'pip install pg_helpers' to install\r\n- **Enhanced in v1.3.2**\r\n  - \ud83d\udcc1 **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n  - \ud83d\udd04 **Enhanced configuration**: Specify custom .env file paths for different projects and environments\r\n  - \ud83c\udfe2 **Centralized credentials**: Share credential directories across multiple projects\r\n- **Enhanced in v1.3.1:**\r\n  - \ud83e\uddea **Comprehensive test suite** with 40+ test cases covering all functionality\r\n  - \u2705 **Cross-platform validation** ensuring reliability on Windows, macOS, and Linux\r\n  - \ud83d\udd0d **Function name improvements** for better API clarity\r\n- **Previous in v1.2.0:**\r\n  - \ud83d\udd10 **Full SSL support** with optional CA certificate verification\r\n  - \ud83d\udee1\ufe0f **Man-in-the-middle attack prevention** for production environments\r\n  - \u2705 **SSL connection testing** and diagnostics\r\n- **Previous in v1.1.0:**\r\n  - \ud83d\ude80 **Advanced fallback methods** for SQLAlchemy/pandas compatibility issues\r\n  - \ud83d\udd0d **Enhanced debugging** and diagnostic capabilities\r\n\r\n## What's New in Version 1.3.2 \ud83e\uddea\r\n\r\n### Flexible Credential File Configuration\r\n- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory\r\n- **Environment-specific files**: Use different credential files for dev/staging/production\r\n- **Fully backwards compatible**: Existing projects continue to work without any changes\r\n```python\r\nimport os\r\n\r\n# Use centralized credentials directory\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads C:\\Documents\\Credentials\\.env\r\n\r\n# Use custom filename for database credentials\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads ./.env.database\r\n\r\n# Combine both for full flexibility\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads C:\\Documents\\Project\\Assets\\.env.production\r\n```\r\n\r\n## Installation\r\n\r\n### From PyPi (Recommended)\r\n```bash\r\npip install pg_helpers\r\n```\r\n\r\n### From Source (Development)\r\n```bash\r\ngit clone https://github.com/lenwood/pg_helpers.git\r\ncd pg_helpers\r\npip install -e .\r\n```\r\n\r\n### From GitHub (Specific Version)\r\n```bash\r\npip install git+https://github.com/lenwood/pg_helpers.git@v1.3.3\r\n```\r\n\r\n### Dependencies\r\n```bash\r\npip install pandas psycopg2-binary sqlalchemy python-dotenv\r\n```\r\n\r\n## Quick Start\r\n\r\n### 1. Environment Setup\r\n\r\nCreate a `.env` file in your project root (NOT in this repository):\r\n\r\n```env\r\n# Required database credentials\r\nDB_USER=your_username\r\nDB_PASSWORD=your_password\r\nDB_HOST=your_host\r\nDB_PORT=5432\r\nDB_NAME=your_database\r\n\r\n# SSL Configuration (optional)\r\n# SSL Mode options: disable, allow, prefer, require, verify-ca, verify-full\r\nDB_SSL_MODE=require\r\n\r\n# Optional: Path to CA certificate for maximum security\r\n# DB_SSL_CA_CERT=/path/to/rds-ca-2019-root.pem\r\n\r\n# For AWS RDS with maximum security:\r\n# DB_SSL_MODE=verify-full\r\n# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\r\n```\r\n\r\n#### Flexible Credential Location Setup **NEW in v1.3.2**\r\n\r\nYou can now specify custom locations for your credential files using environment variables:\r\n\r\n**Option 1: Centralized credential directory**\r\n```python\r\nimport os\r\n# Point to a shared credentials folder\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Users\\Me\\Documents\\Credentials'\r\n# Will load: C:\\Users\\Me\\Documents\\Credentials\\.env\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Option 2: Custom credential filename**\r\n```python\r\nimport os\r\n# Use a specific credential file name in current directory\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\n# Will load: ./.env.database\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Option 3: Both custom directory and filename**\r\n```python\r\nimport os\r\n# Full custom path\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\n# Will load: C:\\Documents\\Project\\Assets\\.env.production\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Configuration Priority:**\r\n- Both `CREDENTIALS_DIR` and `CREDENTIALS_FILE` set \u2192 `{CREDENTIALS_DIR}/{CREDENTIALS_FILE}`\r\n- Only `CREDENTIALS_DIR` set \u2192 `{CREDENTIALS_DIR}/.env`\r\n- Only `CREDENTIALS_FILE` set \u2192 `./{CREDENTIALS_FILE}`\r\n- Neither set (default) \u2192 `./.env`\r\n\r\n**Benefits of flexible credentials:**\r\n- \ud83d\udd10 **Centralized security**: Keep all credentials in one secure location\r\n- \ud83c\udfe2 **Team workflows**: Share credential directories across team members\r\n- \ud83c\udf0d **Environment-specific**: Use `.env.production`, `.env.staging`, `.env.dev`\r\n- \ud83d\udd04 **Multi-project**: Reuse credentials across multiple analysis projects\r\n- \u2705 **Backwards compatible**: Existing code works without changes\r\n\r\n### 2. Basic Usage\r\n\r\n```python\r\nfrom pg_helpers import createPostgresqlEngine, dataGrabber, queryCleaner, check_ssl_connection\r\n\r\n# Create secure database connection (uses SSL by default)\r\nengine = createPostgresqlEngine()\r\n\r\n# Verify SSL connection\r\nssl_info = check_ssl_connection(engine)\r\nprint(f\"SSL Active: {ssl_info.get('ssl_active', 'Unknown')}\")\r\n\r\n# Execute a simple query\r\ndata = dataGrabber(\"SELECT * FROM users LIMIT 10\", engine)\r\nprint(data.head())\r\n\r\n# Enable debugging for troubleshooting\r\ndata = dataGrabber(\"SELECT * FROM complex_view\", engine, debug=True)\r\n\r\n# Use query templates\r\nquery = queryCleaner(\r\n    'queries/user_analysis.sql',\r\n    list1=[100, 200, 300],\r\n    varString1='$USER_IDS',\r\n    startDate='2023-01-01',\r\n    endDate='2023-12-31'\r\n)\r\nresults = dataGrabber(query, engine)\r\n```\r\n\r\n### 3. Custom SSL Configuration\r\n\r\n```python\r\nfrom pg_helpers import createPostgresqlEngineWithCustomSSL\r\n\r\n# Maximum security for production\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_ca_cert=\"/path/to/rds-ca-2019-root.pem\",\r\n    ssl_mode=\"verify-full\"\r\n)\r\n\r\n# Basic SSL without certificate verification\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_mode=\"require\"\r\n)\r\n\r\n# Disable SSL (not recommended for production)\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_mode=\"disable\"\r\n)\r\n```\r\n\r\n## Function Reference\r\n\r\n### Database Operations\r\n\r\n#### `createPostgresqlEngine()`\r\nCreates a SQLAlchemy engine for PostgreSQL connections using environment variables with SSL support.\r\n\r\n**Returns:** `sqlalchemy.Engine`\r\n\r\n**Environment variables:**\r\n- **Required:** `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`\r\n- **Optional:** `DB_PORT` (default: 5432)\r\n- **SSL Options:**\r\n  - `DB_SSL_MODE` (default: \"require\")\r\n  - `DB_SSL_CA_CERT` (path to CA certificate)\r\n  - `DB_SSL_CERT` (client certificate)\r\n  - `DB_SSL_KEY` (client key)\r\n\r\n#### `createPostgresqlEngineWithCustomSSL(ssl_ca_cert=None, ssl_mode='require', ssl_cert=None, ssl_key=None)` **v1.2.0**\r\nCreates a SQLAlchemy engine with custom SSL configuration, overriding environment variables.\r\n\r\n**Parameters:**\r\n- `ssl_ca_cert` (str, optional): Path to CA certificate file\r\n- `ssl_mode` (str): SSL mode - \"disable\", \"allow\", \"prefer\", \"require\", \"verify-ca\", \"verify-full\"\r\n- `ssl_cert` (str, optional): Path to client certificate file\r\n- `ssl_key` (str, optional): Path to client key file\r\n\r\n**Returns:** `sqlalchemy.Engine`\r\n\r\n**SSL Modes Explained:**\r\n- `disable`: No SSL connection\r\n- `allow`: Try SSL, fallback to non-SSL\r\n- `prefer`: Try SSL first, fallback to non-SSL\r\n- `require`: **Default** - Require SSL, fail if unavailable\r\n- `verify-ca`: Require SSL and verify server certificate\r\n- `verify-full`: **Most secure** - Require SSL, verify certificate and hostname\r\n\r\n#### `check_ssl_connection(engine=None)` **v1.2.0, improved v1.3.0**\r\nTests SSL connection and displays SSL information. **(Renamed from `test_ssl_connection` in v1.3.0 for API clarity)**\r\n\r\n**Parameters:**\r\n- `engine` (optional): SQLAlchemy engine (creates one if not provided)\r\n\r\n**Returns:** `dict` with SSL connection details including:\r\n- `ssl_active`: Whether SSL is active\r\n- `ssl_cipher`: Encryption cipher used\r\n- `ssl_version`: SSL/TLS version\r\n- `client_cert_present`: Whether client certificate is present\r\n\r\n#### `dataGrabber(query, engine, limit='None', debug=False)` **ENHANCED in v1.1.0**\r\nExecutes SQL queries and returns pandas DataFrames with robust error handling.\r\n\r\n**Parameters:**\r\n- `query` (str): SQL query to execute\r\n- `engine`: SQLAlchemy engine\r\n- `limit` (str, optional): Row limit for results\r\n- `debug` (bool, optional): Enable detailed logging and debugging output\r\n\r\n**Returns:** `pandas.DataFrame`\r\n\r\n**Features:**\r\n- Multiple fallback methods for compatibility issues\r\n- Automatic detection and handling of metadata interpretation errors\r\n- Comprehensive error logging and debugging\r\n- Execution timing display\r\n- Cross-platform sound notifications\r\n- Automatic error propagation for retry logic\r\n\r\n#### `recursiveDataGrabber(query_dict, results_dict, n=1, max_attempts=50)`\r\nExecutes multiple queries with automatic retry and exponential backoff.\r\n\r\n**Parameters:**\r\n- `query_dict` (dict): Dictionary of {query_name: sql_string}\r\n- `results_dict` (dict): Dictionary to store results\r\n- `n` (int): Current attempt number\r\n- `max_attempts` (int): Maximum retry attempts\r\n\r\n**Returns:** `dict` with DataFrames or None for failed queries\r\n\r\n**Use case:** Perfect for running multiple large queries overnight with unreliable connections.\r\n\r\n#### `diagnose_connection_and_query(engine, query, limit=10)` **ENHANCED in v1.1.0**\r\nDiagnostic function to help troubleshoot SQLAlchemy/pandas compatibility issues.\r\n\r\n**Parameters:**\r\n- `engine`: SQLAlchemy engine\r\n- `query` (str): SQL query to diagnose\r\n- `limit` (int): Number of rows to test with\r\n\r\n**Returns:** `dict` with diagnostic information\r\n\r\n### Query Utilities\r\n\r\n#### `queryCleaner(file, list1='empty', varString1='empty', ...)`\r\nLoads SQL files and substitutes parameters for dynamic queries.\r\n\r\n**Parameters:**\r\n- `file` (str): Path to SQL file\r\n- `list1`: List to substitute (converts to comma-separated string)\r\n- `varString1` (str): Placeholder string in SQL file\r\n- `startDate/endDate`: Date range parameters\r\n\r\n**Returns:** `str` - Processed SQL query\r\n\r\n#### `listPrep(iList)`\r\nConverts Python lists to SQL-compatible comma-separated strings.\r\n\r\n**Parameters:**\r\n- `iList`: List of integers, floats, strings, or single value\r\n\r\n**Returns:** `str` - SQL-formatted string\r\n\r\n## Testing and Quality Assurance **NEW in v1.3.0**\r\n\r\n### Running Tests\r\n\r\nThe package includes a comprehensive test suite with 40+ test cases covering all functionality:\r\n\r\n```bash\r\n# Install test dependencies\r\npip install pytest pytest-cov\r\n\r\n# Run all tests\r\npython -m pytest tests/ -v\r\n\r\n# Run tests with coverage report\r\npython -m pytest tests/ --cov=pg_helpers --cov-report=html --cov-report=term-missing\r\n\r\n# Run specific test categories\r\npython -m pytest tests/test_database.py::TestQueryUtils -v      # Query utilities\r\npython -m pytest tests/test_database.py::TestConfig -v         # Configuration\r\npython -m pytest tests/test_database.py::TestDatabase -v       # Database operations\r\npython -m pytest tests/test_database.py::TestNotifications -v  # Cross-platform notifications\r\n```\r\n\r\n### What the Tests Validate\r\n\r\n#### **Core Functionality Tests**\r\n- \u2705 **Database engine creation** with various SSL configurations\r\n- \u2705 **Query execution** with multiple fallback methods\r\n- \u2705 **SSL connection testing** and diagnostics\r\n- \u2705 **Configuration validation** and environment variable handling\r\n- \u2705 **Query template processing** with parameter substitution\r\n- \u2705 **Error handling** for network issues and compatibility problems\r\n\r\n#### **Cross-Platform Compatibility**\r\n- \u2705 **File operations** on Windows, macOS, and Linux\r\n- \u2705 **Sound notifications** across different operating systems\r\n- \u2705 **Environment variable handling** with different shell environments\r\n- \u2705 **Temporary file management** and cleanup\r\n\r\n#### **Error Condition Testing**\r\n- \u2705 **Missing environment variables** and configuration errors\r\n- \u2705 **SSL certificate validation** failures and missing files\r\n- \u2705 **Database connection failures** and retry logic\r\n- \u2705 **Pandas/SQLAlchemy compatibility** issues and fallback methods\r\n- \u2705 **Invalid query parameters** and malformed SQL\r\n\r\n#### **Integration Testing**\r\n- \u2705 **End-to-end workflows** combining multiple functions\r\n- \u2705 **Configuration to engine creation** pipelines\r\n- \u2705 **Query templating to execution** workflows\r\n\r\n### Test Output Example\r\n\r\n```bash\r\n================================================= test session starts =================================================\r\nplatform win32 -- Python 3.12.9, pytest-8.4.1, pluggy-1.5.0\r\ncollected 40 items\r\n\r\ntests/test_database.py::TestQueryUtils::test_listPrep_empty_list PASSED                                          [  2%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_floats PASSED                                              [  5%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_integers PASSED                                            [  7%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_single_value PASSED                                        [ 10%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_strings PASSED                                             [ 12%]\r\n... (34 more tests) ...\r\ntests/test_database.py::TestNotifications::test_play_notification_sound_windows PASSED                           [100%]\r\n\r\n================================================= 40 passed in 1.03s ==================================================\r\n```\r\n\r\n### Flexible Credential File Configuration **NEW in v1.3.2**\r\n\r\n- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory\r\n- **Environment-specific files**: Use different credential files for dev/staging/production\r\n- **Fully backwards compatible**: Existing projects continue to work without any changes\r\n\r\n```python\r\nimport os\r\n\r\n# Use centralized credentials directory\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads C:\\Documents\\Credentials\\.env\r\n\r\n# Use custom filename for database credentials\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads ./.env.database\r\n\r\n# Combine both for full flexibility\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()  # Loads C:\\Documents\\Project\\Assets\\.env.production\r\n```\r\n\r\n### Continuous Integration Setup **NEW in v1.3.0**\r\n\r\nFor automated testing across multiple environments, add this GitHub Actions workflow:\r\n\r\n```yaml\r\n# .github/workflows/test.yml\r\nname: Tests\r\non: [push, pull_request]\r\njobs:\r\n  test:\r\n    runs-on: ${{ matrix.os }}\r\n    strategy:\r\n      matrix:\r\n        os: [ubuntu-latest, windows-latest, macos-latest]\r\n        python-version: [3.8, 3.9, \"3.10\", \"3.11\", \"3.12\"]\r\n    steps:\r\n    - uses: actions/checkout@v4\r\n    - name: Set up Python ${{ matrix.python-version }}\r\n      uses: actions/setup-python@v4\r\n      with:\r\n        python-version: ${{ matrix.python-version }}\r\n    - name: Install dependencies\r\n      run: |\r\n        python -m pip install --upgrade pip\r\n        pip install pytest pytest-cov\r\n        pip install -e .\r\n    - name: Run tests\r\n      run: python -m pytest tests/ -v --tb=short\r\n```\r\n\r\n### Test Coverage Goals\r\n\r\n- **Target Coverage**: >90% (currently achieved)\r\n- **Critical Functions**: 100% coverage for database operations\r\n- **Error Paths**: All exception handling paths tested\r\n- **Edge Cases**: Empty lists, missing files, invalid configurations\r\n\r\n## Advanced Usage\r\n\r\n### SSL Security Configuration **v1.2.0**\r\n\r\n#### For AWS RDS (Recommended)\r\n```python\r\nfrom pg_helpers import createPostgresqlEngineWithCustomSSL, check_ssl_connection\r\n\r\n# Download RDS CA certificate first\r\n# wget https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem\r\n\r\n# Maximum security configuration\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_ca_cert=\"./certs/rds-ca-2019-root.pem\",\r\n    ssl_mode=\"verify-full\"\r\n)\r\n\r\n# Test the secure connection\r\nssl_info = check_ssl_connection(engine)\r\nprint(f\"SSL Cipher: {ssl_info.get('ssl_cipher')}\")\r\nprint(f\"SSL Version: {ssl_info.get('ssl_version')}\")\r\n```\r\n\r\n#### Environment-based SSL Configuration\r\n```python\r\n# Set in .env file:\r\n# DB_SSL_MODE=verify-full\r\n# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\r\n\r\nfrom pg_helpers import createPostgresqlEngine, check_ssl_connection\r\n\r\n# Uses environment variables\r\nengine = createPostgresqlEngine()\r\n\r\n# Verify security\r\nssl_info = check_ssl_connection(engine)\r\nif ssl_info.get('ssl_active'):\r\n    print(\"\u2705 Secure SSL connection established\")\r\nelse:\r\n    print(\"\u26a0\ufe0f SSL not active - check configuration\")\r\n```\r\n\r\n#### SSL Mode Comparison\r\n```python\r\n# Different security levels for different environments\r\n\r\n# Development (basic encryption)\r\ndev_engine = createPostgresqlEngineWithCustomSSL(ssl_mode=\"require\")\r\n\r\n# Staging (certificate verification)\r\nstaging_engine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_mode=\"verify-ca\",\r\n    ssl_ca_cert=\"./certs/staging-ca.pem\"\r\n)\r\n\r\n# Production (maximum security)\r\nprod_engine = createPostgresqlEngineWithCustomSSL(\r\n    ssl_mode=\"verify-full\",\r\n    ssl_ca_cert=\"./certs/rds-ca-2019-root.pem\"\r\n)\r\n```\r\n\r\n### Debugging and Troubleshooting\r\n```python\r\nfrom pg_helpers import createPostgresqlEngine, dataGrabber, diagnose_connection_and_query\r\n\r\nengine = createPostgresqlEngine()\r\n\r\n# Enable detailed debugging for problematic queries\r\ndata = dataGrabber(\"\"\"\r\n    SELECT * FROM complex_view \r\n    WHERE date_column > '2023-01-01'\r\n\"\"\", engine, debug=True)\r\n\r\n# Diagnose specific issues\r\ndiagnostics = diagnose_connection_and_query(engine, \"SELECT * FROM problematic_table\")\r\nprint(f\"Engine info: {diagnostics['engine_info']}\")\r\nprint(f\"Test results: {diagnostics['test_results']}\")\r\nprint(f\"Recommendations: {diagnostics['recommendations']}\")\r\n```\r\n\r\n### Batch Query Processing\r\n```python\r\nfrom pg_helpers import recursiveDataGrabber, queryCleaner\r\n\r\n# Prepare multiple queries\r\nqueries = {\r\n    'user_stats': queryCleaner('sql/user_stats.sql', startDate='2023-01-01'),\r\n    'sales_data': queryCleaner('sql/sales.sql', list1=[1,2,3], varString1='$REGIONS'),\r\n    'inventory': 'SELECT * FROM inventory WHERE status = \"active\"'\r\n}\r\n\r\n# Execute with automatic retry\r\nresults = {}\r\nfinal_results = recursiveDataGrabber(queries, results)\r\n\r\n# Access individual results\r\nuser_df = final_results['user_stats']\r\nsales_df = final_results['sales_data']\r\n```\r\n\r\n## Credential Management Strategies **NEW in v1.3.2**\r\n\r\n### Strategy 1: Centralized Credentials for Multiple Projects\r\n\r\nKeep one credential directory for all your projects:\r\n\r\n```\r\nC:\\Users\\Me\\Documents\\Credentials\\\r\n\u251c\u2500\u2500 .env                    # General/shared credentials\r\n\u251c\u2500\u2500 .env.database          # Database-specific credentials  \r\n\u251c\u2500\u2500 .env.api               # API credentials\r\n\u2514\u2500\u2500 .env.production        # Production environment\r\n\r\nProject1\\\r\n\u251c\u2500\u2500 analysis.py\r\n\u2514\u2500\u2500 (no .env file needed)\r\n\r\nProject2\\\r\n\u251c\u2500\u2500 dashboard.py\r\n\u2514\u2500\u2500 (no .env file needed)\r\n```\r\n\r\n**Setup in each project:**\r\n```python\r\nimport os\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Users\\Me\\Documents\\Credentials'\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n### Strategy 2: Environment-Specific Credentials\r\n\r\nUse different credential files for different environments:\r\n\r\n```python\r\nimport os\r\n\r\n# Detect environment (from ENV variable, config file, etc.)\r\nenvironment = os.getenv('APP_ENV', 'development')\r\n\r\n# Set credential file based on environment\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nos.environ['CREDENTIALS_FILE'] = f'.env.{environment}'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n# Loads .env.development, .env.staging, or .env.production\r\n```\r\n\r\n### Strategy 3: Project-Specific Overrides\r\n\r\nUse centralized credentials with project-specific overrides:\r\n\r\n```python\r\nimport os\r\nfrom pathlib import Path\r\n\r\n# Try project-local first, fall back to centralized\r\nif Path('.env.local').exists():\r\n    os.environ['CREDENTIALS_FILE'] = '.env.local'\r\nelse:\r\n    os.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\n    os.environ['CREDENTIALS_FILE'] = '.env.database'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n### Best Practices for Credential Files\r\n\r\n**Security:**\r\n- \u2705 Store credential directories outside of project repositories\r\n- \u2705 Use restrictive file permissions (e.g., `chmod 600` on Linux/macOS)\r\n- \u2705 Never commit credential file paths to version control\r\n- \u2705 Document credential file locations in team wikis, not in code\r\n\r\n**Organization:**\r\n- \ud83d\udcc1 Use descriptive filenames: `.env.database`, `.env.api`, `.env.aws`\r\n- \ud83d\udcc1 Group by environment: `.env.production`, `.env.staging`\r\n- \ud83d\udcc1 Keep a `.env.template` in your repository with dummy values\r\n\r\n**Team Workflows:**\r\n```python\r\n# config.py (committed to repository)\r\nimport os\r\nfrom pathlib import Path\r\n\r\ndef setup_credentials():\r\n    \"\"\"Load credentials from standard team location or local override\"\"\"\r\n    # Check for local override first (for development)\r\n    if Path('.env.local').exists():\r\n        os.environ['CREDENTIALS_FILE'] = '.env.local'\r\n        return\r\n    \r\n    # Use team shared credentials location\r\n    team_creds = Path.home() / 'TeamShared' / 'Credentials'\r\n    if team_creds.exists():\r\n        os.environ['CREDENTIALS_DIR'] = str(team_creds)\r\n        os.environ['CREDENTIALS_FILE'] = '.env.database'\r\n        return\r\n    \r\n    # Fallback to default behavior\r\n    print(\"\u26a0\ufe0f No credentials found - using .env in current directory\")\r\n\r\n# Call this before importing pg_helpers\r\nsetup_credentials()\r\n\r\n## Security Best Practices\r\n\r\n### SSL/TLS Configuration\r\n- **Always use SSL in production** - Set `DB_SSL_MODE=require` minimum\r\n- **Use CA certificate verification** for cloud databases (AWS RDS, GCP Cloud SQL)\r\n- **Download official CA certificates** from your cloud provider\r\n- **Use `verify-full` mode** for maximum security in production\r\n- **Test SSL connections** regularly with `check_ssl_connection()`\r\n\r\n### AWS RDS Security Setup\r\n```bash\r\n# Download AWS RDS CA certificate\r\nmkdir -p ./certs\r\nwget -O ./certs/rds-ca-2019-root.pem \\\r\n  https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem\r\n\r\n# Set environment variables\r\necho \"DB_SSL_MODE=verify-full\" >> .env\r\necho \"DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\" >> .env\r\n```\r\n\r\n### General Security\r\n- **Never commit `.env` files** - add to `.gitignore`\r\n- **Use environment variables** for all credentials\r\n- **Rotate database passwords** regularly\r\n- **Use database connection pooling** for production\r\n- **Monitor SSL certificate expiration** dates\r\n\r\n## File Structure for Your Projects\r\n\r\n```\r\nyour_project/\r\n\u251c\u2500\u2500 .env                 # Database credentials (DO NOT COMMIT)\r\n\u251c\u2500\u2500 .gitignore          # Include .env and certs/\r\n\u251c\u2500\u2500 main.py             # Your analysis code\r\n\u251c\u2500\u2500 certs/              # SSL certificates (DO NOT COMMIT private keys)\r\n\u2502   \u251c\u2500\u2500 rds-ca-2019-root.pem\r\n\u2502   \u2514\u2500\u2500 custom-ca.pem\r\n\u251c\u2500\u2500 queries/            # SQL template files\r\n\u2502   \u251c\u2500\u2500 user_analysis.sql\r\n\u2502   \u251c\u2500\u2500 sales_report.sql\r\n\u2502   \u2514\u2500\u2500 inventory.sql\r\n\u251c\u2500\u2500 data/              # Output data files\r\n\u2502   \u2514\u2500\u2500 results.pkl\r\n\u2514\u2500\u2500 tests/             # Test files (if contributing)\r\n    \u2514\u2500\u2500 test_database.py\r\n```\r\n\r\n## Troubleshooting\r\n\r\n### Common Issues\r\n\r\n**SSL certificate verification failed**\r\n- Ensure CA certificate file exists and is readable\r\n- Download the latest CA certificate from your provider\r\n- Check that `ssl_mode` is set correctly\r\n- Use `check_ssl_connection()` to diagnose SSL issues\r\n\r\n**\"SSL connection required\"**\r\n- Check that your database server supports SSL\r\n- Verify `DB_SSL_MODE` is set correctly\r\n- For development, you can temporarily use `ssl_mode=\"disable\"`\r\n\r\n**\"Missing required environment variables\"**\r\n- Ensure `.env` file exists in your project root\r\n- Check that all required variables are set: `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`\r\n\r\n**Connection timeouts**\r\n- The `recursiveDataGrabber` will automatically retry with exponential backoff\r\n- Check network connectivity and database server status\r\n- Consider increasing `max_attempts` for very unreliable connections\r\n\r\n**SQLAlchemy/pandas compatibility errors** **IMPROVED in v1.1.0**\r\n- The package now automatically handles most compatibility issues\r\n- Enable `debug=True` to see which fallback method succeeds\r\n- Use `diagnose_connection_and_query()` for detailed troubleshooting\r\n\r\n**\"immutabledict\" or metadata interpretation errors** **FIXED in v1.1.0**\r\n- These errors are now automatically detected and handled\r\n- The package will try alternative pandas parameters and manual DataFrame construction\r\n- No action needed from users - fallback methods handle this automatically\r\n\r\n**Sound notifications not working**\r\n- This is normal and won't affect functionality\r\n- Ensure system sound is enabled\r\n- On Linux, you may need to install additional audio packages\r\n\r\n**Test failures during development** **NEW in v1.3.0**\r\n- Ensure all dependencies are installed: `pip install pytest pytest-cov`\r\n- Check that you're running tests from the project root directory\r\n- Use `python -m pytest tests/ -v` for detailed output\r\n- Some tests require write access to temporary directories\r\n\r\n## Changelog\r\n\r\n### Version 1.3.3 (Current) \ud83d\udcc1\r\n- \u2705 **Updated installation instructions**: Recommend using 'pip install pg_helpers'\r\n\r\n### Version 1.3.2\r\n- \ud83d\udcc1 **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- \ud83d\udd04 **Enhanced configuration**: Specify custom .env file paths for different projects and environments\r\n- \ud83c\udfe2 **Centralized credentials**: Share credential directories across multiple projects\r\n- \ud83c\udf0d **Environment-specific files**: Easy support for .env.production, .env.staging, .env.dev\r\n- \u2705 **Backwards compatible**: Defaults to `.env` in current directory when no variables set\r\n- \ud83d\udcd6 **Documentation**: New credential management strategies and best practices guide\r\n\r\n### Version 1.3.1\r\n- \ud83e\uddea **Comprehensive test suite**: 40+ test cases with >90% code coverage\r\n- \u2705 **Cross-platform validation**: Tests confirm functionality on Windows, macOS, and Linux\r\n- \ud83d\udd27 **API improvements**: `test_ssl_connection()` renamed to `check_ssl_connection()` for clarity\r\n- \ud83d\ude80 **CI/CD ready**: GitHub Actions workflow for automated testing across environments\r\n- \ud83d\udccb **Enhanced documentation**: Detailed testing guide and troubleshooting section\r\n- \ud83d\udee1\ufe0f **Quality assurance**: All functions tested including error conditions and edge cases\r\n\r\n### Version 1.2.0\r\n- \ud83d\udd12 **SSL/TLS support**: Full SSL encryption with optional CA certificate verification\r\n- \ud83d\udee1\ufe0f **Security enhancements**: Man-in-the-middle attack prevention for production environments  \r\n- \u2705 **SSL testing**: New SSL connection diagnostics\r\n- \ud83d\udd27 **Custom SSL configuration**: Programmatic SSL parameter override\r\n- \ud83d\udccb **Environment SSL config**: Optional SSL settings via environment variables\r\n- \ud83d\udd04 **Backward compatibility**: Existing code continues to work without changes\r\n\r\n### Version 1.1.0\r\n- \u2728 **Enhanced error handling**: Multiple fallback methods for pandas/SQLAlchemy compatibility\r\n- \ud83d\udd0d **Improved debugging**: Comprehensive logging and diagnostic capabilities\r\n- \ud83d\udee0\ufe0f **Better reliability**: Automatic detection and handling of metadata interpretation errors\r\n- \ud83d\udcca **Manual DataFrame construction**: Fallback method for complex data type issues\r\n- \ud83d\udd27 **Alternative parameter testing**: Tries different pandas configurations automatically\r\n\r\n### Version 1.0.0\r\n- Initial release with core functionality\r\n- Basic retry logic and PostgreSQL integration\r\n- Query templating and notification system\r\n\r\n## Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch: `git checkout -b feature-name`\r\n3. Add tests for new functionality: `python -m pytest tests/ -v`\r\n4. Ensure all tests pass and maintain >90% coverage\r\n5. Run the full test suite: `python -m pytest tests/ --cov=pg_helpers`\r\n6. Submit a pull request\r\n\r\n### Development Setup\r\n```bash\r\ngit clone https://github.com/lenwood/pg_helpers.git\r\ncd pg_helpers\r\npip install -e .\r\npip install pytest pytest-cov\r\npython -m pytest tests/ -v\r\n```\r\n\r\n## License\r\n\r\nMIT License - feel free to use in your projects!\r\n\r\n## Author\r\n\r\nChris Leonard\r\n\r\n---\r\n\r\n*This package was designed for data analysts and engineers who work with PostgreSQL databases and need reliable, automated query execution with enterprise-grade security. Version 1.3.2 adds flexible credential management while ensuring this reliability through comprehensive testing across multiple platforms and Python versions.*\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "PostgreSQL helper functions for data analysis with enterprise-grade security",
    "version": "1.3.3",
    "project_urls": {
        "Bug Reports": "https://github.com/lenwood/pg_helpers/issues",
        "Documentation": "https://github.com/lenwood/pg_helpers#readme",
        "Homepage": "https://github.com/lenwood/pg_helpers",
        "Source": "https://github.com/lenwood/pg_helpers"
    },
    "split_keywords": [
        "postgresql",
        " postgres",
        " database",
        " sql",
        " pandas",
        " data-analysis",
        " ssl",
        " security",
        " retry-logic",
        " sqlalchemy",
        " data-science"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dc751eed47b6154a22ef3cbc45ffb73a97a70e81d78b1b5d588a1aa7e1fb8a63",
                "md5": "529ae981de3cc8d28e143b7953443fc7",
                "sha256": "160d4343c2c8bd4dbbb2927cc23b4a5ffb429cb64710868aac36186af2b095cc"
            },
            "downloads": -1,
            "filename": "pg_helpers-1.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "529ae981de3cc8d28e143b7953443fc7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 25930,
            "upload_time": "2025-10-10T20:15:00",
            "upload_time_iso_8601": "2025-10-10T20:15:00.147935Z",
            "url": "https://files.pythonhosted.org/packages/dc/75/1eed47b6154a22ef3cbc45ffb73a97a70e81d78b1b5d588a1aa7e1fb8a63/pg_helpers-1.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c4882cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e",
                "md5": "52fd348fa8262a8d7607ee0a00a744a9",
                "sha256": "1a828c0b393e469df5dbba5c00388b6b8d69d9d86999d2ea54904acefaa46607"
            },
            "downloads": -1,
            "filename": "pg_helpers-1.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "52fd348fa8262a8d7607ee0a00a744a9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 33823,
            "upload_time": "2025-10-10T20:15:01",
            "upload_time_iso_8601": "2025-10-10T20:15:01.365676Z",
            "url": "https://files.pythonhosted.org/packages/c4/88/2cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e/pg_helpers-1.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-10 20:15:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lenwood",
    "github_project": "pg_helpers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "psycopg2-binary",
            "specs": [
                [
                    ">=",
                    "2.9.0"
                ]
            ]
        },
        {
            "name": "sqlalchemy",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "0.19.0"
                ]
            ]
        }
    ],
    "lcname": "pg-helpers"
}
        
Elapsed time: 2.55974s