# PostgreSQL Helper Functions
A Python package providing robust utilities for PostgreSQL database operations, query management, and data analysis workflows with enterprise-grade security.
## Features
- ๐ **Automatic retry logic** with exponential backoff for unreliable connections
- ๐ **Seamless pandas integration** for data analysis
- ๐ง **SQL query templating** and parameter substitution
- ๐ **Cross-platform notifications** when long queries complete
- โ๏ธ **Environment-based configuration** for secure credential management
- ๐ก๏ธ **Comprehensive error handling** and logging
- ๐ **SSL/TLS encryption** with CA certificate verification for AWS RDS and other cloud databases
- **Updated in v1.3.3**
- โ
**Recommended install via PyPi**: Use 'pip install pg_helpers' to install
- **Enhanced in v1.3.2**
- ๐ **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- ๐ **Enhanced configuration**: Specify custom .env file paths for different projects and environments
- ๐ข **Centralized credentials**: Share credential directories across multiple projects
- **Enhanced in v1.3.1:**
- ๐งช **Comprehensive test suite** with 40+ test cases covering all functionality
- โ
**Cross-platform validation** ensuring reliability on Windows, macOS, and Linux
- ๐ **Function name improvements** for better API clarity
- **Previous in v1.2.0:**
- ๐ **Full SSL support** with optional CA certificate verification
- ๐ก๏ธ **Man-in-the-middle attack prevention** for production environments
- โ
**SSL connection testing** and diagnostics
- **Previous in v1.1.0:**
- ๐ **Advanced fallback methods** for SQLAlchemy/pandas compatibility issues
- ๐ **Enhanced debugging** and diagnostic capabilities
## What's New in Version 1.3.2 ๐งช
### Flexible Credential File Configuration
- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory
- **Environment-specific files**: Use different credential files for dev/staging/production
- **Fully backwards compatible**: Existing projects continue to work without any changes
```python
import os
# Use centralized credentials directory
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads C:\Documents\Credentials\.env
# Use custom filename for database credentials
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads ./.env.database
# Combine both for full flexibility
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads C:\Documents\Project\Assets\.env.production
```
## Installation
### From PyPi (Recommended)
```bash
pip install pg_helpers
```
### From Source (Development)
```bash
git clone https://github.com/lenwood/pg_helpers.git
cd pg_helpers
pip install -e .
```
### From GitHub (Specific Version)
```bash
pip install git+https://github.com/lenwood/pg_helpers.git@v1.3.3
```
### Dependencies
```bash
pip install pandas psycopg2-binary sqlalchemy python-dotenv
```
## Quick Start
### 1. Environment Setup
Create a `.env` file in your project root (NOT in this repository):
```env
# Required database credentials
DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=your_host
DB_PORT=5432
DB_NAME=your_database
# SSL Configuration (optional)
# SSL Mode options: disable, allow, prefer, require, verify-ca, verify-full
DB_SSL_MODE=require
# Optional: Path to CA certificate for maximum security
# DB_SSL_CA_CERT=/path/to/rds-ca-2019-root.pem
# For AWS RDS with maximum security:
# DB_SSL_MODE=verify-full
# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem
```
#### Flexible Credential Location Setup **NEW in v1.3.2**
You can now specify custom locations for your credential files using environment variables:
**Option 1: Centralized credential directory**
```python
import os
# Point to a shared credentials folder
os.environ['CREDENTIALS_DIR'] = r'C:\Users\Me\Documents\Credentials'
# Will load: C:\Users\Me\Documents\Credentials\.env
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```
**Option 2: Custom credential filename**
```python
import os
# Use a specific credential file name in current directory
os.environ['CREDENTIALS_FILE'] = '.env.database'
# Will load: ./.env.database
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```
**Option 3: Both custom directory and filename**
```python
import os
# Full custom path
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
# Will load: C:\Documents\Project\Assets\.env.production
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```
**Configuration Priority:**
- Both `CREDENTIALS_DIR` and `CREDENTIALS_FILE` set โ `{CREDENTIALS_DIR}/{CREDENTIALS_FILE}`
- Only `CREDENTIALS_DIR` set โ `{CREDENTIALS_DIR}/.env`
- Only `CREDENTIALS_FILE` set โ `./{CREDENTIALS_FILE}`
- Neither set (default) โ `./.env`
**Benefits of flexible credentials:**
- ๐ **Centralized security**: Keep all credentials in one secure location
- ๐ข **Team workflows**: Share credential directories across team members
- ๐ **Environment-specific**: Use `.env.production`, `.env.staging`, `.env.dev`
- ๐ **Multi-project**: Reuse credentials across multiple analysis projects
- โ
**Backwards compatible**: Existing code works without changes
### 2. Basic Usage
```python
from pg_helpers import createPostgresqlEngine, dataGrabber, queryCleaner, check_ssl_connection
# Create secure database connection (uses SSL by default)
engine = createPostgresqlEngine()
# Verify SSL connection
ssl_info = check_ssl_connection(engine)
print(f"SSL Active: {ssl_info.get('ssl_active', 'Unknown')}")
# Execute a simple query
data = dataGrabber("SELECT * FROM users LIMIT 10", engine)
print(data.head())
# Enable debugging for troubleshooting
data = dataGrabber("SELECT * FROM complex_view", engine, debug=True)
# Use query templates
query = queryCleaner(
'queries/user_analysis.sql',
list1=[100, 200, 300],
varString1='$USER_IDS',
startDate='2023-01-01',
endDate='2023-12-31'
)
results = dataGrabber(query, engine)
```
### 3. Custom SSL Configuration
```python
from pg_helpers import createPostgresqlEngineWithCustomSSL
# Maximum security for production
engine = createPostgresqlEngineWithCustomSSL(
ssl_ca_cert="/path/to/rds-ca-2019-root.pem",
ssl_mode="verify-full"
)
# Basic SSL without certificate verification
engine = createPostgresqlEngineWithCustomSSL(
ssl_mode="require"
)
# Disable SSL (not recommended for production)
engine = createPostgresqlEngineWithCustomSSL(
ssl_mode="disable"
)
```
## Function Reference
### Database Operations
#### `createPostgresqlEngine()`
Creates a SQLAlchemy engine for PostgreSQL connections using environment variables with SSL support.
**Returns:** `sqlalchemy.Engine`
**Environment variables:**
- **Required:** `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`
- **Optional:** `DB_PORT` (default: 5432)
- **SSL Options:**
- `DB_SSL_MODE` (default: "require")
- `DB_SSL_CA_CERT` (path to CA certificate)
- `DB_SSL_CERT` (client certificate)
- `DB_SSL_KEY` (client key)
#### `createPostgresqlEngineWithCustomSSL(ssl_ca_cert=None, ssl_mode='require', ssl_cert=None, ssl_key=None)` **v1.2.0**
Creates a SQLAlchemy engine with custom SSL configuration, overriding environment variables.
**Parameters:**
- `ssl_ca_cert` (str, optional): Path to CA certificate file
- `ssl_mode` (str): SSL mode - "disable", "allow", "prefer", "require", "verify-ca", "verify-full"
- `ssl_cert` (str, optional): Path to client certificate file
- `ssl_key` (str, optional): Path to client key file
**Returns:** `sqlalchemy.Engine`
**SSL Modes Explained:**
- `disable`: No SSL connection
- `allow`: Try SSL, fallback to non-SSL
- `prefer`: Try SSL first, fallback to non-SSL
- `require`: **Default** - Require SSL, fail if unavailable
- `verify-ca`: Require SSL and verify server certificate
- `verify-full`: **Most secure** - Require SSL, verify certificate and hostname
#### `check_ssl_connection(engine=None)` **v1.2.0, improved v1.3.0**
Tests SSL connection and displays SSL information. **(Renamed from `test_ssl_connection` in v1.3.0 for API clarity)**
**Parameters:**
- `engine` (optional): SQLAlchemy engine (creates one if not provided)
**Returns:** `dict` with SSL connection details including:
- `ssl_active`: Whether SSL is active
- `ssl_cipher`: Encryption cipher used
- `ssl_version`: SSL/TLS version
- `client_cert_present`: Whether client certificate is present
#### `dataGrabber(query, engine, limit='None', debug=False)` **ENHANCED in v1.1.0**
Executes SQL queries and returns pandas DataFrames with robust error handling.
**Parameters:**
- `query` (str): SQL query to execute
- `engine`: SQLAlchemy engine
- `limit` (str, optional): Row limit for results
- `debug` (bool, optional): Enable detailed logging and debugging output
**Returns:** `pandas.DataFrame`
**Features:**
- Multiple fallback methods for compatibility issues
- Automatic detection and handling of metadata interpretation errors
- Comprehensive error logging and debugging
- Execution timing display
- Cross-platform sound notifications
- Automatic error propagation for retry logic
#### `recursiveDataGrabber(query_dict, results_dict, n=1, max_attempts=50)`
Executes multiple queries with automatic retry and exponential backoff.
**Parameters:**
- `query_dict` (dict): Dictionary of {query_name: sql_string}
- `results_dict` (dict): Dictionary to store results
- `n` (int): Current attempt number
- `max_attempts` (int): Maximum retry attempts
**Returns:** `dict` with DataFrames or None for failed queries
**Use case:** Perfect for running multiple large queries overnight with unreliable connections.
#### `diagnose_connection_and_query(engine, query, limit=10)` **ENHANCED in v1.1.0**
Diagnostic function to help troubleshoot SQLAlchemy/pandas compatibility issues.
**Parameters:**
- `engine`: SQLAlchemy engine
- `query` (str): SQL query to diagnose
- `limit` (int): Number of rows to test with
**Returns:** `dict` with diagnostic information
### Query Utilities
#### `queryCleaner(file, list1='empty', varString1='empty', ...)`
Loads SQL files and substitutes parameters for dynamic queries.
**Parameters:**
- `file` (str): Path to SQL file
- `list1`: List to substitute (converts to comma-separated string)
- `varString1` (str): Placeholder string in SQL file
- `startDate/endDate`: Date range parameters
**Returns:** `str` - Processed SQL query
#### `listPrep(iList)`
Converts Python lists to SQL-compatible comma-separated strings.
**Parameters:**
- `iList`: List of integers, floats, strings, or single value
**Returns:** `str` - SQL-formatted string
## Testing and Quality Assurance **NEW in v1.3.0**
### Running Tests
The package includes a comprehensive test suite with 40+ test cases covering all functionality:
```bash
# Install test dependencies
pip install pytest pytest-cov
# Run all tests
python -m pytest tests/ -v
# Run tests with coverage report
python -m pytest tests/ --cov=pg_helpers --cov-report=html --cov-report=term-missing
# Run specific test categories
python -m pytest tests/test_database.py::TestQueryUtils -v # Query utilities
python -m pytest tests/test_database.py::TestConfig -v # Configuration
python -m pytest tests/test_database.py::TestDatabase -v # Database operations
python -m pytest tests/test_database.py::TestNotifications -v # Cross-platform notifications
```
### What the Tests Validate
#### **Core Functionality Tests**
- โ
**Database engine creation** with various SSL configurations
- โ
**Query execution** with multiple fallback methods
- โ
**SSL connection testing** and diagnostics
- โ
**Configuration validation** and environment variable handling
- โ
**Query template processing** with parameter substitution
- โ
**Error handling** for network issues and compatibility problems
#### **Cross-Platform Compatibility**
- โ
**File operations** on Windows, macOS, and Linux
- โ
**Sound notifications** across different operating systems
- โ
**Environment variable handling** with different shell environments
- โ
**Temporary file management** and cleanup
#### **Error Condition Testing**
- โ
**Missing environment variables** and configuration errors
- โ
**SSL certificate validation** failures and missing files
- โ
**Database connection failures** and retry logic
- โ
**Pandas/SQLAlchemy compatibility** issues and fallback methods
- โ
**Invalid query parameters** and malformed SQL
#### **Integration Testing**
- โ
**End-to-end workflows** combining multiple functions
- โ
**Configuration to engine creation** pipelines
- โ
**Query templating to execution** workflows
### Test Output Example
```bash
================================================= test session starts =================================================
platform win32 -- Python 3.12.9, pytest-8.4.1, pluggy-1.5.0
collected 40 items
tests/test_database.py::TestQueryUtils::test_listPrep_empty_list PASSED [ 2%]
tests/test_database.py::TestQueryUtils::test_listPrep_floats PASSED [ 5%]
tests/test_database.py::TestQueryUtils::test_listPrep_integers PASSED [ 7%]
tests/test_database.py::TestQueryUtils::test_listPrep_single_value PASSED [ 10%]
tests/test_database.py::TestQueryUtils::test_listPrep_strings PASSED [ 12%]
... (34 more tests) ...
tests/test_database.py::TestNotifications::test_play_notification_sound_windows PASSED [100%]
================================================= 40 passed in 1.03s ==================================================
```
### Flexible Credential File Configuration **NEW in v1.3.2**
- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory
- **Environment-specific files**: Use different credential files for dev/staging/production
- **Fully backwards compatible**: Existing projects continue to work without any changes
```python
import os
# Use centralized credentials directory
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads C:\Documents\Credentials\.env
# Use custom filename for database credentials
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads ./.env.database
# Combine both for full flexibility
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Project\Assets'
os.environ['CREDENTIALS_FILE'] = '.env.production'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine() # Loads C:\Documents\Project\Assets\.env.production
```
### Continuous Integration Setup **NEW in v1.3.0**
For automated testing across multiple environments, add this GitHub Actions workflow:
```yaml
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: [3.8, 3.9, "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest pytest-cov
pip install -e .
- name: Run tests
run: python -m pytest tests/ -v --tb=short
```
### Test Coverage Goals
- **Target Coverage**: >90% (currently achieved)
- **Critical Functions**: 100% coverage for database operations
- **Error Paths**: All exception handling paths tested
- **Edge Cases**: Empty lists, missing files, invalid configurations
## Advanced Usage
### SSL Security Configuration **v1.2.0**
#### For AWS RDS (Recommended)
```python
from pg_helpers import createPostgresqlEngineWithCustomSSL, check_ssl_connection
# Download RDS CA certificate first
# wget https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem
# Maximum security configuration
engine = createPostgresqlEngineWithCustomSSL(
ssl_ca_cert="./certs/rds-ca-2019-root.pem",
ssl_mode="verify-full"
)
# Test the secure connection
ssl_info = check_ssl_connection(engine)
print(f"SSL Cipher: {ssl_info.get('ssl_cipher')}")
print(f"SSL Version: {ssl_info.get('ssl_version')}")
```
#### Environment-based SSL Configuration
```python
# Set in .env file:
# DB_SSL_MODE=verify-full
# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem
from pg_helpers import createPostgresqlEngine, check_ssl_connection
# Uses environment variables
engine = createPostgresqlEngine()
# Verify security
ssl_info = check_ssl_connection(engine)
if ssl_info.get('ssl_active'):
print("โ
Secure SSL connection established")
else:
print("โ ๏ธ SSL not active - check configuration")
```
#### SSL Mode Comparison
```python
# Different security levels for different environments
# Development (basic encryption)
dev_engine = createPostgresqlEngineWithCustomSSL(ssl_mode="require")
# Staging (certificate verification)
staging_engine = createPostgresqlEngineWithCustomSSL(
ssl_mode="verify-ca",
ssl_ca_cert="./certs/staging-ca.pem"
)
# Production (maximum security)
prod_engine = createPostgresqlEngineWithCustomSSL(
ssl_mode="verify-full",
ssl_ca_cert="./certs/rds-ca-2019-root.pem"
)
```
### Debugging and Troubleshooting
```python
from pg_helpers import createPostgresqlEngine, dataGrabber, diagnose_connection_and_query
engine = createPostgresqlEngine()
# Enable detailed debugging for problematic queries
data = dataGrabber("""
SELECT * FROM complex_view
WHERE date_column > '2023-01-01'
""", engine, debug=True)
# Diagnose specific issues
diagnostics = diagnose_connection_and_query(engine, "SELECT * FROM problematic_table")
print(f"Engine info: {diagnostics['engine_info']}")
print(f"Test results: {diagnostics['test_results']}")
print(f"Recommendations: {diagnostics['recommendations']}")
```
### Batch Query Processing
```python
from pg_helpers import recursiveDataGrabber, queryCleaner
# Prepare multiple queries
queries = {
'user_stats': queryCleaner('sql/user_stats.sql', startDate='2023-01-01'),
'sales_data': queryCleaner('sql/sales.sql', list1=[1,2,3], varString1='$REGIONS'),
'inventory': 'SELECT * FROM inventory WHERE status = "active"'
}
# Execute with automatic retry
results = {}
final_results = recursiveDataGrabber(queries, results)
# Access individual results
user_df = final_results['user_stats']
sales_df = final_results['sales_data']
```
## Credential Management Strategies **NEW in v1.3.2**
### Strategy 1: Centralized Credentials for Multiple Projects
Keep one credential directory for all your projects:
```
C:\Users\Me\Documents\Credentials\
โโโ .env # General/shared credentials
โโโ .env.database # Database-specific credentials
โโโ .env.api # API credentials
โโโ .env.production # Production environment
Project1\
โโโ analysis.py
โโโ (no .env file needed)
Project2\
โโโ dashboard.py
โโโ (no .env file needed)
```
**Setup in each project:**
```python
import os
os.environ['CREDENTIALS_DIR'] = r'C:\Users\Me\Documents\Credentials'
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```
### Strategy 2: Environment-Specific Credentials
Use different credential files for different environments:
```python
import os
# Detect environment (from ENV variable, config file, etc.)
environment = os.getenv('APP_ENV', 'development')
# Set credential file based on environment
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
os.environ['CREDENTIALS_FILE'] = f'.env.{environment}'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
# Loads .env.development, .env.staging, or .env.production
```
### Strategy 3: Project-Specific Overrides
Use centralized credentials with project-specific overrides:
```python
import os
from pathlib import Path
# Try project-local first, fall back to centralized
if Path('.env.local').exists():
os.environ['CREDENTIALS_FILE'] = '.env.local'
else:
os.environ['CREDENTIALS_DIR'] = r'C:\Documents\Credentials'
os.environ['CREDENTIALS_FILE'] = '.env.database'
from pg_helpers import createPostgresqlEngine
engine = createPostgresqlEngine()
```
### Best Practices for Credential Files
**Security:**
- โ
Store credential directories outside of project repositories
- โ
Use restrictive file permissions (e.g., `chmod 600` on Linux/macOS)
- โ
Never commit credential file paths to version control
- โ
Document credential file locations in team wikis, not in code
**Organization:**
- ๐ Use descriptive filenames: `.env.database`, `.env.api`, `.env.aws`
- ๐ Group by environment: `.env.production`, `.env.staging`
- ๐ Keep a `.env.template` in your repository with dummy values
**Team Workflows:**
```python
# config.py (committed to repository)
import os
from pathlib import Path
def setup_credentials():
"""Load credentials from standard team location or local override"""
# Check for local override first (for development)
if Path('.env.local').exists():
os.environ['CREDENTIALS_FILE'] = '.env.local'
return
# Use team shared credentials location
team_creds = Path.home() / 'TeamShared' / 'Credentials'
if team_creds.exists():
os.environ['CREDENTIALS_DIR'] = str(team_creds)
os.environ['CREDENTIALS_FILE'] = '.env.database'
return
# Fallback to default behavior
print("โ ๏ธ No credentials found - using .env in current directory")
# Call this before importing pg_helpers
setup_credentials()
## Security Best Practices
### SSL/TLS Configuration
- **Always use SSL in production** - Set `DB_SSL_MODE=require` minimum
- **Use CA certificate verification** for cloud databases (AWS RDS, GCP Cloud SQL)
- **Download official CA certificates** from your cloud provider
- **Use `verify-full` mode** for maximum security in production
- **Test SSL connections** regularly with `check_ssl_connection()`
### AWS RDS Security Setup
```bash
# Download AWS RDS CA certificate
mkdir -p ./certs
wget -O ./certs/rds-ca-2019-root.pem \
https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem
# Set environment variables
echo "DB_SSL_MODE=verify-full" >> .env
echo "DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem" >> .env
```
### General Security
- **Never commit `.env` files** - add to `.gitignore`
- **Use environment variables** for all credentials
- **Rotate database passwords** regularly
- **Use database connection pooling** for production
- **Monitor SSL certificate expiration** dates
## File Structure for Your Projects
```
your_project/
โโโ .env # Database credentials (DO NOT COMMIT)
โโโ .gitignore # Include .env and certs/
โโโ main.py # Your analysis code
โโโ certs/ # SSL certificates (DO NOT COMMIT private keys)
โ โโโ rds-ca-2019-root.pem
โ โโโ custom-ca.pem
โโโ queries/ # SQL template files
โ โโโ user_analysis.sql
โ โโโ sales_report.sql
โ โโโ inventory.sql
โโโ data/ # Output data files
โ โโโ results.pkl
โโโ tests/ # Test files (if contributing)
โโโ test_database.py
```
## Troubleshooting
### Common Issues
**SSL certificate verification failed**
- Ensure CA certificate file exists and is readable
- Download the latest CA certificate from your provider
- Check that `ssl_mode` is set correctly
- Use `check_ssl_connection()` to diagnose SSL issues
**"SSL connection required"**
- Check that your database server supports SSL
- Verify `DB_SSL_MODE` is set correctly
- For development, you can temporarily use `ssl_mode="disable"`
**"Missing required environment variables"**
- Ensure `.env` file exists in your project root
- Check that all required variables are set: `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`
**Connection timeouts**
- The `recursiveDataGrabber` will automatically retry with exponential backoff
- Check network connectivity and database server status
- Consider increasing `max_attempts` for very unreliable connections
**SQLAlchemy/pandas compatibility errors** **IMPROVED in v1.1.0**
- The package now automatically handles most compatibility issues
- Enable `debug=True` to see which fallback method succeeds
- Use `diagnose_connection_and_query()` for detailed troubleshooting
**"immutabledict" or metadata interpretation errors** **FIXED in v1.1.0**
- These errors are now automatically detected and handled
- The package will try alternative pandas parameters and manual DataFrame construction
- No action needed from users - fallback methods handle this automatically
**Sound notifications not working**
- This is normal and won't affect functionality
- Ensure system sound is enabled
- On Linux, you may need to install additional audio packages
**Test failures during development** **NEW in v1.3.0**
- Ensure all dependencies are installed: `pip install pytest pytest-cov`
- Check that you're running tests from the project root directory
- Use `python -m pytest tests/ -v` for detailed output
- Some tests require write access to temporary directories
## Changelog
### Version 1.3.3 (Current) ๐
- โ
**Updated installation instructions**: Recommend using 'pip install pg_helpers'
### Version 1.3.2
- ๐ **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables
- ๐ **Enhanced configuration**: Specify custom .env file paths for different projects and environments
- ๐ข **Centralized credentials**: Share credential directories across multiple projects
- ๐ **Environment-specific files**: Easy support for .env.production, .env.staging, .env.dev
- โ
**Backwards compatible**: Defaults to `.env` in current directory when no variables set
- ๐ **Documentation**: New credential management strategies and best practices guide
### Version 1.3.1
- ๐งช **Comprehensive test suite**: 40+ test cases with >90% code coverage
- โ
**Cross-platform validation**: Tests confirm functionality on Windows, macOS, and Linux
- ๐ง **API improvements**: `test_ssl_connection()` renamed to `check_ssl_connection()` for clarity
- ๐ **CI/CD ready**: GitHub Actions workflow for automated testing across environments
- ๐ **Enhanced documentation**: Detailed testing guide and troubleshooting section
- ๐ก๏ธ **Quality assurance**: All functions tested including error conditions and edge cases
### Version 1.2.0
- ๐ **SSL/TLS support**: Full SSL encryption with optional CA certificate verification
- ๐ก๏ธ **Security enhancements**: Man-in-the-middle attack prevention for production environments
- โ
**SSL testing**: New SSL connection diagnostics
- ๐ง **Custom SSL configuration**: Programmatic SSL parameter override
- ๐ **Environment SSL config**: Optional SSL settings via environment variables
- ๐ **Backward compatibility**: Existing code continues to work without changes
### Version 1.1.0
- โจ **Enhanced error handling**: Multiple fallback methods for pandas/SQLAlchemy compatibility
- ๐ **Improved debugging**: Comprehensive logging and diagnostic capabilities
- ๐ ๏ธ **Better reliability**: Automatic detection and handling of metadata interpretation errors
- ๐ **Manual DataFrame construction**: Fallback method for complex data type issues
- ๐ง **Alternative parameter testing**: Tries different pandas configurations automatically
### Version 1.0.0
- Initial release with core functionality
- Basic retry logic and PostgreSQL integration
- Query templating and notification system
## Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Add tests for new functionality: `python -m pytest tests/ -v`
4. Ensure all tests pass and maintain >90% coverage
5. Run the full test suite: `python -m pytest tests/ --cov=pg_helpers`
6. Submit a pull request
### Development Setup
```bash
git clone https://github.com/lenwood/pg_helpers.git
cd pg_helpers
pip install -e .
pip install pytest pytest-cov
python -m pytest tests/ -v
```
## License
MIT License - feel free to use in your projects!
## Author
Chris Leonard
---
*This package was designed for data analysts and engineers who work with PostgreSQL databases and need reliable, automated query execution with enterprise-grade security. Version 1.3.2 adds flexible credential management while ensuring this reliability through comprehensive testing across multiple platforms and Python versions.*
Raw data
{
"_id": null,
"home_page": "https://github.com/lenwood/pg_helpers",
"name": "pg-helpers",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "postgresql, postgres, database, sql, pandas, data-analysis, ssl, security, retry-logic, sqlalchemy, data-science",
"author": "Chris Leonard",
"author_email": "lenwood@duck.com",
"download_url": "https://files.pythonhosted.org/packages/c4/88/2cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e/pg_helpers-1.3.3.tar.gz",
"platform": null,
"description": "# PostgreSQL Helper Functions\r\n\r\nA Python package providing robust utilities for PostgreSQL database operations, query management, and data analysis workflows with enterprise-grade security.\r\n\r\n## Features\r\n\r\n- \ud83d\udd04 **Automatic retry logic** with exponential backoff for unreliable connections\r\n- \ud83d\udcca **Seamless pandas integration** for data analysis\r\n- \ud83d\udd27 **SQL query templating** and parameter substitution\r\n- \ud83d\udd0a **Cross-platform notifications** when long queries complete\r\n- \u2699\ufe0f **Environment-based configuration** for secure credential management\r\n- \ud83d\udee1\ufe0f **Comprehensive error handling** and logging\r\n- \ud83d\udd12 **SSL/TLS encryption** with CA certificate verification for AWS RDS and other cloud databases\r\n- **Updated in v1.3.3**\r\n - \u2705 **Recommended install via PyPi**: Use 'pip install pg_helpers' to install\r\n- **Enhanced in v1.3.2**\r\n - \ud83d\udcc1 **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n - \ud83d\udd04 **Enhanced configuration**: Specify custom .env file paths for different projects and environments\r\n - \ud83c\udfe2 **Centralized credentials**: Share credential directories across multiple projects\r\n- **Enhanced in v1.3.1:**\r\n - \ud83e\uddea **Comprehensive test suite** with 40+ test cases covering all functionality\r\n - \u2705 **Cross-platform validation** ensuring reliability on Windows, macOS, and Linux\r\n - \ud83d\udd0d **Function name improvements** for better API clarity\r\n- **Previous in v1.2.0:**\r\n - \ud83d\udd10 **Full SSL support** with optional CA certificate verification\r\n - \ud83d\udee1\ufe0f **Man-in-the-middle attack prevention** for production environments\r\n - \u2705 **SSL connection testing** and diagnostics\r\n- **Previous in v1.1.0:**\r\n - \ud83d\ude80 **Advanced fallback methods** for SQLAlchemy/pandas compatibility issues\r\n - \ud83d\udd0d **Enhanced debugging** and diagnostic capabilities\r\n\r\n## What's New in Version 1.3.2 \ud83e\uddea\r\n\r\n### Flexible Credential File Configuration\r\n- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory\r\n- **Environment-specific files**: Use different credential files for dev/staging/production\r\n- **Fully backwards compatible**: Existing projects continue to work without any changes\r\n```python\r\nimport os\r\n\r\n# Use centralized credentials directory\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads C:\\Documents\\Credentials\\.env\r\n\r\n# Use custom filename for database credentials\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads ./.env.database\r\n\r\n# Combine both for full flexibility\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads C:\\Documents\\Project\\Assets\\.env.production\r\n```\r\n\r\n## Installation\r\n\r\n### From PyPi (Recommended)\r\n```bash\r\npip install pg_helpers\r\n```\r\n\r\n### From Source (Development)\r\n```bash\r\ngit clone https://github.com/lenwood/pg_helpers.git\r\ncd pg_helpers\r\npip install -e .\r\n```\r\n\r\n### From GitHub (Specific Version)\r\n```bash\r\npip install git+https://github.com/lenwood/pg_helpers.git@v1.3.3\r\n```\r\n\r\n### Dependencies\r\n```bash\r\npip install pandas psycopg2-binary sqlalchemy python-dotenv\r\n```\r\n\r\n## Quick Start\r\n\r\n### 1. Environment Setup\r\n\r\nCreate a `.env` file in your project root (NOT in this repository):\r\n\r\n```env\r\n# Required database credentials\r\nDB_USER=your_username\r\nDB_PASSWORD=your_password\r\nDB_HOST=your_host\r\nDB_PORT=5432\r\nDB_NAME=your_database\r\n\r\n# SSL Configuration (optional)\r\n# SSL Mode options: disable, allow, prefer, require, verify-ca, verify-full\r\nDB_SSL_MODE=require\r\n\r\n# Optional: Path to CA certificate for maximum security\r\n# DB_SSL_CA_CERT=/path/to/rds-ca-2019-root.pem\r\n\r\n# For AWS RDS with maximum security:\r\n# DB_SSL_MODE=verify-full\r\n# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\r\n```\r\n\r\n#### Flexible Credential Location Setup **NEW in v1.3.2**\r\n\r\nYou can now specify custom locations for your credential files using environment variables:\r\n\r\n**Option 1: Centralized credential directory**\r\n```python\r\nimport os\r\n# Point to a shared credentials folder\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Users\\Me\\Documents\\Credentials'\r\n# Will load: C:\\Users\\Me\\Documents\\Credentials\\.env\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Option 2: Custom credential filename**\r\n```python\r\nimport os\r\n# Use a specific credential file name in current directory\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\n# Will load: ./.env.database\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Option 3: Both custom directory and filename**\r\n```python\r\nimport os\r\n# Full custom path\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\n# Will load: C:\\Documents\\Project\\Assets\\.env.production\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n**Configuration Priority:**\r\n- Both `CREDENTIALS_DIR` and `CREDENTIALS_FILE` set \u2192 `{CREDENTIALS_DIR}/{CREDENTIALS_FILE}`\r\n- Only `CREDENTIALS_DIR` set \u2192 `{CREDENTIALS_DIR}/.env`\r\n- Only `CREDENTIALS_FILE` set \u2192 `./{CREDENTIALS_FILE}`\r\n- Neither set (default) \u2192 `./.env`\r\n\r\n**Benefits of flexible credentials:**\r\n- \ud83d\udd10 **Centralized security**: Keep all credentials in one secure location\r\n- \ud83c\udfe2 **Team workflows**: Share credential directories across team members\r\n- \ud83c\udf0d **Environment-specific**: Use `.env.production`, `.env.staging`, `.env.dev`\r\n- \ud83d\udd04 **Multi-project**: Reuse credentials across multiple analysis projects\r\n- \u2705 **Backwards compatible**: Existing code works without changes\r\n\r\n### 2. Basic Usage\r\n\r\n```python\r\nfrom pg_helpers import createPostgresqlEngine, dataGrabber, queryCleaner, check_ssl_connection\r\n\r\n# Create secure database connection (uses SSL by default)\r\nengine = createPostgresqlEngine()\r\n\r\n# Verify SSL connection\r\nssl_info = check_ssl_connection(engine)\r\nprint(f\"SSL Active: {ssl_info.get('ssl_active', 'Unknown')}\")\r\n\r\n# Execute a simple query\r\ndata = dataGrabber(\"SELECT * FROM users LIMIT 10\", engine)\r\nprint(data.head())\r\n\r\n# Enable debugging for troubleshooting\r\ndata = dataGrabber(\"SELECT * FROM complex_view\", engine, debug=True)\r\n\r\n# Use query templates\r\nquery = queryCleaner(\r\n 'queries/user_analysis.sql',\r\n list1=[100, 200, 300],\r\n varString1='$USER_IDS',\r\n startDate='2023-01-01',\r\n endDate='2023-12-31'\r\n)\r\nresults = dataGrabber(query, engine)\r\n```\r\n\r\n### 3. Custom SSL Configuration\r\n\r\n```python\r\nfrom pg_helpers import createPostgresqlEngineWithCustomSSL\r\n\r\n# Maximum security for production\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n ssl_ca_cert=\"/path/to/rds-ca-2019-root.pem\",\r\n ssl_mode=\"verify-full\"\r\n)\r\n\r\n# Basic SSL without certificate verification\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n ssl_mode=\"require\"\r\n)\r\n\r\n# Disable SSL (not recommended for production)\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n ssl_mode=\"disable\"\r\n)\r\n```\r\n\r\n## Function Reference\r\n\r\n### Database Operations\r\n\r\n#### `createPostgresqlEngine()`\r\nCreates a SQLAlchemy engine for PostgreSQL connections using environment variables with SSL support.\r\n\r\n**Returns:** `sqlalchemy.Engine`\r\n\r\n**Environment variables:**\r\n- **Required:** `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`\r\n- **Optional:** `DB_PORT` (default: 5432)\r\n- **SSL Options:**\r\n - `DB_SSL_MODE` (default: \"require\")\r\n - `DB_SSL_CA_CERT` (path to CA certificate)\r\n - `DB_SSL_CERT` (client certificate)\r\n - `DB_SSL_KEY` (client key)\r\n\r\n#### `createPostgresqlEngineWithCustomSSL(ssl_ca_cert=None, ssl_mode='require', ssl_cert=None, ssl_key=None)` **v1.2.0**\r\nCreates a SQLAlchemy engine with custom SSL configuration, overriding environment variables.\r\n\r\n**Parameters:**\r\n- `ssl_ca_cert` (str, optional): Path to CA certificate file\r\n- `ssl_mode` (str): SSL mode - \"disable\", \"allow\", \"prefer\", \"require\", \"verify-ca\", \"verify-full\"\r\n- `ssl_cert` (str, optional): Path to client certificate file\r\n- `ssl_key` (str, optional): Path to client key file\r\n\r\n**Returns:** `sqlalchemy.Engine`\r\n\r\n**SSL Modes Explained:**\r\n- `disable`: No SSL connection\r\n- `allow`: Try SSL, fallback to non-SSL\r\n- `prefer`: Try SSL first, fallback to non-SSL\r\n- `require`: **Default** - Require SSL, fail if unavailable\r\n- `verify-ca`: Require SSL and verify server certificate\r\n- `verify-full`: **Most secure** - Require SSL, verify certificate and hostname\r\n\r\n#### `check_ssl_connection(engine=None)` **v1.2.0, improved v1.3.0**\r\nTests SSL connection and displays SSL information. **(Renamed from `test_ssl_connection` in v1.3.0 for API clarity)**\r\n\r\n**Parameters:**\r\n- `engine` (optional): SQLAlchemy engine (creates one if not provided)\r\n\r\n**Returns:** `dict` with SSL connection details including:\r\n- `ssl_active`: Whether SSL is active\r\n- `ssl_cipher`: Encryption cipher used\r\n- `ssl_version`: SSL/TLS version\r\n- `client_cert_present`: Whether client certificate is present\r\n\r\n#### `dataGrabber(query, engine, limit='None', debug=False)` **ENHANCED in v1.1.0**\r\nExecutes SQL queries and returns pandas DataFrames with robust error handling.\r\n\r\n**Parameters:**\r\n- `query` (str): SQL query to execute\r\n- `engine`: SQLAlchemy engine\r\n- `limit` (str, optional): Row limit for results\r\n- `debug` (bool, optional): Enable detailed logging and debugging output\r\n\r\n**Returns:** `pandas.DataFrame`\r\n\r\n**Features:**\r\n- Multiple fallback methods for compatibility issues\r\n- Automatic detection and handling of metadata interpretation errors\r\n- Comprehensive error logging and debugging\r\n- Execution timing display\r\n- Cross-platform sound notifications\r\n- Automatic error propagation for retry logic\r\n\r\n#### `recursiveDataGrabber(query_dict, results_dict, n=1, max_attempts=50)`\r\nExecutes multiple queries with automatic retry and exponential backoff.\r\n\r\n**Parameters:**\r\n- `query_dict` (dict): Dictionary of {query_name: sql_string}\r\n- `results_dict` (dict): Dictionary to store results\r\n- `n` (int): Current attempt number\r\n- `max_attempts` (int): Maximum retry attempts\r\n\r\n**Returns:** `dict` with DataFrames or None for failed queries\r\n\r\n**Use case:** Perfect for running multiple large queries overnight with unreliable connections.\r\n\r\n#### `diagnose_connection_and_query(engine, query, limit=10)` **ENHANCED in v1.1.0**\r\nDiagnostic function to help troubleshoot SQLAlchemy/pandas compatibility issues.\r\n\r\n**Parameters:**\r\n- `engine`: SQLAlchemy engine\r\n- `query` (str): SQL query to diagnose\r\n- `limit` (int): Number of rows to test with\r\n\r\n**Returns:** `dict` with diagnostic information\r\n\r\n### Query Utilities\r\n\r\n#### `queryCleaner(file, list1='empty', varString1='empty', ...)`\r\nLoads SQL files and substitutes parameters for dynamic queries.\r\n\r\n**Parameters:**\r\n- `file` (str): Path to SQL file\r\n- `list1`: List to substitute (converts to comma-separated string)\r\n- `varString1` (str): Placeholder string in SQL file\r\n- `startDate/endDate`: Date range parameters\r\n\r\n**Returns:** `str` - Processed SQL query\r\n\r\n#### `listPrep(iList)`\r\nConverts Python lists to SQL-compatible comma-separated strings.\r\n\r\n**Parameters:**\r\n- `iList`: List of integers, floats, strings, or single value\r\n\r\n**Returns:** `str` - SQL-formatted string\r\n\r\n## Testing and Quality Assurance **NEW in v1.3.0**\r\n\r\n### Running Tests\r\n\r\nThe package includes a comprehensive test suite with 40+ test cases covering all functionality:\r\n\r\n```bash\r\n# Install test dependencies\r\npip install pytest pytest-cov\r\n\r\n# Run all tests\r\npython -m pytest tests/ -v\r\n\r\n# Run tests with coverage report\r\npython -m pytest tests/ --cov=pg_helpers --cov-report=html --cov-report=term-missing\r\n\r\n# Run specific test categories\r\npython -m pytest tests/test_database.py::TestQueryUtils -v # Query utilities\r\npython -m pytest tests/test_database.py::TestConfig -v # Configuration\r\npython -m pytest tests/test_database.py::TestDatabase -v # Database operations\r\npython -m pytest tests/test_database.py::TestNotifications -v # Cross-platform notifications\r\n```\r\n\r\n### What the Tests Validate\r\n\r\n#### **Core Functionality Tests**\r\n- \u2705 **Database engine creation** with various SSL configurations\r\n- \u2705 **Query execution** with multiple fallback methods\r\n- \u2705 **SSL connection testing** and diagnostics\r\n- \u2705 **Configuration validation** and environment variable handling\r\n- \u2705 **Query template processing** with parameter substitution\r\n- \u2705 **Error handling** for network issues and compatibility problems\r\n\r\n#### **Cross-Platform Compatibility**\r\n- \u2705 **File operations** on Windows, macOS, and Linux\r\n- \u2705 **Sound notifications** across different operating systems\r\n- \u2705 **Environment variable handling** with different shell environments\r\n- \u2705 **Temporary file management** and cleanup\r\n\r\n#### **Error Condition Testing**\r\n- \u2705 **Missing environment variables** and configuration errors\r\n- \u2705 **SSL certificate validation** failures and missing files\r\n- \u2705 **Database connection failures** and retry logic\r\n- \u2705 **Pandas/SQLAlchemy compatibility** issues and fallback methods\r\n- \u2705 **Invalid query parameters** and malformed SQL\r\n\r\n#### **Integration Testing**\r\n- \u2705 **End-to-end workflows** combining multiple functions\r\n- \u2705 **Configuration to engine creation** pipelines\r\n- \u2705 **Query templating to execution** workflows\r\n\r\n### Test Output Example\r\n\r\n```bash\r\n================================================= test session starts =================================================\r\nplatform win32 -- Python 3.12.9, pytest-8.4.1, pluggy-1.5.0\r\ncollected 40 items\r\n\r\ntests/test_database.py::TestQueryUtils::test_listPrep_empty_list PASSED [ 2%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_floats PASSED [ 5%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_integers PASSED [ 7%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_single_value PASSED [ 10%]\r\ntests/test_database.py::TestQueryUtils::test_listPrep_strings PASSED [ 12%]\r\n... (34 more tests) ...\r\ntests/test_database.py::TestNotifications::test_play_notification_sound_windows PASSED [100%]\r\n\r\n================================================= 40 passed in 1.03s ==================================================\r\n```\r\n\r\n### Flexible Credential File Configuration **NEW in v1.3.2**\r\n\r\n- **Custom credential locations**: Use `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- **Perfect for shared credentials**: Point multiple projects to a centralized credential directory\r\n- **Environment-specific files**: Use different credential files for dev/staging/production\r\n- **Fully backwards compatible**: Existing projects continue to work without any changes\r\n\r\n```python\r\nimport os\r\n\r\n# Use centralized credentials directory\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads C:\\Documents\\Credentials\\.env\r\n\r\n# Use custom filename for database credentials\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads ./.env.database\r\n\r\n# Combine both for full flexibility\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Project\\Assets'\r\nos.environ['CREDENTIALS_FILE'] = '.env.production'\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine() # Loads C:\\Documents\\Project\\Assets\\.env.production\r\n```\r\n\r\n### Continuous Integration Setup **NEW in v1.3.0**\r\n\r\nFor automated testing across multiple environments, add this GitHub Actions workflow:\r\n\r\n```yaml\r\n# .github/workflows/test.yml\r\nname: Tests\r\non: [push, pull_request]\r\njobs:\r\n test:\r\n runs-on: ${{ matrix.os }}\r\n strategy:\r\n matrix:\r\n os: [ubuntu-latest, windows-latest, macos-latest]\r\n python-version: [3.8, 3.9, \"3.10\", \"3.11\", \"3.12\"]\r\n steps:\r\n - uses: actions/checkout@v4\r\n - name: Set up Python ${{ matrix.python-version }}\r\n uses: actions/setup-python@v4\r\n with:\r\n python-version: ${{ matrix.python-version }}\r\n - name: Install dependencies\r\n run: |\r\n python -m pip install --upgrade pip\r\n pip install pytest pytest-cov\r\n pip install -e .\r\n - name: Run tests\r\n run: python -m pytest tests/ -v --tb=short\r\n```\r\n\r\n### Test Coverage Goals\r\n\r\n- **Target Coverage**: >90% (currently achieved)\r\n- **Critical Functions**: 100% coverage for database operations\r\n- **Error Paths**: All exception handling paths tested\r\n- **Edge Cases**: Empty lists, missing files, invalid configurations\r\n\r\n## Advanced Usage\r\n\r\n### SSL Security Configuration **v1.2.0**\r\n\r\n#### For AWS RDS (Recommended)\r\n```python\r\nfrom pg_helpers import createPostgresqlEngineWithCustomSSL, check_ssl_connection\r\n\r\n# Download RDS CA certificate first\r\n# wget https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem\r\n\r\n# Maximum security configuration\r\nengine = createPostgresqlEngineWithCustomSSL(\r\n ssl_ca_cert=\"./certs/rds-ca-2019-root.pem\",\r\n ssl_mode=\"verify-full\"\r\n)\r\n\r\n# Test the secure connection\r\nssl_info = check_ssl_connection(engine)\r\nprint(f\"SSL Cipher: {ssl_info.get('ssl_cipher')}\")\r\nprint(f\"SSL Version: {ssl_info.get('ssl_version')}\")\r\n```\r\n\r\n#### Environment-based SSL Configuration\r\n```python\r\n# Set in .env file:\r\n# DB_SSL_MODE=verify-full\r\n# DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\r\n\r\nfrom pg_helpers import createPostgresqlEngine, check_ssl_connection\r\n\r\n# Uses environment variables\r\nengine = createPostgresqlEngine()\r\n\r\n# Verify security\r\nssl_info = check_ssl_connection(engine)\r\nif ssl_info.get('ssl_active'):\r\n print(\"\u2705 Secure SSL connection established\")\r\nelse:\r\n print(\"\u26a0\ufe0f SSL not active - check configuration\")\r\n```\r\n\r\n#### SSL Mode Comparison\r\n```python\r\n# Different security levels for different environments\r\n\r\n# Development (basic encryption)\r\ndev_engine = createPostgresqlEngineWithCustomSSL(ssl_mode=\"require\")\r\n\r\n# Staging (certificate verification)\r\nstaging_engine = createPostgresqlEngineWithCustomSSL(\r\n ssl_mode=\"verify-ca\",\r\n ssl_ca_cert=\"./certs/staging-ca.pem\"\r\n)\r\n\r\n# Production (maximum security)\r\nprod_engine = createPostgresqlEngineWithCustomSSL(\r\n ssl_mode=\"verify-full\",\r\n ssl_ca_cert=\"./certs/rds-ca-2019-root.pem\"\r\n)\r\n```\r\n\r\n### Debugging and Troubleshooting\r\n```python\r\nfrom pg_helpers import createPostgresqlEngine, dataGrabber, diagnose_connection_and_query\r\n\r\nengine = createPostgresqlEngine()\r\n\r\n# Enable detailed debugging for problematic queries\r\ndata = dataGrabber(\"\"\"\r\n SELECT * FROM complex_view \r\n WHERE date_column > '2023-01-01'\r\n\"\"\", engine, debug=True)\r\n\r\n# Diagnose specific issues\r\ndiagnostics = diagnose_connection_and_query(engine, \"SELECT * FROM problematic_table\")\r\nprint(f\"Engine info: {diagnostics['engine_info']}\")\r\nprint(f\"Test results: {diagnostics['test_results']}\")\r\nprint(f\"Recommendations: {diagnostics['recommendations']}\")\r\n```\r\n\r\n### Batch Query Processing\r\n```python\r\nfrom pg_helpers import recursiveDataGrabber, queryCleaner\r\n\r\n# Prepare multiple queries\r\nqueries = {\r\n 'user_stats': queryCleaner('sql/user_stats.sql', startDate='2023-01-01'),\r\n 'sales_data': queryCleaner('sql/sales.sql', list1=[1,2,3], varString1='$REGIONS'),\r\n 'inventory': 'SELECT * FROM inventory WHERE status = \"active\"'\r\n}\r\n\r\n# Execute with automatic retry\r\nresults = {}\r\nfinal_results = recursiveDataGrabber(queries, results)\r\n\r\n# Access individual results\r\nuser_df = final_results['user_stats']\r\nsales_df = final_results['sales_data']\r\n```\r\n\r\n## Credential Management Strategies **NEW in v1.3.2**\r\n\r\n### Strategy 1: Centralized Credentials for Multiple Projects\r\n\r\nKeep one credential directory for all your projects:\r\n\r\n```\r\nC:\\Users\\Me\\Documents\\Credentials\\\r\n\u251c\u2500\u2500 .env # General/shared credentials\r\n\u251c\u2500\u2500 .env.database # Database-specific credentials \r\n\u251c\u2500\u2500 .env.api # API credentials\r\n\u2514\u2500\u2500 .env.production # Production environment\r\n\r\nProject1\\\r\n\u251c\u2500\u2500 analysis.py\r\n\u2514\u2500\u2500 (no .env file needed)\r\n\r\nProject2\\\r\n\u251c\u2500\u2500 dashboard.py\r\n\u2514\u2500\u2500 (no .env file needed)\r\n```\r\n\r\n**Setup in each project:**\r\n```python\r\nimport os\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Users\\Me\\Documents\\Credentials'\r\nos.environ['CREDENTIALS_FILE'] = '.env.database'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n### Strategy 2: Environment-Specific Credentials\r\n\r\nUse different credential files for different environments:\r\n\r\n```python\r\nimport os\r\n\r\n# Detect environment (from ENV variable, config file, etc.)\r\nenvironment = os.getenv('APP_ENV', 'development')\r\n\r\n# Set credential file based on environment\r\nos.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\nos.environ['CREDENTIALS_FILE'] = f'.env.{environment}'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n# Loads .env.development, .env.staging, or .env.production\r\n```\r\n\r\n### Strategy 3: Project-Specific Overrides\r\n\r\nUse centralized credentials with project-specific overrides:\r\n\r\n```python\r\nimport os\r\nfrom pathlib import Path\r\n\r\n# Try project-local first, fall back to centralized\r\nif Path('.env.local').exists():\r\n os.environ['CREDENTIALS_FILE'] = '.env.local'\r\nelse:\r\n os.environ['CREDENTIALS_DIR'] = r'C:\\Documents\\Credentials'\r\n os.environ['CREDENTIALS_FILE'] = '.env.database'\r\n\r\nfrom pg_helpers import createPostgresqlEngine\r\nengine = createPostgresqlEngine()\r\n```\r\n\r\n### Best Practices for Credential Files\r\n\r\n**Security:**\r\n- \u2705 Store credential directories outside of project repositories\r\n- \u2705 Use restrictive file permissions (e.g., `chmod 600` on Linux/macOS)\r\n- \u2705 Never commit credential file paths to version control\r\n- \u2705 Document credential file locations in team wikis, not in code\r\n\r\n**Organization:**\r\n- \ud83d\udcc1 Use descriptive filenames: `.env.database`, `.env.api`, `.env.aws`\r\n- \ud83d\udcc1 Group by environment: `.env.production`, `.env.staging`\r\n- \ud83d\udcc1 Keep a `.env.template` in your repository with dummy values\r\n\r\n**Team Workflows:**\r\n```python\r\n# config.py (committed to repository)\r\nimport os\r\nfrom pathlib import Path\r\n\r\ndef setup_credentials():\r\n \"\"\"Load credentials from standard team location or local override\"\"\"\r\n # Check for local override first (for development)\r\n if Path('.env.local').exists():\r\n os.environ['CREDENTIALS_FILE'] = '.env.local'\r\n return\r\n \r\n # Use team shared credentials location\r\n team_creds = Path.home() / 'TeamShared' / 'Credentials'\r\n if team_creds.exists():\r\n os.environ['CREDENTIALS_DIR'] = str(team_creds)\r\n os.environ['CREDENTIALS_FILE'] = '.env.database'\r\n return\r\n \r\n # Fallback to default behavior\r\n print(\"\u26a0\ufe0f No credentials found - using .env in current directory\")\r\n\r\n# Call this before importing pg_helpers\r\nsetup_credentials()\r\n\r\n## Security Best Practices\r\n\r\n### SSL/TLS Configuration\r\n- **Always use SSL in production** - Set `DB_SSL_MODE=require` minimum\r\n- **Use CA certificate verification** for cloud databases (AWS RDS, GCP Cloud SQL)\r\n- **Download official CA certificates** from your cloud provider\r\n- **Use `verify-full` mode** for maximum security in production\r\n- **Test SSL connections** regularly with `check_ssl_connection()`\r\n\r\n### AWS RDS Security Setup\r\n```bash\r\n# Download AWS RDS CA certificate\r\nmkdir -p ./certs\r\nwget -O ./certs/rds-ca-2019-root.pem \\\r\n https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem\r\n\r\n# Set environment variables\r\necho \"DB_SSL_MODE=verify-full\" >> .env\r\necho \"DB_SSL_CA_CERT=./certs/rds-ca-2019-root.pem\" >> .env\r\n```\r\n\r\n### General Security\r\n- **Never commit `.env` files** - add to `.gitignore`\r\n- **Use environment variables** for all credentials\r\n- **Rotate database passwords** regularly\r\n- **Use database connection pooling** for production\r\n- **Monitor SSL certificate expiration** dates\r\n\r\n## File Structure for Your Projects\r\n\r\n```\r\nyour_project/\r\n\u251c\u2500\u2500 .env # Database credentials (DO NOT COMMIT)\r\n\u251c\u2500\u2500 .gitignore # Include .env and certs/\r\n\u251c\u2500\u2500 main.py # Your analysis code\r\n\u251c\u2500\u2500 certs/ # SSL certificates (DO NOT COMMIT private keys)\r\n\u2502 \u251c\u2500\u2500 rds-ca-2019-root.pem\r\n\u2502 \u2514\u2500\u2500 custom-ca.pem\r\n\u251c\u2500\u2500 queries/ # SQL template files\r\n\u2502 \u251c\u2500\u2500 user_analysis.sql\r\n\u2502 \u251c\u2500\u2500 sales_report.sql\r\n\u2502 \u2514\u2500\u2500 inventory.sql\r\n\u251c\u2500\u2500 data/ # Output data files\r\n\u2502 \u2514\u2500\u2500 results.pkl\r\n\u2514\u2500\u2500 tests/ # Test files (if contributing)\r\n \u2514\u2500\u2500 test_database.py\r\n```\r\n\r\n## Troubleshooting\r\n\r\n### Common Issues\r\n\r\n**SSL certificate verification failed**\r\n- Ensure CA certificate file exists and is readable\r\n- Download the latest CA certificate from your provider\r\n- Check that `ssl_mode` is set correctly\r\n- Use `check_ssl_connection()` to diagnose SSL issues\r\n\r\n**\"SSL connection required\"**\r\n- Check that your database server supports SSL\r\n- Verify `DB_SSL_MODE` is set correctly\r\n- For development, you can temporarily use `ssl_mode=\"disable\"`\r\n\r\n**\"Missing required environment variables\"**\r\n- Ensure `.env` file exists in your project root\r\n- Check that all required variables are set: `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`\r\n\r\n**Connection timeouts**\r\n- The `recursiveDataGrabber` will automatically retry with exponential backoff\r\n- Check network connectivity and database server status\r\n- Consider increasing `max_attempts` for very unreliable connections\r\n\r\n**SQLAlchemy/pandas compatibility errors** **IMPROVED in v1.1.0**\r\n- The package now automatically handles most compatibility issues\r\n- Enable `debug=True` to see which fallback method succeeds\r\n- Use `diagnose_connection_and_query()` for detailed troubleshooting\r\n\r\n**\"immutabledict\" or metadata interpretation errors** **FIXED in v1.1.0**\r\n- These errors are now automatically detected and handled\r\n- The package will try alternative pandas parameters and manual DataFrame construction\r\n- No action needed from users - fallback methods handle this automatically\r\n\r\n**Sound notifications not working**\r\n- This is normal and won't affect functionality\r\n- Ensure system sound is enabled\r\n- On Linux, you may need to install additional audio packages\r\n\r\n**Test failures during development** **NEW in v1.3.0**\r\n- Ensure all dependencies are installed: `pip install pytest pytest-cov`\r\n- Check that you're running tests from the project root directory\r\n- Use `python -m pytest tests/ -v` for detailed output\r\n- Some tests require write access to temporary directories\r\n\r\n## Changelog\r\n\r\n### Version 1.3.3 (Current) \ud83d\udcc1\r\n- \u2705 **Updated installation instructions**: Recommend using 'pip install pg_helpers'\r\n\r\n### Version 1.3.2\r\n- \ud83d\udcc1 **Flexible credential file location**: Support for `CREDENTIALS_DIR` and `CREDENTIALS_FILE` environment variables\r\n- \ud83d\udd04 **Enhanced configuration**: Specify custom .env file paths for different projects and environments\r\n- \ud83c\udfe2 **Centralized credentials**: Share credential directories across multiple projects\r\n- \ud83c\udf0d **Environment-specific files**: Easy support for .env.production, .env.staging, .env.dev\r\n- \u2705 **Backwards compatible**: Defaults to `.env` in current directory when no variables set\r\n- \ud83d\udcd6 **Documentation**: New credential management strategies and best practices guide\r\n\r\n### Version 1.3.1\r\n- \ud83e\uddea **Comprehensive test suite**: 40+ test cases with >90% code coverage\r\n- \u2705 **Cross-platform validation**: Tests confirm functionality on Windows, macOS, and Linux\r\n- \ud83d\udd27 **API improvements**: `test_ssl_connection()` renamed to `check_ssl_connection()` for clarity\r\n- \ud83d\ude80 **CI/CD ready**: GitHub Actions workflow for automated testing across environments\r\n- \ud83d\udccb **Enhanced documentation**: Detailed testing guide and troubleshooting section\r\n- \ud83d\udee1\ufe0f **Quality assurance**: All functions tested including error conditions and edge cases\r\n\r\n### Version 1.2.0\r\n- \ud83d\udd12 **SSL/TLS support**: Full SSL encryption with optional CA certificate verification\r\n- \ud83d\udee1\ufe0f **Security enhancements**: Man-in-the-middle attack prevention for production environments \r\n- \u2705 **SSL testing**: New SSL connection diagnostics\r\n- \ud83d\udd27 **Custom SSL configuration**: Programmatic SSL parameter override\r\n- \ud83d\udccb **Environment SSL config**: Optional SSL settings via environment variables\r\n- \ud83d\udd04 **Backward compatibility**: Existing code continues to work without changes\r\n\r\n### Version 1.1.0\r\n- \u2728 **Enhanced error handling**: Multiple fallback methods for pandas/SQLAlchemy compatibility\r\n- \ud83d\udd0d **Improved debugging**: Comprehensive logging and diagnostic capabilities\r\n- \ud83d\udee0\ufe0f **Better reliability**: Automatic detection and handling of metadata interpretation errors\r\n- \ud83d\udcca **Manual DataFrame construction**: Fallback method for complex data type issues\r\n- \ud83d\udd27 **Alternative parameter testing**: Tries different pandas configurations automatically\r\n\r\n### Version 1.0.0\r\n- Initial release with core functionality\r\n- Basic retry logic and PostgreSQL integration\r\n- Query templating and notification system\r\n\r\n## Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch: `git checkout -b feature-name`\r\n3. Add tests for new functionality: `python -m pytest tests/ -v`\r\n4. Ensure all tests pass and maintain >90% coverage\r\n5. Run the full test suite: `python -m pytest tests/ --cov=pg_helpers`\r\n6. Submit a pull request\r\n\r\n### Development Setup\r\n```bash\r\ngit clone https://github.com/lenwood/pg_helpers.git\r\ncd pg_helpers\r\npip install -e .\r\npip install pytest pytest-cov\r\npython -m pytest tests/ -v\r\n```\r\n\r\n## License\r\n\r\nMIT License - feel free to use in your projects!\r\n\r\n## Author\r\n\r\nChris Leonard\r\n\r\n---\r\n\r\n*This package was designed for data analysts and engineers who work with PostgreSQL databases and need reliable, automated query execution with enterprise-grade security. Version 1.3.2 adds flexible credential management while ensuring this reliability through comprehensive testing across multiple platforms and Python versions.*\r\n",
"bugtrack_url": null,
"license": null,
"summary": "PostgreSQL helper functions for data analysis with enterprise-grade security",
"version": "1.3.3",
"project_urls": {
"Bug Reports": "https://github.com/lenwood/pg_helpers/issues",
"Documentation": "https://github.com/lenwood/pg_helpers#readme",
"Homepage": "https://github.com/lenwood/pg_helpers",
"Source": "https://github.com/lenwood/pg_helpers"
},
"split_keywords": [
"postgresql",
" postgres",
" database",
" sql",
" pandas",
" data-analysis",
" ssl",
" security",
" retry-logic",
" sqlalchemy",
" data-science"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dc751eed47b6154a22ef3cbc45ffb73a97a70e81d78b1b5d588a1aa7e1fb8a63",
"md5": "529ae981de3cc8d28e143b7953443fc7",
"sha256": "160d4343c2c8bd4dbbb2927cc23b4a5ffb429cb64710868aac36186af2b095cc"
},
"downloads": -1,
"filename": "pg_helpers-1.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "529ae981de3cc8d28e143b7953443fc7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 25930,
"upload_time": "2025-10-10T20:15:00",
"upload_time_iso_8601": "2025-10-10T20:15:00.147935Z",
"url": "https://files.pythonhosted.org/packages/dc/75/1eed47b6154a22ef3cbc45ffb73a97a70e81d78b1b5d588a1aa7e1fb8a63/pg_helpers-1.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c4882cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e",
"md5": "52fd348fa8262a8d7607ee0a00a744a9",
"sha256": "1a828c0b393e469df5dbba5c00388b6b8d69d9d86999d2ea54904acefaa46607"
},
"downloads": -1,
"filename": "pg_helpers-1.3.3.tar.gz",
"has_sig": false,
"md5_digest": "52fd348fa8262a8d7607ee0a00a744a9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 33823,
"upload_time": "2025-10-10T20:15:01",
"upload_time_iso_8601": "2025-10-10T20:15:01.365676Z",
"url": "https://files.pythonhosted.org/packages/c4/88/2cfdbcceb6cf97e902e78db66b4e2717665d5c4e461648a8ba33b3bd473e/pg_helpers-1.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-10 20:15:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lenwood",
"github_project": "pg_helpers",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "psycopg2-binary",
"specs": [
[
">=",
"2.9.0"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
">=",
"1.4.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"0.19.0"
]
]
}
],
"lcname": "pg-helpers"
}