# SQLAlchemy PostgreSQL Upsert
[](https://pypi.org/project/sqlalchemy-psql-upsert/)
[](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)
[](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
[](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)
A high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using temporary tables and advanced CTE algorithm.
## 🚀 Features
- **Temporary Table Strategy**: Uses PostgreSQL temporary tables for efficient staging and conflict analysis
- **Advanced CTE Logic**: Sophisticated Common Table Expression queries for multi-constraint conflict resolution
- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements
- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously
- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data
- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN values to PostgreSQL NULL values
- **Schema Validation**: Automatic table and column validation before operations
- **Comprehensive Logging**: Detailed debug information and progress tracking
## 📦 Installation
### Using Poetry (Recommended)
```bash
poetry add sqlalchemy-psql-upsert
```
### Using pip
```bash
pip install sqlalchemy-psql-upsert
```
## ⚙️ Configuration
### Database Privileges Requirements
Besides SELECT, INSERT, UPDATE permissions on target tables, this library requires PostgreSQL `TEMPORARY` privilege to function properly:
**Why Temporary Tables?**
- **Isolation**: Staging data doesn't interfere with production tables during analysis
- **Performance**: Bulk operations are faster on temporary tables
- **Safety**: Failed operations don't leave partial data in target tables
- **Atomicity**: Entire upsert operation happens in a single transaction
### Environment Variables
Create a `.env` file or set the following environment variables:
```bash
# PostgreSQL Configuration
PGHOST = localhost
PGPORT = 5432
PGDATABASE = your_database
PGUSER = your_username
PGPASSWORD = your_password
```
### Configuration Class
```python
from sqlalchemy_psql_upsert import PgConfig
# Default configuration from environment
config = PgConfig()
# Manual configuration
config = PgConfig(
host="localhost",
port="5432",
user="myuser",
password="mypass",
dbname="mydb"
)
print(config.uri()) # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb
```
## 🛠️ Quick Start
### Connection Testing
```python
from sqlalchemy_psql_upsert import test_connection
# Test default connection
success, message = test_connection()
if success:
print("✅ Database connection OK")
else:
print(f"❌ Connection failed: {message}")
```
### Privileges Verification
**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.
```python
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig
from sqlalchemy import create_engine
# Test connection and privileges
config = PgConfig()
try:
# This will automatically test temp table privileges
upserter = PostgresqlUpsert(config=config)
print("✅ Connection and privileges verified successfully")
except PermissionError as e:
print(f"❌ Privilege error: {e}")
print("Solution: Grant temporary privileges with:")
print("GRANT TEMPORARY ON DATABASE your_database TO your_user;")
except Exception as e:
print(f"❌ Connection failed: {e}")
```
**Grant Required Privileges:**
```sql
-- As database administrator, grant temporary table privileges
GRANT TEMPORARY ON DATABASE your_database TO your_user;
-- Alternatively, grant more comprehensive privileges
GRANT CREATE ON DATABASE your_database TO your_user;
```
### Basic Usage
```python
import pandas as pd
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig
# Configure database connection
config = PgConfig() # Loads from environment variables
upserter = PostgresqlUpsert(config=config)
# Prepare your data
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']
})
# Perform upsert
success, affected_rows = upserter.upsert_dataframe(
dataframe=df,
table_name='users',
schema='public'
)
print(f"✅ Upserted {affected_rows} rows successfully")
```
### Advanced Configuration
```python
from sqlalchemy import create_engine
# Using custom SQLAlchemy engine
engine = create_engine('postgresql://user:pass@localhost:5432/mydb')
upserter = PostgresqlUpsert(engine=engine, debug=True)
# Upsert with custom schema
success, affected_rows = upserter.upsert_dataframe(
dataframe=large_df,
table_name='products',
schema='inventory'
)
```
### Data Type Handling
The library automatically handles pandas data type conversions:
```python
import pandas as pd
import numpy as np
# DataFrame with NaN values
df = pd.DataFrame({
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', None, 'David'], # None values
'score': [85.5, np.nan, 92.0, 88.1], # NaN values
'active': [True, False, None, True] # Mixed types with None
})
# All NaN and None values are automatically converted to PostgreSQL NULL
upserter.upsert_dataframe(df, 'users')
# Result: NaN/None → NULL in PostgreSQL
```
## 🔍 How It Works
### Overview: Temporary Table + CTE + MERGE Strategy
This library uses a sophisticated 3-stage approach for reliable, high-performance upserts:
1. **Temporary Table Staging**: Data is first loaded into a PostgreSQL temporary table
2. **CTE-based Conflict Analysis**: Advanced Common Table Expressions analyze and resolve conflicts
3. **Atomic MERGE Operation**: Single transaction applies all changes using PostgreSQL MERGE
### Stage 1: Temporary Table Creation
```sql
-- Creates a temporary table with identical structure to target table
CREATE TEMP TABLE "temp_upsert_12345678" (
id INTEGER,
email VARCHAR,
doc_type VARCHAR,
doc_number VARCHAR
);
-- Bulk insert all DataFrame data into temporary table
INSERT INTO "temp_upsert_12345678" VALUES (...);
```
**Benefits:**
- ✅ Isolates staging data from target table
- ✅ Enables complex analysis without affecting production data
- ✅ Automatic cleanup when session ends
- ✅ High-performance bulk inserts
### Stage 2: CTE-based Conflict Analysis
The library generates sophisticated CTEs to handle complex conflict scenarios:
```sql
WITH temp_with_conflicts AS (
-- Analyze each temp row against ALL constraints simultaneously
SELECT temp.*, temp.ctid,
COUNT(DISTINCT target.id) AS conflict_targets,
MAX(target.id) AS conflicted_target_id
FROM "temp_upsert_12345678" temp
LEFT JOIN "public"."target_table" target
ON (temp.id = target.id) OR -- PK conflict
(temp.email = target.email) OR -- Unique conflict
(temp.doc_type = target.doc_type AND -- Composite unique conflict
temp.doc_number = target.doc_number)
GROUP BY temp.ctid, temp.id, temp.email, temp.doc_type, temp.doc_number
),
filtered AS (
-- Filter out rows that conflict with multiple target rows (ambiguous)
SELECT * FROM temp_with_conflicts
WHERE conflict_targets <= 1
),
ranked AS (
-- Deduplicate temp rows targeting the same existing row
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY COALESCE(conflicted_target_id, id)
ORDER BY ctid DESC
) AS row_rank
FROM filtered
),
clean_rows AS (
-- Final dataset: only the latest row per target
SELECT id, email, doc_type, doc_number
FROM ranked
WHERE row_rank = 1
)
-- Ready for MERGE...
```
**CTE Logic Breakdown:**
1. **`temp_with_conflicts`**: Joins temporary table against target table using ALL constraints simultaneously, counting how many existing rows each temp row conflicts with.
2. **`filtered`**: Removes ambiguous rows that would conflict with multiple existing records (keeps rows with conflict_targets <= 1) - [ _1 to many conflict_ ].
3. **`ranked`**: When multiple temp rows target the same existing record, rank these based on descend insertion order using table ctid - [ _many to 1 conflict_ ].
4. **`clean_rows`**: Keps only the last row inserted for each conflict. Final clean dataset ready for atomic upsert.
### Stage 3: Atomic MERGE Operation
```sql
MERGE INTO "public"."target_table" AS tgt
USING clean_rows AS src
ON (tgt.id = src.id) OR
(tgt.email = src.email) OR
(tgt.doc_type = src.doc_type AND tgt.doc_number = src.doc_number)
WHEN MATCHED THEN
UPDATE SET email = src.email, doc_type = src.doc_type, doc_number = src.doc_number
WHEN NOT MATCHED THEN
INSERT (id, email, doc_type, doc_number)
VALUES (src.id, src.email, src.doc_type, src.doc_number);
```
**Benefits:**
- ✅ Single atomic transaction
- ✅ Handles all conflict types simultaneously
- ✅ Automatic INSERT or UPDATE decision
- ✅ No race conditions or partial updates
### Constraint Detection
The library automatically analyzes your target table to identify:
- **Primary key constraints**: Single or composite primary keys
- **Unique constraints**: Single column unique constraints
- **Composite unique constraints**: Multi-column unique constraints
All constraint types are handled simultaneously in a single operation.
## 🚨 Pros, Cons & Considerations
### ✅ Advantages of Temporary Table + CTE + MERGE Approach
**Performance Benefits:**
- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions
- **Bulk Operations**: High-performance bulk inserts into temporary tables
- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables
- **Minimal Locking**: Temporary tables don't interfere with concurrent operations
**Reliability Benefits:**
- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously
- **Deterministic Results**: Same input always produces same output
- **Automatic Cleanup**: Temporary tables are automatically dropped
- **ACID Compliance**: Full transaction safety and rollback capability
**Data Integrity Benefits:**
- **Ambiguity Detection**: Automatically detects and skips problematic rows
- **Deduplication**: Handles duplicate input data intelligently
- **Constraint Validation**: PostgreSQL validates all constraints during MERGE
### ❌ Limitations and Trade-offs
**Resource Requirements:**
- **Memory Usage**: All input data is staged in temporary tables (memory-resident)
- **Temporary Space**: Requires sufficient temporary storage for staging tables
- **Single-threaded**: No parallel processing (traded for reliability and simplicity)
**PostgreSQL Specifics:**
- **Version Dependency**: MERGE statement requires PostgreSQL 15+
- **Session-based Temp Tables**: Temporary tables are tied to database sessions
**Privilege-related Limitations:**
- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges
- **Shared Hosting**: Some cloud providers restrict temporary table creation
- **Security Policies**: Corporate environments may restrict temporary table usage
**Scale Considerations:**
- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning
- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)
### 🎯 Best Practices
**Memory Management:**
```python
# For large datasets, monitor memory usage
import pandas as pd
# Consider chunking very large datasets manually if needed
def chunk_dataframe(df, chunk_size=50000):
for start in range(0, len(df), chunk_size):
yield df[start:start + chunk_size]
# Process in manageable chunks
for chunk in chunk_dataframe(large_df):
success, rows = upserter.upsert_dataframe(chunk, 'target_table')
print(f"Processed chunk: {rows} rows")
```
**Performance Optimization:**
```python
# Ensure proper indexing on conflict columns
# CREATE INDEX idx_email ON target_table(email);
# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);
# Use debug mode to monitor performance
upserter = PostgresqlUpsert(engine=engine, debug=True)
```
**Error Handling:**
```python
try:
success, affected_rows = upserter.upsert_dataframe(df, 'users')
logger.info(f"Successfully upserted {affected_rows} rows")
except ValueError as e:
logger.error(f"Validation error: {e}")
except Exception as e:
logger.error(f"Upsert failed: {e}")
# Handle rollback - transaction is automatically rolled back
```
## 🤝 Contributing
We welcome contributions! Here's how to get started:
1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
3. **Make your changes** and add tests
4. **Run the test suite**: `pytest tests/ -v`
5. **Submit a pull request**
## 📝 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙋 Support
- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
- **Documentation**: Check the docstrings and test files for detailed usage examples
Raw data
{
"_id": null,
"home_page": "https://github.com/machado000/sqlalchemy-psql-upsert",
"name": "sqlalchemy-psql-upsert",
"maintainer": null,
"docs_url": null,
"requires_python": "<=3.13,>=3.9",
"maintainer_email": null,
"keywords": "postgresql, upsert, sqlalchemy, database, conflict-resolution, bulk-insert, data-migration, pandas",
"author": "Joao Brito",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/6e/89/fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9/sqlalchemy_psql_upsert-1.1.1.tar.gz",
"platform": null,
"description": "# SQLAlchemy PostgreSQL Upsert\n\n[](https://pypi.org/project/sqlalchemy-psql-upsert/)\n[](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)\n[](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n[](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)\n\nA high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using temporary tables and advanced CTE algorithm.\n\n\n## \ud83d\ude80 Features\n\n- **Temporary Table Strategy**: Uses PostgreSQL temporary tables for efficient staging and conflict analysis\n- **Advanced CTE Logic**: Sophisticated Common Table Expression queries for multi-constraint conflict resolution\n- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements\n- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously\n- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data\n- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN values to PostgreSQL NULL values\n- **Schema Validation**: Automatic table and column validation before operations\n- **Comprehensive Logging**: Detailed debug information and progress tracking\n\n\n## \ud83d\udce6 Installation\n\n### Using Poetry (Recommended)\n```bash\npoetry add sqlalchemy-psql-upsert\n```\n\n### Using pip\n```bash\npip install sqlalchemy-psql-upsert\n```\n\n## \u2699\ufe0f Configuration\n\n### Database Privileges Requirements\n\nBesides SELECT, INSERT, UPDATE permissions on target tables, this library requires PostgreSQL `TEMPORARY` privilege to function properly:\n\n**Why Temporary Tables?**\n- **Isolation**: Staging data doesn't interfere with production tables during analysis\n- **Performance**: Bulk operations are faster on temporary tables\n- **Safety**: Failed operations don't leave partial data in target tables\n- **Atomicity**: Entire upsert operation happens in a single transaction\n\n### Environment Variables\n\nCreate a `.env` file or set the following environment variables:\n\n```bash\n# PostgreSQL Configuration\nPGHOST = localhost\nPGPORT = 5432\nPGDATABASE = your_database\nPGUSER = your_username\nPGPASSWORD = your_password\n```\n\n### Configuration Class\n\n```python\nfrom sqlalchemy_psql_upsert import PgConfig\n\n# Default configuration from environment\nconfig = PgConfig()\n\n# Manual configuration\nconfig = PgConfig(\n host=\"localhost\",\n port=\"5432\",\n user=\"myuser\",\n password=\"mypass\",\n dbname=\"mydb\"\n)\n\nprint(config.uri()) # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb\n```\n\n## \ud83d\udee0\ufe0f Quick Start\n\n### Connection Testing\n\n```python\nfrom sqlalchemy_psql_upsert import test_connection\n\n# Test default connection\nsuccess, message = test_connection()\nif success:\n print(\"\u2705 Database connection OK\")\nelse:\n print(f\"\u274c Connection failed: {message}\")\n```\n\n### Privileges Verification\n\n**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.\n\n```python\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\nfrom sqlalchemy import create_engine\n\n# Test connection and privileges\nconfig = PgConfig()\n\ntry:\n # This will automatically test temp table privileges\n upserter = PostgresqlUpsert(config=config)\n print(\"\u2705 Connection and privileges verified successfully\")\n \nexcept PermissionError as e:\n print(f\"\u274c Privilege error: {e}\")\n print(\"Solution: Grant temporary privileges with:\")\n print(\"GRANT TEMPORARY ON DATABASE your_database TO your_user;\")\n \nexcept Exception as e:\n print(f\"\u274c Connection failed: {e}\")\n```\n\n**Grant Required Privileges:**\n```sql\n-- As database administrator, grant temporary table privileges\nGRANT TEMPORARY ON DATABASE your_database TO your_user;\n\n-- Alternatively, grant more comprehensive privileges\nGRANT CREATE ON DATABASE your_database TO your_user;\n```\n\n### Basic Usage\n\n```python\nimport pandas as pd\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\n\n# Configure database connection\nconfig = PgConfig() # Loads from environment variables\nupserter = PostgresqlUpsert(config=config)\n\n# Prepare your data\ndf = pd.DataFrame({\n 'id': [1, 2, 3],\n 'name': ['Alice', 'Bob', 'Charlie'],\n 'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']\n})\n\n# Perform upsert\nsuccess, affected_rows = upserter.upsert_dataframe(\n dataframe=df,\n table_name='users',\n schema='public'\n)\n\nprint(f\"\u2705 Upserted {affected_rows} rows successfully\")\n```\n\n### Advanced Configuration\n\n```python\nfrom sqlalchemy import create_engine\n\n# Using custom SQLAlchemy engine\nengine = create_engine('postgresql://user:pass@localhost:5432/mydb')\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n\n# Upsert with custom schema\nsuccess, affected_rows = upserter.upsert_dataframe(\n dataframe=large_df,\n table_name='products',\n schema='inventory'\n)\n```\n\n### Data Type Handling\n\nThe library automatically handles pandas data type conversions:\n\n```python\nimport pandas as pd\nimport numpy as np\n\n# DataFrame with NaN values\ndf = pd.DataFrame({\n 'id': [1, 2, 3, 4],\n 'name': ['Alice', 'Bob', None, 'David'], # None values\n 'score': [85.5, np.nan, 92.0, 88.1], # NaN values\n 'active': [True, False, None, True] # Mixed types with None\n})\n\n# All NaN and None values are automatically converted to PostgreSQL NULL\nupserter.upsert_dataframe(df, 'users')\n# Result: NaN/None \u2192 NULL in PostgreSQL\n```\n\n## \ud83d\udd0d How It Works\n\n### Overview: Temporary Table + CTE + MERGE Strategy\n\nThis library uses a sophisticated 3-stage approach for reliable, high-performance upserts:\n\n1. **Temporary Table Staging**: Data is first loaded into a PostgreSQL temporary table\n2. **CTE-based Conflict Analysis**: Advanced Common Table Expressions analyze and resolve conflicts\n3. **Atomic MERGE Operation**: Single transaction applies all changes using PostgreSQL MERGE\n\n### Stage 1: Temporary Table Creation\n\n```sql\n-- Creates a temporary table with identical structure to target table\nCREATE TEMP TABLE \"temp_upsert_12345678\" (\n id INTEGER,\n email VARCHAR,\n doc_type VARCHAR,\n doc_number VARCHAR\n);\n\n-- Bulk insert all DataFrame data into temporary table\nINSERT INTO \"temp_upsert_12345678\" VALUES (...);\n```\n\n**Benefits:**\n- \u2705 Isolates staging data from target table\n- \u2705 Enables complex analysis without affecting production data\n- \u2705 Automatic cleanup when session ends\n- \u2705 High-performance bulk inserts\n\n### Stage 2: CTE-based Conflict Analysis\n\nThe library generates sophisticated CTEs to handle complex conflict scenarios:\n\n```sql\nWITH temp_with_conflicts AS (\n -- Analyze each temp row against ALL constraints simultaneously\n SELECT temp.*, temp.ctid,\n COUNT(DISTINCT target.id) AS conflict_targets,\n MAX(target.id) AS conflicted_target_id\n FROM \"temp_upsert_12345678\" temp\n LEFT JOIN \"public\".\"target_table\" target\n ON (temp.id = target.id) OR -- PK conflict\n (temp.email = target.email) OR -- Unique conflict\n (temp.doc_type = target.doc_type AND -- Composite unique conflict\n temp.doc_number = target.doc_number)\n GROUP BY temp.ctid, temp.id, temp.email, temp.doc_type, temp.doc_number\n),\nfiltered AS (\n -- Filter out rows that conflict with multiple target rows (ambiguous)\n SELECT * FROM temp_with_conflicts\n WHERE conflict_targets <= 1\n),\nranked AS (\n -- Deduplicate temp rows targeting the same existing row\n SELECT *,\n ROW_NUMBER() OVER (\n PARTITION BY COALESCE(conflicted_target_id, id)\n ORDER BY ctid DESC\n ) AS row_rank\n FROM filtered\n),\nclean_rows AS (\n -- Final dataset: only the latest row per target\n SELECT id, email, doc_type, doc_number\n FROM ranked\n WHERE row_rank = 1\n)\n-- Ready for MERGE...\n```\n\n**CTE Logic Breakdown:**\n\n1. **`temp_with_conflicts`**: Joins temporary table against target table using ALL constraints simultaneously, counting how many existing rows each temp row conflicts with.\n\n2. **`filtered`**: Removes ambiguous rows that would conflict with multiple existing records (keeps rows with conflict_targets <= 1) - [ _1 to many conflict_ ].\n\n3. **`ranked`**: When multiple temp rows target the same existing record, rank these based on descend insertion order using table ctid - [ _many to 1 conflict_ ].\n\n4. **`clean_rows`**: Keps only the last row inserted for each conflict. Final clean dataset ready for atomic upsert.\n\n### Stage 3: Atomic MERGE Operation\n\n```sql\nMERGE INTO \"public\".\"target_table\" AS tgt\nUSING clean_rows AS src\nON (tgt.id = src.id) OR \n (tgt.email = src.email) OR \n (tgt.doc_type = src.doc_type AND tgt.doc_number = src.doc_number)\nWHEN MATCHED THEN\n UPDATE SET email = src.email, doc_type = src.doc_type, doc_number = src.doc_number\nWHEN NOT MATCHED THEN\n INSERT (id, email, doc_type, doc_number)\n VALUES (src.id, src.email, src.doc_type, src.doc_number);\n```\n\n**Benefits:**\n- \u2705 Single atomic transaction\n- \u2705 Handles all conflict types simultaneously\n- \u2705 Automatic INSERT or UPDATE decision\n- \u2705 No race conditions or partial updates\n\n### Constraint Detection\n\nThe library automatically analyzes your target table to identify:\n- **Primary key constraints**: Single or composite primary keys\n- **Unique constraints**: Single column unique constraints \n- **Composite unique constraints**: Multi-column unique constraints\n\nAll constraint types are handled simultaneously in a single operation.\n\n\n## \ud83d\udea8 Pros, Cons & Considerations\n\n### \u2705 Advantages of Temporary Table + CTE + MERGE Approach\n\n**Performance Benefits:**\n- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions\n- **Bulk Operations**: High-performance bulk inserts into temporary tables\n- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables\n- **Minimal Locking**: Temporary tables don't interfere with concurrent operations\n\n**Reliability Benefits:**\n- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously\n- **Deterministic Results**: Same input always produces same output\n- **Automatic Cleanup**: Temporary tables are automatically dropped\n- **ACID Compliance**: Full transaction safety and rollback capability\n\n**Data Integrity Benefits:**\n- **Ambiguity Detection**: Automatically detects and skips problematic rows\n- **Deduplication**: Handles duplicate input data intelligently\n- **Constraint Validation**: PostgreSQL validates all constraints during MERGE\n\n### \u274c Limitations and Trade-offs\n\n**Resource Requirements:**\n- **Memory Usage**: All input data is staged in temporary tables (memory-resident)\n- **Temporary Space**: Requires sufficient temporary storage for staging tables\n- **Single-threaded**: No parallel processing (traded for reliability and simplicity)\n\n**PostgreSQL Specifics:**\n- **Version Dependency**: MERGE statement requires PostgreSQL 15+\n- **Session-based Temp Tables**: Temporary tables are tied to database sessions\n\n**Privilege-related Limitations:**\n- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges\n- **Shared Hosting**: Some cloud providers restrict temporary table creation\n- **Security Policies**: Corporate environments may restrict temporary table usage\n\n**Scale Considerations:**\n- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning\n- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)\n\n### \ud83c\udfaf Best Practices\n\n**Memory Management:**\n```python\n# For large datasets, monitor memory usage\nimport pandas as pd\n\n# Consider chunking very large datasets manually if needed\ndef chunk_dataframe(df, chunk_size=50000):\n for start in range(0, len(df), chunk_size):\n yield df[start:start + chunk_size]\n\n# Process in manageable chunks\nfor chunk in chunk_dataframe(large_df):\n success, rows = upserter.upsert_dataframe(chunk, 'target_table')\n print(f\"Processed chunk: {rows} rows\")\n```\n\n**Performance Optimization:**\n```python\n# Ensure proper indexing on conflict columns\n# CREATE INDEX idx_email ON target_table(email);\n# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);\n\n# Use debug mode to monitor performance\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n```\n\n**Error Handling:**\n```python\ntry:\n success, affected_rows = upserter.upsert_dataframe(df, 'users')\n logger.info(f\"Successfully upserted {affected_rows} rows\")\nexcept ValueError as e:\n logger.error(f\"Validation error: {e}\")\nexcept Exception as e:\n logger.error(f\"Upsert failed: {e}\")\n # Handle rollback - transaction is automatically rolled back\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Here's how to get started:\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature/amazing-feature`\n3. **Make your changes** and add tests\n4. **Run the test suite**: `pytest tests/ -v`\n5. **Submit a pull request**\n\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4b Support\n\n- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n- **Documentation**: Check the docstrings and test files for detailed usage examples",
"bugtrack_url": null,
"license": "MIT",
"summary": "PostgreSQL upsert library with intelligent conflict resolution and multi-threaded processing",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/machado000/sqlalchemy-psql-upsert",
"Issues": "https://github.com/machado000/sqlalchemy-psql-upsert/issues"
},
"split_keywords": [
"postgresql",
" upsert",
" sqlalchemy",
" database",
" conflict-resolution",
" bulk-insert",
" data-migration",
" pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bbcf8a09126aeaf1c0ec820f407a2580c95952029cc641bf4e4a09c813ff7c17",
"md5": "af281f37ca6aa7fba23134ac49c28def",
"sha256": "c50d53caa38aa0227c46df5157b83dfc21325e3c6a241cce83fdb00d4aceb2ff"
},
"downloads": -1,
"filename": "sqlalchemy_psql_upsert-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "af281f37ca6aa7fba23134ac49c28def",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<=3.13,>=3.9",
"size": 15723,
"upload_time": "2025-07-24T17:18:04",
"upload_time_iso_8601": "2025-07-24T17:18:04.433300Z",
"url": "https://files.pythonhosted.org/packages/bb/cf/8a09126aeaf1c0ec820f407a2580c95952029cc641bf4e4a09c813ff7c17/sqlalchemy_psql_upsert-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e89fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9",
"md5": "b2617fa1054286ad727bcbc77d8fb729",
"sha256": "e3b01395ddb9e78ebe64970b2d043e74261e3ab9affc621a6b212a17ff5c6584"
},
"downloads": -1,
"filename": "sqlalchemy_psql_upsert-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "b2617fa1054286ad727bcbc77d8fb729",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<=3.13,>=3.9",
"size": 17845,
"upload_time": "2025-07-24T17:18:05",
"upload_time_iso_8601": "2025-07-24T17:18:05.894863Z",
"url": "https://files.pythonhosted.org/packages/6e/89/fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9/sqlalchemy_psql_upsert-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 17:18:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "machado000",
"github_project": "sqlalchemy-psql-upsert",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sqlalchemy-psql-upsert"
}