sqlalchemy-psql-upsert


Namesqlalchemy-psql-upsert JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/machado000/sqlalchemy-psql-upsert
SummaryPostgreSQL upsert library with intelligent conflict resolution and multi-threaded processing
upload_time2025-07-24 17:18:05
maintainerNone
docs_urlNone
authorJoao Brito
requires_python<=3.13,>=3.9
licenseMIT
keywords postgresql upsert sqlalchemy database conflict-resolution bulk-insert data-migration pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SQLAlchemy PostgreSQL Upsert

[![PyPI version](https://img.shields.io/pypi/v/sqlalchemy-psql-upsert)](https://pypi.org/project/sqlalchemy-psql-upsert/)
[![License](https://img.shields.io/github/license/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)
[![Issues](https://img.shields.io/github/issues/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
[![Last Commit](https://img.shields.io/github/last-commit/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)

A high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using temporary tables and advanced CTE algorithm.


## 🚀 Features

- **Temporary Table Strategy**: Uses PostgreSQL temporary tables for efficient staging and conflict analysis
- **Advanced CTE Logic**: Sophisticated Common Table Expression queries for multi-constraint conflict resolution
- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements
- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously
- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data
- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN values to PostgreSQL NULL values
- **Schema Validation**: Automatic table and column validation before operations
- **Comprehensive Logging**: Detailed debug information and progress tracking


## 📦 Installation

### Using Poetry (Recommended)
```bash
poetry add sqlalchemy-psql-upsert
```

### Using pip
```bash
pip install sqlalchemy-psql-upsert
```

## ⚙️ Configuration

### Database Privileges Requirements

Besides SELECT, INSERT, UPDATE permissions on target tables, this library requires  PostgreSQL `TEMPORARY` privilege to function properly:

**Why Temporary Tables?**
- **Isolation**: Staging data doesn't interfere with production tables during analysis
- **Performance**: Bulk operations are faster on temporary tables
- **Safety**: Failed operations don't leave partial data in target tables
- **Atomicity**: Entire upsert operation happens in a single transaction

### Environment Variables

Create a `.env` file or set the following environment variables:

```bash
# PostgreSQL Configuration
PGHOST = localhost
PGPORT = 5432
PGDATABASE = your_database
PGUSER = your_username
PGPASSWORD = your_password
```

### Configuration Class

```python
from sqlalchemy_psql_upsert import PgConfig

# Default configuration from environment
config = PgConfig()

# Manual configuration
config = PgConfig(
    host="localhost",
    port="5432",
    user="myuser",
    password="mypass",
    dbname="mydb"
)

print(config.uri())  # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb
```

## 🛠️ Quick Start

### Connection Testing

```python
from sqlalchemy_psql_upsert import test_connection

# Test default connection
success, message = test_connection()
if success:
    print("✅ Database connection OK")
else:
    print(f"❌ Connection failed: {message}")
```

### Privileges Verification

**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.

```python
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig
from sqlalchemy import create_engine

# Test connection and privileges
config = PgConfig()

try:
    # This will automatically test temp table privileges
    upserter = PostgresqlUpsert(config=config)
    print("✅ Connection and privileges verified successfully")
    
except PermissionError as e:
    print(f"❌ Privilege error: {e}")
    print("Solution: Grant temporary privileges with:")
    print("GRANT TEMPORARY ON DATABASE your_database TO your_user;")
    
except Exception as e:
    print(f"❌ Connection failed: {e}")
```

**Grant Required Privileges:**
```sql
-- As database administrator, grant temporary table privileges
GRANT TEMPORARY ON DATABASE your_database TO your_user;

-- Alternatively, grant more comprehensive privileges
GRANT CREATE ON DATABASE your_database TO your_user;
```

### Basic Usage

```python
import pandas as pd
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig

# Configure database connection
config = PgConfig()  # Loads from environment variables
upserter = PostgresqlUpsert(config=config)

# Prepare your data
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']
})

# Perform upsert
success, affected_rows = upserter.upsert_dataframe(
    dataframe=df,
    table_name='users',
    schema='public'
)

print(f"✅ Upserted {affected_rows} rows successfully")
```

### Advanced Configuration

```python
from sqlalchemy import create_engine

# Using custom SQLAlchemy engine
engine = create_engine('postgresql://user:pass@localhost:5432/mydb')
upserter = PostgresqlUpsert(engine=engine, debug=True)

# Upsert with custom schema
success, affected_rows = upserter.upsert_dataframe(
    dataframe=large_df,
    table_name='products',
    schema='inventory'
)
```

### Data Type Handling

The library automatically handles pandas data type conversions:

```python
import pandas as pd
import numpy as np

# DataFrame with NaN values
df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', None, 'David'],      # None values
    'score': [85.5, np.nan, 92.0, 88.1],         # NaN values
    'active': [True, False, None, True]           # Mixed types with None
})

# All NaN and None values are automatically converted to PostgreSQL NULL
upserter.upsert_dataframe(df, 'users')
# Result: NaN/None → NULL in PostgreSQL
```

## 🔍 How It Works

### Overview: Temporary Table + CTE + MERGE Strategy

This library uses a sophisticated 3-stage approach for reliable, high-performance upserts:

1. **Temporary Table Staging**: Data is first loaded into a PostgreSQL temporary table
2. **CTE-based Conflict Analysis**: Advanced Common Table Expressions analyze and resolve conflicts
3. **Atomic MERGE Operation**: Single transaction applies all changes using PostgreSQL MERGE

### Stage 1: Temporary Table Creation

```sql
-- Creates a temporary table with identical structure to target table
CREATE TEMP TABLE "temp_upsert_12345678" (
    id INTEGER,
    email VARCHAR,
    doc_type VARCHAR,
    doc_number VARCHAR
);

-- Bulk insert all DataFrame data into temporary table
INSERT INTO "temp_upsert_12345678" VALUES (...);
```

**Benefits:**
- ✅ Isolates staging data from target table
- ✅ Enables complex analysis without affecting production data
- ✅ Automatic cleanup when session ends
- ✅ High-performance bulk inserts

### Stage 2: CTE-based Conflict Analysis

The library generates sophisticated CTEs to handle complex conflict scenarios:

```sql
WITH temp_with_conflicts AS (
    -- Analyze each temp row against ALL constraints simultaneously
    SELECT temp.*, temp.ctid,
           COUNT(DISTINCT target.id) AS conflict_targets,
           MAX(target.id) AS conflicted_target_id
    FROM "temp_upsert_12345678" temp
    LEFT JOIN "public"."target_table" target
      ON (temp.id = target.id) OR              -- PK conflict
         (temp.email = target.email) OR        -- Unique conflict
         (temp.doc_type = target.doc_type AND  -- Composite unique conflict
          temp.doc_number = target.doc_number)
    GROUP BY temp.ctid, temp.id, temp.email, temp.doc_type, temp.doc_number
),
filtered AS (
    -- Filter out rows that conflict with multiple target rows (ambiguous)
    SELECT * FROM temp_with_conflicts
    WHERE conflict_targets <= 1
),
ranked AS (
    -- Deduplicate temp rows targeting the same existing row
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY COALESCE(conflicted_target_id, id)
               ORDER BY ctid DESC
           ) AS row_rank
    FROM filtered
),
clean_rows AS (
    -- Final dataset: only the latest row per target
    SELECT id, email, doc_type, doc_number
    FROM ranked
    WHERE row_rank = 1
)
-- Ready for MERGE...
```

**CTE Logic Breakdown:**

1. **`temp_with_conflicts`**: Joins temporary table against target table using ALL constraints simultaneously, counting how many existing rows each temp row conflicts with.

2. **`filtered`**: Removes ambiguous rows that would conflict with multiple existing records (keeps rows with conflict_targets <= 1) - [ _1 to many conflict_ ].

3. **`ranked`**: When multiple temp rows target the same existing record, rank these based on descend insertion order using table ctid - [ _many to 1 conflict_ ].

4. **`clean_rows`**: Keps only the last row inserted for each conflict. Final clean dataset ready for atomic upsert.

### Stage 3: Atomic MERGE Operation

```sql
MERGE INTO "public"."target_table" AS tgt
USING clean_rows AS src
ON (tgt.id = src.id) OR 
   (tgt.email = src.email) OR 
   (tgt.doc_type = src.doc_type AND tgt.doc_number = src.doc_number)
WHEN MATCHED THEN
    UPDATE SET email = src.email, doc_type = src.doc_type, doc_number = src.doc_number
WHEN NOT MATCHED THEN
    INSERT (id, email, doc_type, doc_number)
    VALUES (src.id, src.email, src.doc_type, src.doc_number);
```

**Benefits:**
- ✅ Single atomic transaction
- ✅ Handles all conflict types simultaneously
- ✅ Automatic INSERT or UPDATE decision
- ✅ No race conditions or partial updates

### Constraint Detection

The library automatically analyzes your target table to identify:
- **Primary key constraints**: Single or composite primary keys
- **Unique constraints**: Single column unique constraints  
- **Composite unique constraints**: Multi-column unique constraints

All constraint types are handled simultaneously in a single operation.


## 🚨 Pros, Cons & Considerations

### ✅ Advantages of Temporary Table + CTE + MERGE Approach

**Performance Benefits:**
- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions
- **Bulk Operations**: High-performance bulk inserts into temporary tables
- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables
- **Minimal Locking**: Temporary tables don't interfere with concurrent operations

**Reliability Benefits:**
- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously
- **Deterministic Results**: Same input always produces same output
- **Automatic Cleanup**: Temporary tables are automatically dropped
- **ACID Compliance**: Full transaction safety and rollback capability

**Data Integrity Benefits:**
- **Ambiguity Detection**: Automatically detects and skips problematic rows
- **Deduplication**: Handles duplicate input data intelligently
- **Constraint Validation**: PostgreSQL validates all constraints during MERGE

### ❌ Limitations and Trade-offs

**Resource Requirements:**
- **Memory Usage**: All input data is staged in temporary tables (memory-resident)
- **Temporary Space**: Requires sufficient temporary storage for staging tables
- **Single-threaded**: No parallel processing (traded for reliability and simplicity)

**PostgreSQL Specifics:**
- **Version Dependency**: MERGE statement requires PostgreSQL 15+
- **Session-based Temp Tables**: Temporary tables are tied to database sessions

**Privilege-related Limitations:**
- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges
- **Shared Hosting**: Some cloud providers restrict temporary table creation
- **Security Policies**: Corporate environments may restrict temporary table usage

**Scale Considerations:**
- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning
- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)

### 🎯 Best Practices

**Memory Management:**
```python
# For large datasets, monitor memory usage
import pandas as pd

# Consider chunking very large datasets manually if needed
def chunk_dataframe(df, chunk_size=50000):
    for start in range(0, len(df), chunk_size):
        yield df[start:start + chunk_size]

# Process in manageable chunks
for chunk in chunk_dataframe(large_df):
    success, rows = upserter.upsert_dataframe(chunk, 'target_table')
    print(f"Processed chunk: {rows} rows")
```

**Performance Optimization:**
```python
# Ensure proper indexing on conflict columns
# CREATE INDEX idx_email ON target_table(email);
# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);

# Use debug mode to monitor performance
upserter = PostgresqlUpsert(engine=engine, debug=True)
```

**Error Handling:**
```python
try:
    success, affected_rows = upserter.upsert_dataframe(df, 'users')
    logger.info(f"Successfully upserted {affected_rows} rows")
except ValueError as e:
    logger.error(f"Validation error: {e}")
except Exception as e:
    logger.error(f"Upsert failed: {e}")
    # Handle rollback - transaction is automatically rolled back
```

## 🤝 Contributing

We welcome contributions! Here's how to get started:

1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
3. **Make your changes** and add tests
4. **Run the test suite**: `pytest tests/ -v`
5. **Submit a pull request**


## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙋 Support

- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
- **Documentation**: Check the docstrings and test files for detailed usage examples
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/machado000/sqlalchemy-psql-upsert",
    "name": "sqlalchemy-psql-upsert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "postgresql, upsert, sqlalchemy, database, conflict-resolution, bulk-insert, data-migration, pandas",
    "author": "Joao Brito",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6e/89/fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9/sqlalchemy_psql_upsert-1.1.1.tar.gz",
    "platform": null,
    "description": "# SQLAlchemy PostgreSQL Upsert\n\n[![PyPI version](https://img.shields.io/pypi/v/sqlalchemy-psql-upsert)](https://pypi.org/project/sqlalchemy-psql-upsert/)\n[![License](https://img.shields.io/github/license/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)\n[![Issues](https://img.shields.io/github/issues/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n[![Last Commit](https://img.shields.io/github/last-commit/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)\n\nA high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using temporary tables and advanced CTE algorithm.\n\n\n## \ud83d\ude80 Features\n\n- **Temporary Table Strategy**: Uses PostgreSQL temporary tables for efficient staging and conflict analysis\n- **Advanced CTE Logic**: Sophisticated Common Table Expression queries for multi-constraint conflict resolution\n- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements\n- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously\n- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data\n- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN values to PostgreSQL NULL values\n- **Schema Validation**: Automatic table and column validation before operations\n- **Comprehensive Logging**: Detailed debug information and progress tracking\n\n\n## \ud83d\udce6 Installation\n\n### Using Poetry (Recommended)\n```bash\npoetry add sqlalchemy-psql-upsert\n```\n\n### Using pip\n```bash\npip install sqlalchemy-psql-upsert\n```\n\n## \u2699\ufe0f Configuration\n\n### Database Privileges Requirements\n\nBesides SELECT, INSERT, UPDATE permissions on target tables, this library requires  PostgreSQL `TEMPORARY` privilege to function properly:\n\n**Why Temporary Tables?**\n- **Isolation**: Staging data doesn't interfere with production tables during analysis\n- **Performance**: Bulk operations are faster on temporary tables\n- **Safety**: Failed operations don't leave partial data in target tables\n- **Atomicity**: Entire upsert operation happens in a single transaction\n\n### Environment Variables\n\nCreate a `.env` file or set the following environment variables:\n\n```bash\n# PostgreSQL Configuration\nPGHOST = localhost\nPGPORT = 5432\nPGDATABASE = your_database\nPGUSER = your_username\nPGPASSWORD = your_password\n```\n\n### Configuration Class\n\n```python\nfrom sqlalchemy_psql_upsert import PgConfig\n\n# Default configuration from environment\nconfig = PgConfig()\n\n# Manual configuration\nconfig = PgConfig(\n    host=\"localhost\",\n    port=\"5432\",\n    user=\"myuser\",\n    password=\"mypass\",\n    dbname=\"mydb\"\n)\n\nprint(config.uri())  # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb\n```\n\n## \ud83d\udee0\ufe0f Quick Start\n\n### Connection Testing\n\n```python\nfrom sqlalchemy_psql_upsert import test_connection\n\n# Test default connection\nsuccess, message = test_connection()\nif success:\n    print(\"\u2705 Database connection OK\")\nelse:\n    print(f\"\u274c Connection failed: {message}\")\n```\n\n### Privileges Verification\n\n**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.\n\n```python\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\nfrom sqlalchemy import create_engine\n\n# Test connection and privileges\nconfig = PgConfig()\n\ntry:\n    # This will automatically test temp table privileges\n    upserter = PostgresqlUpsert(config=config)\n    print(\"\u2705 Connection and privileges verified successfully\")\n    \nexcept PermissionError as e:\n    print(f\"\u274c Privilege error: {e}\")\n    print(\"Solution: Grant temporary privileges with:\")\n    print(\"GRANT TEMPORARY ON DATABASE your_database TO your_user;\")\n    \nexcept Exception as e:\n    print(f\"\u274c Connection failed: {e}\")\n```\n\n**Grant Required Privileges:**\n```sql\n-- As database administrator, grant temporary table privileges\nGRANT TEMPORARY ON DATABASE your_database TO your_user;\n\n-- Alternatively, grant more comprehensive privileges\nGRANT CREATE ON DATABASE your_database TO your_user;\n```\n\n### Basic Usage\n\n```python\nimport pandas as pd\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\n\n# Configure database connection\nconfig = PgConfig()  # Loads from environment variables\nupserter = PostgresqlUpsert(config=config)\n\n# Prepare your data\ndf = pd.DataFrame({\n    'id': [1, 2, 3],\n    'name': ['Alice', 'Bob', 'Charlie'],\n    'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']\n})\n\n# Perform upsert\nsuccess, affected_rows = upserter.upsert_dataframe(\n    dataframe=df,\n    table_name='users',\n    schema='public'\n)\n\nprint(f\"\u2705 Upserted {affected_rows} rows successfully\")\n```\n\n### Advanced Configuration\n\n```python\nfrom sqlalchemy import create_engine\n\n# Using custom SQLAlchemy engine\nengine = create_engine('postgresql://user:pass@localhost:5432/mydb')\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n\n# Upsert with custom schema\nsuccess, affected_rows = upserter.upsert_dataframe(\n    dataframe=large_df,\n    table_name='products',\n    schema='inventory'\n)\n```\n\n### Data Type Handling\n\nThe library automatically handles pandas data type conversions:\n\n```python\nimport pandas as pd\nimport numpy as np\n\n# DataFrame with NaN values\ndf = pd.DataFrame({\n    'id': [1, 2, 3, 4],\n    'name': ['Alice', 'Bob', None, 'David'],      # None values\n    'score': [85.5, np.nan, 92.0, 88.1],         # NaN values\n    'active': [True, False, None, True]           # Mixed types with None\n})\n\n# All NaN and None values are automatically converted to PostgreSQL NULL\nupserter.upsert_dataframe(df, 'users')\n# Result: NaN/None \u2192 NULL in PostgreSQL\n```\n\n## \ud83d\udd0d How It Works\n\n### Overview: Temporary Table + CTE + MERGE Strategy\n\nThis library uses a sophisticated 3-stage approach for reliable, high-performance upserts:\n\n1. **Temporary Table Staging**: Data is first loaded into a PostgreSQL temporary table\n2. **CTE-based Conflict Analysis**: Advanced Common Table Expressions analyze and resolve conflicts\n3. **Atomic MERGE Operation**: Single transaction applies all changes using PostgreSQL MERGE\n\n### Stage 1: Temporary Table Creation\n\n```sql\n-- Creates a temporary table with identical structure to target table\nCREATE TEMP TABLE \"temp_upsert_12345678\" (\n    id INTEGER,\n    email VARCHAR,\n    doc_type VARCHAR,\n    doc_number VARCHAR\n);\n\n-- Bulk insert all DataFrame data into temporary table\nINSERT INTO \"temp_upsert_12345678\" VALUES (...);\n```\n\n**Benefits:**\n- \u2705 Isolates staging data from target table\n- \u2705 Enables complex analysis without affecting production data\n- \u2705 Automatic cleanup when session ends\n- \u2705 High-performance bulk inserts\n\n### Stage 2: CTE-based Conflict Analysis\n\nThe library generates sophisticated CTEs to handle complex conflict scenarios:\n\n```sql\nWITH temp_with_conflicts AS (\n    -- Analyze each temp row against ALL constraints simultaneously\n    SELECT temp.*, temp.ctid,\n           COUNT(DISTINCT target.id) AS conflict_targets,\n           MAX(target.id) AS conflicted_target_id\n    FROM \"temp_upsert_12345678\" temp\n    LEFT JOIN \"public\".\"target_table\" target\n      ON (temp.id = target.id) OR              -- PK conflict\n         (temp.email = target.email) OR        -- Unique conflict\n         (temp.doc_type = target.doc_type AND  -- Composite unique conflict\n          temp.doc_number = target.doc_number)\n    GROUP BY temp.ctid, temp.id, temp.email, temp.doc_type, temp.doc_number\n),\nfiltered AS (\n    -- Filter out rows that conflict with multiple target rows (ambiguous)\n    SELECT * FROM temp_with_conflicts\n    WHERE conflict_targets <= 1\n),\nranked AS (\n    -- Deduplicate temp rows targeting the same existing row\n    SELECT *,\n           ROW_NUMBER() OVER (\n               PARTITION BY COALESCE(conflicted_target_id, id)\n               ORDER BY ctid DESC\n           ) AS row_rank\n    FROM filtered\n),\nclean_rows AS (\n    -- Final dataset: only the latest row per target\n    SELECT id, email, doc_type, doc_number\n    FROM ranked\n    WHERE row_rank = 1\n)\n-- Ready for MERGE...\n```\n\n**CTE Logic Breakdown:**\n\n1. **`temp_with_conflicts`**: Joins temporary table against target table using ALL constraints simultaneously, counting how many existing rows each temp row conflicts with.\n\n2. **`filtered`**: Removes ambiguous rows that would conflict with multiple existing records (keeps rows with conflict_targets <= 1) - [ _1 to many conflict_ ].\n\n3. **`ranked`**: When multiple temp rows target the same existing record, rank these based on descend insertion order using table ctid - [ _many to 1 conflict_ ].\n\n4. **`clean_rows`**: Keps only the last row inserted for each conflict. Final clean dataset ready for atomic upsert.\n\n### Stage 3: Atomic MERGE Operation\n\n```sql\nMERGE INTO \"public\".\"target_table\" AS tgt\nUSING clean_rows AS src\nON (tgt.id = src.id) OR \n   (tgt.email = src.email) OR \n   (tgt.doc_type = src.doc_type AND tgt.doc_number = src.doc_number)\nWHEN MATCHED THEN\n    UPDATE SET email = src.email, doc_type = src.doc_type, doc_number = src.doc_number\nWHEN NOT MATCHED THEN\n    INSERT (id, email, doc_type, doc_number)\n    VALUES (src.id, src.email, src.doc_type, src.doc_number);\n```\n\n**Benefits:**\n- \u2705 Single atomic transaction\n- \u2705 Handles all conflict types simultaneously\n- \u2705 Automatic INSERT or UPDATE decision\n- \u2705 No race conditions or partial updates\n\n### Constraint Detection\n\nThe library automatically analyzes your target table to identify:\n- **Primary key constraints**: Single or composite primary keys\n- **Unique constraints**: Single column unique constraints  \n- **Composite unique constraints**: Multi-column unique constraints\n\nAll constraint types are handled simultaneously in a single operation.\n\n\n## \ud83d\udea8 Pros, Cons & Considerations\n\n### \u2705 Advantages of Temporary Table + CTE + MERGE Approach\n\n**Performance Benefits:**\n- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions\n- **Bulk Operations**: High-performance bulk inserts into temporary tables\n- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables\n- **Minimal Locking**: Temporary tables don't interfere with concurrent operations\n\n**Reliability Benefits:**\n- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously\n- **Deterministic Results**: Same input always produces same output\n- **Automatic Cleanup**: Temporary tables are automatically dropped\n- **ACID Compliance**: Full transaction safety and rollback capability\n\n**Data Integrity Benefits:**\n- **Ambiguity Detection**: Automatically detects and skips problematic rows\n- **Deduplication**: Handles duplicate input data intelligently\n- **Constraint Validation**: PostgreSQL validates all constraints during MERGE\n\n### \u274c Limitations and Trade-offs\n\n**Resource Requirements:**\n- **Memory Usage**: All input data is staged in temporary tables (memory-resident)\n- **Temporary Space**: Requires sufficient temporary storage for staging tables\n- **Single-threaded**: No parallel processing (traded for reliability and simplicity)\n\n**PostgreSQL Specifics:**\n- **Version Dependency**: MERGE statement requires PostgreSQL 15+\n- **Session-based Temp Tables**: Temporary tables are tied to database sessions\n\n**Privilege-related Limitations:**\n- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges\n- **Shared Hosting**: Some cloud providers restrict temporary table creation\n- **Security Policies**: Corporate environments may restrict temporary table usage\n\n**Scale Considerations:**\n- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning\n- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)\n\n### \ud83c\udfaf Best Practices\n\n**Memory Management:**\n```python\n# For large datasets, monitor memory usage\nimport pandas as pd\n\n# Consider chunking very large datasets manually if needed\ndef chunk_dataframe(df, chunk_size=50000):\n    for start in range(0, len(df), chunk_size):\n        yield df[start:start + chunk_size]\n\n# Process in manageable chunks\nfor chunk in chunk_dataframe(large_df):\n    success, rows = upserter.upsert_dataframe(chunk, 'target_table')\n    print(f\"Processed chunk: {rows} rows\")\n```\n\n**Performance Optimization:**\n```python\n# Ensure proper indexing on conflict columns\n# CREATE INDEX idx_email ON target_table(email);\n# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);\n\n# Use debug mode to monitor performance\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n```\n\n**Error Handling:**\n```python\ntry:\n    success, affected_rows = upserter.upsert_dataframe(df, 'users')\n    logger.info(f\"Successfully upserted {affected_rows} rows\")\nexcept ValueError as e:\n    logger.error(f\"Validation error: {e}\")\nexcept Exception as e:\n    logger.error(f\"Upsert failed: {e}\")\n    # Handle rollback - transaction is automatically rolled back\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Here's how to get started:\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature/amazing-feature`\n3. **Make your changes** and add tests\n4. **Run the test suite**: `pytest tests/ -v`\n5. **Submit a pull request**\n\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4b Support\n\n- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n- **Documentation**: Check the docstrings and test files for detailed usage examples",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "PostgreSQL upsert library with intelligent conflict resolution and multi-threaded processing",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/machado000/sqlalchemy-psql-upsert",
        "Issues": "https://github.com/machado000/sqlalchemy-psql-upsert/issues"
    },
    "split_keywords": [
        "postgresql",
        " upsert",
        " sqlalchemy",
        " database",
        " conflict-resolution",
        " bulk-insert",
        " data-migration",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbcf8a09126aeaf1c0ec820f407a2580c95952029cc641bf4e4a09c813ff7c17",
                "md5": "af281f37ca6aa7fba23134ac49c28def",
                "sha256": "c50d53caa38aa0227c46df5157b83dfc21325e3c6a241cce83fdb00d4aceb2ff"
            },
            "downloads": -1,
            "filename": "sqlalchemy_psql_upsert-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "af281f37ca6aa7fba23134ac49c28def",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.13,>=3.9",
            "size": 15723,
            "upload_time": "2025-07-24T17:18:04",
            "upload_time_iso_8601": "2025-07-24T17:18:04.433300Z",
            "url": "https://files.pythonhosted.org/packages/bb/cf/8a09126aeaf1c0ec820f407a2580c95952029cc641bf4e4a09c813ff7c17/sqlalchemy_psql_upsert-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e89fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9",
                "md5": "b2617fa1054286ad727bcbc77d8fb729",
                "sha256": "e3b01395ddb9e78ebe64970b2d043e74261e3ab9affc621a6b212a17ff5c6584"
            },
            "downloads": -1,
            "filename": "sqlalchemy_psql_upsert-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "b2617fa1054286ad727bcbc77d8fb729",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.13,>=3.9",
            "size": 17845,
            "upload_time": "2025-07-24T17:18:05",
            "upload_time_iso_8601": "2025-07-24T17:18:05.894863Z",
            "url": "https://files.pythonhosted.org/packages/6e/89/fd66949b36891e3b53cfeacc8e37311946ce859c78a33a9ad20cf275daa9/sqlalchemy_psql_upsert-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 17:18:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "machado000",
    "github_project": "sqlalchemy-psql-upsert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sqlalchemy-psql-upsert"
}
        
Elapsed time: 2.90386s