sqlalchemy-psql-upsert


Namesqlalchemy-psql-upsert JSON
Version 1.2.2 PyPI version JSON
download
home_pagehttps://github.com/machado000/sqlalchemy-psql-upsert
SummaryPostgreSQL upsert library with intelligent conflict resolution and multi-threaded processing
upload_time2025-08-28 15:58:49
maintainerNone
docs_urlNone
authorJoao Brito
requires_python<=3.13,>=3.10
licenseGPL-3.0
keywords postgresql upsert sqlalchemy database conflict-resolution bulk-insert data-migration pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SQLAlchemy PostgreSQL Upsert

[![PyPI version](https://img.shields.io/pypi/v/sqlalchemy-psql-upsert)](https://pypi.org/project/sqlalchemy-psql-upsert/)
[![License](https://img.shields.io/github/license/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)
[![Issues](https://img.shields.io/github/issues/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
[![Last Commit](https://img.shields.io/github/last-commit/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)
[![CI](https://github.com/machado000/sqlalchemy-psql-upsert/actions/workflows/ci.yml/badge.svg)](https://github.com/machado000/sqlalchemy-psql-upsert/actions/workflows/ci.yml)


A high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using PostgreSQL temporary tables and atomic MERGE statements. Designed for reliability, data integrity, and modern Python development with comprehensive type safety.


## 🚀 Features

- **Temporary Table Staging**: Uses PostgreSQL temporary tables for efficient, isolated staging and conflict analysis
- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements for reliability and performance
- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously
- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data
- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN/None values to PostgreSQL NULL values
- **Schema Validation**: Automatic table and column validation before operations
- **Comprehensive Logging**: Detailed debug information and progress tracking
- **Modern Typing**: Fully typed with Python 3.10+ type hints and strict mypy compliance
- **Type Safety**: 100% type coverage with comprehensive type annotations


## 📦 Installation

### Using Poetry (Recommended)
```bash
poetry add sqlalchemy-psql-upsert
```

### Using pip
```bash
pip install sqlalchemy-psql-upsert
```

## ⚙️ Configuration

### Database Privileges Requirements

Besides SELECT, INSERT, UPDATE permissions on target tables, this library requires PostgreSQL `TEMPORARY` privilege to function properly:

**Why Temporary Tables?**
- **Isolation**: Staging data doesn't interfere with production tables during analysis
- **Performance**: Bulk operations are faster on temporary tables
- **Safety**: Failed operations don't leave partial data in target tables
- **Atomicity**: Entire upsert operation happens in a single transaction

### Environment Variables

Create a `.env` file or set the following environment variables:

```bash
# PostgreSQL Configuration
PGHOST = localhost
PGPORT = 5432
PGDATABASE = your_database
PGUSER = your_username
PGPASSWORD = your_password
```

### Configuration Class

```python
from sqlalchemy_psql_upsert import PgConfig

# Default configuration from environment
config = PgConfig()

# Manual configuration
config = PgConfig(
    host="localhost",
    port="5432",
    user="myuser",
    password="mypass",
    dbname="mydb"
)

print(config.uri())  # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb
```

## 🛠️ Quick Start

### Connection Testing

```python
from sqlalchemy_psql_upsert import test_connection

# Test default connection
success, message = test_connection()
if success:
    print("✅ Database connection OK")
else:
    print(f"❌ Connection failed: {message}")
```

### Privileges Verification

**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.

```python
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig
from sqlalchemy import create_engine

# Test connection and privileges
config = PgConfig()

try:
    # This will automatically test temp table privileges
    upserter = PostgresqlUpsert(config=config)
    print("✅ Connection and privileges verified successfully")
    
except PermissionError as e:
    print(f"❌ Privilege error: {e}")
    print("Solution: Grant temporary privileges with:")
    print("GRANT TEMPORARY ON DATABASE your_database TO your_user;")
    
except Exception as e:
    print(f"❌ Connection failed: {e}")
```

**Grant Required Privileges:**
```sql
-- As database administrator, grant temporary table privileges
GRANT TEMPORARY ON DATABASE your_database TO your_user;

-- Alternatively, grant more comprehensive privileges
GRANT CREATE ON DATABASE your_database TO your_user;
```

### Basic Usage

```python
import pandas as pd
from sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig

# Configure database connection
config = PgConfig()  # Loads from environment variables
upserter = PostgresqlUpsert(config=config)

# Prepare your data
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']
})

# Perform upsert
affected_rows = upserter.upsert_dataframe(
    dataframe=df,
    table_name='users',
    schema='public'
)

print(f"✅ Upserted {affected_rows} rows successfully")

# Get detailed information about skipped rows
affected_rows, skipped_df = upserter.upsert_dataframe(
    dataframe=df,
    table_name='users', 
    schema='public',
    return_skipped=True
)

print(f"✅ Upserted {affected_rows} rows, {len(skipped_df)} rows skipped")
if not skipped_df.empty:
    print("Skipped rows:", skipped_df[['skip_reason']].value_counts())
```

### Advanced Configuration

```python
from sqlalchemy import create_engine

# Using custom SQLAlchemy engine
engine = create_engine('postgresql://user:pass@localhost:5432/mydb')
upserter = PostgresqlUpsert(engine=engine, debug=True)

# Upsert with custom schema
affected_rows = upserter.upsert_dataframe(
    dataframe=large_df,
    table_name='products',
    schema='inventory'
)
```

### Data Type Handling

The library automatically handles pandas data type conversions:

```python
import pandas as pd
import numpy as np

# DataFrame with NaN values
df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', None, 'David'],      # None values
    'score': [85.5, np.nan, 92.0, 88.1],         # NaN values
    'active': [True, False, None, True]           # Mixed types with None
})

# All NaN and None values are automatically converted to PostgreSQL NULL
upserter.upsert_dataframe(df, 'users')
# Result: NaN/None → NULL in PostgreSQL
```

## 🔍 How It Works


### Detailed Upsert Workflow

This library uses a robust multi-step approach for reliable, high-performance upserts:

1. **Temporary Table Staging**
   - A temporary table is created with the same structure as the target table, plus a `skip_reason` column.
   - All input DataFrame rows are bulk inserted into this table.

2. **Multi-Row Conflict Detection**
   - Each row in the temp table is checked for conflicts against the target table's constraints (PK, unique, composite unique).
   - Rows that would conflict with more than one target row are marked with `skip_reason = 'multiple_target_conflicts'`.

3. **Ranking and Deduplication**
   - For rows that only conflict with a single target row, the library checks for duplicate source rows (multiple input rows targeting the same record).
   - Only the last row for each target is kept; others are marked with `skip_reason = 'duplicate_source_rows'`.

4. **Constructing the Clean Rows Table**
   - A second temporary table (the "clean table") is created with the same structure as the target table.
   - All rows from the raw temp table where `skip_reason IS NULL` are copied into the clean table.

5. **Merging Clean Table into Target Table**
   - A single, atomic PostgreSQL `MERGE` statement is executed:
     - If a row in the clean table matches a row in the target table (by any constraint), it is updated.
     - If there is no match, the row is inserted.

6. **Fetching and Returning Conflicts**
   - After the merge, all rows from the raw temp table where `skip_reason IS NOT NULL` are fetched and returned to the user, along with the reason they were skipped.

This process ensures only valid, deduplicated rows are upserted, and all conflicts are tracked and reported for further review.


### Constraint Detection

The library automatically analyzes your target table to identify:
- **Primary key constraints**: Single or composite primary keys
- **Unique constraints**: Single column unique constraints
- **Composite unique constraints**: Multi-column unique constraints

All constraint types are handled simultaneously in a single operation.


## 🚨 Pros, Cons & Considerations

### ✅ Advantages of Temporary Table + CTE + MERGE Approach

**Performance Benefits:**
- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions
- **Bulk Operations**: High-performance bulk inserts into temporary tables
- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables
- **Minimal Locking**: Temporary tables don't interfere with concurrent operations

**Reliability Benefits:**
- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously
- **Deterministic Results**: Same input always produces same output
- **Automatic Cleanup**: Temporary tables are automatically dropped
- **ACID Compliance**: Full transaction safety and rollback capability

**Data Integrity Benefits:**
- **Ambiguity Detection**: Automatically detects and skips problematic rows
- **Deduplication**: Handles duplicate input data intelligently
- **Constraint Validation**: PostgreSQL validates all constraints during MERGE

### ❌ Limitations and Trade-offs

**Resource Requirements:**
- **Memory Usage**: All input data is staged in temporary tables (memory-resident)
- **Temporary Space**: Requires sufficient temporary storage for staging tables
- **Single-threaded**: No parallel processing (traded for reliability and simplicity)

**PostgreSQL Specifics:**
- **Version Dependency**: MERGE statement requires PostgreSQL 15+
- **Session-based Temp Tables**: Temporary tables are tied to database sessions

**Privilege-related Limitations:**
- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges
- **Shared Hosting**: Some cloud providers restrict temporary table creation
- **Security Policies**: Corporate environments may restrict temporary table usage

**Scale Considerations:**
- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning
- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)

### 🎯 Best Practices

**Memory Management:**
```python
# For large datasets, monitor memory usage
import pandas as pd

# Consider chunking very large datasets manually if needed
def chunk_dataframe(df, chunk_size=50000):
    for start in range(0, len(df), chunk_size):
        yield df[start:start + chunk_size]

# Process in manageable chunks
for chunk in chunk_dataframe(large_df):
    affected_rows = upserter.upsert_dataframe(chunk, 'target_table')
    print(f"Processed chunk: {affected_rows} rows")
```

**Performance Optimization:**
```python
# Ensure proper indexing on conflict columns
# CREATE INDEX idx_email ON target_table(email);
# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);

# Use debug mode to monitor performance
upserter = PostgresqlUpsert(engine=engine, debug=True)
```

**Error Handling:**
```python
try:
    affected_rows = upserter.upsert_dataframe(df, 'users')
    logger.info(f"Successfully upserted {affected_rows} rows")
except ValueError as e:
    logger.error(f"Validation error: {e}")
except Exception as e:
    logger.error(f"Upsert failed: {e}")
    # Handle rollback - transaction is automatically rolled back
```

## 🤝 Contributing

We welcome contributions! Here's how to get started:

1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
3. **Make your changes** and add tests
4. **Run the test suite**: `pytest tests/ -v`
5. **Submit a pull request**


## 📝 License

This project is licensed under the GPL v3 License - see the [LICENSE](LICENSE) file for details.

## 🙋 Support

- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)
- **Documentation**: Check the docstrings and test files for detailed usage examples
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/machado000/sqlalchemy-psql-upsert",
    "name": "sqlalchemy-psql-upsert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "postgresql, upsert, sqlalchemy, database, conflict-resolution, bulk-insert, data-migration, pandas",
    "author": "Joao Brito",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ac/81/14d77022c390ecbdea9f176184a83cb6c8f384075c298af196deb4499ca1/sqlalchemy_psql_upsert-1.2.2.tar.gz",
    "platform": null,
    "description": "# SQLAlchemy PostgreSQL Upsert\n\n[![PyPI version](https://img.shields.io/pypi/v/sqlalchemy-psql-upsert)](https://pypi.org/project/sqlalchemy-psql-upsert/)\n[![License](https://img.shields.io/github/license/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/blob/main/LICENSE)\n[![Issues](https://img.shields.io/github/issues/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n[![Last Commit](https://img.shields.io/github/last-commit/machado000/sqlalchemy-psql-upsert)](https://github.com/machado000/sqlalchemy-psql-upsert/commits/main)\n[![CI](https://github.com/machado000/sqlalchemy-psql-upsert/actions/workflows/ci.yml/badge.svg)](https://github.com/machado000/sqlalchemy-psql-upsert/actions/workflows/ci.yml)\n\n\nA high-performance Python library for PostgreSQL UPSERT operations with intelligent conflict resolution using PostgreSQL temporary tables and atomic MERGE statements. Designed for reliability, data integrity, and modern Python development with comprehensive type safety.\n\n\n## \ud83d\ude80 Features\n\n- **Temporary Table Staging**: Uses PostgreSQL temporary tables for efficient, isolated staging and conflict analysis\n- **Atomic MERGE Operations**: Single-transaction upserts using PostgreSQL 15+ MERGE statements for reliability and performance\n- **Multi-constraint Support**: Handles primary keys, unique constraints, and composite constraints simultaneously\n- **Intelligent Conflict Resolution**: Automatically filters ambiguous conflicts and deduplicates data\n- **Automatic NaN to NULL Conversion**: Seamlessly converts pandas NaN/None values to PostgreSQL NULL values\n- **Schema Validation**: Automatic table and column validation before operations\n- **Comprehensive Logging**: Detailed debug information and progress tracking\n- **Modern Typing**: Fully typed with Python 3.10+ type hints and strict mypy compliance\n- **Type Safety**: 100% type coverage with comprehensive type annotations\n\n\n## \ud83d\udce6 Installation\n\n### Using Poetry (Recommended)\n```bash\npoetry add sqlalchemy-psql-upsert\n```\n\n### Using pip\n```bash\npip install sqlalchemy-psql-upsert\n```\n\n## \u2699\ufe0f Configuration\n\n### Database Privileges Requirements\n\nBesides SELECT, INSERT, UPDATE permissions on target tables, this library requires PostgreSQL `TEMPORARY` privilege to function properly:\n\n**Why Temporary Tables?**\n- **Isolation**: Staging data doesn't interfere with production tables during analysis\n- **Performance**: Bulk operations are faster on temporary tables\n- **Safety**: Failed operations don't leave partial data in target tables\n- **Atomicity**: Entire upsert operation happens in a single transaction\n\n### Environment Variables\n\nCreate a `.env` file or set the following environment variables:\n\n```bash\n# PostgreSQL Configuration\nPGHOST = localhost\nPGPORT = 5432\nPGDATABASE = your_database\nPGUSER = your_username\nPGPASSWORD = your_password\n```\n\n### Configuration Class\n\n```python\nfrom sqlalchemy_psql_upsert import PgConfig\n\n# Default configuration from environment\nconfig = PgConfig()\n\n# Manual configuration\nconfig = PgConfig(\n    host=\"localhost\",\n    port=\"5432\",\n    user=\"myuser\",\n    password=\"mypass\",\n    dbname=\"mydb\"\n)\n\nprint(config.uri())  # postgresql+psycopg2://myuser:mypass@localhost:5432/mydb\n```\n\n## \ud83d\udee0\ufe0f Quick Start\n\n### Connection Testing\n\n```python\nfrom sqlalchemy_psql_upsert import test_connection\n\n# Test default connection\nsuccess, message = test_connection()\nif success:\n    print(\"\u2705 Database connection OK\")\nelse:\n    print(f\"\u274c Connection failed: {message}\")\n```\n\n### Privileges Verification\n\n**Important**: This library requires `CREATE TEMP TABLE` privileges to function properly. The client automatically verifies these privileges during initialization.\n\n```python\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\nfrom sqlalchemy import create_engine\n\n# Test connection and privileges\nconfig = PgConfig()\n\ntry:\n    # This will automatically test temp table privileges\n    upserter = PostgresqlUpsert(config=config)\n    print(\"\u2705 Connection and privileges verified successfully\")\n    \nexcept PermissionError as e:\n    print(f\"\u274c Privilege error: {e}\")\n    print(\"Solution: Grant temporary privileges with:\")\n    print(\"GRANT TEMPORARY ON DATABASE your_database TO your_user;\")\n    \nexcept Exception as e:\n    print(f\"\u274c Connection failed: {e}\")\n```\n\n**Grant Required Privileges:**\n```sql\n-- As database administrator, grant temporary table privileges\nGRANT TEMPORARY ON DATABASE your_database TO your_user;\n\n-- Alternatively, grant more comprehensive privileges\nGRANT CREATE ON DATABASE your_database TO your_user;\n```\n\n### Basic Usage\n\n```python\nimport pandas as pd\nfrom sqlalchemy_psql_upsert import PostgresqlUpsert, PgConfig\n\n# Configure database connection\nconfig = PgConfig()  # Loads from environment variables\nupserter = PostgresqlUpsert(config=config)\n\n# Prepare your data\ndf = pd.DataFrame({\n    'id': [1, 2, 3],\n    'name': ['Alice', 'Bob', 'Charlie'],\n    'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']\n})\n\n# Perform upsert\naffected_rows = upserter.upsert_dataframe(\n    dataframe=df,\n    table_name='users',\n    schema='public'\n)\n\nprint(f\"\u2705 Upserted {affected_rows} rows successfully\")\n\n# Get detailed information about skipped rows\naffected_rows, skipped_df = upserter.upsert_dataframe(\n    dataframe=df,\n    table_name='users', \n    schema='public',\n    return_skipped=True\n)\n\nprint(f\"\u2705 Upserted {affected_rows} rows, {len(skipped_df)} rows skipped\")\nif not skipped_df.empty:\n    print(\"Skipped rows:\", skipped_df[['skip_reason']].value_counts())\n```\n\n### Advanced Configuration\n\n```python\nfrom sqlalchemy import create_engine\n\n# Using custom SQLAlchemy engine\nengine = create_engine('postgresql://user:pass@localhost:5432/mydb')\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n\n# Upsert with custom schema\naffected_rows = upserter.upsert_dataframe(\n    dataframe=large_df,\n    table_name='products',\n    schema='inventory'\n)\n```\n\n### Data Type Handling\n\nThe library automatically handles pandas data type conversions:\n\n```python\nimport pandas as pd\nimport numpy as np\n\n# DataFrame with NaN values\ndf = pd.DataFrame({\n    'id': [1, 2, 3, 4],\n    'name': ['Alice', 'Bob', None, 'David'],      # None values\n    'score': [85.5, np.nan, 92.0, 88.1],         # NaN values\n    'active': [True, False, None, True]           # Mixed types with None\n})\n\n# All NaN and None values are automatically converted to PostgreSQL NULL\nupserter.upsert_dataframe(df, 'users')\n# Result: NaN/None \u2192 NULL in PostgreSQL\n```\n\n## \ud83d\udd0d How It Works\n\n\n### Detailed Upsert Workflow\n\nThis library uses a robust multi-step approach for reliable, high-performance upserts:\n\n1. **Temporary Table Staging**\n   - A temporary table is created with the same structure as the target table, plus a `skip_reason` column.\n   - All input DataFrame rows are bulk inserted into this table.\n\n2. **Multi-Row Conflict Detection**\n   - Each row in the temp table is checked for conflicts against the target table's constraints (PK, unique, composite unique).\n   - Rows that would conflict with more than one target row are marked with `skip_reason = 'multiple_target_conflicts'`.\n\n3. **Ranking and Deduplication**\n   - For rows that only conflict with a single target row, the library checks for duplicate source rows (multiple input rows targeting the same record).\n   - Only the last row for each target is kept; others are marked with `skip_reason = 'duplicate_source_rows'`.\n\n4. **Constructing the Clean Rows Table**\n   - A second temporary table (the \"clean table\") is created with the same structure as the target table.\n   - All rows from the raw temp table where `skip_reason IS NULL` are copied into the clean table.\n\n5. **Merging Clean Table into Target Table**\n   - A single, atomic PostgreSQL `MERGE` statement is executed:\n     - If a row in the clean table matches a row in the target table (by any constraint), it is updated.\n     - If there is no match, the row is inserted.\n\n6. **Fetching and Returning Conflicts**\n   - After the merge, all rows from the raw temp table where `skip_reason IS NOT NULL` are fetched and returned to the user, along with the reason they were skipped.\n\nThis process ensures only valid, deduplicated rows are upserted, and all conflicts are tracked and reported for further review.\n\n\n### Constraint Detection\n\nThe library automatically analyzes your target table to identify:\n- **Primary key constraints**: Single or composite primary keys\n- **Unique constraints**: Single column unique constraints\n- **Composite unique constraints**: Multi-column unique constraints\n\nAll constraint types are handled simultaneously in a single operation.\n\n\n## \ud83d\udea8 Pros, Cons & Considerations\n\n### \u2705 Advantages of Temporary Table + CTE + MERGE Approach\n\n**Performance Benefits:**\n- **Single Transaction**: Entire operation is atomic, no partial updates or race conditions\n- **Bulk Operations**: High-performance bulk inserts into temporary tables\n- **Efficient Joins**: PostgreSQL optimizes joins between temporary and main tables\n- **Minimal Locking**: Temporary tables don't interfere with concurrent operations\n\n**Reliability Benefits:**\n- **Comprehensive Conflict Resolution**: Handles all constraint types simultaneously\n- **Deterministic Results**: Same input always produces same output\n- **Automatic Cleanup**: Temporary tables are automatically dropped\n- **ACID Compliance**: Full transaction safety and rollback capability\n\n**Data Integrity Benefits:**\n- **Ambiguity Detection**: Automatically detects and skips problematic rows\n- **Deduplication**: Handles duplicate input data intelligently\n- **Constraint Validation**: PostgreSQL validates all constraints during MERGE\n\n### \u274c Limitations and Trade-offs\n\n**Resource Requirements:**\n- **Memory Usage**: All input data is staged in temporary tables (memory-resident)\n- **Temporary Space**: Requires sufficient temporary storage for staging tables\n- **Single-threaded**: No parallel processing (traded for reliability and simplicity)\n\n**PostgreSQL Specifics:**\n- **Version Dependency**: MERGE statement requires PostgreSQL 15+\n- **Session-based Temp Tables**: Temporary tables are tied to database sessions\n\n**Privilege-related Limitations:**\n- **Database Administrator**: May need DBA assistance to grant `TEMPORARY` privileges\n- **Shared Hosting**: Some cloud providers restrict temporary table creation\n- **Security Policies**: Corporate environments may restrict temporary table usage\n\n**Scale Considerations:**\n- **Large Dataset Handling**: Very large datasets (>1M rows) may require memory tuning\n- **Transaction Duration**: Entire operation happens in one transaction (longer lock times)\n\n### \ud83c\udfaf Best Practices\n\n**Memory Management:**\n```python\n# For large datasets, monitor memory usage\nimport pandas as pd\n\n# Consider chunking very large datasets manually if needed\ndef chunk_dataframe(df, chunk_size=50000):\n    for start in range(0, len(df), chunk_size):\n        yield df[start:start + chunk_size]\n\n# Process in manageable chunks\nfor chunk in chunk_dataframe(large_df):\n    affected_rows = upserter.upsert_dataframe(chunk, 'target_table')\n    print(f\"Processed chunk: {affected_rows} rows\")\n```\n\n**Performance Optimization:**\n```python\n# Ensure proper indexing on conflict columns\n# CREATE INDEX idx_email ON target_table(email);\n# CREATE INDEX idx_composite ON target_table(doc_type, doc_number);\n\n# Use debug mode to monitor performance\nupserter = PostgresqlUpsert(engine=engine, debug=True)\n```\n\n**Error Handling:**\n```python\ntry:\n    affected_rows = upserter.upsert_dataframe(df, 'users')\n    logger.info(f\"Successfully upserted {affected_rows} rows\")\nexcept ValueError as e:\n    logger.error(f\"Validation error: {e}\")\nexcept Exception as e:\n    logger.error(f\"Upsert failed: {e}\")\n    # Handle rollback - transaction is automatically rolled back\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Here's how to get started:\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature/amazing-feature`\n3. **Make your changes** and add tests\n4. **Run the test suite**: `pytest tests/ -v`\n5. **Submit a pull request**\n\n\n## \ud83d\udcdd License\n\nThis project is licensed under the GPL v3 License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4b Support\n\n- **Issues**: [GitHub Issues](https://github.com/machado000/sqlalchemy-psql-upsert/issues)\n- **Documentation**: Check the docstrings and test files for detailed usage examples",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "PostgreSQL upsert library with intelligent conflict resolution and multi-threaded processing",
    "version": "1.2.2",
    "project_urls": {
        "Homepage": "https://github.com/machado000/sqlalchemy-psql-upsert",
        "Issues": "https://github.com/machado000/sqlalchemy-psql-upsert/issues"
    },
    "split_keywords": [
        "postgresql",
        " upsert",
        " sqlalchemy",
        " database",
        " conflict-resolution",
        " bulk-insert",
        " data-migration",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9bfe8c47b95da29dfdcf034e1549a08abc5088119e86e345d0616e00e1247415",
                "md5": "afde2b99400af1e2b8e10fe4d3996ff3",
                "sha256": "459add6dc865c724e477c1fcfb728dd480a1beb34c241f761b15ccdcf5c9df08"
            },
            "downloads": -1,
            "filename": "sqlalchemy_psql_upsert-1.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "afde2b99400af1e2b8e10fe4d3996ff3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.13,>=3.10",
            "size": 28900,
            "upload_time": "2025-08-28T15:58:47",
            "upload_time_iso_8601": "2025-08-28T15:58:47.982605Z",
            "url": "https://files.pythonhosted.org/packages/9b/fe/8c47b95da29dfdcf034e1549a08abc5088119e86e345d0616e00e1247415/sqlalchemy_psql_upsert-1.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ac8114d77022c390ecbdea9f176184a83cb6c8f384075c298af196deb4499ca1",
                "md5": "8fec9690fa9b8ce036dce89e65777505",
                "sha256": "6f35e69a811b52fbf0909f2ad2d4e23a6d15cbee7f7984da3d6048c2f0ae5445"
            },
            "downloads": -1,
            "filename": "sqlalchemy_psql_upsert-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8fec9690fa9b8ce036dce89e65777505",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.13,>=3.10",
            "size": 31046,
            "upload_time": "2025-08-28T15:58:49",
            "upload_time_iso_8601": "2025-08-28T15:58:49.605539Z",
            "url": "https://files.pythonhosted.org/packages/ac/81/14d77022c390ecbdea9f176184a83cb6c8f384075c298af196deb4499ca1/sqlalchemy_psql_upsert-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-28 15:58:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "machado000",
    "github_project": "sqlalchemy-psql-upsert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sqlalchemy-psql-upsert"
}
        
Elapsed time: 0.42300s