pgsql-upserter


Namepgsql-upserter JSON
Version 0.9.1 PyPI version JSON
download
home_pagehttps://github.com/machado000/pgsql-upserter
SummaryPostgreSQL upsert engine using temp tables and automatic conflict resolution
upload_time2025-08-31 19:25:29
maintainerNone
docs_urlNone
authorJoao Brito
requires_python>=3.11
licenseMIT
keywords postgresql upsert database etl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pgsql_upserter

[![PyPI version](https://img.shields.io/pypi/v/pgsql-upserter)](https://pypi.org/project/pgsql-upserter/)
[![License](https://img.shields.io/github/license/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/blob/main/LICENSE)
[![Issues](https://img.shields.io/github/issues/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/issues)
[![Last Commit](https://img.shields.io/github/last-commit/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/commits/main)

A powerful, production-ready PostgreSQL upsert utility with automatic schema introspection and intelligent conflict resolution. Perfect for serverless ETL pipelines and data integration workflows.

## 🚀 Key Features

- **Zero Configuration**: Automatic schema detection and column matching
- **Intelligent Conflict Resolution**: Automatically detects primary keys and unique constraints
- **Production Tested**: Handles deduplication, data validation, and error recovery
- **Flexible Input**: Supports both direct data (API responses) and CSV files

## 📦 Installation

```bash
pip install pgsql-upserter
```

## 🎯 Quick Start

### Serverless ETL (Recommended)

Perfect for AWS Lambda, Google Cloud Functions, or any API-driven ETL:

```python
from pgsql_upserter import execute_upsert_workflow, create_connection_from_env

# Your API response data (Facebook Ads, Google Ads, etc.)
api_data = [
    {
        'account_id': '123456789',
        'campaign_id': 'camp_001', 
        'impressions': 1000,
        'clicks': 50,
        'spend': 25.50,
        'date_start': '2025-08-31'
    }
    # ... more records
]

# One function call does everything!
connection = create_connection_from_env()
result = execute_upsert_workflow(
    connection=connection,
    data=api_data,  # Direct API data
    target_table='ads_metrics'
)

print(f"✅ {result.total_affected} rows processed")
print(f"📈 {result.rows_inserted} inserted, {result.rows_updated} updated")
```

### CSV File Processing

```python
# Automatic CSV processing
result = execute_upsert_workflow(
    connection=connection,
    data='path/to/data.csv',  # File path
    target_table='ads_metrics'
)
```

## 🔧 Environment Setup

Set your PostgreSQL connection via environment variables:

```bash
export PGHOST=your-host
export PGPORT=5432
export PGDATABASE=your-db
export PGUSER=your-user
export PGPASSWORD=your-password
```

Or use a connection string:
```bash
export DATABASE_URL=postgresql://user:pass@host:port/dbname
```

## 🧠 How It Works

1. **Schema Introspection**: Analyzes your table structure automatically
2. **Column Matching**: Maps your data columns to table columns
3. **Conflict Detection**: Finds primary keys and unique constraints  
4. **Data Deduplication**: Removes duplicates using conflict resolution strategy
5. **Intelligent Upsert**: Uses PostgreSQL's native `INSERT...ON CONFLICT`

## 🎯 Perfect For

- **API Data Ingestion**: Facebook Ads, Google Ads, LinkedIn Ads APIs
- **Serverless ETL**: AWS Lambda, Google Cloud Functions, Azure Functions
- **Data Warehousing**: Loading data into analytics databases
- **Real-time Sync**: Keeping databases in sync with external sources
- **Batch Processing**: Traditional CSV and file-based workflows

## 📊 Automatic Conflict Resolution

The library automatically chooses the best upsert strategy:

1. **Primary Key**: Uses table's primary key if available in data
2. **Unique Constraints**: Combines all unique constraints for conflict detection  
3. **Insert Only**: Falls back to simple insert if no conflicts possible

## 🔍 Advanced Usage

### Data Processing Before Upsert

```python
from pgsql_upserter import UpsertResult

# Read and process CSV data
csv_data = UpsertResult.read_csv_to_dict_list('data.csv')

# Filter or transform data
filtered_data = [row for row in csv_data if float(row.get('spend', 0)) > 10.0]

# Upsert processed data
result = execute_upsert_workflow(
    connection=connection,
    data=filtered_data,
    target_table='ads_metrics'
)
```

### Custom Connection

```python
import psycopg2
from pgsql_upserter import execute_upsert_workflow

connection = psycopg2.connect(
    host="localhost",
    database="mydb",
    user="user",
    password="password"
)

result = execute_upsert_workflow(
    connection=connection,
    data=your_data,
    target_table='your_table',
    schema_name='public'  # optional, defaults to 'public'
)
```

## 🛡️ Error Handling

The library provides comprehensive error handling and validation:

```python
from pgsql_upserter import execute_upsert_workflow, PgsqlUpserterError

try:
    result = execute_upsert_workflow(connection, data, 'my_table')
    print(f"Success: {result.total_affected} rows processed")
except PgsqlUpserterError as e:
    print(f"Upsert failed: {e}")
```

## 📋 Requirements

- Python 3.11+
- PostgreSQL 12+
- psycopg2-binary

## 🤝 Contributing

Issues and pull requests are welcome! Please see our contributing guidelines.

## 📄 License

MIT License - see LICENSE file for details.

## 🔗 Links

- [PyPI Package](https://pypi.org/project/pgsql-upserter/)
- [Source Code](https://github.com/machadoo000/pgsql-upserter)
- [Issues](https://github.com/machadoo000/pgsql-upserter/issues)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/machado000/pgsql-upserter",
    "name": "pgsql-upserter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "postgresql, upsert, database, etl",
    "author": "Joao Brito",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/11/ce/6e21fabac668ed6dbd75af8dcb5f770529b1f33502753717cd4a1c50992c/pgsql_upserter-0.9.1.tar.gz",
    "platform": null,
    "description": "# pgsql_upserter\n\n[![PyPI version](https://img.shields.io/pypi/v/pgsql-upserter)](https://pypi.org/project/pgsql-upserter/)\n[![License](https://img.shields.io/github/license/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/blob/main/LICENSE)\n[![Issues](https://img.shields.io/github/issues/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/issues)\n[![Last Commit](https://img.shields.io/github/last-commit/machado000/pgsql-upserter)](https://github.com/machado000/pgsql-upserter/commits/main)\n\nA powerful, production-ready PostgreSQL upsert utility with automatic schema introspection and intelligent conflict resolution. Perfect for serverless ETL pipelines and data integration workflows.\n\n## \ud83d\ude80 Key Features\n\n- **Zero Configuration**: Automatic schema detection and column matching\n- **Intelligent Conflict Resolution**: Automatically detects primary keys and unique constraints\n- **Production Tested**: Handles deduplication, data validation, and error recovery\n- **Flexible Input**: Supports both direct data (API responses) and CSV files\n\n## \ud83d\udce6 Installation\n\n```bash\npip install pgsql-upserter\n```\n\n## \ud83c\udfaf Quick Start\n\n### Serverless ETL (Recommended)\n\nPerfect for AWS Lambda, Google Cloud Functions, or any API-driven ETL:\n\n```python\nfrom pgsql_upserter import execute_upsert_workflow, create_connection_from_env\n\n# Your API response data (Facebook Ads, Google Ads, etc.)\napi_data = [\n    {\n        'account_id': '123456789',\n        'campaign_id': 'camp_001', \n        'impressions': 1000,\n        'clicks': 50,\n        'spend': 25.50,\n        'date_start': '2025-08-31'\n    }\n    # ... more records\n]\n\n# One function call does everything!\nconnection = create_connection_from_env()\nresult = execute_upsert_workflow(\n    connection=connection,\n    data=api_data,  # Direct API data\n    target_table='ads_metrics'\n)\n\nprint(f\"\u2705 {result.total_affected} rows processed\")\nprint(f\"\ud83d\udcc8 {result.rows_inserted} inserted, {result.rows_updated} updated\")\n```\n\n### CSV File Processing\n\n```python\n# Automatic CSV processing\nresult = execute_upsert_workflow(\n    connection=connection,\n    data='path/to/data.csv',  # File path\n    target_table='ads_metrics'\n)\n```\n\n## \ud83d\udd27 Environment Setup\n\nSet your PostgreSQL connection via environment variables:\n\n```bash\nexport PGHOST=your-host\nexport PGPORT=5432\nexport PGDATABASE=your-db\nexport PGUSER=your-user\nexport PGPASSWORD=your-password\n```\n\nOr use a connection string:\n```bash\nexport DATABASE_URL=postgresql://user:pass@host:port/dbname\n```\n\n## \ud83e\udde0 How It Works\n\n1. **Schema Introspection**: Analyzes your table structure automatically\n2. **Column Matching**: Maps your data columns to table columns\n3. **Conflict Detection**: Finds primary keys and unique constraints  \n4. **Data Deduplication**: Removes duplicates using conflict resolution strategy\n5. **Intelligent Upsert**: Uses PostgreSQL's native `INSERT...ON CONFLICT`\n\n## \ud83c\udfaf Perfect For\n\n- **API Data Ingestion**: Facebook Ads, Google Ads, LinkedIn Ads APIs\n- **Serverless ETL**: AWS Lambda, Google Cloud Functions, Azure Functions\n- **Data Warehousing**: Loading data into analytics databases\n- **Real-time Sync**: Keeping databases in sync with external sources\n- **Batch Processing**: Traditional CSV and file-based workflows\n\n## \ud83d\udcca Automatic Conflict Resolution\n\nThe library automatically chooses the best upsert strategy:\n\n1. **Primary Key**: Uses table's primary key if available in data\n2. **Unique Constraints**: Combines all unique constraints for conflict detection  \n3. **Insert Only**: Falls back to simple insert if no conflicts possible\n\n## \ud83d\udd0d Advanced Usage\n\n### Data Processing Before Upsert\n\n```python\nfrom pgsql_upserter import UpsertResult\n\n# Read and process CSV data\ncsv_data = UpsertResult.read_csv_to_dict_list('data.csv')\n\n# Filter or transform data\nfiltered_data = [row for row in csv_data if float(row.get('spend', 0)) > 10.0]\n\n# Upsert processed data\nresult = execute_upsert_workflow(\n    connection=connection,\n    data=filtered_data,\n    target_table='ads_metrics'\n)\n```\n\n### Custom Connection\n\n```python\nimport psycopg2\nfrom pgsql_upserter import execute_upsert_workflow\n\nconnection = psycopg2.connect(\n    host=\"localhost\",\n    database=\"mydb\",\n    user=\"user\",\n    password=\"password\"\n)\n\nresult = execute_upsert_workflow(\n    connection=connection,\n    data=your_data,\n    target_table='your_table',\n    schema_name='public'  # optional, defaults to 'public'\n)\n```\n\n## \ud83d\udee1\ufe0f Error Handling\n\nThe library provides comprehensive error handling and validation:\n\n```python\nfrom pgsql_upserter import execute_upsert_workflow, PgsqlUpserterError\n\ntry:\n    result = execute_upsert_workflow(connection, data, 'my_table')\n    print(f\"Success: {result.total_affected} rows processed\")\nexcept PgsqlUpserterError as e:\n    print(f\"Upsert failed: {e}\")\n```\n\n## \ud83d\udccb Requirements\n\n- Python 3.11+\n- PostgreSQL 12+\n- psycopg2-binary\n\n## \ud83e\udd1d Contributing\n\nIssues and pull requests are welcome! Please see our contributing guidelines.\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n\n## \ud83d\udd17 Links\n\n- [PyPI Package](https://pypi.org/project/pgsql-upserter/)\n- [Source Code](https://github.com/machadoo000/pgsql-upserter)\n- [Issues](https://github.com/machadoo000/pgsql-upserter/issues)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "PostgreSQL upsert engine using temp tables and automatic conflict resolution",
    "version": "0.9.1",
    "project_urls": {
        "Homepage": "https://github.com/machado000/pgsql-upserter",
        "Issues": "https://github.com/machado000/pgsql-upserter/issues"
    },
    "split_keywords": [
        "postgresql",
        " upsert",
        " database",
        " etl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4312e4a27aaf7d9f849db65f9c86d431ea9b6e9411d01f00580a3215548c5888",
                "md5": "c4315ae623f3e81ff4c0ddccc14df875",
                "sha256": "2ee5750665d4f3303b5a698567c30e851b5fb017112be31ddbd8e4a3301cf504"
            },
            "downloads": -1,
            "filename": "pgsql_upserter-0.9.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c4315ae623f3e81ff4c0ddccc14df875",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 19327,
            "upload_time": "2025-08-31T19:25:27",
            "upload_time_iso_8601": "2025-08-31T19:25:27.785594Z",
            "url": "https://files.pythonhosted.org/packages/43/12/e4a27aaf7d9f849db65f9c86d431ea9b6e9411d01f00580a3215548c5888/pgsql_upserter-0.9.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "11ce6e21fabac668ed6dbd75af8dcb5f770529b1f33502753717cd4a1c50992c",
                "md5": "08548391a3b6df35df3bb05185e1ff48",
                "sha256": "a26444f4e1e5630d59c360183335dbf085dd015af9c043b380c19acc11d7963e"
            },
            "downloads": -1,
            "filename": "pgsql_upserter-0.9.1.tar.gz",
            "has_sig": false,
            "md5_digest": "08548391a3b6df35df3bb05185e1ff48",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 16408,
            "upload_time": "2025-08-31T19:25:29",
            "upload_time_iso_8601": "2025-08-31T19:25:29.163431Z",
            "url": "https://files.pythonhosted.org/packages/11/ce/6e21fabac668ed6dbd75af8dcb5f770529b1f33502753717cd4a1c50992c/pgsql_upserter-0.9.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-31 19:25:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "machado000",
    "github_project": "pgsql-upserter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pgsql-upserter"
}
        
Elapsed time: 1.22617s