pangres


Namepangres JSON
Version 2.3.1 PyPI version JSON
download
home_pagehttps://github.com/ThibTrip/pangres
SummaryPostgres insert update with pandas DataFrames.
upload_time2021-06-15 12:37:58
maintainer
docs_urlNone
authorThibault Bétrémieux
requires_python
licenseThe Unlicense
keywords pandas postgres mysql sqlite
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![CircleCI](https://circleci.com/gh/ThibTrip/pangres.svg?style=svg&circle-token=3e39be6b969ed02b41d259c279da0d9e63751506)](https://circleci.com/gh/ThibTrip/pangres) [![codecov](https://codecov.io/gh/ThibTrip/pangres/branch/master/graph/badge.svg)](https://codecov.io/gh/ThibTrip/pangres) [![PyPI version](https://img.shields.io/pypi/v/pangres)](https://img.shields.io/pypi/v/pangres)

# pangres
![pangres logo](logo.png)

_Thanks to [freesvg.org](https://freesvg.org/) for the logo assets_

Upsert with pandas DataFrames (<code>ON CONFLICT DO NOTHING</code> or <code>ON CONFLICT DO UPDATE</code>) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with **primary keys** or **unique keys**.
Pangres also handles the creation of non existing SQL tables and schemas.

# Features

1. <i>(optional)</i> Automatical column creation (when a column exists in the DataFrame but not in the SQL table).
2. <i>(optional)</i> Automatical column type alteration for columns that are empty in the SQL table (except for SQlite where alteration is limited).
3. Creates the table if it is missing.
4. Creates missing schemas in Postgres (and potentially other databases that have a schema system).
5. JSON is supported (with pd.to_sql it does not work) with some exceptions (see [Gotchas and caveats](#Gotchas-and-caveats)).
6. Fast (except for SQlite where some help is needed).
7. Will work even if not all columns defined in the SQL table are there.
8. SQL injection safe (schema, table and column names are escaped and values are given as parameters).

# Tested with
* Python 3.7.3 and Python 3.8.0
* MySQL 5.7.29 using pymysql 0.9.3
* PostgreSQL 9.6.17 using psycopg2 2.8.4
* SQlite 3.28.0 using sqlite3 2.6.0

# Gotchas and caveats

## All flavors
1. We can't create JSON columns automatically but we can insert JSON like objects (list, dict) in existing JSON columns.

## Postgres

1. "%", ")" and "(" in column names will most likely cause errors with PostgreSQL (this is due to psycopg2 and also affect pd.to_sql). Use the function pangres.fix_psycopg2_bad_cols to "clean" the columns in the DataFrame. You'll also have to rename columns in the SQL table accordingly (if the table already exists).
2. Even though we only do data type alteration on empty columns, since we don't want to lose column information (e.g. constraints) we use true column alteration (instead of drop+create) so the old data type must be castable to the new data type. Postgres seems a bit restrictive in this regard even when the columns are empty (e.g. BOOLEAN to TIMESTAMP is impossible).

## SQlite
1. **SQlite must be version 3.24.4 or higher**! UPSERT syntax did not exist before. 
2. Column type alteration is not possible for SQlite.
3. SQlite inserts can be at worst 5 times slower than pd.to_sql for some reasons. If you can help please contact me!
4. Inserts with 1000 columns or more are not supported due to a restriction of 999 parameters per queries. One way to fix this would inserting the columns progressively but this seems quite tricky. If you know a better way please contact me.

## MySQL

1. MySQL will often change the order of the primary keys in the SQL table when using INSERT... ON CONFLICT.. DO NOTHING/UPDATE. This seems to be the expected behavior so nothing we can do about it but please mind that!
2. You may need to provide SQL dtypes e.g. if you have a primary key with text you will need to provide a character length (e.g. VARCHAR(50)) because MySQL does not support indices/primary keys with flexible text length. pd.to_sql has the same issue.


# Notes

This is a library I was using in production in private with very good results and decided to publish.

Ideally such features will be integrated into pandas since there is already a [PR on the way](https://github.com/pandas-dev/pandas/pull/29636)) and I would like to give the option to add columns via another PR.

There is also [pandabase](https://github.com/notsambeck/pandabase) which does almost the same thing (plus lots of extra features) but my implementation is different.
Btw big thanks to pandabase and the sql part of pandas which helped a lot.

# Installation
```
pip install pangres
```
Additionally depending on which database you want to work with you will need to install the corresponding library (note that SQlite is included in the standard library):

* Postgres
```
pip install psycopg2
```

* MySQL
```
pip install pymysql
```

# Usage
Head over to [pangres' wiki](https://github.com/ThibTrip/pangres/wiki)!

# Contributing

Pull requests/issues are welcome.

Note: I develop the library inside of Jupyter Lab using the [jupytext](https://github.com/mwouts/jupytext) extension.
I recommand using this extension for the best experience. It will split code blocks within modules in cells and will help thanks to interactive development.
If you wish you can also use the provided environment (see `environment.yml` file) inside of Jupyter Lab/Notebook thanks to [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels).

# Testing

You will need a SQlite, MySQL and Postgres database available for testing.

Clone pangres then set your curent working directory to the root of the cloned repository folder. Then use the commands below. You will have to replace the following variables in those commands:
* SQLITE_CONNECTION_STRING: replace with a SQlite sqlalchemy connection string (e.g. "sqlite:///test.db")
* POSTGRES_CONNECTION_STRING: replace with a Postgres sqlalchemy connection string (e.g. "postgres:///user:password@localhost:5432/database"). Specifying schema is optional for postgres (will default to public).
* PG_SCHEMA (optional): schema for postgres (defaults to public)
* MYSQL_CONNECTION_STRING: replace with a MySQL sqlalchemy connection string (e.g. "mysql+pymysql:///user:password@localhost:3306/database")

```shell
# 1. Create and activate the build environment
conda env create -f environment.yml
conda activate pangres-dev
# 2. Install pangres in editable mode (changes are reflected upon reimporting)
pip install -e .
# 3. Run pytest
# -s prints stdout
# -v prints test parameters
# --cov=./pangres shows coverage only for pangres
# --doctest-modules tests with doctest in all modules
pytest -s -v pangres --cov=pangres --doctest-modules --sqlite_conn=$SQLITE_CONNECTION_STRING --pg_conn=$POSTGRES_CONNECTION_STRING --mysql_conn=$MYSQL_CONNECTION_STRING --pg_schema=tests
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ThibTrip/pangres",
    "name": "pangres",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "pandas,postgres,mysql,sqlite",
    "author": "Thibault B\u00e9tr\u00e9mieux",
    "author_email": "thibault.betremieux@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/32/29/f7fd48abf04ab6e02d124f3d47c307a871d78009a59e336043abdbefebc1/pangres-2.3.1.tar.gz",
    "platform": "",
    "description": "[![CircleCI](https://circleci.com/gh/ThibTrip/pangres.svg?style=svg&circle-token=3e39be6b969ed02b41d259c279da0d9e63751506)](https://circleci.com/gh/ThibTrip/pangres) [![codecov](https://codecov.io/gh/ThibTrip/pangres/branch/master/graph/badge.svg)](https://codecov.io/gh/ThibTrip/pangres) [![PyPI version](https://img.shields.io/pypi/v/pangres)](https://img.shields.io/pypi/v/pangres)\n\n# pangres\n![pangres logo](logo.png)\n\n_Thanks to [freesvg.org](https://freesvg.org/) for the logo assets_\n\nUpsert with pandas DataFrames (<code>ON CONFLICT DO NOTHING</code> or <code>ON CONFLICT DO UPDATE</code>) for PostgreSQL, MySQL, SQlite and potentially other databases behaving like SQlite (untested) with some additional optional features (see features). Upserting can be done with **primary keys** or **unique keys**.\nPangres also handles the creation of non existing SQL tables and schemas.\n\n# Features\n\n1. <i>(optional)</i> Automatical column creation (when a column exists in the DataFrame but not in the SQL table).\n2. <i>(optional)</i> Automatical column type alteration for columns that are empty in the SQL table (except for SQlite where alteration is limited).\n3. Creates the table if it is missing.\n4. Creates missing schemas in Postgres (and potentially other databases that have a schema system).\n5. JSON is supported (with pd.to_sql it does not work) with some exceptions (see [Gotchas and caveats](#Gotchas-and-caveats)).\n6. Fast (except for SQlite where some help is needed).\n7. Will work even if not all columns defined in the SQL table are there.\n8. SQL injection safe (schema, table and column names are escaped and values are given as parameters).\n\n# Tested with\n* Python 3.7.3 and Python 3.8.0\n* MySQL 5.7.29 using pymysql 0.9.3\n* PostgreSQL 9.6.17 using psycopg2 2.8.4\n* SQlite 3.28.0 using sqlite3 2.6.0\n\n# Gotchas and caveats\n\n## All flavors\n1. We can't create JSON columns automatically but we can insert JSON like objects (list, dict) in existing JSON columns.\n\n## Postgres\n\n1. \"%\", \")\" and \"(\" in column names will most likely cause errors with PostgreSQL (this is due to psycopg2 and also affect pd.to_sql). Use the function pangres.fix_psycopg2_bad_cols to \"clean\" the columns in the DataFrame. You'll also have to rename columns in the SQL table accordingly (if the table already exists).\n2. Even though we only do data type alteration on empty columns, since we don't want to lose column information (e.g. constraints) we use true column alteration (instead of drop+create) so the old data type must be castable to the new data type. Postgres seems a bit restrictive in this regard even when the columns are empty (e.g. BOOLEAN to TIMESTAMP is impossible).\n\n## SQlite\n1. **SQlite must be version 3.24.4 or higher**! UPSERT syntax did not exist before. \n2. Column type alteration is not possible for SQlite.\n3. SQlite inserts can be at worst 5 times slower than pd.to_sql for some reasons. If you can help please contact me!\n4. Inserts with 1000 columns or more are not supported due to a restriction of 999 parameters per queries. One way to fix this would inserting the columns progressively but this seems quite tricky. If you know a better way please contact me.\n\n## MySQL\n\n1. MySQL will often change the order of the primary keys in the SQL table when using INSERT... ON CONFLICT.. DO NOTHING/UPDATE. This seems to be the expected behavior so nothing we can do about it but please mind that!\n2. You may need to provide SQL dtypes e.g. if you have a primary key with text you will need to provide a character length (e.g. VARCHAR(50)) because MySQL does not support indices/primary keys with flexible text length. pd.to_sql has the same issue.\n\n\n# Notes\n\nThis is a library I was using in production in private with very good results and decided to publish.\n\nIdeally such features will be integrated into pandas since there is already a [PR on the way](https://github.com/pandas-dev/pandas/pull/29636)) and I would like to give the option to add columns via another PR.\n\nThere is also [pandabase](https://github.com/notsambeck/pandabase) which does almost the same thing (plus lots of extra features) but my implementation is different.\nBtw big thanks to pandabase and the sql part of pandas which helped a lot.\n\n# Installation\n```\npip install pangres\n```\nAdditionally depending on which database you want to work with you will need to install the corresponding library (note that SQlite is included in the standard library):\n\n* Postgres\n```\npip install psycopg2\n```\n\n* MySQL\n```\npip install pymysql\n```\n\n# Usage\nHead over to [pangres' wiki](https://github.com/ThibTrip/pangres/wiki)!\n\n# Contributing\n\nPull requests/issues are welcome.\n\nNote: I develop the library inside of Jupyter Lab using the [jupytext](https://github.com/mwouts/jupytext) extension.\nI recommand using this extension for the best experience. It will split code blocks within modules in cells and will help thanks to interactive development.\nIf you wish you can also use the provided environment (see `environment.yml` file) inside of Jupyter Lab/Notebook thanks to [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels).\n\n# Testing\n\nYou will need a SQlite, MySQL and Postgres database available for testing.\n\nClone pangres then set your curent working directory to the root of the cloned repository folder. Then use the commands below. You will have to replace the following variables in those commands:\n* SQLITE_CONNECTION_STRING: replace with a SQlite sqlalchemy connection string (e.g. \"sqlite:///test.db\")\n* POSTGRES_CONNECTION_STRING: replace with a Postgres sqlalchemy connection string (e.g. \"postgres:///user:password@localhost:5432/database\"). Specifying schema is optional for postgres (will default to public).\n* PG_SCHEMA (optional): schema for postgres (defaults to public)\n* MYSQL_CONNECTION_STRING: replace with a MySQL sqlalchemy connection string (e.g. \"mysql+pymysql:///user:password@localhost:3306/database\")\n\n```shell\n# 1. Create and activate the build environment\nconda env create -f environment.yml\nconda activate pangres-dev\n# 2. Install pangres in editable mode (changes are reflected upon reimporting)\npip install -e .\n# 3. Run pytest\n# -s prints stdout\n# -v prints test parameters\n# --cov=./pangres shows coverage only for pangres\n# --doctest-modules tests with doctest in all modules\npytest -s -v pangres --cov=pangres --doctest-modules --sqlite_conn=$SQLITE_CONNECTION_STRING --pg_conn=$POSTGRES_CONNECTION_STRING --mysql_conn=$MYSQL_CONNECTION_STRING --pg_schema=tests\n```",
    "bugtrack_url": null,
    "license": "The Unlicense",
    "summary": "Postgres insert update with pandas DataFrames.",
    "version": "2.3.1",
    "split_keywords": [
        "pandas",
        "postgres",
        "mysql",
        "sqlite"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "34b95c05a18353a2ce205f2c29792949",
                "sha256": "984adbeed567c1d68abb8e1b91e51d76ee5524885d0bbf3b5772cde02e235bca"
            },
            "downloads": -1,
            "filename": "pangres-2.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "34b95c05a18353a2ce205f2c29792949",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27877,
            "upload_time": "2021-06-15T12:37:58",
            "upload_time_iso_8601": "2021-06-15T12:37:58.118893Z",
            "url": "https://files.pythonhosted.org/packages/32/29/f7fd48abf04ab6e02d124f3d47c307a871d78009a59e336043abdbefebc1/pangres-2.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-06-15 12:37:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "ThibTrip",
    "error": "Could not fetch GitHub repository",
    "lcname": "pangres"
}
        
Elapsed time: 0.29616s