pandashift


Namepandashift JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryModule for quick and easy access to redshift using pandas
upload_time2024-11-19 13:58:15
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords redshift pandas analytics amazon
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pandashift

## Overview
Pandashift integrates Pandas with Amazon Redshift for smooth ETL processes and data manipulation. Specifically designed for Redshift in Python, it simplifies workflows by providing easy bulk inserts and automatically handling type conversions, including the ability to convert Pandas types to Redshift types. Data can be pulled from Redshift into Pandas using the function `read_query` for quick analysis and data processing in Python. Whether creating DataFrames from Redshift or performing ETL tasks, taking advantage of Pandas and Pandashift provides flexibility and efficiency for scalable data workflows.

## Installing

``` shell
pip install pandashift
```

## Usage
There are 2 ways work with this package

1. Setting up environment variables 
2. Passing credentials on every call

Below are examples of both approaches

### With environment variables set up

``` python
from pandashift import read_query, execute_query, load_db, create_table_from_df

# Read Data
df = read_query('SELECT * FROM public.test')

# Execute statement (aka table create/drop/etc)
execute_query('DROP public.test')

# Create the table from df
create_table_from_df(df, 'public.test')

# Loading dataframe
load_df(df, table_name ='public.test')
```

### No environment variables
``` python
from pandashift import read_query, execute_query, load_db, create_table_from_df

creds = {
        "host":"YOUR HOST",
        "port":"YOUR PORT",
        "dbname":"YOUR DATABASE",
        "user":"YOUR USER",
        "password":"YOUR PASSWORD"
        }

# Read Data
df = read_query('SELECT * FROM public.test',credentials = creds)

# Execute statement (aka table create/drop/etc)
execute_query('DROP public.test',credentials = creds)

# Create the table from df
create_table_from_df(df, 'public.test', credentials = creds)

# Loading dataframe
load_df(df, table_name = 'public.test',credentials = creds)
```

## Functions

### load_df


| Parameter             | Usage                                                                                                                           |
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|
| init_df               | The dataframe for loading                                                                                                       |
| table_name            | The table that you want to load the df to                                                                                       |
| credentials           | Credentials to use for connection if any                                                                                        |
| verify_column_names   | The checks that the dataframe column order matches the table column order, by default **True**                                  |
| empty_str_as_null     | This option will interpret empty string '' as NULL, by default  **True**                                                        |
| maximum_insert_length | Maximum length of the insert statement, alter this if you get error exceeding the max statement length, by default **16000000** |
| perform_analyze       | If this is true at the end of loading will run ANALYZE table, by default  **False**                                             |


### create_table_from_df

| Parameter             | Usage                                                                                                                           |
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|
| df                    | The dataframe for loading                                                                                                       |
| table_name            | The table that you want to load the df to                                                                                       |
| credentials           | Credentials to use for connection if any                                                                                        |
| sortkeys              | Specify desired sortkeys, by default no sortkeys are used|
| distkey               | Specify desired sortkeys, by default no distkeys are used|


Currently only the these datatypes are suppourted: 
* SMALLINT
* INTEGER
* BIGINT
* DECIMAL
* REAL
* DOUBLE
* TIMESTAMP
* DATE
* CHAR
* VARCHAR
* BOOLEAN


## Testing

In order to run local testing

1. Create an environment with required modules
``` bash
pip install -r requirements.txt
```

2. Run the following from the `/tests` directory
```
pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pandashift",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "redshift, pandas, analytics, amazon",
    "author": null,
    "author_email": "Maksim Linichenko <maksim.linichenko@yahoo.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/0e/a11c97a31d31d4e37fd6c3d432303918ac399058c1d469475409abbf3022/pandashift-1.0.3.tar.gz",
    "platform": null,
    "description": "# pandashift\n\n## Overview\nPandashift integrates Pandas with Amazon Redshift for smooth ETL processes and data manipulation. Specifically designed for Redshift in Python, it simplifies workflows by providing easy bulk inserts and automatically handling type conversions, including the ability to convert Pandas types to Redshift types. Data can be pulled from Redshift into Pandas using the function `read_query` for quick analysis and data processing in Python. Whether creating DataFrames from Redshift or performing ETL tasks, taking advantage of Pandas and Pandashift provides flexibility and efficiency for scalable data workflows.\n\n## Installing\n\n``` shell\npip install pandashift\n```\n\n## Usage\nThere are 2 ways work with this package\n\n1. Setting up environment variables \n2. Passing credentials on every call\n\nBelow are examples of both approaches\n\n### With environment variables set up\n\n``` python\nfrom pandashift import read_query, execute_query, load_db, create_table_from_df\n\n# Read Data\ndf = read_query('SELECT * FROM public.test')\n\n# Execute statement (aka table create/drop/etc)\nexecute_query('DROP public.test')\n\n# Create the table from df\ncreate_table_from_df(df, 'public.test')\n\n# Loading dataframe\nload_df(df, table_name ='public.test')\n```\n\n### No environment variables\n``` python\nfrom pandashift import read_query, execute_query, load_db, create_table_from_df\n\ncreds = {\n        \"host\":\"YOUR HOST\",\n        \"port\":\"YOUR PORT\",\n        \"dbname\":\"YOUR DATABASE\",\n        \"user\":\"YOUR USER\",\n        \"password\":\"YOUR PASSWORD\"\n        }\n\n# Read Data\ndf = read_query('SELECT * FROM public.test',credentials = creds)\n\n# Execute statement (aka table create/drop/etc)\nexecute_query('DROP public.test',credentials = creds)\n\n# Create the table from df\ncreate_table_from_df(df, 'public.test', credentials = creds)\n\n# Loading dataframe\nload_df(df, table_name = 'public.test',credentials = creds)\n```\n\n## Functions\n\n### load_df\n\n\n| Parameter             | Usage                                                                                                                           |\n|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|\n| init_df               | The dataframe for loading                                                                                                       |\n| table_name            | The table that you want to load the df to                                                                                       |\n| credentials           | Credentials to use for connection if any                                                                                        |\n| verify_column_names   | The checks that the dataframe column order matches the table column order, by default **True**                                  |\n| empty_str_as_null     | This option will interpret empty string '' as NULL, by default  **True**                                                        |\n| maximum_insert_length | Maximum length of the insert statement, alter this if you get error exceeding the max statement length, by default **16000000** |\n| perform_analyze       | If this is true at the end of loading will run ANALYZE table, by default  **False**                                             |\n\n\n### create_table_from_df\n\n| Parameter             | Usage                                                                                                                           |\n|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|\n| df                    | The dataframe for loading                                                                                                       |\n| table_name            | The table that you want to load the df to                                                                                       |\n| credentials           | Credentials to use for connection if any                                                                                        |\n| sortkeys              | Specify desired sortkeys, by default no sortkeys are used|\n| distkey               | Specify desired sortkeys, by default no distkeys are used|\n\n\nCurrently only the these datatypes are suppourted: \n* SMALLINT\n* INTEGER\n* BIGINT\n* DECIMAL\n* REAL\n* DOUBLE\n* TIMESTAMP\n* DATE\n* CHAR\n* VARCHAR\n* BOOLEAN\n\n\n## Testing\n\nIn order to run local testing\n\n1. Create an environment with required modules\n``` bash\npip install -r requirements.txt\n```\n\n2. Run the following from the `/tests` directory\n```\npytest\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Module for quick and easy access to redshift using pandas",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/tumax7/pandashift",
        "Issues": "https://github.com/tumax7/pandashift/issues"
    },
    "split_keywords": [
        "redshift",
        " pandas",
        " analytics",
        " amazon"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e7d60b14dde29f1dfe8ad82331db9201505b77ebe6a245b233d0e5a2f326ba8",
                "md5": "fd90aca97b5567d55d6dcce36089b96b",
                "sha256": "bb87abbd2ace9d02c2e0d8454bbb8e033ef10b148ee2c5f68d5a837c497d4ab8"
            },
            "downloads": -1,
            "filename": "pandashift-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fd90aca97b5567d55d6dcce36089b96b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 6981,
            "upload_time": "2024-11-19T13:58:13",
            "upload_time_iso_8601": "2024-11-19T13:58:13.845471Z",
            "url": "https://files.pythonhosted.org/packages/3e/7d/60b14dde29f1dfe8ad82331db9201505b77ebe6a245b233d0e5a2f326ba8/pandashift-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b90ea11c97a31d31d4e37fd6c3d432303918ac399058c1d469475409abbf3022",
                "md5": "86c1715fc30e2be800fccc16d46c0649",
                "sha256": "d7f4d1c0617c3d882665012d32c656d417e916be13cef2192ad8a2bbdbd7558f"
            },
            "downloads": -1,
            "filename": "pandashift-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "86c1715fc30e2be800fccc16d46c0649",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8192,
            "upload_time": "2024-11-19T13:58:15",
            "upload_time_iso_8601": "2024-11-19T13:58:15.592179Z",
            "url": "https://files.pythonhosted.org/packages/b9/0e/a11c97a31d31d4e37fd6c3d432303918ac399058c1d469475409abbf3022/pandashift-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-19 13:58:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tumax7",
    "github_project": "pandashift",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pandashift"
}
        
Elapsed time: 4.68682s