# pandas_scd
executing slowly changing dimension type 2 on pandas dataframes or parquet files
**pandas_scd arguments:**
- **src:** pandas dataframe with the source of the SCD
- **tgt:** pandas dataframe with the target of the SCD (target can be
empty)
- **cols_to_track:** list of columns to track changes (default is all
columns from the source table)
- **tz:** pytz time zone to use on start_ts and end_ts, default is None
(will use local time)
##### the return dataframe contain the entire target table with the new changes, ready for insert overwrite of the current target table
**parquet_scd arguments:**
- **src:** path to the source of the SCD
- **tgt:** path to the target of the SCD (target can be empty)
- **cols_to_track:** list of columns to track changes (default is all columns from the source table)
- **tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
##### there is no return value, the tgt path that was provided will be overwritten
## Installation
pip install scd2
## Getting started
*for working with pandas:*
from scd2 import SCD2
import pandas as df
tgt = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ["Clippers"], "start_ts": [datetime(2012, 1, 14, 3, 21, 34)], "end_ts": [None], "is_active": [True]})
src = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ['Suns']})
final_df = SCD2().pandas_scd2(src, tgt)
**pandas_scd2 will return a dataframe with the entire new targer**
tgt:
| first_name | last_name | team | start_ts | end_ts | is_active |
|------------|-----------|----------|---------------------|--------|-----------|
| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |
src:
| first_name | last_name | team |
|------------|-----------|----------|
| Chris | Paul | Clippers |
final_df:
| first_name | last_name | team | start_ts | end_ts | is_active |
|------------|-----------|----------|---------------------|---------------------|-----------|
| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |
| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |
*for working with parquet:*
src_parquet_path = '~/source.parquet'
tgt_parquet_path = '~/target.parquet'
SCD2().parquet_scd2(src, tgt)
**parquet_scd2 will overide the current target (tgt_parquet_path)**
**src:** pandas dataframe with the source of the SCD
**tgt:** pandas dataframe with the target of the SCD (target can be empty)
**cols_to_track:** list of columns to track changes (default is all columns from the source table)
**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/liranc/pandas_scd",
"name": "scd2",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "scd,slowly changing dimension,type 2,pandas,parquet",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/64/fd/8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b/scd2-1.0.0.tar.gz",
"platform": null,
"description": "\n# pandas_scd\n\n \nexecuting slowly changing dimension type 2 on pandas dataframes or parquet files\n\n \n\n**pandas_scd arguments:**\n\n - **src:** pandas dataframe with the source of the SCD\n - **tgt:** pandas dataframe with the target of the SCD (target can be\n empty)\n \n - **cols_to_track:** list of columns to track changes (default is all \n columns from the source table)\n - **tz:** pytz time zone to use on start_ts and end_ts, default is None \n (will use local time)\n\n\n##### the return dataframe contain the entire target table with the new changes, ready for insert overwrite of the current target table\n\n\n\n\n\n**parquet_scd arguments:**\n\n - **src:** path to the source of the SCD\n - **tgt:** path to the target of the SCD (target can be empty)\n- **cols_to_track:** list of columns to track changes (default is all columns from the source table)\n- **tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)\n##### there is no return value, the tgt path that was provided will be overwritten \n \n \n\n## Installation\n\n pip install scd2\n\n \n\n## Getting started\n\n*for working with pandas:* \n\n\tfrom scd2 import SCD2\n\timport pandas as df\t \n\n\ttgt = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': [\"Clippers\"], \"start_ts\": [datetime(2012, 1, 14, 3, 21, 34)], \"end_ts\": [None], \"is_active\": [True]}) \n\n\tsrc = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': ['Suns']})\n\n\tfinal_df = SCD2().pandas_scd2(src, tgt)\n\n**pandas_scd2 will return a dataframe with the entire new targer**\n \n \n \n\ntgt:\n\n| first_name | last_name | team | start_ts | end_ts | is_active |\n\n|------------|-----------|----------|---------------------|--------|-----------|\n\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |\n\n \n \n \n\nsrc:\n\n \n\n| first_name | last_name | team |\n\n|------------|-----------|----------|\n\n| Chris | Paul | Clippers |\n\n \n \n \n \n\nfinal_df:\n\n \n\n| first_name | last_name | team | start_ts | end_ts | is_active |\n\n|------------|-----------|----------|---------------------|---------------------|-----------|\n\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |\n\n| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |\n\n \n \n \n\n*for working with parquet:*\n\n\tsrc_parquet_path = '~/source.parquet'\n\n\ttgt_parquet_path = '~/target.parquet'\n\n\tSCD2().parquet_scd2(src, tgt)\n \n\n**parquet_scd2 will overide the current target (tgt_parquet_path)**\n\n\n \n\n**src:** pandas dataframe with the source of the SCD\n\n \n\n**tgt:** pandas dataframe with the target of the SCD (target can be empty)\n\n \n\n**cols_to_track:** list of columns to track changes (default is all columns from the source table)\n\n \n\n**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)\n",
"bugtrack_url": null,
"license": "",
"summary": "slowly changing dimension type 2 with pandas or parquet",
"version": "1.0.0",
"split_keywords": [
"scd",
"slowly changing dimension",
"type 2",
"pandas",
"parquet"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9b2346cc159ea064a6f27bf28891c32951e261ac2c3e8b021d19a75ddbc3b9ce",
"md5": "1e458a320dac16f82f298a92474e429e",
"sha256": "e5d3a58990fe8faa840ebaa786be48d50d3163ce9d3d4fffaef53fa5adc172a0"
},
"downloads": -1,
"filename": "scd2-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1e458a320dac16f82f298a92474e429e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 3363,
"upload_time": "2023-01-31T09:35:13",
"upload_time_iso_8601": "2023-01-31T09:35:13.596394Z",
"url": "https://files.pythonhosted.org/packages/9b/23/46cc159ea064a6f27bf28891c32951e261ac2c3e8b021d19a75ddbc3b9ce/scd2-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "64fd8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b",
"md5": "0864b5bf6890255623be88575672e9a2",
"sha256": "4cd133b1fb73852e3e3d58839fa8d2d531a9208ce2776e1232956fc3e1b8e58a"
},
"downloads": -1,
"filename": "scd2-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "0864b5bf6890255623be88575672e9a2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 3188,
"upload_time": "2023-01-31T09:35:15",
"upload_time_iso_8601": "2023-01-31T09:35:15.171066Z",
"url": "https://files.pythonhosted.org/packages/64/fd/8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b/scd2-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-31 09:35:15",
"github": false,
"gitlab": true,
"bitbucket": false,
"gitlab_user": "liranc",
"gitlab_project": "pandas_scd",
"lcname": "scd2"
}