# pandas_scd
executing slowly changing dimension type 2 on pandas dataframes
given pandas df of the source table, and pandas df of the target table, returning pandas df with the entire new target after scd2
## Installation
basic installtion :
pip install pandas-scd2
## Getting started
from pandas_scd import scd2
import pandas as df
tgt = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ["Clippers"], "start_ts": [datetime(2012, 1, 14, 3, 21, 34)], "end_ts": [None], "is_active": [True]})
src = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ['Suns']})
final_df = scd2(src, tgt)
tgt:
| first_name | last_name | team | start_ts | end_ts | is_active |
|------------|-----------|----------|---------------------|--------|-----------|
| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |
src:
| first_name | last_name | team |
|------------|-----------|----------|
| Chris | Paul | Clippers |
final_df:
| first_name | last_name | team | start_ts | end_ts | is_active |
|------------|-----------|----------|---------------------|---------------------|-----------|
| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |
| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |
**src:** pandas dataframe with the source of the SCD
**tgt:** pandas dataframe with the target of the SCD (target can be empty)
**cols_to_track:** list of columns to track changes (default is all columns from the source table)
**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/liranc/pandas_scd",
"name": "pandas-scd2",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "scd,slowly changing dimension,pandas",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/72/8b/7c17627bac1ca20027423edfb23e44d3f70c1a317ab9ebe8688251dc67c7/pandas_scd2-1.0.3.tar.gz",
"platform": null,
"description": "# pandas_scd\n\n \n\nexecuting slowly changing dimension type 2 on pandas dataframes\n\ngiven pandas df of the source table, and pandas df of the target table, returning pandas df with the entire new target after scd2\n\n\n\n## Installation \nbasic installtion :\n \n pip install pandas-scd2 \n\n## Getting started\n \n\n from pandas_scd import scd2\n import pandas as df\n\n tgt = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': [\"Clippers\"], \"start_ts\": [datetime(2012, 1, 14, 3, 21, 34)], \"end_ts\": [None], \"is_active\": [True]})\n\n src = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': ['Suns']})\n\n final_df = scd2(src, tgt)\n\n\n\ntgt:\n \n| first_name | last_name | team | start_ts | end_ts | is_active |\n|------------|-----------|----------|---------------------|--------|-----------|\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |\n\n\n\nsrc:\n\n| first_name | last_name | team |\n|------------|-----------|----------|\n| Chris | Paul | Clippers |\n\n\n\n\nfinal_df:\n\n| first_name | last_name | team | start_ts | end_ts | is_active |\n|------------|-----------|----------|---------------------|---------------------|-----------|\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |\n| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |\n\n\n\n**src:** pandas dataframe with the source of the SCD\n\n**tgt:** pandas dataframe with the target of the SCD (target can be empty)\n\n**cols_to_track:** list of columns to track changes (default is all columns from the source table)\n\n**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)\n\n",
"bugtrack_url": null,
"license": "",
"summary": "slowly changing dimension with pandas",
"version": "1.0.3",
"split_keywords": [
"scd",
"slowly changing dimension",
"pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3d305515e28d27571a72ffd01c850bc0be85dcb71d0fcd73fd423ac4e22cfe80",
"md5": "6641b15c173bb12b349844979444ba46",
"sha256": "ddcff10aeded8c6a37697d9cff8b08f07f0eb6e4fc1061f17794ffa847f1c0f6"
},
"downloads": -1,
"filename": "pandas_scd2-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6641b15c173bb12b349844979444ba46",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 3122,
"upload_time": "2023-01-29T12:40:19",
"upload_time_iso_8601": "2023-01-29T12:40:19.766621Z",
"url": "https://files.pythonhosted.org/packages/3d/30/5515e28d27571a72ffd01c850bc0be85dcb71d0fcd73fd423ac4e22cfe80/pandas_scd2-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "728b7c17627bac1ca20027423edfb23e44d3f70c1a317ab9ebe8688251dc67c7",
"md5": "bcb2456fccfd132072cae2e6891f096c",
"sha256": "48abee966b39b3799a08d381100bac3113b738a0181f37375fcddcbf63b88301"
},
"downloads": -1,
"filename": "pandas_scd2-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "bcb2456fccfd132072cae2e6891f096c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 2917,
"upload_time": "2023-01-29T12:40:21",
"upload_time_iso_8601": "2023-01-29T12:40:21.749317Z",
"url": "https://files.pythonhosted.org/packages/72/8b/7c17627bac1ca20027423edfb23e44d3f70c1a317ab9ebe8688251dc67c7/pandas_scd2-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-29 12:40:21",
"github": false,
"gitlab": true,
"bitbucket": false,
"gitlab_user": "liranc",
"gitlab_project": "pandas_scd",
"lcname": "pandas-scd2"
}