scd2


Namescd2 JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://gitlab.com/liranc/pandas_scd
Summaryslowly changing dimension type 2 with pandas or parquet
upload_time2023-01-31 09:35:15
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords scd slowly changing dimension type 2 pandas parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# pandas_scd

  
executing slowly changing dimension type 2 on pandas dataframes or parquet files

  

**pandas_scd arguments:**

 - **src:** pandas dataframe with the source of the SCD
 - **tgt:** pandas dataframe with the target of the SCD (target can be
   empty)
   
 - **cols_to_track:** list of columns to track changes (default is all   
   columns from the source table)
 - **tz:** pytz time zone to use on start_ts and end_ts, default is None   
   (will use local time)


#####  the return dataframe contain the entire target table with the new changes, ready for insert overwrite of the current target table





**parquet_scd arguments:**

 - **src:** path to the source of the SCD
 - **tgt:** path to the target of the SCD (target can be empty)
- **cols_to_track:** list of columns to track changes (default is all columns from the source table)
- **tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
##### there is no return value, the tgt path that was provided will be overwritten 
  
  

## Installation

    pip install scd2

  

## Getting started

*for working with pandas:*  

	from scd2 import SCD2
	import pandas as df	  

	tgt = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ["Clippers"], "start_ts": [datetime(2012, 1, 14, 3, 21, 34)], "end_ts": [None], "is_active": [True]}) 

	src = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ['Suns']})

	final_df = SCD2().pandas_scd2(src, tgt)

**pandas_scd2 will return a dataframe with the entire new targer**
  
  
  

tgt:

| first_name | last_name | team | start_ts | end_ts | is_active |

|------------|-----------|----------|---------------------|--------|-----------|

| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |

  
  
  

src:

  

| first_name | last_name | team |

|------------|-----------|----------|

| Chris | Paul | Clippers |

  
  
  
  

final_df:

  

| first_name | last_name | team | start_ts | end_ts | is_active |

|------------|-----------|----------|---------------------|---------------------|-----------|

| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |

| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |

  
  
  

*for working with parquet:*

	src_parquet_path = '~/source.parquet'

	tgt_parquet_path = '~/target.parquet'

	SCD2().parquet_scd2(src, tgt)
 

**parquet_scd2 will overide the current target (tgt_parquet_path)**


 

**src:** pandas dataframe with the source of the SCD

  

**tgt:** pandas dataframe with the target of the SCD (target can be empty)

  

**cols_to_track:** list of columns to track changes (default is all columns from the source table)

  

**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/liranc/pandas_scd",
    "name": "scd2",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "scd,slowly changing dimension,type 2,pandas,parquet",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/64/fd/8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b/scd2-1.0.0.tar.gz",
    "platform": null,
    "description": "\n# pandas_scd\n\n  \nexecuting slowly changing dimension type 2 on pandas dataframes or parquet files\n\n  \n\n**pandas_scd arguments:**\n\n - **src:** pandas dataframe with the source of the SCD\n - **tgt:** pandas dataframe with the target of the SCD (target can be\n   empty)\n   \n - **cols_to_track:** list of columns to track changes (default is all   \n   columns from the source table)\n - **tz:** pytz time zone to use on start_ts and end_ts, default is None   \n   (will use local time)\n\n\n#####  the return dataframe contain the entire target table with the new changes, ready for insert overwrite of the current target table\n\n\n\n\n\n**parquet_scd arguments:**\n\n - **src:** path to the source of the SCD\n - **tgt:** path to the target of the SCD (target can be empty)\n- **cols_to_track:** list of columns to track changes (default is all columns from the source table)\n- **tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)\n##### there is no return value, the tgt path that was provided will be overwritten \n  \n  \n\n## Installation\n\n    pip install scd2\n\n  \n\n## Getting started\n\n*for working with pandas:*  \n\n\tfrom scd2 import SCD2\n\timport pandas as df\t  \n\n\ttgt = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': [\"Clippers\"], \"start_ts\": [datetime(2012, 1, 14, 3, 21, 34)], \"end_ts\": [None], \"is_active\": [True]}) \n\n\tsrc = pd.DataFrame.from_dict({'first_name': [\"Chris\"], 'last_name': ['Paul'], 'team': ['Suns']})\n\n\tfinal_df = SCD2().pandas_scd2(src, tgt)\n\n**pandas_scd2 will return a dataframe with the entire new targer**\n  \n  \n  \n\ntgt:\n\n| first_name | last_name | team | start_ts | end_ts | is_active |\n\n|------------|-----------|----------|---------------------|--------|-----------|\n\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |\n\n  \n  \n  \n\nsrc:\n\n  \n\n| first_name | last_name | team |\n\n|------------|-----------|----------|\n\n| Chris | Paul | Clippers |\n\n  \n  \n  \n  \n\nfinal_df:\n\n  \n\n| first_name | last_name | team | start_ts | end_ts | is_active |\n\n|------------|-----------|----------|---------------------|---------------------|-----------|\n\n| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |\n\n| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |\n\n  \n  \n  \n\n*for working with parquet:*\n\n\tsrc_parquet_path = '~/source.parquet'\n\n\ttgt_parquet_path = '~/target.parquet'\n\n\tSCD2().parquet_scd2(src, tgt)\n \n\n**parquet_scd2 will overide the current target (tgt_parquet_path)**\n\n\n \n\n**src:** pandas dataframe with the source of the SCD\n\n  \n\n**tgt:** pandas dataframe with the target of the SCD (target can be empty)\n\n  \n\n**cols_to_track:** list of columns to track changes (default is all columns from the source table)\n\n  \n\n**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "slowly changing dimension type 2 with pandas or parquet",
    "version": "1.0.0",
    "split_keywords": [
        "scd",
        "slowly changing dimension",
        "type 2",
        "pandas",
        "parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b2346cc159ea064a6f27bf28891c32951e261ac2c3e8b021d19a75ddbc3b9ce",
                "md5": "1e458a320dac16f82f298a92474e429e",
                "sha256": "e5d3a58990fe8faa840ebaa786be48d50d3163ce9d3d4fffaef53fa5adc172a0"
            },
            "downloads": -1,
            "filename": "scd2-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1e458a320dac16f82f298a92474e429e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 3363,
            "upload_time": "2023-01-31T09:35:13",
            "upload_time_iso_8601": "2023-01-31T09:35:13.596394Z",
            "url": "https://files.pythonhosted.org/packages/9b/23/46cc159ea064a6f27bf28891c32951e261ac2c3e8b021d19a75ddbc3b9ce/scd2-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64fd8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b",
                "md5": "0864b5bf6890255623be88575672e9a2",
                "sha256": "4cd133b1fb73852e3e3d58839fa8d2d531a9208ce2776e1232956fc3e1b8e58a"
            },
            "downloads": -1,
            "filename": "scd2-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0864b5bf6890255623be88575672e9a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 3188,
            "upload_time": "2023-01-31T09:35:15",
            "upload_time_iso_8601": "2023-01-31T09:35:15.171066Z",
            "url": "https://files.pythonhosted.org/packages/64/fd/8d56e79dc8af624e133adf353385d59178929b26bd324cc1b474413ab36b/scd2-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-31 09:35:15",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "gitlab_user": "liranc",
    "gitlab_project": "pandas_scd",
    "lcname": "scd2"
}
        
Elapsed time: 0.20175s