CADPR


NameCADPR JSON
Version 0.0.33 PyPI version JSON
download
home_page
SummaryStandardize and Automate processes
upload_time2023-09-13 16:26:32
maintainer
docs_urlNone
author
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

# CA

This package was developed informally for the Commercial Analytics Team

Before trying to use this package ensure that you have the proper access (This can be found under the "Usage" Section below)

This is a start to see about developing package to facilitate, standardize, and automate repetitive tasks


# Installation and Setup
<details><summary>See Information about Installation, Setup, and Running</summary>

<details><summary> Dependencies that will automatically be installed if not already satisfied:</summary>

* "wheel",
* "asn1crypto==1.5.1",
* "certifi==2022.12.7",
* "cffi==1.15.1",
* "charset-normalizer==2.1.1",
* "cryptography==39.0.1",
* "databricks==0.2",
* "databricks-sql==1.0.0",
* "databricks-sql-connector==2.2.1",
* "filelock==3.9.0",
* "gitdb==4.0.10",
* "GitPython==3.1.31",
* "greenlet==2.0.2",
* "idna==3.4",
* "jupyter-contrib-core==0.4.2",
* "jupyter-contrib-nbextensions==0.7.0",
* "jupyter-events==0.6.3",
* "jupyter-highlight-selected-word==0.2.0",
* "jupyter-nbextensions-configurator==0.6.1",
* "jupyter-ydoc==0.2.2",
* "jupyter_client==8.0.3",
* "jupyter_core==5.2.0",
* "jupyter_server==2.3.0",
* "jupyter_server_fileid==0.8.0",
* "jupyter_server_terminals==0.4.4",
* "jupyter_server_ydoc==0.6.1",
* "jupyterlab==3.6.1",
* "jupyterlab-pygments==0.2.2",
* "jupyterlab_server==2.19.0",
* "lz4==4.3.2",
* "numpy==1.23.4",
* "oauthlib==3.2.2",
* "oscrypto==1.3.0",
* "pandas==1.5.3",
* "pyarrow==10.0.1",
* "pycparser==2.21",
* "pycryptodomex==3.17",
* "PyJWT==2.6.0",
* "pyOpenSSL==23.0.0",
* "pystache==0.6.0",
* "python-dateutil==2.8.2",
* "pytz==2022.7.1",
* "requests==2.28.2",
* "six==1.16.0",
* "smmap==5.0.0",
* "snowflake-connector-python==3.0.0",
* "snowflake-sqlalchemy==1.4.6",
* "SQLAlchemy==1.4.46",
* "thrift==0.16.0",
* "typing_extensions==4.5.0",
* "urllib3==1.26.14",
* "xcrun==0.4",
* "configparser~=5.3.0"

</details>

## Installing and Setting up a New Environment (if you are new to python start here):

<details><summary>Installation and Setup with a New Environment</summary>

<details><summary>For Mac</summary>

### Note: This assumes that you already have Python 3.11.2 installed

<details><summary> How do I tell which version of Python I have?</summary>

1. Launch the Terminal by typing "Terminal" in the Launchpad search field or Spotlight

2. Enter the following command in the Terminal

```
python3 --version
```
and you should see this:
> Python 3.11.2

</details>

<details> <summary> To Install Python 3.11.2</summary>

1. Go to https://www.python.org/downloads/

2. Click on "Download Python 3.11.2"

3. Open the file and click through the installation steps accepting the defaults

</details>

<details><summary> When running for the first time, open the Terminal and run the following commands where you want the files to be kept:</summary>

```unix
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install NikeCA
```
* After running the command above, restart the terminal and proceed to the "To open Jupyter Notebook after installation (Mac)"

</details></details>

</details>


## Installing without Setting up a New Environment:

<details><summary> pip Install Without Setting up a New Environment</summary>

Run the following to install:

```
$ python pip install NikeCA
```
</details>

## To open Jupyter Notebook after installation (Mac)

<details><summary> Navigate to the installation location in the terminal and run the following:</summary>

```unix
source venv/bin/activate
jupyter notebook
```
</details></details>



# Modules
### NikeCA

A Module for interacting with the Databases and doing Analytics

<details><summary>Import</summary>

Run the following to import:

```
import NikeCA
```
</details>


## Classes:
<details><summary>Snowflake Class</summary>


## Snowflake:
Snowflake(username: str, warehouse: str, role: str, database: str = None, schema: str = None, table: str = None, column_name: str = None, col_and_or: str = 'AND', get_ex_val: bool = None, like_flag: bool = False, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv = None)

<details><summary> Import:</summary>

    from NikeCA import Snowflake

</details>

<details><summary>Parameters:</summary>

* username (str): The Snowflake account username


* warehouse (str): The Snowflake warehouse to use


* role (str): The Snowflake role to use


* database (str, optional, default=None): The Snowflake database to use


* schema (str, optional, default=None): The Snowflake schema to use


* table (str, optional, default=None): The Snowflake table to use


* column_name (str, optional, default=None): The name of the column to search


* col_and_or (str, optional, default=None): The AND/OR operator to use between search criteria


* get_ex_val (bool, optional, default=None): Whether to return exact matches only


* like_flag (bool, optional, default=None): Whether to use the LIKE operator for search criteria

</details>

## Methods:

<details>
<summary> snowflake_pull() - pulls snowflake data
</summary>

### snowflake_pull(
self, query: str, username: str | None = None, warehouse: str | None = None, database: str | None = None, role: str | None = None, sample_table: bool = False, sample_val: bool = False, table_sample: dict | None = None, dtypes_conv: Any = None

) -> DataFrame:

<details><summary>Dependencies:</summary>

* pandas
* snowflake.connector

</details>

<details><summary> Parameters:</summary>

* query (str): SQL query to run on Snowflake 
  * e.g. ```SELECT * FROM {}```


* username (str or None, default=None): Nike Snowflake Username 


* database (str or None, default=None): Name of the Database 


* warehouse (str or None, default=None): Name of the Warehouse 


* role (str or None, default=None): Name of the role under which you are running Snowflake 


* sample_table (bool, optional, Default=False): pull only 500 records from table


* sample_val (bool, optional, default=False)


* table_sample (dictionary, optional, default=None) 


* dtypes_conv (any, default=None)

</details>

#### return: pandas.DataFrame

Run the following in python to generate a sample query:


```
from NikeCA import Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf =  Snowflake(username=username, warehouse=warehouse, role=role, database=database)

query = 'SELECT TOP 2 * FROM  {}'

print(sf.snowflake_pull(query)) 
```

</details>

<details><summary>build_search_query() - Builds and returns a search query based on the specified parameters and instance variables
</summary>

### build_search_query(
self, inp_db: str | None = None, schema: str | None = None, table: str | None = None, column_name=None, like_flag: bool = False, col_and_or: str = 'AND'

) -> str

#### Dependencies - None

<details><summary> Parameters:</summary>

* inp_db (str or None, optional, default=None): The database name to search in. If not specified, search all databases
  

* schema (str or None, optional, default=None): The schema name to search in. If not specified, search all schemas


* table (str or None, optional, default=None): The table name to search for. If not specified, search all tables


* column_name(any, optional, default=None): The column name(s) to search for. If not specified, search all columns
  * If a list is provided, search for any columns that match any of the names in the list


* like_flag (bool, optional, default=False) 
  * If True, uses a SQL LIKE statement to search for columns that contain the specified column name(s)
    ```
    f"AND column_name like '{column_name}' " if like_flag else where_stmt + f"AND column_name = '{column_name}' "
    ```
  * If False, searches for exact matches to the specified column name(s)
    ```
    f"AND column_name like '{column_name}' " if like_flag else where_stmt + f"AND column_name = '{column_name}' "
    ```
    

* col_and_or (str: optional, default='AND'): If specified and column_name is a list, determines whether to search for columns that match all or any of 
the names in the list. Must be one of the following values: 'AND', 'and', 'OR', 'or'.

</details>

#### return: string of the SQL Statement

#### Run the following in python to generate a sample query
```
from NikeCA import Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)

print(sf.build_search_query(column_name='%***%', like_flag=True))
```

</details>


<details><summary>search_schema() - Search snowflake structure for specific schemas/tables/columns </summary>

### search_schema(
self, username=None, warehouse=None, database=None, role=None, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv=None, schema=None, table=None, column_name=None, col_and_or='and', get_ex_val=False, like_flag=False

)

Notes: Will allow to search for tables/cols/etc. even without knowing the db if database=None

<details><summary>Dependencies</summary>

* pandas
* snowflake.connector

</details>
 
<details><summary>Parameters</summary>

* username (str or None, default=None): Nike Snowflake Username 


* database (str or None, default=None): Name of the Database 


* warehouse (str or None, default=None): Name of the Warehouse 


* role (str or None, default=None): Name of the role under which you are running Snowflake 


* sample_table (bool, optional, Default=False): pull only 500 records from table


* sample_val (bool, optional, default=False)


* table_sample (dictionary, optional, default=None) 
  * Notes: The below code is built within the Module

        if table_sample is not None: 
             table_sample = {'db': None, 'schema': None, 'table': None, 'col': None}

* dtypes_conv (any, default=None)


* schema (str, default=None): Snowflake schema name from any database 


* table (str, default=None): Snowflake table name


* column_name (str, default=None): column name to filter


* col_and_or (str, default='and'): either 'and' or 'or'
  * will use in the where statement


* get_ex_val (bool, default=False)


* like_flag (bool, default=False): This signifies whether the "column_name like " or "column_name = "

</details>

#### return: pandas.Dataframe

Run the following in python to generate a sample table:

    from NikeCA import Snowflake
    
    sf = Snowflake(username=<your username>, warehouse=<your warehouse>, 
         role=<your role>, database=<database you would like to search or none>)
    
    sf.column_name = '*****'
    sf.schema = '*****'
    sf.like_flag = True
    
    print(sf.search_schema())

</details>

<details><summary>snowflake_dependencies() - Searches the snowflake database and finds instances where the table is referenced and where the reference is not in the actual creation of the table itself
</summary>


### snowflake_dependencies(

self, tables: str | list, username: str, warehouse: str, role: str, database: str | None = None, schema: str | list | None = None

) -> pandas.DataFrame:

Note: If the table's get_ddl() is empty, it will throw an error - I will fix this soon
 

<details><summary>Dependencies</summary>

* pandas
* snowflake.connector

</details>

<details><summary>Parameters</summary>

* tables (list | str, required): This is a list or string to check for in the database could be a table name or anything contained within the get_ddl() string


* username (str, default=self): Username for Snowflake


* warehouse (str, default=self): Name of the Snowflake warehouse


* role (str, default=self): Role for Snowflake


* database (str, required, default=self): database to search in


* schema (str | list | None, optional, default=self): Snowflake schema to search in
  * notes: filling this in can really speed up the query

</details>

#### return: pandas.Dataframe

Run the following in python to generate a sample table:

    import pandas as pd
    
    username = 
    warehouse =
    role = 
    database = 
    
    sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)
    
    tables = ['***', '***']
     
    schema = '***'

    df = sf.snowflake_dependencies(tables='***', schema=schema)
    
    print(df)

</details>


[//]: # (## optimize_tbl_mem&#40;&#41;:)

[//]: # (build a dictionary containing keys that reference column:datatype conversion with the purpose of optimizing memory )

[//]: # (after pulling data)

[//]: # ()
[//]: # (#### Dependencies)

[//]: # (* time)

[//]: # (* pandas)

[//]: # (* itertools)

[//]: # ()
[//]: # (#### Parameters:)

[//]: # ()
[//]: # (* username &#40;str or None, default=None&#41;: Nike Snowflake Username )

[//]: # (  * e.g. "USERNAME")

[//]: # ()
[//]: # ()
[//]: # (* database &#40;str or None, default=None&#41;: Name of the Database )

[//]: # (  * e.g. "NGP_DA_PROD")

[//]: # ()
[//]: # ()
[//]: # (* warehouse &#40;str or None, default=None&#41;: Name of the Warehouse )

[//]: # (  * e.g. "DA_DSM_SCANALYTICS_REPORTING_PROD")

[//]: # ()
[//]: # ()
[//]: # (* role &#40;str or None, default=None&#41;: Name of the role under which you are running Snowflake )

[//]: # (  * e.g. "DF_*****")

[//]: # ()
[//]: # ()
[//]: # (* schema &#40;str or None, default=None&#41;: Name of the schema that is being optimized)

[//]: # (  * e.g. "POS")

[//]: # ()
[//]: # ()
[//]: # (* table_name &#40;str or None, default=None&#41;: Name of the table to be optimized)

[//]: # (  * e.g. "TO_DATE_AGG_CHANNEL_CY")

[//]: # ()
[//]: # ()
[//]: # (* pull_all_cols &#40;bool, optional, default=True&#41;:)

[//]: # ()
[//]: # ()
[//]: # (* run_debugging &#40;bool, optional, default=False&#41;:)

[//]: # ()
[//]: # (                         )
[//]: # (* query &#40;any, default=None&#41;: query for the pull for the analyzation of the datatypes)

[//]: # ()
[//]: # (#### return )

[//]: # (* dictionary)

</details>

#



#

<details><summary>QA Class</summary>

## QA:

### Import

Run the following to import:

```
from NikeCA import QA
```

<details><summary>Parameters</summary>

* df (DataFrame)


* df2 (DataFrame, optional, default=None)


* ds1_nm (str, optional, default='Source #1')


* ds2_nm (str, optional, default='Source #2')


* case_sens (bool, optional, default=True)


* print_analysis (bool, optional, default=True)


* check_match_by (any, optional, default=None)


* breakdown_grain (any, optional, default=None)

</details>

## Methods

<details><summary>column_gap_analysis() - Compares 2 DataFrames and gives shape, size, matching columns, non-matching columns, coverages, and percentages
</summary>

## column_gap_analysis(
self, df2: pd.DataFrame = None, ds1_nm: str = 'Source #1', ds2_nm: str = 'Source #2', case_sens: bool = True, print_analysis: bool = True, check_match_by=None, breakdown_grain=None, df=None

)

<details><summary>Dependecnies
</summary>

* "pandas==1.5.3",

</details>

<details><summary>Parameters</summary>

* df (DataFrame)


* df2 (DataFrame, optional, default=None)


* ds1_nm (str, optional, default='Source #1')


* ds2_nm (str, optional, default='Source #2')


* case_sens (bool, optional, default=True)


* print_analysis (bool, optional, default=True)


* check_match_by (any, optional, default=None)


* breakdown_grain (any, optional, default=None)

</details>

#### return: pandas.DataFrame

#### Run the following in python to generate a sample query
```
from NikeCA import QA, Snowflake

username = <Your Username>
warehouse = <The Name of the Warehouse>
role = <Name of Your Role>
database = <Name of the Database>

sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)

df = sf.snowflake_pull(sf.build_search_query(column_name='%***%', like_flag=True))[['TABLE_CATALOG', 'TABLE_SCHEMA', 'COLUMN_NAME']]

df2 = sf.snowflake_pull(sf.build_search_query(column_name='%***%', schema='***', like_flag=True))

qa = QA(df=df, df2=df2)
print(qa.column_gap_analysis())
```

</details>

<details><summary>data_prfl_analysis() - Takes a pandas.DataFrame as an input and returns a pandas.DataFrame with certain inormation about the dataframes, such 
as a list of columns and data types, nulls, coverage percentage, unique values, etc.
</summary>

## data_prfl_analysis(
self, df: pd.DataFrame = None, ds_name: str = 'Data Source', sample_vals: int = 5, print_analysis: bool = True, show_pct_fmt: bool = True

)

### Still Under Development

<details><summary>Dependencies</summary>

* "pandas==1.5.3",

</details>

<details><summary>Parameters</summary>

* df (DataFrame): pandas.DataFrame to be analyzed


* ds_name (str, optional, default='Data Source'): name of the data source to be included in the output


* sample_vals (int, optional, default=5)


* print_analysis (bool, optional, default=True)


* show_pct_fmt (bool, optional, default=True): show_percentage_format

</details>

#### return: 
<details><summary>pandas.Dataframe with the following columns: </summary>

* 'DATA_SOURCE'
* 'COLUMN'
* 'COL_DATA_TYPE'
* 'TOTAL_ROWS'
* 'ROW_DTYPE_CT'
* 'PRIMARY_DTYPE_PCT'
* 'COVERAGE_PCT', 'NULL_PCT'
* 'DTYPE_ERROR_FLAG'
* 'NON_NULL_ROWS'
* 'NULL_VALUES'
* 'UNIQUE_VALUES'
* 'COL_VALUE_SAMPLE'
* 'NULL_VALUE_SAMPLE'

</details>

```
    from NikeCA import Snowflake, QA
    
    sf = Snowflake(username=<username>, warehouse=<warehouse>, role=<role>, database=<database>)
    
    df = sf.snowflake_pull("""SELECT TOP 200 * FROM ***""")
    
    print(QA(df).data_prfl_analysis())
```

</details>

<details><summary>get_repo_list() - Get repository list for all repos in organization</summary>

## get_repo_list(

self, git_username: str = None, pat: str | None = None, org_name: str | None = None, repo_list_filename: str | None = None

)

  <details>
    <summary>Dependencies</summary>

* requests==2.28.2
* json5==0.9.10
    
  </details>
  <details>
    <summary>Parameters</summary>
      
  * git_username (str, default=self.__git_username): username for your GitHub account
  * pat (str, default=self.__pat): GitHub personal access token
    <details><summary>Steps to Setup pat (personal access token)</summary>
      
    * Ensure that you are logged in to GitHub
    * go to https://github.com/settings/tokens/new
    * fill out the information (Note, Expiration, select the scopes)
    * Click "Generate Token"
    * Make sure to copy this key (you will only see it once)
    </details>
  * org_name (str, default=self.__org_name): GitHub repository name
  * repo_list_filename (str, default='repolist'): the file path for the repolist

  </details>
  
  #### return: Nothing but it does save a file

</details>

</details>


<br>

<br>
<br>
<br>
<br>

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "CADPR",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/ab/8c/3aa040aa4fc91da6685f0d7ed211310ddc8413dc4a5600e4ca5e6163ca89/CADPR-0.0.33.tar.gz",
    "platform": null,
    "description": "\n\n# CA\n\nThis package was developed informally for the Commercial Analytics Team\n\nBefore trying to use this package ensure that you have the proper access (This can be found under the \"Usage\" Section below)\n\nThis is a start to see about developing package to facilitate, standardize, and automate repetitive tasks\n\n\n# Installation and Setup\n<details><summary>See Information about Installation, Setup, and Running</summary>\n\n<details><summary> Dependencies that will automatically be installed if not already satisfied:</summary>\n\n* \"wheel\",\n* \"asn1crypto==1.5.1\",\n* \"certifi==2022.12.7\",\n* \"cffi==1.15.1\",\n* \"charset-normalizer==2.1.1\",\n* \"cryptography==39.0.1\",\n* \"databricks==0.2\",\n* \"databricks-sql==1.0.0\",\n* \"databricks-sql-connector==2.2.1\",\n* \"filelock==3.9.0\",\n* \"gitdb==4.0.10\",\n* \"GitPython==3.1.31\",\n* \"greenlet==2.0.2\",\n* \"idna==3.4\",\n* \"jupyter-contrib-core==0.4.2\",\n* \"jupyter-contrib-nbextensions==0.7.0\",\n* \"jupyter-events==0.6.3\",\n* \"jupyter-highlight-selected-word==0.2.0\",\n* \"jupyter-nbextensions-configurator==0.6.1\",\n* \"jupyter-ydoc==0.2.2\",\n* \"jupyter_client==8.0.3\",\n* \"jupyter_core==5.2.0\",\n* \"jupyter_server==2.3.0\",\n* \"jupyter_server_fileid==0.8.0\",\n* \"jupyter_server_terminals==0.4.4\",\n* \"jupyter_server_ydoc==0.6.1\",\n* \"jupyterlab==3.6.1\",\n* \"jupyterlab-pygments==0.2.2\",\n* \"jupyterlab_server==2.19.0\",\n* \"lz4==4.3.2\",\n* \"numpy==1.23.4\",\n* \"oauthlib==3.2.2\",\n* \"oscrypto==1.3.0\",\n* \"pandas==1.5.3\",\n* \"pyarrow==10.0.1\",\n* \"pycparser==2.21\",\n* \"pycryptodomex==3.17\",\n* \"PyJWT==2.6.0\",\n* \"pyOpenSSL==23.0.0\",\n* \"pystache==0.6.0\",\n* \"python-dateutil==2.8.2\",\n* \"pytz==2022.7.1\",\n* \"requests==2.28.2\",\n* \"six==1.16.0\",\n* \"smmap==5.0.0\",\n* \"snowflake-connector-python==3.0.0\",\n* \"snowflake-sqlalchemy==1.4.6\",\n* \"SQLAlchemy==1.4.46\",\n* \"thrift==0.16.0\",\n* \"typing_extensions==4.5.0\",\n* \"urllib3==1.26.14\",\n* \"xcrun==0.4\",\n* \"configparser~=5.3.0\"\n\n</details>\n\n## Installing and Setting up a New Environment (if you are new to python start here):\n\n<details><summary>Installation and Setup with a New Environment</summary>\n\n<details><summary>For Mac</summary>\n\n### Note: This assumes that you already have Python 3.11.2 installed\n\n<details><summary> How do I tell which version of Python I have?</summary>\n\n1. Launch the Terminal by typing \"Terminal\" in the Launchpad search field or Spotlight\n\n2. Enter the following command in the Terminal\n\n```\npython3 --version\n```\nand you should see this:\n> Python 3.11.2\n\n</details>\n\n<details> <summary> To Install Python 3.11.2</summary>\n\n1. Go to https://www.python.org/downloads/\n\n2. Click on \"Download Python 3.11.2\"\n\n3. Open the file and click through the installation steps accepting the defaults\n\n</details>\n\n<details><summary> When running for the first time, open the Terminal and run the following commands where you want the files to be kept:</summary>\n\n```unix\npython3 -m venv venv\nsource venv/bin/activate\npip install --upgrade pip\npip install NikeCA\n```\n* After running the command above, restart the terminal and proceed to the \"To open Jupyter Notebook after installation (Mac)\"\n\n</details></details>\n\n</details>\n\n\n## Installing without Setting up a New Environment:\n\n<details><summary> pip Install Without Setting up a New Environment</summary>\n\nRun the following to install:\n\n```\n$ python pip install NikeCA\n```\n</details>\n\n## To open Jupyter Notebook after installation (Mac)\n\n<details><summary> Navigate to the installation location in the terminal and run the following:</summary>\n\n```unix\nsource venv/bin/activate\njupyter notebook\n```\n</details></details>\n\n\n\n# Modules\n### NikeCA\n\nA Module for interacting with the Databases and doing Analytics\n\n<details><summary>Import</summary>\n\nRun the following to import:\n\n```\nimport NikeCA\n```\n</details>\n\n\n## Classes:\n<details><summary>Snowflake Class</summary>\n\n\n## Snowflake:\nSnowflake(username: str, warehouse: str, role: str, database: str = None, schema: str = None, table: str = None, column_name: str = None, col_and_or: str = 'AND', get_ex_val: bool = None, like_flag: bool = False, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv = None)\n\n<details><summary> Import:</summary>\n\n    from NikeCA import Snowflake\n\n</details>\n\n<details><summary>Parameters:</summary>\n\n* username (str): The Snowflake account username\n\n\n* warehouse (str): The Snowflake warehouse to use\n\n\n* role (str): The Snowflake role to use\n\n\n* database (str, optional, default=None): The Snowflake database to use\n\n\n* schema (str, optional, default=None): The Snowflake schema to use\n\n\n* table (str, optional, default=None): The Snowflake table to use\n\n\n* column_name (str, optional, default=None): The name of the column to search\n\n\n* col_and_or (str, optional, default=None): The AND/OR operator to use between search criteria\n\n\n* get_ex_val (bool, optional, default=None): Whether to return exact matches only\n\n\n* like_flag (bool, optional, default=None): Whether to use the LIKE operator for search criteria\n\n</details>\n\n## Methods:\n\n<details>\n<summary> snowflake_pull() - pulls snowflake data\n</summary>\n\n### snowflake_pull(\nself, query: str, username: str | None = None, warehouse: str | None = None, database: str | None = None, role: str | None = None, sample_table: bool = False, sample_val: bool = False, table_sample: dict | None = None, dtypes_conv: Any = None\n\n) -> DataFrame:\n\n<details><summary>Dependencies:</summary>\n\n* pandas\n* snowflake.connector\n\n</details>\n\n<details><summary> Parameters:</summary>\n\n* query (str): SQL query to run on Snowflake \n  * e.g. ```SELECT * FROM {}```\n\n\n* username (str or None, default=None): Nike Snowflake Username \n\n\n* database (str or None, default=None): Name of the Database \n\n\n* warehouse (str or None, default=None): Name of the Warehouse \n\n\n* role (str or None, default=None): Name of the role under which you are running Snowflake \n\n\n* sample_table (bool, optional, Default=False): pull only 500 records from table\n\n\n* sample_val (bool, optional, default=False)\n\n\n* table_sample (dictionary, optional, default=None) \n\n\n* dtypes_conv (any, default=None)\n\n</details>\n\n#### return: pandas.DataFrame\n\nRun the following in python to generate a sample query:\n\n\n```\nfrom NikeCA import Snowflake\n\nusername = <Your Username>\nwarehouse = <The Name of the Warehouse>\nrole = <Name of Your Role>\ndatabase = <Name of the Database>\n\nsf =  Snowflake(username=username, warehouse=warehouse, role=role, database=database)\n\nquery = 'SELECT TOP 2 * FROM  {}'\n\nprint(sf.snowflake_pull(query)) \n```\n\n</details>\n\n<details><summary>build_search_query() - Builds and returns a search query based on the specified parameters and instance variables\n</summary>\n\n### build_search_query(\nself, inp_db: str | None = None, schema: str | None = None, table: str | None = None, column_name=None, like_flag: bool = False, col_and_or: str = 'AND'\n\n) -> str\n\n#### Dependencies - None\n\n<details><summary> Parameters:</summary>\n\n* inp_db (str or None, optional, default=None): The database name to search in. If not specified, search all databases\n  \n\n* schema (str or None, optional, default=None): The schema name to search in. If not specified, search all schemas\n\n\n* table (str or None, optional, default=None): The table name to search for. If not specified, search all tables\n\n\n* column_name(any, optional, default=None): The column name(s) to search for. If not specified, search all columns\n  * If a list is provided, search for any columns that match any of the names in the list\n\n\n* like_flag (bool, optional, default=False) \n  * If True, uses a SQL LIKE statement to search for columns that contain the specified column name(s)\n    ```\n    f\"AND column_name like '{column_name}' \" if like_flag else where_stmt + f\"AND column_name = '{column_name}' \"\n    ```\n  * If False, searches for exact matches to the specified column name(s)\n    ```\n    f\"AND column_name like '{column_name}' \" if like_flag else where_stmt + f\"AND column_name = '{column_name}' \"\n    ```\n    \n\n* col_and_or (str: optional, default='AND'): If specified and column_name is a list, determines whether to search for columns that match all or any of \nthe names in the list. Must be one of the following values: 'AND', 'and', 'OR', 'or'.\n\n</details>\n\n#### return: string of the SQL Statement\n\n#### Run the following in python to generate a sample query\n```\nfrom NikeCA import Snowflake\n\nusername = <Your Username>\nwarehouse = <The Name of the Warehouse>\nrole = <Name of Your Role>\ndatabase = <Name of the Database>\n\nsf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)\n\nprint(sf.build_search_query(column_name='%***%', like_flag=True))\n```\n\n</details>\n\n\n<details><summary>search_schema() - Search snowflake structure for specific schemas/tables/columns </summary>\n\n### search_schema(\nself, username=None, warehouse=None, database=None, role=None, sample_table: bool = False, sample_val: bool = False, table_sample: dict = None, dtypes_conv=None, schema=None, table=None, column_name=None, col_and_or='and', get_ex_val=False, like_flag=False\n\n)\n\nNotes: Will allow to search for tables/cols/etc. even without knowing the db if database=None\n\n<details><summary>Dependencies</summary>\n\n* pandas\n* snowflake.connector\n\n</details>\n \n<details><summary>Parameters</summary>\n\n* username (str or None, default=None): Nike Snowflake Username \n\n\n* database (str or None, default=None): Name of the Database \n\n\n* warehouse (str or None, default=None): Name of the Warehouse \n\n\n* role (str or None, default=None): Name of the role under which you are running Snowflake \n\n\n* sample_table (bool, optional, Default=False): pull only 500 records from table\n\n\n* sample_val (bool, optional, default=False)\n\n\n* table_sample (dictionary, optional, default=None) \n  * Notes: The below code is built within the Module\n\n        if table_sample is not None: \n             table_sample = {'db': None, 'schema': None, 'table': None, 'col': None}\n\n* dtypes_conv (any, default=None)\n\n\n* schema (str, default=None): Snowflake schema name from any database \n\n\n* table (str, default=None): Snowflake table name\n\n\n* column_name (str, default=None): column name to filter\n\n\n* col_and_or (str, default='and'): either 'and' or 'or'\n  * will use in the where statement\n\n\n* get_ex_val (bool, default=False)\n\n\n* like_flag (bool, default=False): This signifies whether the \"column_name like \" or \"column_name = \"\n\n</details>\n\n#### return: pandas.Dataframe\n\nRun the following in python to generate a sample table:\n\n    from NikeCA import Snowflake\n    \n    sf = Snowflake(username=<your username>, warehouse=<your warehouse>, \n         role=<your role>, database=<database you would like to search or none>)\n    \n    sf.column_name = '*****'\n    sf.schema = '*****'\n    sf.like_flag = True\n    \n    print(sf.search_schema())\n\n</details>\n\n<details><summary>snowflake_dependencies() - Searches the snowflake database and finds instances where the table is referenced and where the reference is not in the actual creation of the table itself\n</summary>\n\n\n### snowflake_dependencies(\n\nself, tables: str | list, username: str, warehouse: str, role: str, database: str | None = None, schema: str | list | None = None\n\n) -> pandas.DataFrame:\n\nNote: If the table's get_ddl() is empty, it will throw an error - I will fix this soon\n \n\n<details><summary>Dependencies</summary>\n\n* pandas\n* snowflake.connector\n\n</details>\n\n<details><summary>Parameters</summary>\n\n* tables (list | str, required): This is a list or string to check for in the database could be a table name or anything contained within the get_ddl() string\n\n\n* username (str, default=self): Username for Snowflake\n\n\n* warehouse (str, default=self): Name of the Snowflake warehouse\n\n\n* role (str, default=self): Role for Snowflake\n\n\n* database (str, required, default=self): database to search in\n\n\n* schema (str | list | None, optional, default=self): Snowflake schema to search in\n  * notes: filling this in can really speed up the query\n\n</details>\n\n#### return: pandas.Dataframe\n\nRun the following in python to generate a sample table:\n\n    import pandas as pd\n    \n    username = \n    warehouse =\n    role = \n    database = \n    \n    sf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)\n    \n    tables = ['***', '***']\n     \n    schema = '***'\n\n    df = sf.snowflake_dependencies(tables='***', schema=schema)\n    \n    print(df)\n\n</details>\n\n\n[//]: # (## optimize_tbl_mem&#40;&#41;:)\n\n[//]: # (build a dictionary containing keys that reference column:datatype conversion with the purpose of optimizing memory )\n\n[//]: # (after pulling data)\n\n[//]: # ()\n[//]: # (#### Dependencies)\n\n[//]: # (* time)\n\n[//]: # (* pandas)\n\n[//]: # (* itertools)\n\n[//]: # ()\n[//]: # (#### Parameters:)\n\n[//]: # ()\n[//]: # (* username &#40;str or None, default=None&#41;: Nike Snowflake Username )\n\n[//]: # (  * e.g. \"USERNAME\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* database &#40;str or None, default=None&#41;: Name of the Database )\n\n[//]: # (  * e.g. \"NGP_DA_PROD\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* warehouse &#40;str or None, default=None&#41;: Name of the Warehouse )\n\n[//]: # (  * e.g. \"DA_DSM_SCANALYTICS_REPORTING_PROD\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* role &#40;str or None, default=None&#41;: Name of the role under which you are running Snowflake )\n\n[//]: # (  * e.g. \"DF_*****\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* schema &#40;str or None, default=None&#41;: Name of the schema that is being optimized)\n\n[//]: # (  * e.g. \"POS\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* table_name &#40;str or None, default=None&#41;: Name of the table to be optimized)\n\n[//]: # (  * e.g. \"TO_DATE_AGG_CHANNEL_CY\")\n\n[//]: # ()\n[//]: # ()\n[//]: # (* pull_all_cols &#40;bool, optional, default=True&#41;:)\n\n[//]: # ()\n[//]: # ()\n[//]: # (* run_debugging &#40;bool, optional, default=False&#41;:)\n\n[//]: # ()\n[//]: # (                         )\n[//]: # (* query &#40;any, default=None&#41;: query for the pull for the analyzation of the datatypes)\n\n[//]: # ()\n[//]: # (#### return )\n\n[//]: # (* dictionary)\n\n</details>\n\n#\n\n\n\n#\n\n<details><summary>QA Class</summary>\n\n## QA:\n\n### Import\n\nRun the following to import:\n\n```\nfrom NikeCA import QA\n```\n\n<details><summary>Parameters</summary>\n\n* df (DataFrame)\n\n\n* df2 (DataFrame, optional, default=None)\n\n\n* ds1_nm (str, optional, default='Source #1')\n\n\n* ds2_nm (str, optional, default='Source #2')\n\n\n* case_sens (bool, optional, default=True)\n\n\n* print_analysis (bool, optional, default=True)\n\n\n* check_match_by (any, optional, default=None)\n\n\n* breakdown_grain (any, optional, default=None)\n\n</details>\n\n## Methods\n\n<details><summary>column_gap_analysis() - Compares 2 DataFrames and gives shape, size, matching columns, non-matching columns, coverages, and percentages\n</summary>\n\n## column_gap_analysis(\nself, df2: pd.DataFrame = None, ds1_nm: str = 'Source #1', ds2_nm: str = 'Source #2', case_sens: bool = True, print_analysis: bool = True, check_match_by=None, breakdown_grain=None, df=None\n\n)\n\n<details><summary>Dependecnies\n</summary>\n\n* \"pandas==1.5.3\",\n\n</details>\n\n<details><summary>Parameters</summary>\n\n* df (DataFrame)\n\n\n* df2 (DataFrame, optional, default=None)\n\n\n* ds1_nm (str, optional, default='Source #1')\n\n\n* ds2_nm (str, optional, default='Source #2')\n\n\n* case_sens (bool, optional, default=True)\n\n\n* print_analysis (bool, optional, default=True)\n\n\n* check_match_by (any, optional, default=None)\n\n\n* breakdown_grain (any, optional, default=None)\n\n</details>\n\n#### return: pandas.DataFrame\n\n#### Run the following in python to generate a sample query\n```\nfrom NikeCA import QA, Snowflake\n\nusername = <Your Username>\nwarehouse = <The Name of the Warehouse>\nrole = <Name of Your Role>\ndatabase = <Name of the Database>\n\nsf = Snowflake(username=username, warehouse=warehouse, role=role, database=database)\n\ndf = sf.snowflake_pull(sf.build_search_query(column_name='%***%', like_flag=True))[['TABLE_CATALOG', 'TABLE_SCHEMA', 'COLUMN_NAME']]\n\ndf2 = sf.snowflake_pull(sf.build_search_query(column_name='%***%', schema='***', like_flag=True))\n\nqa = QA(df=df, df2=df2)\nprint(qa.column_gap_analysis())\n```\n\n</details>\n\n<details><summary>data_prfl_analysis() - Takes a pandas.DataFrame as an input and returns a pandas.DataFrame with certain inormation about the dataframes, such \nas a list of columns and data types, nulls, coverage percentage, unique values, etc.\n</summary>\n\n## data_prfl_analysis(\nself, df: pd.DataFrame = None, ds_name: str = 'Data Source', sample_vals: int = 5, print_analysis: bool = True, show_pct_fmt: bool = True\n\n)\n\n### Still Under Development\n\n<details><summary>Dependencies</summary>\n\n* \"pandas==1.5.3\",\n\n</details>\n\n<details><summary>Parameters</summary>\n\n* df (DataFrame): pandas.DataFrame to be analyzed\n\n\n* ds_name (str, optional, default='Data Source'): name of the data source to be included in the output\n\n\n* sample_vals (int, optional, default=5)\n\n\n* print_analysis (bool, optional, default=True)\n\n\n* show_pct_fmt (bool, optional, default=True): show_percentage_format\n\n</details>\n\n#### return: \n<details><summary>pandas.Dataframe with the following columns: </summary>\n\n* 'DATA_SOURCE'\n* 'COLUMN'\n* 'COL_DATA_TYPE'\n* 'TOTAL_ROWS'\n* 'ROW_DTYPE_CT'\n* 'PRIMARY_DTYPE_PCT'\n* 'COVERAGE_PCT', 'NULL_PCT'\n* 'DTYPE_ERROR_FLAG'\n* 'NON_NULL_ROWS'\n* 'NULL_VALUES'\n* 'UNIQUE_VALUES'\n* 'COL_VALUE_SAMPLE'\n* 'NULL_VALUE_SAMPLE'\n\n</details>\n\n```\n    from NikeCA import Snowflake, QA\n    \n    sf = Snowflake(username=<username>, warehouse=<warehouse>, role=<role>, database=<database>)\n    \n    df = sf.snowflake_pull(\"\"\"SELECT TOP 200 * FROM ***\"\"\")\n    \n    print(QA(df).data_prfl_analysis())\n```\n\n</details>\n\n<details><summary>get_repo_list() - Get repository list for all repos in organization</summary>\n\n## get_repo_list(\n\nself, git_username: str = None, pat: str | None = None, org_name: str | None = None, repo_list_filename: str | None = None\n\n)\n\n  <details>\n    <summary>Dependencies</summary>\n\n* requests==2.28.2\n* json5==0.9.10\n    \n  </details>\n  <details>\n    <summary>Parameters</summary>\n      \n  * git_username (str, default=self.__git_username): username for your GitHub account\n  * pat (str, default=self.__pat): GitHub personal access token\n    <details><summary>Steps to Setup pat (personal access token)</summary>\n      \n    * Ensure that you are logged in to GitHub\n    * go to https://github.com/settings/tokens/new\n    * fill out the information (Note, Expiration, select the scopes)\n    * Click \"Generate Token\"\n    * Make sure to copy this key (you will only see it once)\n    </details>\n  * org_name (str, default=self.__org_name): GitHub repository name\n  * repo_list_filename (str, default='repolist'): the file path for the repolist\n\n  </details>\n  \n  #### return: Nothing but it does save a file\n\n</details>\n\n</details>\n\n\n<br>\n\n<br>\n<br>\n<br>\n<br>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Standardize and Automate processes",
    "version": "0.0.33",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "38513494b1089ebea4244ce60a49ca46930ebe154937c33939df123b9efca451",
                "md5": "664dda8abf4b6b471e3c4c5fe5806f28",
                "sha256": "a68635b7cc8c5ac9c3b95714f62aa053928254d6ecd5f9e9c964ae28587602a9"
            },
            "downloads": -1,
            "filename": "CADPR-0.0.33-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "664dda8abf4b6b471e3c4c5fe5806f28",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 142960,
            "upload_time": "2023-09-13T16:26:29",
            "upload_time_iso_8601": "2023-09-13T16:26:29.924244Z",
            "url": "https://files.pythonhosted.org/packages/38/51/3494b1089ebea4244ce60a49ca46930ebe154937c33939df123b9efca451/CADPR-0.0.33-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab8c3aa040aa4fc91da6685f0d7ed211310ddc8413dc4a5600e4ca5e6163ca89",
                "md5": "bae0d14034724968707dc199d5ce6e69",
                "sha256": "a36e2ef06a26a146a637bf3ae7c4cbf9f690202c903bbbb856adb4ea5ba90bf3"
            },
            "downloads": -1,
            "filename": "CADPR-0.0.33.tar.gz",
            "has_sig": false,
            "md5_digest": "bae0d14034724968707dc199d5ce6e69",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 36393,
            "upload_time": "2023-09-13T16:26:32",
            "upload_time_iso_8601": "2023-09-13T16:26:32.197382Z",
            "url": "https://files.pythonhosted.org/packages/ab/8c/3aa040aa4fc91da6685f0d7ed211310ddc8413dc4a5600e4ca5e6163ca89/CADPR-0.0.33.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-13 16:26:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cadpr"
}
        
Elapsed time: 0.12615s