azdsdr


Nameazdsdr JSON
Version 1.230612.2 PyPI version JSON
download
home_pagehttps://github.com/xhinker/azdsdr
SummaryThis package provide functions and tools for accessing data in a easy way.
upload_time2023-06-12 22:01:42
maintainer
docs_urlNone
authorAndrew Zhu
requires_python
licenseApache License
keywords ds data reader
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # An all-in-one data reader and tools in Python - AZDSDR

[![PyPI version](https://badge.fury.io/py/azdsdr.svg)](https://badge.fury.io/py/azdsdr)

- [An all-in-one data reader and tools in Python - AZDSDR](#an-all-in-one-data-reader-and-tools-in-python---azdsdr)
	- [Installation](#installation)
		- [Potential installation errors and solutions](#potential-installation-errors-and-solutions)
	- [Use Kusto Reader](#use-kusto-reader)
		- [Azure CLI Authentication](#azure-cli-authentication)
		- [Run any Kusto query](#run-any-kusto-query)
		- [Show Kusto tables](#show-kusto-tables)
		- [Create an empty Kusto table from a CSV file](#create-an-empty-kusto-table-from-a-csv-file)
		- [Upload data to Kusto](#upload-data-to-kusto)
	- [Use Dremio Reader](#use-dremio-reader)
		- [Step 1. Install Dremio Connector](#step-1-install-dremio-connector)
		- [Step 2. Generate a Personal Access Token(PAT)](#step-2-generate-a-personal-access-tokenpat)
		- [Step 3. Configure driver](#step-3-configure-driver)
		- [Dremio Sample Query](#dremio-sample-query)
	- [Move data with functions from `Pipelines` class](#move-data-with-functions-from-pipelines-class)
		- [Export Kusto data to local csv file](#export-kusto-data-to-local-csv-file)
		- [Move Dremio data to Kusto](#move-dremio-data-to-kusto)
	- [Data Tools](#data-tools)
		- [`display_all` Display all dataframe rows](#display_all-display-all-dataframe-rows)
	- [Thanks](#thanks)
	- [Update Logs](#update-logs)
		- [Jan 24, 2024](#jan-24-2024)
		- [Jan 23, 2023](#jan-23-2023)
		- [Jan 18, 2023](#jan-18-2023)
		- [Jan 17, 2023](#jan-17-2023)
		- [Jan 10, 2023](#jan-10-2023)
		- [Dec 16, 2022](#dec-16-2022)
		- [Dec 10, 2022](#dec-10-2022)
		- [Dec 6, 2022](#dec-6-2022)

This package includes data reader for DS to access data in a easy way. 

Covered data platforms:

* Kusto
* Azure Blob Storage (Samples coming soon)
* Dremio
* Microsoft Cosmos - Not Azure Cosmos DB, the Microsoft Cosmos using Scope, now AKA Azure Data Lake (Samples coming soon) 

May cover in the future:

* Databricks/Spark
* Microsoft Synapse
* Delta Lake
* Postgresql
* Microsoft SQL Server
* SQLite

Besides, the package also include functions from `Pipelines` class to help move data around: 

* Dremio to Kusto
* Kusto to CSV file

## Installation

The module is test and usable for Python 3.10 and Python 3.9. Other versions(Python 3.6+) should also works. 

Use pip to install the package and all of the dependences

```
pip install -U azdsdr
```

The `-U` will help update your old version to the newest

Or, you can clone the repository and copy over the `readers.py` file to your project folder.  

The installation will also install all the dependance packages automatrically.

* pandas
* pyodbc
* azure-cli
* azure-kusto-data
* azure-kusto-ingest
* azure-storage-blob
* matplotlib
* ipython
* ipykernel

If you are working on a new build OS, the all-in-one installation will also save you time from installing individual packages one by one. 

### Potential installation errors and solutions

Most of the time, all dependent packages should be successfully installed without any additional interfere. But you may still see error message based on different OS and Python version. 

1. Need elevated permission

	* Error message:
		```
		Error: Could not install packages due to an OSError: [Erron 13] Permission denied:...
		```
	* Solution:  
  		Start a new Windows terminal window with Administrator permission (Right click icon, and then "Run as administrator")

1. Fail to install `pyodbc`  

	Usually occurs in Linux and MacOS. 

	* Error message  
  
		```
		Building wheel for pyodbc (setup.py) ... error
		```

	* Solution  

		Linux: run this first 

		```bash
		sudo apt-get install unixodbc-dev
		```
		<https://github.com/mkleehammer/pyodbc/issues/276>

		Macos: run this first
		```bash
		brew install unixodbc
		export LDFLAGS="-L/opt/homebrew/Cellar/unixodbc/2.3.9/lib"
		export CPPFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.9/include"
		```


## Use Kusto Reader

### Azure CLI Authentication

Before running the kusto query, please use 

```
az login
```

To login into Azure using AAD authentication. An authentication refresh token is generated by Azure and stored in your local machine. This token will be revoked after **90 days of inactivity**. 

For More details, read [Sign in with Azure CLI](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli).

After successufuly authenticated with AAD, you should be able to run the following code without any pop up auth request. The Kusto Reader is test in Windows 10, also works in Linux and Mac. 

### Run any Kusto query

```python 
from azdsdr.readers import KustoReader

cluster = "https://help.kusto.windows.net"
db      = "Samples"
kr      = KustoReader(cluster=cluster,db=db)

kql     = "StormEvents | take 10"
r       = kr.run_kql(kql)
```

The function `run_kql` will return a Pandas Dataframe object hold by `r`. The `kr` object will be reused in the following samples.

Use `run_kql_all` to output multiple result set. 

```python
kql = '''
StormEvents 
| take 10
;
StormEvents 
| summarize count()
'''
rs = kr.run_kql_all(kql=kql)
for r in rs:
    display(r)
```

### Show Kusto tables

List all tables:

```python
kr.list_tables()
```
![](README/2022-11-09-23-03-51.png)

List tables with folder keyword: 

```python
kr.list_tables(folder_name='Covid19')
```
![](README/2022-11-09-23-06-22.png)


### Create an empty Kusto table from a CSV file

This function can be used before uploading CSV data to Kusto table. Instead of manually creating a Kusto table from CSV schema, use this function to create a empty Kusto table based on CSV file automatically. 

Besides, you can also specify the table's folder name. 

```python
kusto_table_name  = 'target_kusto_table'
folder_name       = 'target_kusto_folder'
csv_file_name     = 'local_csv_path'
kr.create_table_from_csv (
    kusto_table_name    = kusto_table_name
    ,csv_file_path      = csv_file_name
    ,kusto_folder       = folder_name
)
```

### Upload data to Kusto

Before uploading your data to Kusto, please make sure you have the right table created to hold the data. Ideally, you can use the above `create_table_from_csv` to create an empty table for you. 

To enable the data ingestion(upload), you should also initialize the KustoReader object with an additional `ingest_cluster_str` parameter. Here is a sample, you should ask your admin or doc to find out the ingestion cluster url. 

```python
cluster         = "https://help.kusto.windows.net"
ingest_cluster  = "https://help-ingest.kusto.windows.net"
db              = "Samples"
kr              = KustoReader(cluster=cluster,db=db,ingest_cluster_str=ingest_cluster)
```

Note that you will need to create a empty table with aligned table schema to hold the data. 

You can also save the dataframe object `your_df_data` as CSV file first, and create a empty table from the csv file. 

```python
your_df_data.to_csv('temp.csv',index=False)

target_kusto_table  =  'upload_df_to_kusto_test'
kr.create_table_from_csv(
    kusto_table_name = target_kusto_table
    ,kusto_folder = 'test'
    ,csv_file_path = 'temp.csv'
)
print('create empty table done')
```

Then upload Pandas Dataframe to Kusto:

```python
target_kusto_table  = 'kusto_table_name'
df_data             = your_df_data
kr.upload_df_to_kusto(
    target_table_name = target_kusto_table
    ,df_data          = df_data
)
kr.check_table_data(target_table_name=target_kusto_table)
```

Upload CSV file to Kusto:

```python
target_kusto_table  = 'kusto_table_name'
csv_path            = 'csv_file.csv'
kr.upload_csv_to_kusto(
    target_table_name = target_kusto_table
    ,csv_path         = csv_path
)
```

Upload Azure Blob CSV file to Kusto, this is the best and fast way to upload massive csv data to Kusto table. 

```python
target_kusto_table  = 'kusto_table_name'
blob_sas_url = 'the sas url you generate from Azure portal or Azure Storage Explorer, or azdsdr'
kr.upload_csv_from_blob (
    target_table_name   = kusto_table_name
    ,blob_sas_url       = blob_sas_url
)
```

I will cover how to generate `blob_sas_url` in the Azure Blob Reader section. [TODO]

## Use Dremio Reader

### Step 1. Install Dremio Connector

You will need to install the Dremio ODBC driver first to use `DremioReader` from this package. 

**For Windows user**

Please download the [dremio-connector](https://github.com/xhinker/azdsdr/tree/main/drivers) file from the drivers folder. 


### Step 2. Generate a [Personal Access Token(PAT)](https://docs.dremio.com/cloud/security/authentication/personal-access-token/#creating-a-token)

- Recommend storing this personal access token in a safe location, such as a user environment variable on your local machine.  
- Start Menu -> “Edit Environment variables For Your Account”.  
- Click “New” under environment variables.  
- Enter a new variable with name “DREMIO_TOKEN” and set the value to the PAT you generated earlier.  

Note: you will have to log out your Windows account and log in again to take the new env variable take effort.

### Step 3. Configure driver
- Go to Start Menu -> “ODBC Data Sources (64-bit)”.
- Under User DSN, click “Add”.
- Add Dremio Connector.
- Configure as follows:
  - set `Data Source Name` as **Dremio Connector**. 
  - with your own **user@example.com** as the username.
  - Do remember to replace the dremio host with your own host string. 

![](README/2022-12-10-13-27-32.png)

- Click Ok/Save

**For Linux and Mac User**

You can download the driver from [Dremio's ODBC Driver](https://www.dremio.com/drivers/odbc/) page. It should be working in theory, haven't been test yet. 

### Dremio Sample Query

```python
from azdsdr.readers import DremioReader
import os

username    = "name@host.com"
#token       = "token string"
token       = os.environ.get("DREMIO_TOKEN") 
dr          = DremioReader(username=username,token=token)

sql = '''
select 
    * 
from 
    [workspace].[folder].[tablename]
limit 10
'''
r = dr.run_sql(sql)
```

## Move data with functions from `Pipelines` class

### Export Kusto data to local csv file

[TODO]

When the export data is very large like exceed 1 billion rows, kusto will export data to several csv files. this function will automatically combine the data to one single CSV file in destination folder.

### Move Dremio data to Kusto 

[TODO]

## Data Tools

### `display_all` Display all dataframe rows

The IPython's `display` can display only limited rows of data. This tool can display **all** or **specified rows** of data. 

```python
from azdsdr.tools import pd_tools
display_all = pd_tools().display_all

#...prepare pd data

# display all 
display_all(pd_data)

# display top 20 rows
display_all(pd_data,top=20)
```

## Thanks

The Dremio ODBC Reader solution is origin from [KC Munnings](https://github.com/kcm117). Glory and credits belong to KC. 

--- 

## Update Logs

### Jan 24, 2024

* Add `bar1_chart` in `vis_tools`, so that you can plot bar chart using `vis_tools` class.

### Jan 23, 2023

* Add `Grid` and `XY Axes Lable` option for 1 line and 2 lines chart.

### Jan 18, 2023

* Add walk-around and solutions to potential installation errors. 

### Jan 17, 2023

* Add `show_data_label` option for `vis_tools`'s `line1_chart` function. 
If specify the `show_data_label=True`, the chart will show each data point's value. 

### Jan 10, 2023

* Add guid to the temp cosmos script file and temp middle stream file to avoid temp files collision. 

### Dec 16, 2022

* Add function `get_table_schema` for `KustoReader`
* Add function `get_table_folder` for `KustoReader`

### Dec 10, 2022

* Update Dremio Reader configuration document and screenshot.

### Dec 6, 2022

* Add function `download_file_list` of class `AzureBlobReader` to download a list of CSV file with the same schema and merge to one target CSV file.
* Add function `delete_blob_files` of class `AzureBlobReader` to delete a list of blob files.
* Add [usage sample code](https://github.com/xhinker/azdsdr/tree/main/usage_examples). 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xhinker/azdsdr",
    "name": "azdsdr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "DS Data Reader",
    "author": "Andrew Zhu",
    "author_email": "xhinker@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/67/8b/8022d001ddec2df84e08b8f50d86b622a37e2a6673491a0f894c1e1db8c1/azdsdr-1.230612.2.tar.gz",
    "platform": null,
    "description": "# An all-in-one data reader and tools in Python - AZDSDR\r\n\r\n[![PyPI version](https://badge.fury.io/py/azdsdr.svg)](https://badge.fury.io/py/azdsdr)\r\n\r\n- [An all-in-one data reader and tools in Python - AZDSDR](#an-all-in-one-data-reader-and-tools-in-python---azdsdr)\r\n\t- [Installation](#installation)\r\n\t\t- [Potential installation errors and solutions](#potential-installation-errors-and-solutions)\r\n\t- [Use Kusto Reader](#use-kusto-reader)\r\n\t\t- [Azure CLI Authentication](#azure-cli-authentication)\r\n\t\t- [Run any Kusto query](#run-any-kusto-query)\r\n\t\t- [Show Kusto tables](#show-kusto-tables)\r\n\t\t- [Create an empty Kusto table from a CSV file](#create-an-empty-kusto-table-from-a-csv-file)\r\n\t\t- [Upload data to Kusto](#upload-data-to-kusto)\r\n\t- [Use Dremio Reader](#use-dremio-reader)\r\n\t\t- [Step 1. Install Dremio Connector](#step-1-install-dremio-connector)\r\n\t\t- [Step 2. Generate a Personal Access Token(PAT)](#step-2-generate-a-personal-access-tokenpat)\r\n\t\t- [Step 3. Configure driver](#step-3-configure-driver)\r\n\t\t- [Dremio Sample Query](#dremio-sample-query)\r\n\t- [Move data with functions from `Pipelines` class](#move-data-with-functions-from-pipelines-class)\r\n\t\t- [Export Kusto data to local csv file](#export-kusto-data-to-local-csv-file)\r\n\t\t- [Move Dremio data to Kusto](#move-dremio-data-to-kusto)\r\n\t- [Data Tools](#data-tools)\r\n\t\t- [`display_all` Display all dataframe rows](#display_all-display-all-dataframe-rows)\r\n\t- [Thanks](#thanks)\r\n\t- [Update Logs](#update-logs)\r\n\t\t- [Jan 24, 2024](#jan-24-2024)\r\n\t\t- [Jan 23, 2023](#jan-23-2023)\r\n\t\t- [Jan 18, 2023](#jan-18-2023)\r\n\t\t- [Jan 17, 2023](#jan-17-2023)\r\n\t\t- [Jan 10, 2023](#jan-10-2023)\r\n\t\t- [Dec 16, 2022](#dec-16-2022)\r\n\t\t- [Dec 10, 2022](#dec-10-2022)\r\n\t\t- [Dec 6, 2022](#dec-6-2022)\r\n\r\nThis package includes data reader for DS to access data in a easy way. \r\n\r\nCovered data platforms:\r\n\r\n* Kusto\r\n* Azure Blob Storage (Samples coming soon)\r\n* Dremio\r\n* Microsoft Cosmos - Not Azure Cosmos DB, the Microsoft Cosmos using Scope, now AKA Azure Data Lake (Samples coming soon) \r\n\r\nMay cover in the future:\r\n\r\n* Databricks/Spark\r\n* Microsoft Synapse\r\n* Delta Lake\r\n* Postgresql\r\n* Microsoft SQL Server\r\n* SQLite\r\n\r\nBesides, the package also include functions from `Pipelines` class to help move data around: \r\n\r\n* Dremio to Kusto\r\n* Kusto to CSV file\r\n\r\n## Installation\r\n\r\nThe module is test and usable for Python 3.10 and Python 3.9. Other versions(Python 3.6+) should also works. \r\n\r\nUse pip to install the package and all of the dependences\r\n\r\n```\r\npip install -U azdsdr\r\n```\r\n\r\nThe `-U` will help update your old version to the newest\r\n\r\nOr, you can clone the repository and copy over the `readers.py` file to your project folder.  \r\n\r\nThe installation will also install all the dependance packages automatrically.\r\n\r\n* pandas\r\n* pyodbc\r\n* azure-cli\r\n* azure-kusto-data\r\n* azure-kusto-ingest\r\n* azure-storage-blob\r\n* matplotlib\r\n* ipython\r\n* ipykernel\r\n\r\nIf you are working on a new build OS, the all-in-one installation will also save you time from installing individual packages one by one. \r\n\r\n### Potential installation errors and solutions\r\n\r\nMost of the time, all dependent packages should be successfully installed without any additional interfere. But you may still see error message based on different OS and Python version. \r\n\r\n1. Need elevated permission\r\n\r\n\t* Error message:\r\n\t\t```\r\n\t\tError: Could not install packages due to an OSError: [Erron 13] Permission denied:...\r\n\t\t```\r\n\t* Solution:  \r\n  \t\tStart a new Windows terminal window with Administrator permission (Right click icon, and then \"Run as administrator\")\r\n\r\n1. Fail to install `pyodbc`  \r\n\r\n\tUsually occurs in Linux and MacOS. \r\n\r\n\t* Error message  \r\n  \r\n\t\t```\r\n\t\tBuilding wheel for pyodbc (setup.py) ... error\r\n\t\t```\r\n\r\n\t* Solution  \r\n\r\n\t\tLinux: run this first \r\n\r\n\t\t```bash\r\n\t\tsudo apt-get install unixodbc-dev\r\n\t\t```\r\n\t\t<https://github.com/mkleehammer/pyodbc/issues/276>\r\n\r\n\t\tMacos: run this first\r\n\t\t```bash\r\n\t\tbrew install unixodbc\r\n\t\texport LDFLAGS=\"-L/opt/homebrew/Cellar/unixodbc/2.3.9/lib\"\r\n\t\texport CPPFLAGS=\"-I/opt/homebrew/Cellar/unixodbc/2.3.9/include\"\r\n\t\t```\r\n\r\n\r\n## Use Kusto Reader\r\n\r\n### Azure CLI Authentication\r\n\r\nBefore running the kusto query, please use \r\n\r\n```\r\naz login\r\n```\r\n\r\nTo login into Azure using AAD authentication. An authentication refresh token is generated by Azure and stored in your local machine. This token will be revoked after **90 days of inactivity**. \r\n\r\nFor More details, read [Sign in with Azure CLI](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli).\r\n\r\nAfter successufuly authenticated with AAD, you should be able to run the following code without any pop up auth request. The Kusto Reader is test in Windows 10, also works in Linux and Mac. \r\n\r\n### Run any Kusto query\r\n\r\n```python \r\nfrom azdsdr.readers import KustoReader\r\n\r\ncluster = \"https://help.kusto.windows.net\"\r\ndb      = \"Samples\"\r\nkr      = KustoReader(cluster=cluster,db=db)\r\n\r\nkql     = \"StormEvents | take 10\"\r\nr       = kr.run_kql(kql)\r\n```\r\n\r\nThe function `run_kql` will return a Pandas Dataframe object hold by `r`. The `kr` object will be reused in the following samples.\r\n\r\nUse `run_kql_all` to output multiple result set. \r\n\r\n```python\r\nkql = '''\r\nStormEvents \r\n| take 10\r\n;\r\nStormEvents \r\n| summarize count()\r\n'''\r\nrs = kr.run_kql_all(kql=kql)\r\nfor r in rs:\r\n    display(r)\r\n```\r\n\r\n### Show Kusto tables\r\n\r\nList all tables:\r\n\r\n```python\r\nkr.list_tables()\r\n```\r\n![](README/2022-11-09-23-03-51.png)\r\n\r\nList tables with folder keyword: \r\n\r\n```python\r\nkr.list_tables(folder_name='Covid19')\r\n```\r\n![](README/2022-11-09-23-06-22.png)\r\n\r\n\r\n### Create an empty Kusto table from a CSV file\r\n\r\nThis function can be used before uploading CSV data to Kusto table. Instead of manually creating a Kusto table from CSV schema, use this function to create a empty Kusto table based on CSV file automatically. \r\n\r\nBesides, you can also specify the table's folder name. \r\n\r\n```python\r\nkusto_table_name  = 'target_kusto_table'\r\nfolder_name       = 'target_kusto_folder'\r\ncsv_file_name     = 'local_csv_path'\r\nkr.create_table_from_csv (\r\n    kusto_table_name    = kusto_table_name\r\n    ,csv_file_path      = csv_file_name\r\n    ,kusto_folder       = folder_name\r\n)\r\n```\r\n\r\n### Upload data to Kusto\r\n\r\nBefore uploading your data to Kusto, please make sure you have the right table created to hold the data. Ideally, you can use the above `create_table_from_csv` to create an empty table for you. \r\n\r\nTo enable the data ingestion(upload), you should also initialize the KustoReader object with an additional `ingest_cluster_str` parameter. Here is a sample, you should ask your admin or doc to find out the ingestion cluster url. \r\n\r\n```python\r\ncluster         = \"https://help.kusto.windows.net\"\r\ningest_cluster  = \"https://help-ingest.kusto.windows.net\"\r\ndb              = \"Samples\"\r\nkr              = KustoReader(cluster=cluster,db=db,ingest_cluster_str=ingest_cluster)\r\n```\r\n\r\nNote that you will need to create a empty table with aligned table schema to hold the data. \r\n\r\nYou can also save the dataframe object `your_df_data` as CSV file first, and create a empty table from the csv file. \r\n\r\n```python\r\nyour_df_data.to_csv('temp.csv',index=False)\r\n\r\ntarget_kusto_table  =  'upload_df_to_kusto_test'\r\nkr.create_table_from_csv(\r\n    kusto_table_name = target_kusto_table\r\n    ,kusto_folder = 'test'\r\n    ,csv_file_path = 'temp.csv'\r\n)\r\nprint('create empty table done')\r\n```\r\n\r\nThen upload Pandas Dataframe to Kusto:\r\n\r\n```python\r\ntarget_kusto_table  = 'kusto_table_name'\r\ndf_data             = your_df_data\r\nkr.upload_df_to_kusto(\r\n    target_table_name = target_kusto_table\r\n    ,df_data          = df_data\r\n)\r\nkr.check_table_data(target_table_name=target_kusto_table)\r\n```\r\n\r\nUpload CSV file to Kusto:\r\n\r\n```python\r\ntarget_kusto_table  = 'kusto_table_name'\r\ncsv_path            = 'csv_file.csv'\r\nkr.upload_csv_to_kusto(\r\n    target_table_name = target_kusto_table\r\n    ,csv_path         = csv_path\r\n)\r\n```\r\n\r\nUpload Azure Blob CSV file to Kusto, this is the best and fast way to upload massive csv data to Kusto table. \r\n\r\n```python\r\ntarget_kusto_table  = 'kusto_table_name'\r\nblob_sas_url = 'the sas url you generate from Azure portal or Azure Storage Explorer, or azdsdr'\r\nkr.upload_csv_from_blob (\r\n    target_table_name   = kusto_table_name\r\n    ,blob_sas_url       = blob_sas_url\r\n)\r\n```\r\n\r\nI will cover how to generate `blob_sas_url` in the Azure Blob Reader section. [TODO]\r\n\r\n## Use Dremio Reader\r\n\r\n### Step 1. Install Dremio Connector\r\n\r\nYou will need to install the Dremio ODBC driver first to use `DremioReader` from this package. \r\n\r\n**For Windows user**\r\n\r\nPlease download the [dremio-connector](https://github.com/xhinker/azdsdr/tree/main/drivers) file from the drivers folder. \r\n\r\n\r\n### Step 2. Generate a [Personal Access Token(PAT)](https://docs.dremio.com/cloud/security/authentication/personal-access-token/#creating-a-token)\r\n\r\n- Recommend storing this personal access token in a safe location, such as a user environment variable on your local machine.  \r\n- Start Menu -> \u201cEdit Environment variables For Your Account\u201d.  \r\n- Click \u201cNew\u201d under environment variables.  \r\n- Enter a new variable with name \u201cDREMIO_TOKEN\u201d and set the value to the PAT you generated earlier.  \r\n\r\nNote: you will have to log out your Windows account and log in again to take the new env variable take effort.\r\n\r\n### Step 3. Configure driver\r\n- Go to Start Menu -> \u201cODBC Data Sources (64-bit)\u201d.\r\n- Under User DSN, click \u201cAdd\u201d.\r\n- Add Dremio Connector.\r\n- Configure as follows:\r\n  - set `Data Source Name` as **Dremio Connector**. \r\n  - with your own **user@example.com** as the username.\r\n  - Do remember to replace the dremio host with your own host string. \r\n\r\n![](README/2022-12-10-13-27-32.png)\r\n\r\n- Click Ok/Save\r\n\r\n**For Linux and Mac User**\r\n\r\nYou can download the driver from [Dremio's ODBC Driver](https://www.dremio.com/drivers/odbc/) page. It should be working in theory, haven't been test yet. \r\n\r\n### Dremio Sample Query\r\n\r\n```python\r\nfrom azdsdr.readers import DremioReader\r\nimport os\r\n\r\nusername    = \"name@host.com\"\r\n#token       = \"token string\"\r\ntoken       = os.environ.get(\"DREMIO_TOKEN\") \r\ndr          = DremioReader(username=username,token=token)\r\n\r\nsql = '''\r\nselect \r\n    * \r\nfrom \r\n    [workspace].[folder].[tablename]\r\nlimit 10\r\n'''\r\nr = dr.run_sql(sql)\r\n```\r\n\r\n## Move data with functions from `Pipelines` class\r\n\r\n### Export Kusto data to local csv file\r\n\r\n[TODO]\r\n\r\nWhen the export data is very large like exceed 1 billion rows, kusto will export data to several csv files. this function will automatically combine the data to one single CSV file in destination folder.\r\n\r\n### Move Dremio data to Kusto \r\n\r\n[TODO]\r\n\r\n## Data Tools\r\n\r\n### `display_all` Display all dataframe rows\r\n\r\nThe IPython's `display` can display only limited rows of data. This tool can display **all** or **specified rows** of data. \r\n\r\n```python\r\nfrom azdsdr.tools import pd_tools\r\ndisplay_all = pd_tools().display_all\r\n\r\n#...prepare pd data\r\n\r\n# display all \r\ndisplay_all(pd_data)\r\n\r\n# display top 20 rows\r\ndisplay_all(pd_data,top=20)\r\n```\r\n\r\n## Thanks\r\n\r\nThe Dremio ODBC Reader solution is origin from [KC Munnings](https://github.com/kcm117). Glory and credits belong to KC. \r\n\r\n--- \r\n\r\n## Update Logs\r\n\r\n### Jan 24, 2024\r\n\r\n* Add `bar1_chart` in `vis_tools`, so that you can plot bar chart using `vis_tools` class.\r\n\r\n### Jan 23, 2023\r\n\r\n* Add `Grid` and `XY Axes Lable` option for 1 line and 2 lines chart.\r\n\r\n### Jan 18, 2023\r\n\r\n* Add walk-around and solutions to potential installation errors. \r\n\r\n### Jan 17, 2023\r\n\r\n* Add `show_data_label` option for `vis_tools`'s `line1_chart` function. \r\nIf specify the `show_data_label=True`, the chart will show each data point's value. \r\n\r\n### Jan 10, 2023\r\n\r\n* Add guid to the temp cosmos script file and temp middle stream file to avoid temp files collision. \r\n\r\n### Dec 16, 2022\r\n\r\n* Add function `get_table_schema` for `KustoReader`\r\n* Add function `get_table_folder` for `KustoReader`\r\n\r\n### Dec 10, 2022\r\n\r\n* Update Dremio Reader configuration document and screenshot.\r\n\r\n### Dec 6, 2022\r\n\r\n* Add function `download_file_list` of class `AzureBlobReader` to download a list of CSV file with the same schema and merge to one target CSV file.\r\n* Add function `delete_blob_files` of class `AzureBlobReader` to delete a list of blob files.\r\n* Add [usage sample code](https://github.com/xhinker/azdsdr/tree/main/usage_examples). \r\n",
    "bugtrack_url": null,
    "license": "Apache License",
    "summary": "This package provide functions and tools for accessing data in a easy way.",
    "version": "1.230612.2",
    "project_urls": {
        "Homepage": "https://github.com/xhinker/azdsdr"
    },
    "split_keywords": [
        "ds",
        "data",
        "reader"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "678b8022d001ddec2df84e08b8f50d86b622a37e2a6673491a0f894c1e1db8c1",
                "md5": "c0cf45c723c98824768be9435333c74c",
                "sha256": "235a8babddb0acc4ac4e4394cba61bc41754ff1bff087949daa8f7ea83168128"
            },
            "downloads": -1,
            "filename": "azdsdr-1.230612.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c0cf45c723c98824768be9435333c74c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 25749,
            "upload_time": "2023-06-12T22:01:42",
            "upload_time_iso_8601": "2023-06-12T22:01:42.435356Z",
            "url": "https://files.pythonhosted.org/packages/67/8b/8022d001ddec2df84e08b8f50d86b622a37e2a6673491a0f894c1e1db8c1/azdsdr-1.230612.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-12 22:01:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xhinker",
    "github_project": "azdsdr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "azdsdr"
}
        
Elapsed time: 0.07748s