# mlplatformutils
<br />
**mlplatformutils package for observability and ML Pipeline Processing** <br />
<br />
**This framework supports Azure Machine Learning training Pipeline supporting across computes such as Azure Synapse Spark, Virtual Machines Clusters, Azure Kubernetes Cluster, Azure Databricks. It supports reading/writing data from Azure Data Lake Gen2 in parquet and DELTA format, Azure Data Explorer (Kusto), Azure Sql DB instnces. The framework suports Python and Spark scalably. Writes with Spark with capabilties such a dynamic partitition overwrites, repartitioning are fully supported. In operating data reads and writes from such sources, The framework integrates built-in lineage framework providing column level lineage across the systems on a scalable Graph leveraging Azure Cosmos Gremlin Graph DB service. This enables a robust upstream dependency tracking and proactive alerting & eventing. All operations are suported over Service Principal (Client Id, Client Secrets) for applications and processing. The package also provides creating and managing computes, PIP dependecies for Azure Machine Learning Workspace and the training definitions.**
## Description
<br />
**app_insights_logger** - Contains **telemetrylogger** Class with Functions to Manage and Log Telemetry into Azure Application Insights <br />
<br />
* trackEvent
* trackTrace
* trackException
* logEvent
* gather_event_details
<br />
**lineagegraph** - Contains **LineageGraph** Class with functions to manage Graph on Azure Cosmos DB enabled with Gremlin <br />
<br />
* add_vertex
* get_vertices
* is_vertex
* update_vertex
* insert_edges
* drop_vertex
* drop_edge
* query_graph
* update_lineage_graph
* connect_lineage_graph
**platformutils** - Contains platform utility functions to check, install depedencies, check Azure ML Compute
* is_package_installed
* install_pip
* get_environment
* set_environment
* assert_amlcompute
* read_setup_ini
**sparkutils** - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2)while ensuring integrated Lineage Graph Logging.
* read_from_adls_gen2
* write_to_adls_gen2
* read_from_kusto
* read_from_azsql
**sparkcoreutils** - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2) **without** integrated Lineage Graph Logging.
* read_from_adls_gen2
* write_to_adls_gen2
* read_from_kusto
* read_from_azsql
**pandasutils** - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark while ensuring integrated Lineage Graph Logging.
* read_from_delta_as_pandas
* read_parquet_file_from_adlsgen2_as_pandas
* read_parquet_directory_from_adlsgen2_as_pandas
* write_pandas_as_parquet_file_to_adlsgen2
**pandascoreutils** - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark **without** integrated Lineage Graph Logging.
* read_from_delta_as_pandas
* read_parquet_file_from_adlsgen2_as_pandas
* read_parquet_directory_from_adlsgen2_as_pandas
* write_pandas_as_parquet_file_to_adlsgen2
**freshnessutils** - Contains functions to add freshness details into Azure Cosmos (NoSQL) document db. This helps with the details on the freshness metrics on evaluating the SLA, and downstream processing. It captures and provides details on model, training dataset freshness for the most recent and historical processing.
* add_freshness
* upsert_freshness
* query_freshness
### Examples
<br />
**from mlplatformutils.core.platformutils import is_package_installed** <br />
**print(is_package_installed("pandas"))** <br />
**from mlplatformutils.core.app_insights_logger import telemetrylogger** <br />
**from mlplatformutils.core.lineagegraph import LineageGraph** <br />
**from mlplatformutils.core.sparkutils import write_to_adls_gen2, read_from_adls_gen2** <br />
**from mlplatformutils.core.pandasutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas** <br />
**from mlplatformutils.core.sparkcoreutils import write_to_adls_gen2, read_from_adls_gen2** <br />
**from mlplatformutils.core.pandascoreutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas** <br />
**from mlplatformutils.core.freshnessutils import add_freshness, upsert_freshness, query_freshness** <br />
**import mlplatformutils.core.version as vr** <br />
**print(vr.\_\_version\_\_)** <br />
### Notes
<br />
When Running this Lineage Package from Jupyter Nootebook, the below 3 Lines Help overcome JupyterNotebook **RuntimeError: Cannot run the event loop while another loop is running** <br />
**import asyncio** <br />
**import nest_asyncio** <br />
**nest_asyncio.apply()** <br />
## Structure
<br />
.<br />
|-- LICENSE.txt<br />
|-- README.rst<br />
|-- setup.cfg<br />
|-- setup.py<br />
|-- src<br />
| |-- mlplatformutils<br />
| | |-- __init__.py<br />
| | |-- core<br />
| | |-- |-- __init__.py<br />
| | |-- |-- sparkcoreutils.py<br />
| | |-- |-- sparkutils.py<br />
| | |-- |-- platformutils.py<br />
| | |-- |-- pandascoreutils.py<br />
| | |-- |-- pandasutils.py<br />
| | |-- |-- lineagegraph.py<br />
| | |-- |-- freshnessutils.py<br />
| | |-- |-- app_insights_logger.py<br />
|-- tests<br />
| |-- __init__.py<br />
| |-- core<br />
| |-- |--__init__.py<br />
| |-- |-- test_sparkcoreutils.py<br />
| |-- |-- test_sparkutils.py<br />
| |-- |-- test_platformutils.py<br />
| |-- |-- test_pandascoreutils.py<br />
| |-- |-- test_pandasutils.py<br />
| |-- |-- test_lineagegraph.py<br />
| |-- |-- test_freshnessutils.py<br />
| |-- |-- test_app_insights_logger.py<br />
<br />
## Instructions
<br />
install twine - twine is a utility package that is used for publishing Python packages on PyPI <br />
**python -m pip install twine** <br />
Build Package - create the source distribution of the package <br />
**python setup.py sdist** <br />
Upload Package to PyPI <br />
***python -m twine upload dist/* *** <br />
Raw data
{
"_id": null,
"home_page": "",
"name": "mlplatformutils",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "mlplatformutils",
"author": "Keshav Singh",
"author_email": "keshav_singh@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/63/9e/b14eff079244cb6a6019dd2c4bad36efb0e31e509e8e24bad1de2048d987/mlplatformutils-0.9.5.17.tar.gz",
"platform": null,
"description": "# mlplatformutils\r\n\r\n<br />\r\n\r\n **mlplatformutils package for observability and ML Pipeline Processing** <br />\r\n\r\n <br />\r\n\r\n **This framework supports Azure Machine Learning training Pipeline supporting across computes such as Azure Synapse Spark, Virtual Machines Clusters, Azure Kubernetes Cluster, Azure Databricks. It supports reading/writing data from Azure Data Lake Gen2 in parquet and DELTA format, Azure Data Explorer (Kusto), Azure Sql DB instnces. The framework suports Python and Spark scalably. Writes with Spark with capabilties such a dynamic partitition overwrites, repartitioning are fully supported. In operating data reads and writes from such sources, The framework integrates built-in lineage framework providing column level lineage across the systems on a scalable Graph leveraging Azure Cosmos Gremlin Graph DB service. This enables a robust upstream dependency tracking and proactive alerting & eventing. All operations are suported over Service Principal (Client Id, Client Secrets) for applications and processing. The package also provides creating and managing computes, PIP dependecies for Azure Machine Learning Workspace and the training definitions.**\r\n\r\n## Description\r\n\r\n<br />\r\n\r\n**app_insights_logger** - Contains **telemetrylogger** Class with Functions to Manage and Log Telemetry into Azure Application Insights <br />\r\n\r\n<br />\r\n\r\n* trackEvent\r\n* trackTrace\r\n* trackException\r\n* logEvent\r\n* gather_event_details\r\n\r\n<br />\r\n\r\n**lineagegraph** - Contains **LineageGraph** Class with functions to manage Graph on Azure Cosmos DB enabled with Gremlin <br />\r\n\r\n<br />\r\n\r\n* add_vertex\r\n* get_vertices\r\n* is_vertex\r\n* update_vertex\r\n* insert_edges\r\n* drop_vertex\r\n* drop_edge\r\n* query_graph\r\n* update_lineage_graph\r\n* connect_lineage_graph\r\n\r\n**platformutils** - Contains platform utility functions to check, install depedencies, check Azure ML Compute \r\n\r\n* is_package_installed\r\n* install_pip\r\n* get_environment\r\n* set_environment\r\n* assert_amlcompute\r\n* read_setup_ini\r\n\r\n**sparkutils** - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2)while ensuring integrated Lineage Graph Logging.\r\n\r\n* read_from_adls_gen2\r\n* write_to_adls_gen2\r\n* read_from_kusto\r\n* read_from_azsql\r\n\r\n**sparkcoreutils** - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2) **without** integrated Lineage Graph Logging.\r\n\r\n* read_from_adls_gen2\r\n* write_to_adls_gen2\r\n* read_from_kusto\r\n* read_from_azsql\r\n\r\n**pandasutils** - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark while ensuring integrated Lineage Graph Logging.\r\n\r\n* read_from_delta_as_pandas\r\n* read_parquet_file_from_adlsgen2_as_pandas\r\n* read_parquet_directory_from_adlsgen2_as_pandas\r\n* write_pandas_as_parquet_file_to_adlsgen2\r\n\r\n**pandascoreutils** - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark **without** integrated Lineage Graph Logging.\r\n\r\n* read_from_delta_as_pandas\r\n* read_parquet_file_from_adlsgen2_as_pandas\r\n* read_parquet_directory_from_adlsgen2_as_pandas\r\n* write_pandas_as_parquet_file_to_adlsgen2\r\n\r\n**freshnessutils** - Contains functions to add freshness details into Azure Cosmos (NoSQL) document db. This helps with the details on the freshness metrics on evaluating the SLA, and downstream processing. It captures and provides details on model, training dataset freshness for the most recent and historical processing.\r\n\r\n* add_freshness\r\n* upsert_freshness\r\n* query_freshness\r\n\r\n### Examples\r\n\r\n<br />\r\n\r\n**from mlplatformutils.core.platformutils import is_package_installed** <br />\r\n**print(is_package_installed(\"pandas\"))** <br />\r\n**from mlplatformutils.core.app_insights_logger import telemetrylogger** <br />\r\n**from mlplatformutils.core.lineagegraph import LineageGraph** <br />\r\n**from mlplatformutils.core.sparkutils import write_to_adls_gen2, read_from_adls_gen2** <br />\r\n**from mlplatformutils.core.pandasutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas** <br />\r\n**from mlplatformutils.core.sparkcoreutils import write_to_adls_gen2, read_from_adls_gen2** <br />\r\n**from mlplatformutils.core.pandascoreutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas** <br />\r\n**from mlplatformutils.core.freshnessutils import add_freshness, upsert_freshness, query_freshness** <br />\r\n**import mlplatformutils.core.version as vr** <br />\r\n**print(vr.\\_\\_version\\_\\_)** <br />\r\n\r\n### Notes\r\n\r\n<br />\r\n\r\nWhen Running this Lineage Package from Jupyter Nootebook, the below 3 Lines Help overcome JupyterNotebook **RuntimeError: Cannot run the event loop while another loop is running** <br />\r\n**import asyncio** <br />\r\n**import nest_asyncio** <br />\r\n**nest_asyncio.apply()** <br />\r\n\r\n\r\n## Structure\r\n\r\n<br />\r\n.<br />\r\n|-- LICENSE.txt<br />\r\n|-- README.rst<br />\r\n|-- setup.cfg<br />\r\n|-- setup.py<br />\r\n|-- src<br />\r\n| |-- mlplatformutils<br />\r\n| | |-- __init__.py<br />\r\n| | |-- core<br />\r\n| | |-- |-- __init__.py<br />\r\n| | |-- |-- sparkcoreutils.py<br />\r\n| | |-- |-- sparkutils.py<br />\r\n| | |-- |-- platformutils.py<br />\r\n| | |-- |-- pandascoreutils.py<br />\r\n| | |-- |-- pandasutils.py<br />\r\n| | |-- |-- lineagegraph.py<br />\r\n| | |-- |-- freshnessutils.py<br />\r\n| | |-- |-- app_insights_logger.py<br />\r\n|-- tests<br />\r\n| |-- __init__.py<br />\r\n| |-- core<br />\r\n| |-- |--__init__.py<br />\r\n| |-- |-- test_sparkcoreutils.py<br />\r\n| |-- |-- test_sparkutils.py<br />\r\n| |-- |-- test_platformutils.py<br />\r\n| |-- |-- test_pandascoreutils.py<br />\r\n| |-- |-- test_pandasutils.py<br />\r\n| |-- |-- test_lineagegraph.py<br />\r\n| |-- |-- test_freshnessutils.py<br />\r\n| |-- |-- test_app_insights_logger.py<br />\r\n<br />\r\n\r\n## Instructions\r\n\r\n<br />\r\n install twine - twine is a utility package that is used for publishing Python packages on PyPI <br />\r\n \r\n **python -m pip install twine** <br />\r\n \r\n Build Package - create the source distribution of the package <br />\r\n \r\n **python setup.py sdist** <br />\r\n \r\n Upload Package to PyPI <br />\r\n\r\n ***python -m twine upload dist/* *** <br />\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "",
"version": "0.9.5.17",
"project_urls": null,
"split_keywords": [
"mlplatformutils"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "639eb14eff079244cb6a6019dd2c4bad36efb0e31e509e8e24bad1de2048d987",
"md5": "6925a2100a215ad0138919f6b203ff3c",
"sha256": "2dcd54afb0b59dced6e1c565aa9d7ad9c23d552eaaf2b86e15817ea369dbc32c"
},
"downloads": -1,
"filename": "mlplatformutils-0.9.5.17.tar.gz",
"has_sig": false,
"md5_digest": "6925a2100a215ad0138919f6b203ff3c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14274,
"upload_time": "2023-08-17T00:07:46",
"upload_time_iso_8601": "2023-08-17T00:07:46.834701Z",
"url": "https://files.pythonhosted.org/packages/63/9e/b14eff079244cb6a6019dd2c4bad36efb0e31e509e8e24bad1de2048d987/mlplatformutils-0.9.5.17.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-17 00:07:46",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "mlplatformutils"
}