Introduction
------------
<img src="https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/SAP_R_grad.jpg" align="left" alt="" width="120" />
Welcome to Python machine learning client for SAP HANA (hana-ml)!
This package enables Python data scientists to access SAP HANA data and build various machine learning models using the data directly in SAP HANA. This page provides an overview of hana-ml.
Overview
--------
Python machine learning client for SAP HANA consists of two main parts:
- SAP HANA DataFrame, which provides a set of methods for accessing and querying data in SAP HANA without bringing the data to the client.
- A set of machine learning APIs for developing machine learning models.
Specifically, machine learning APIs are composed of two packages:
- **PAL** package
PAL package consists of a set of Python algorithms and functions which provide access to machine learning capabilities in SAP HANA Predictive Analysis Library (PAL). SAP HANA PAL functions cover a variety of machine learning algorithms for training a model and then the trained model is used for scoring.
- **APL** package
Automated Predictive Library (APL) package exposes the data mining capabilities of the Automated Analytics engine in SAP HANA through a set of
functions. These functions develop a predictive modeling process that analysts can use to answer simple questions on their customer datasets stored in SAP HANA.
In addition to SAP HANA DataFrame methods and machine learning API, hana-ml also offers the following features:
- Visualizers: a bunch of methods to visualize dataset and model, e.g. eda (plot functions, e.g. Distribution plot, Pie plot and Correlation plot), dataset_report (analyze the dataset and generate a report in HTML format), model_debriefing (visualize a tree model and explain the output of model with Shapley value ), unified_report (integrated dataset report and model report for UnifiedClassfication() and UnifiedRegression()).
- Model storage: offers a series of methods to save, list, load and delete models in SAP HANA. Models are saved into SAP HANA tables in a schema specified by the user.
- Text Mining: provides a series of functions, such as perform tf_analysis, text classification on the given document.
- Spatial and Graph: introduces additional engines that can be used for analytics focused on Geospatial and Graph or network modeled data.
Please see [Python Machine Learning Client for SAP HANA Documentation](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html) for more details of methods.
Prerequisites
-------------
hana-ml uses SAP HANA Python Driver (hdbcli) to connect to SAP HANA. Please install and see the following information:
- SAP HANA Python Driver: hdbcli. Please see [SAP HANA Client Interface Programming Reference](https://help.sap.com/docs/SAP_HANA_CLIENT/f1b440ded6144a54ada97ff95dac7adf/f3b8fabf34324302b123297cdbe710f0.html)
hana-ml uses SAP HANA PAL and SAP HANA APL for machine learning API. Please refer to the following information:
- SAP HANA PAL: Security **AFL__SYS_AFL_AFLPAL_EXECUTE** and **AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION** roles. See [SAP HANA Predictive Analysis Library](https://help.sap.com/viewer/2cfbc5cf2bc14f028cfbe2a2bba60a50/latest/en-US/253f2b552f55436ba1243ff0d7b374b3.html) for more information.
- SAP HANA APL 1905 or higher. Please see [SAP HANA Automated Predictive Library Developer Guide](https://help.sap.com/viewer/product/apl/latest/en-US) for more information. Only valid when using the APL package.
Getting Started
---------------
Install via
>>> pip install hana-ml
Quick Start
-----------
In this section, we will show some simple statements and an example
to help you get familiar with the usage of hana-ml.
### Establish a connection to SAP HANA:
>>> from hana_ml import dataframe
>>> conn = dataframe.ConnectionContext(address="<hostname>",
port=<port>,
user="<username>",
password="<password>")
### Create a SAP HANA DataFrame df referenced to a SAP HANA table:
>>> df = conn.table('MY_TABLE', schema='MY_SCHEMA').filter('COL3>5').select('COL1', 'COL2')
### Create a SAP HANA DataFrame from select statement:
>>> df = dataframe.DataFrame(conn, 'select * from MY_SCHEMA.MY_TABLE')
### Convert a SAP HANA DataFrame to be a pandas DataFrame:
>>> pandas_df = df.collect()
### Convert a pandas DataFrame to be a SAP HANA DataFrame:
>>> df = dataframe.create_dataframe_from_pandas(connection_context=conn,
pandas_df=pandas_df,
table_name='MY_TABLE',
force=True)
### An Classification Example:
In this example, we build an `UnifiedClassification` model and display the dataset and model with UnifiedReport function.
Step 1: Import modules:
>>> from hana_ml import dataframe
>>> from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
>>> from hana_ml.visualizers.unified_report import UnifiedReport
Step 2: Create a ConnectionContext object:
>>> conn = dataframe.ConnectionContext('<url>', <port>, '<user>', '<password>')
Step 3: Create a SAP HANA DataFrame df_fit and point to a table "DATA_TBL_FIT" in SAP HANA:
>>> df_fit = conn.table("DATA_TBL_FIT")
Step 4: Inspect df_fit:
>>> df_fit.head(6).collect()
ID OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70.0 Yes Play
2 Sunny 80 90.0 Yes Do not Play
3 Sunny 85 85.0 No Do not Play
4 Sunny 72 95.0 No Do not Play
5 Sunny 69 70.0 No Play
6 Overcast 72 90.0 Yes Play
Step 5: Invoke UnifiedReport function to display the dataset:
>>> UnifiedReport(df_fit).build().display()
<img src="https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/datasetreport.PNG" align="left" alt="" width="2000" />
Step 6: Create an 'UnifiedClassification' instance and specify parameters:
>>> rdt_params = dict(random_state=2,
split_threshold=1e-7,
min_samples_leaf=1,
n_estimators=10,
max_depth=55)
>>> uc_rdt = UnifiedClassification(func = 'RandomDecisionTree', **rdt_params)
Step 7: Invoke the fit method and inspect one of returned attributes importance_:
>>> uc_rdt.fit(data=df_fit,
partition_method='stratified',
stratified_column='CLASS',
partition_random_state=2,
training_percent=0.7,
ntiles=2)
>>> print(uc_rdt.importance_.collect())
VARIABLE_NAME IMPORTANCE
OUTLOOK 0.191748
TEMP 0.418285
HUMIDITY 0.389968
WINDY 0.000000
Step 8: View the 'UnifiedClassification' model report:
>>> UnifiedReport(uc_rdt).build().display()
<img src="https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/modelreport.PNG" align="left" alt="" width="2000" />
Step 9: Create a SAP HANA DataFrame df_predict and point to a table "DATA_TBL_PREDICT":
>>> df_predict = conn.table("DATA_TBL_PREDICT")
Step 10: Preview df_predict:
>>> df_predict.collect()
ID OUTLOOK TEMP HUMIDITY WINDY
0 Overcast 75.0 70.0 Yes
1 Rain 78.0 70.0 Yes
2 Sunny 66.0 70.0 Yes
3 Sunny 69.0 70.0 Yes
4 Rain NaN 70.0 Yes
5 None 70.0 70.0 Yes
6 *** 70.0 70.0 Yes
Step 11: Invoke the predict method and inspect the result:
>>> result = uc_rdt.predict(df_predict, key = "ID", top_k_attributions=10)
>>> print(result.collect())
ID SCORE CONFIDENCE
0 Play 0.8
1 Play 1.0
2 Play 0.6
3 Play 1.0
4 Play 1.0
5 Do not Play 0.8
6 Play 0.8
Step 12: Create a TreeModelDebriefing.shapley_explainer object and then invoke summary_plot() to explain the output of 'UnifiedClassification' model :
>>> from hana_ml.visualizers.model_debriefing import TreeModelDebriefing
>>> features = ["OUTLOOK", "TEMP", "HUMIDITY", "WINDY"]
>>> shapley_explainer = TreeModelDebriefing.shapley_explainer(feature_data=df_predict.select(features),
reason_code_data=res.select('REASON_CODE'))
>>> shapley_explainer.summary_plot()
<img src="https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/shap.png" align="left" alt="" width="2000" />
Step 13: Create a SAP HANA DataFrame df_score and point to a "DATA_TBL_SCORE" Table:
>>> df_score = conn.table("DATA_TBL_SCORE")
Step 14: Preview df_score:
>>> df_score.collect()
ID OUTLOOK TEMP HUMIDITY WINDY CLASS
0 Overcast 75.0 -10000.0 Yes Play
1 Rain 78.0 70.0 Yes Play
2 Sunny -10000.0 NaN Yes Do not Play
3 Sunny 69.0 70.0 Yes Do not Play
4 Rain NaN 70.0 Yes Play
5 None 70.0 70.0 Yes Do not Play
6 *** 70.0 70.0 Yes Play
Step 15: Perform the score method and inspect the result:
>>> score_res = uc_rdt.score(data=df_score,
key='ID',
max_result_num=2,
ntiles=2,
attribution_method='tree-shap')[1].head(4)
>>> print(score_res.collect())
STAT_NAME STAT_VALUE CLASS_NAME
AUC 0.5102040816326531 None
RECALL 0 Do not Play
PRECISION 0 Do not Play
F1_SCORE 0 Do not Play
Step 16: Close the connection to SAP HANA:
>>> conn.close()
Help
----
Please see [Python Machine Learning Client for SAP HANA Documentation](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html) for more details of methods.
License
-------
The SAP HANA ML API is provided via the [SAP Developer License Agreement](https://tools.hana.ondemand.com/developer-license-3_2.txt).
By using this software, you agree that the following text is incorporated into the terms of the Developer Agreement:
If you are an existing SAP customer for On Premise software, your use of this current software is also covered by the
terms of your software license agreement with SAP, including the Use Rights, the current version of which can be found at:
https://www.sap.com/about/agreements/product-use-and-support-terms.html?tag=agreements:product-use-support-terms/on-premise-software/software-use-rights
Raw data
{
"_id": null,
"home_page": "https://www.sap.com/",
"name": "hana-ml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.4",
"maintainer_email": null,
"keywords": "SAP HANA machine learning intelligent enterprise cloud PAL APL client",
"author": "SAP SE",
"author_email": null,
"download_url": null,
"platform": null,
"description": "Introduction\n------------\n\n<img src=\"https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/SAP_R_grad.jpg\" align=\"left\" alt=\"\" width=\"120\" />\n\n\nWelcome to Python machine learning client for SAP HANA (hana-ml)!\n\nThis package enables Python data scientists to access SAP HANA data and build various machine learning models using the data directly in SAP HANA. This page provides an overview of hana-ml.\n\nOverview\n--------\nPython machine learning client for SAP HANA consists of two main parts:\n\n - SAP HANA DataFrame, which provides a set of methods for accessing and querying data in SAP HANA without bringing the data to the client.\n - A set of machine learning APIs for developing machine learning models.\n\nSpecifically, machine learning APIs are composed of two packages:\n\n - **PAL** package\n\n PAL package consists of a set of Python algorithms and functions which provide access to machine learning capabilities in SAP HANA Predictive Analysis Library (PAL). SAP HANA PAL functions cover a variety of machine learning algorithms for training a model and then the trained model is used for scoring.\n\n - **APL** package\n\n Automated Predictive Library (APL) package exposes the data mining capabilities of the Automated Analytics engine in SAP HANA through a set of\n functions. These functions develop a predictive modeling process that analysts can use to answer simple questions on their customer datasets stored in SAP HANA.\n\nIn addition to SAP HANA DataFrame methods and machine learning API, hana-ml also offers the following features:\n- Visualizers: a bunch of methods to visualize dataset and model, e.g. eda (plot functions, e.g. Distribution plot, Pie plot and Correlation plot), dataset_report (analyze the dataset and generate a report in HTML format), model_debriefing (visualize a tree model and explain the output of model with Shapley value ), unified_report (integrated dataset report and model report for UnifiedClassfication() and UnifiedRegression()).\n- Model storage: offers a series of methods to save, list, load and delete models in SAP HANA. Models are saved into SAP HANA tables in a schema specified by the user.\n- Text Mining: provides a series of functions, such as perform tf_analysis, text classification on the given document.\n- Spatial and Graph: introduces additional engines that can be used for analytics focused on Geospatial and Graph or network modeled data.\n\nPlease see [Python Machine Learning Client for SAP HANA Documentation](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html) for more details of methods.\n\nPrerequisites\n-------------\n\nhana-ml uses SAP HANA Python Driver (hdbcli) to connect to SAP HANA. Please install and see the following information:\n\n - SAP HANA Python Driver: hdbcli. Please see [SAP HANA Client Interface Programming Reference](https://help.sap.com/docs/SAP_HANA_CLIENT/f1b440ded6144a54ada97ff95dac7adf/f3b8fabf34324302b123297cdbe710f0.html)\n\nhana-ml uses SAP HANA PAL and SAP HANA APL for machine learning API. Please refer to the following information:\n\n - SAP HANA PAL: Security **AFL__SYS_AFL_AFLPAL_EXECUTE** and **AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION** roles. See [SAP HANA Predictive Analysis Library](https://help.sap.com/viewer/2cfbc5cf2bc14f028cfbe2a2bba60a50/latest/en-US/253f2b552f55436ba1243ff0d7b374b3.html) for more information.\n\n - SAP HANA APL 1905 or higher. Please see [SAP HANA Automated Predictive Library Developer Guide](https://help.sap.com/viewer/product/apl/latest/en-US) for more information. Only valid when using the APL package.\n\nGetting Started\n---------------\n\nInstall via\n\n >>> pip install hana-ml\n\nQuick Start\n-----------\n\nIn this section, we will show some simple statements and an example\nto help you get familiar with the usage of hana-ml.\n\n### Establish a connection to SAP HANA:\n\n >>> from hana_ml import dataframe\n >>> conn = dataframe.ConnectionContext(address=\"<hostname>\",\n port=<port>,\n user=\"<username>\",\n password=\"<password>\")\n\n### Create a SAP HANA DataFrame df referenced to a SAP HANA table:\n\n >>> df = conn.table('MY_TABLE', schema='MY_SCHEMA').filter('COL3>5').select('COL1', 'COL2')\n\n### Create a SAP HANA DataFrame from select statement:\n\n >>> df = dataframe.DataFrame(conn, 'select * from MY_SCHEMA.MY_TABLE')\n\n### Convert a SAP HANA DataFrame to be a pandas DataFrame:\n\n >>> pandas_df = df.collect()\n\n### Convert a pandas DataFrame to be a SAP HANA DataFrame:\n\n >>> df = dataframe.create_dataframe_from_pandas(connection_context=conn,\n pandas_df=pandas_df,\n table_name='MY_TABLE',\n force=True)\n\n\n### An Classification Example:\n In this example, we build an `UnifiedClassification` model and display the dataset and model with UnifiedReport function.\n\nStep 1: Import modules:\n\n >>> from hana_ml import dataframe\n >>> from hana_ml.algorithms.pal.unified_classification import UnifiedClassification\n >>> from hana_ml.visualizers.unified_report import UnifiedReport\n\nStep 2: Create a ConnectionContext object:\n\n >>> conn = dataframe.ConnectionContext('<url>', <port>, '<user>', '<password>')\n\nStep 3: Create a SAP HANA DataFrame df_fit and point to a table \"DATA_TBL_FIT\" in SAP HANA:\n\n >>> df_fit = conn.table(\"DATA_TBL_FIT\")\n\nStep 4: Inspect df_fit:\n\n >>> df_fit.head(6).collect()\n ID OUTLOOK TEMP HUMIDITY WINDY CLASS\n 1 Sunny 75 70.0 Yes Play\n 2 Sunny 80 90.0 Yes Do not Play\n 3 Sunny 85 85.0 No Do not Play\n 4 Sunny 72 95.0 No Do not Play\n 5 Sunny 69 70.0 No Play\n 6 Overcast 72 90.0 Yes Play\n\nStep 5: Invoke UnifiedReport function to display the dataset:\n\n >>> UnifiedReport(df_fit).build().display()\n\n<img src=\"https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/datasetreport.PNG\" align=\"left\" alt=\"\" width=\"2000\" />\n\nStep 6: Create an 'UnifiedClassification' instance and specify parameters:\n\n >>> rdt_params = dict(random_state=2,\n split_threshold=1e-7,\n min_samples_leaf=1,\n n_estimators=10,\n max_depth=55)\n >>> uc_rdt = UnifiedClassification(func = 'RandomDecisionTree', **rdt_params)\n\nStep 7: Invoke the fit method and inspect one of returned attributes importance_:\n\n >>> uc_rdt.fit(data=df_fit,\n partition_method='stratified',\n stratified_column='CLASS',\n partition_random_state=2,\n training_percent=0.7,\n ntiles=2)\n >>> print(uc_rdt.importance_.collect())\n VARIABLE_NAME IMPORTANCE\n OUTLOOK 0.191748\n TEMP 0.418285\n HUMIDITY 0.389968\n WINDY 0.000000\n\nStep 8: View the 'UnifiedClassification' model report:\n\n >>> UnifiedReport(uc_rdt).build().display()\n\n<img src=\"https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/modelreport.PNG\" align=\"left\" alt=\"\" width=\"2000\" />\n\nStep 9: Create a SAP HANA DataFrame df_predict and point to a table \"DATA_TBL_PREDICT\":\n\n >>> df_predict = conn.table(\"DATA_TBL_PREDICT\")\n\nStep 10: Preview df_predict:\n\n >>> df_predict.collect()\n ID OUTLOOK TEMP HUMIDITY WINDY\n 0 Overcast 75.0 70.0 Yes\n 1 Rain 78.0 70.0 Yes\n 2 Sunny 66.0 70.0 Yes\n 3 Sunny 69.0 70.0 Yes\n 4 Rain NaN 70.0 Yes\n 5 None 70.0 70.0 Yes\n 6 *** 70.0 70.0 Yes\n\nStep 11: Invoke the predict method and inspect the result:\n\n >>> result = uc_rdt.predict(df_predict, key = \"ID\", top_k_attributions=10)\n >>> print(result.collect())\n ID SCORE CONFIDENCE\n 0 Play 0.8\n 1 Play 1.0\n 2 Play 0.6\n 3 Play 1.0\n 4 Play 1.0\n 5 Do not Play 0.8\n 6 Play 0.8\n\nStep 12: Create a TreeModelDebriefing.shapley_explainer object and then invoke summary_plot() to explain the output of 'UnifiedClassification' model :\n\n >>> from hana_ml.visualizers.model_debriefing import TreeModelDebriefing\n >>> features = [\"OUTLOOK\", \"TEMP\", \"HUMIDITY\", \"WINDY\"]\n >>> shapley_explainer = TreeModelDebriefing.shapley_explainer(feature_data=df_predict.select(features),\n reason_code_data=res.select('REASON_CODE'))\n >>> shapley_explainer.summary_plot()\n\n\n<img src=\"https://raw.githubusercontent.com/SAP-samples/hana-ml-samples/main/Python-API/portal/shap.png\" align=\"left\" alt=\"\" width=\"2000\" />\n\nStep 13: Create a SAP HANA DataFrame df_score and point to a \"DATA_TBL_SCORE\" Table:\n\n >>> df_score = conn.table(\"DATA_TBL_SCORE\")\n\nStep 14: Preview df_score:\n\n >>> df_score.collect()\n ID OUTLOOK TEMP HUMIDITY WINDY CLASS\n 0 Overcast 75.0 -10000.0 Yes Play\n 1 Rain 78.0 70.0 Yes Play\n 2 Sunny -10000.0 NaN Yes Do not Play\n 3 Sunny 69.0 70.0 Yes Do not Play\n 4 Rain NaN 70.0 Yes Play\n 5 None 70.0 70.0 Yes Do not Play\n 6 *** 70.0 70.0 Yes Play\n\nStep 15: Perform the score method and inspect the result:\n\n >>> score_res = uc_rdt.score(data=df_score,\n key='ID',\n max_result_num=2,\n ntiles=2,\n attribution_method='tree-shap')[1].head(4)\n >>> print(score_res.collect())\n STAT_NAME STAT_VALUE CLASS_NAME\n AUC 0.5102040816326531 None\n RECALL 0 Do not Play\n PRECISION 0 Do not Play\n F1_SCORE 0 Do not Play\n\nStep 16: Close the connection to SAP HANA:\n\n >>> conn.close()\n\nHelp\n----\n\nPlease see [Python Machine Learning Client for SAP HANA Documentation](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html) for more details of methods.\n\nLicense\n-------\n\nThe SAP HANA ML API is provided via the [SAP Developer License Agreement](https://tools.hana.ondemand.com/developer-license-3_2.txt).\n\nBy using this software, you agree that the following text is incorporated into the terms of the Developer Agreement:\n\nIf you are an existing SAP customer for On Premise software, your use of this current software is also covered by the\nterms of your software license agreement with SAP, including the Use Rights, the current version of which can be found at:\nhttps://www.sap.com/about/agreements/product-use-and-support-terms.html?tag=agreements:product-use-support-terms/on-premise-software/software-use-rights\n\n\n\n\n\n",
"bugtrack_url": null,
"license": "SAP DEVELOPER LICENSE AGREEMENT",
"summary": "Python Machine Learning Client for SAP HANA",
"version": "2.22.24110601",
"project_urls": {
"Documentation": "https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html",
"Homepage": "https://www.sap.com/",
"Notebook Examples": "https://github.com/SAP-samples/hana-ml-samples/tree/main/Python-API/pal/notebooks",
"Report Issues": "https://github.com/SAP-samples/hana-ml-samples/issues"
},
"split_keywords": [
"sap",
"hana",
"machine",
"learning",
"intelligent",
"enterprise",
"cloud",
"pal",
"apl",
"client"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "256672d01ad4549a7e38419982ab56f901b1b85e792c61a819f7a14e64376a23",
"md5": "860fd86badad63089bd18528f1dcb442",
"sha256": "39dde6418bb30f9a5115d707af4d52f901db72da9a2f5e969820df12575b102b"
},
"downloads": -1,
"filename": "hana_ml-2.22.24110601-py3-none-any.whl",
"has_sig": false,
"md5_digest": "860fd86badad63089bd18528f1dcb442",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.4",
"size": 9587772,
"upload_time": "2024-11-05T08:27:00",
"upload_time_iso_8601": "2024-11-05T08:27:00.049931Z",
"url": "https://files.pythonhosted.org/packages/25/66/72d01ad4549a7e38419982ab56f901b1b85e792c61a819f7a14e64376a23/hana_ml-2.22.24110601-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 08:27:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SAP-samples",
"github_project": "hana-ml-samples",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "hana-ml"
}