# Fosforml
## Overview
The `fosforml` package is designed to facilitate the registration, management, and deployment of machine learning models with a focus on integration with Snowflake. It provides tools for managing datasets, model metadata, and the lifecycle of models within a Snowflake environment.
## Features
- Model Registration: Register models to the Snowflake Model registry with detailed metadata, including descriptions, types, and dependencies.
- Dataset Management: Handle datasets within Snowflake, including creation, versioning, and deletion of dataset objects.
- Metadata Management: Update model registry with descriptions and tags for better organization and retrieval.
- Snowflake Session Management: Manage Snowflake sessions for executing operations within the Snowflake environment.
## Installation
To install the `fosforml` package, ensure you have Python installed on your system and run the following command:
```shell
pip install fosforml
```
## Usage
Register a model with the Snowflake Model Registry using the `register_model` function. The function supports both Snowflake and Pandas dataframes, catering to different data handling preferences.
### Requirements
- **Snowflake DataFrame**: If you are using Snowflake as your data warehouse, you must provide a Snowflake DataFrame (`snowflake.snowpark.dataframe.DataFrame`) that includes model feature names, labels, and output column names.
- `snowflake_df`: Training snowflake dataframe with required feature,label and prediction columns.
- **Pandas DataFrame**: For users preferring local or in-memory data processing, you must upload the following as Pandas DataFrames (`pandas.DataFrame`):
- `x_train`: Training data with feature columns.
- `y_train`: Training data labels.
- `x_test`: Test data with feature columns.
- `y_test`: Test data labels.
- `y_pred`: Predicted labels for the test data.
- `y_prob`: Predicted probabilities for the test data classes for classification problems.
- Numpy data arrays are not allowed as input datasets to register the model
- `dataset_name`: Name of the dataset on which the model is trained.
- `dataset_source`: Name of the source from where the dataset is pulled/created.
- `source`: Model environment name where the model is being developed (e.g., Notebook/Experiment).
### Supported Model Flavors
Currently, the framework supports the following model flavors:
- **Snowflake Models (snowflake)**: Models that are directly integrated with Snowflake, leveraging Snowflake's data processing capabilities.
- **Scikit-Learn Models (sklearn)**: Models built using the Scikit-Learn library, a widely used library for machine learning in Python.
### Registering a Model
To register a model with the `fosforml` package, you need to provide the model object, session, and other relevant details such as the model name, description, and type.
#### For Snowflake Models:
```python
from fosforml import register_model
register_model(
model_obj=pipeline,
session=session,
name="MyModel",
snowflake_df=pred_df,
dataset_name="HR_CHURN",
dataset_source="Dataset",
source="Notebook",
description="This is a Snowflake model",
flavour="snowflake",
model_type="classification",
conda_dependencies=["scikit-learn==1.3.2"]
)
```
#### For Scikit-Learn Models:
```python
from fosforml import register_model
register_model(
model_obj=model,
session=session,
x_train=x_train,
y_train=y_train,
x_test=x_test,
y_test=y_test,
y_pred=y_pred,
y_prob=y_prob,
source="Notebook",
dataset_name="HR_CHURN",
dataset_source="InMemory",
name="MyModel",
description="This is a sklearn model",
flavour="sklearn",
model_type="classification",
conda_dependencies=["scikit-learn==1.3.2"]
)
```
### Snowflake Session Management
The `SnowflakeSession` class is used to manage connections to Snowflake, facilitating the execution of operations within the Snowflake environment. It provides the following features:
- `session`: To get the Snowflake session object.
- `connection_params`: To get the Snowflake connection parameters.
```python
from fosforml.model_manager.snowflakesession import get_session, get_connection_params
session = get_session()
connection_params = get_connection_params()
```
### Retrieving Model History
The `ModelRegistry` class provides functionalities to interact with the history of machine learning models stored in your environment. By utilizing this class, you can retrieve list of all models and their respective versions. This feature is particularly useful for tracking model evolution and managing model versions effectively.
#### Initializing ModelRegistry
To begin, you need to initialize the `ModelRegistry` class with an active session and connection parameters. These parameters are essential for establishing a connection to your data storage environment, where your models and their metadata are stored.
```python
from fosforml.model_manager import ModelRegistry
registry = ModelRegistry(
session=session,
connection_params=connection_params
)
```
#### Listing All Models
To obtain a list of all models stored in your environment, use the `list_models` method. This method returns a list of model names, providing a quick overview of the models you have.
```
model_list = registry.list_models()
print("Models:", model_list)
```
#### To list model versions
For more detailed insights into a specific model's evolution, The list_model_versions method can be used. By specifying a model's name, you can retrieve a list of all versions associated with that model. This allows for easy tracking of model updates and iterations
```
versions_list = registry.list_model_versions(model_name='YourModelName')
print("Versions_list:",versions_list)
```
### Managing Datasets with DatasetManager
The `DatasetManager` class is designed to facilitate the management of datasets associated with machine learning models in Snowflake. It allows for the creation, uploading, listing, deletion, and retrieval of datasets in a structured manner.
#### Initializing DatasetManager
To use `DatasetManager`, you need to initialize it with the model name, version, session, and connection parameters. The session and connection parameters ensure that `DatasetManager` can interact with the Snowflake environment where your datasets and models are stored.
```python
from fosforml.model_manager import DatasetManager
dataset_manager = DatasetManager(
model_name="YourModelName",
version_name="v1",
session=session,
connection_params=connection_params
)
```
### Upload datasets to a specific model version
To upload datasets to a specific model version, use the following code:
```python
dataset_manager.upload_datasets(session=session, datasets={"x_train": snowflake_train_dataframe_,
"x_test": snowflake_test_dataframe_},
...
)
```
#### Listing Datasets
To list all datasets associated with a specific model and version, use the `list_datasets` method. This method returns a list of dataset names that have been uploaded and registered under the specified model and version.
```python
datasets = dataset_manager.list_datasets()
print("Available datasets:", datasets)
```
#### Reading Datasets
The `DatasetManager` provides a method to read datasets: `read_dataset`. This method allows you to retrieve datasets either as Pandas DataFrames or as native Snowflake query results, depending on the `to_pandas` parameter.
##### To read as a Pandas DataFrame
To read a dataset as a Pandas DataFrame, set `to_pandas=True` as shown below:
```python
dataset_df = dataset_manager.read_dataset(dataset_name="x_train", to_pandas=True)
print(dataset_df.head())
```
##### To read as a Snowflake DataFrame
To read a dataset as a Snowflake DataFrame, set `to_pandas=False` as shown below:
```python
dataset_result = dataset_manager.read_dataset(dataset_name="x_train", to_pandas=False)
print(dataset_result.show())
```
### Delete datasets
To delete datasets associated with a specific model version, use the following code:
```python
dataset_manager.remove_datasets()
```
## Dependencies
- pandas
- snowflake-ml-python
- requests
Ensure these dependencies are installed in your environment to use the `fosforml` package effectively.
For issues and contributions, please refer to the project's [GitHub repository](https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk/-/tree/main/fosforml?ref_type=heads).
## Additional Resources
For further assistance and examples on how to register models using [`fosforml`](https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk/-/tree/main/fosforml/examples?ref_type=heads), please refer to the `example` folder in the project repository. This folder contains Jupyter notebooks that provide step-by-step guidance on model registration and other operations.
Visit [www.fosfor.com](https://www.fosfor.com) for more information.
Raw data
{
"_id": null,
"home_page": "https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk.git",
"name": "fosforml",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "fosforml",
"author": "Mahesh Gadipea",
"author_email": "mahesh.gadipea@fosfor.com",
"download_url": "https://files.pythonhosted.org/packages/a8/9b/9585663fde3ed24ec50ed581320eb59110f79b2c97135b82b9291eb18a46/fosforml-1.1.9.tar.gz",
"platform": null,
"description": "\n# Fosforml\n\n## Overview\nThe `fosforml` package is designed to facilitate the registration, management, and deployment of machine learning models with a focus on integration with Snowflake. It provides tools for managing datasets, model metadata, and the lifecycle of models within a Snowflake environment.\n\n## Features\n- Model Registration: Register models to the Snowflake Model registry with detailed metadata, including descriptions, types, and dependencies.\n- Dataset Management: Handle datasets within Snowflake, including creation, versioning, and deletion of dataset objects.\n- Metadata Management: Update model registry with descriptions and tags for better organization and retrieval.\n- Snowflake Session Management: Manage Snowflake sessions for executing operations within the Snowflake environment.\n\n## Installation\nTo install the `fosforml` package, ensure you have Python installed on your system and run the following command:\n\n```shell\npip install fosforml\n```\n\n## Usage\nRegister a model with the Snowflake Model Registry using the `register_model` function. The function supports both Snowflake and Pandas dataframes, catering to different data handling preferences.\n\n### Requirements\n- **Snowflake DataFrame**: If you are using Snowflake as your data warehouse, you must provide a Snowflake DataFrame (`snowflake.snowpark.dataframe.DataFrame`) that includes model feature names, labels, and output column names.\n - `snowflake_df`: Training snowflake dataframe with required feature,label and prediction columns.\n\n- **Pandas DataFrame**: For users preferring local or in-memory data processing, you must upload the following as Pandas DataFrames (`pandas.DataFrame`):\n - `x_train`: Training data with feature columns.\n - `y_train`: Training data labels.\n - `x_test`: Test data with feature columns.\n - `y_test`: Test data labels.\n - `y_pred`: Predicted labels for the test data.\n - `y_prob`: Predicted probabilities for the test data classes for classification problems.\n\n- Numpy data arrays are not allowed as input datasets to register the model\n- `dataset_name`: Name of the dataset on which the model is trained.\n- `dataset_source`: Name of the source from where the dataset is pulled/created.\n- `source`: Model environment name where the model is being developed (e.g., Notebook/Experiment).\n\n### Supported Model Flavors\n\nCurrently, the framework supports the following model flavors:\n\n- **Snowflake Models (snowflake)**: Models that are directly integrated with Snowflake, leveraging Snowflake's data processing capabilities.\n- **Scikit-Learn Models (sklearn)**: Models built using the Scikit-Learn library, a widely used library for machine learning in Python.\n\n### Registering a Model\nTo register a model with the `fosforml` package, you need to provide the model object, session, and other relevant details such as the model name, description, and type.\n\n#### For Snowflake Models:\n\n```python\nfrom fosforml import register_model\n\nregister_model(\n model_obj=pipeline,\n session=session,\n name=\"MyModel\",\n snowflake_df=pred_df,\n dataset_name=\"HR_CHURN\",\n dataset_source=\"Dataset\",\n source=\"Notebook\",\n description=\"This is a Snowflake model\",\n flavour=\"snowflake\",\n model_type=\"classification\",\n conda_dependencies=[\"scikit-learn==1.3.2\"]\n)\n```\n\n#### For Scikit-Learn Models:\n\n```python\nfrom fosforml import register_model\n\nregister_model(\n model_obj=model,\n session=session,\n x_train=x_train,\n y_train=y_train,\n x_test=x_test,\n y_test=y_test,\n y_pred=y_pred,\n y_prob=y_prob,\n source=\"Notebook\",\n dataset_name=\"HR_CHURN\",\n dataset_source=\"InMemory\",\n name=\"MyModel\",\n description=\"This is a sklearn model\",\n flavour=\"sklearn\",\n model_type=\"classification\",\n conda_dependencies=[\"scikit-learn==1.3.2\"]\n)\n```\n\n### Snowflake Session Management\nThe `SnowflakeSession` class is used to manage connections to Snowflake, facilitating the execution of operations within the Snowflake environment. It provides the following features:\n- `session`: To get the Snowflake session object.\n- `connection_params`: To get the Snowflake connection parameters.\n\n```python\nfrom fosforml.model_manager.snowflakesession import get_session, get_connection_params\n\nsession = get_session()\nconnection_params = get_connection_params()\n```\n\n### Retrieving Model History\n\nThe `ModelRegistry` class provides functionalities to interact with the history of machine learning models stored in your environment. By utilizing this class, you can retrieve list of all models and their respective versions. This feature is particularly useful for tracking model evolution and managing model versions effectively.\n\n#### Initializing ModelRegistry\n\nTo begin, you need to initialize the `ModelRegistry` class with an active session and connection parameters. These parameters are essential for establishing a connection to your data storage environment, where your models and their metadata are stored.\n\n```python\nfrom fosforml.model_manager import ModelRegistry\n\nregistry = ModelRegistry(\n session=session,\n connection_params=connection_params\n)\n```\n\n#### Listing All Models\nTo obtain a list of all models stored in your environment, use the `list_models` method. This method returns a list of model names, providing a quick overview of the models you have.\n\n```\nmodel_list = registry.list_models()\nprint(\"Models:\", model_list)\n```\n\n#### To list model versions\nFor more detailed insights into a specific model's evolution, The list_model_versions method can be used. By specifying a model's name, you can retrieve a list of all versions associated with that model. This allows for easy tracking of model updates and iterations\n\n```\nversions_list = registry.list_model_versions(model_name='YourModelName')\nprint(\"Versions_list:\",versions_list)\n```\n\n### Managing Datasets with DatasetManager\n\nThe `DatasetManager` class is designed to facilitate the management of datasets associated with machine learning models in Snowflake. It allows for the creation, uploading, listing, deletion, and retrieval of datasets in a structured manner.\n\n#### Initializing DatasetManager\n\nTo use `DatasetManager`, you need to initialize it with the model name, version, session, and connection parameters. The session and connection parameters ensure that `DatasetManager` can interact with the Snowflake environment where your datasets and models are stored.\n\n```python\nfrom fosforml.model_manager import DatasetManager\n\ndataset_manager = DatasetManager(\n model_name=\"YourModelName\",\n version_name=\"v1\",\n session=session,\n connection_params=connection_params\n)\n```\n### Upload datasets to a specific model version\nTo upload datasets to a specific model version, use the following code:\n\n```python\ndataset_manager.upload_datasets(session=session, datasets={\"x_train\": snowflake_train_dataframe_,\n \"x_test\": snowflake_test_dataframe_},\n ...\n )\n```\n\n#### Listing Datasets\nTo list all datasets associated with a specific model and version, use the `list_datasets` method. This method returns a list of dataset names that have been uploaded and registered under the specified model and version.\n\n```python\ndatasets = dataset_manager.list_datasets()\nprint(\"Available datasets:\", datasets)\n```\n\n#### Reading Datasets\nThe `DatasetManager` provides a method to read datasets: `read_dataset`. This method allows you to retrieve datasets either as Pandas DataFrames or as native Snowflake query results, depending on the `to_pandas` parameter.\n\n##### To read as a Pandas DataFrame\nTo read a dataset as a Pandas DataFrame, set `to_pandas=True` as shown below:\n\n```python\ndataset_df = dataset_manager.read_dataset(dataset_name=\"x_train\", to_pandas=True)\nprint(dataset_df.head())\n```\n\n##### To read as a Snowflake DataFrame\nTo read a dataset as a Snowflake DataFrame, set `to_pandas=False` as shown below:\n\n```python\ndataset_result = dataset_manager.read_dataset(dataset_name=\"x_train\", to_pandas=False)\nprint(dataset_result.show())\n```\n\n### Delete datasets\nTo delete datasets associated with a specific model version, use the following code:\n\n```python\ndataset_manager.remove_datasets()\n```\n\n## Dependencies\n- pandas\n- snowflake-ml-python\n- requests\n\nEnsure these dependencies are installed in your environment to use the `fosforml` package effectively.\n\nFor issues and contributions, please refer to the project's [GitHub repository](https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk/-/tree/main/fosforml?ref_type=heads).\n\n## Additional Resources\nFor further assistance and examples on how to register models using [`fosforml`](https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk/-/tree/main/fosforml/examples?ref_type=heads), please refer to the `example` folder in the project repository. This folder contains Jupyter notebooks that provide step-by-step guidance on model registration and other operations.\n\nVisit [www.fosfor.com](https://www.fosfor.com) for more information.\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for registering machine learning models directly to the Snowflake Model Registry, leveraging Snowflake ML capabilities.",
"version": "1.1.9",
"project_urls": {
"Homepage": "https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk.git",
"Product": "https://www.fosfor.com/",
"Source": "https://gitlab.fosfor.com/fosfor-decision-cloud/intelligence/refract-sdk/-/tree/main/fosforml?ref_type=heads"
},
"split_keywords": [
"fosforml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6bca0c86668b472708567e21a5416238df7ff61dd2b66b7a91d0c9ca7780951e",
"md5": "5ae62645862e5bd2692845df8d4fd2d8",
"sha256": "051140028629c95510285eee0242f9ffd920f748297b70f49588fbbcabdd12d7"
},
"downloads": -1,
"filename": "fosforml-1.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ae62645862e5bd2692845df8d4fd2d8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 42510,
"upload_time": "2024-11-27T06:58:05",
"upload_time_iso_8601": "2024-11-27T06:58:05.739160Z",
"url": "https://files.pythonhosted.org/packages/6b/ca/0c86668b472708567e21a5416238df7ff61dd2b66b7a91d0c9ca7780951e/fosforml-1.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a89b9585663fde3ed24ec50ed581320eb59110f79b2c97135b82b9291eb18a46",
"md5": "32c31be68edbad86a4ab4771bf8a2fb0",
"sha256": "a064f54f972e00bcfd3ea4c91c76dcd80e32687acd0ad13be8e489a0a36158a7"
},
"downloads": -1,
"filename": "fosforml-1.1.9.tar.gz",
"has_sig": false,
"md5_digest": "32c31be68edbad86a4ab4771bf8a2fb0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 39723,
"upload_time": "2024-11-27T06:58:09",
"upload_time_iso_8601": "2024-11-27T06:58:09.571041Z",
"url": "https://files.pythonhosted.org/packages/a8/9b/9585663fde3ed24ec50ed581320eb59110f79b2c97135b82b9291eb18a46/fosforml-1.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-27 06:58:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fosforml"
}