hsfs


Namehsfs JSON
Version 3.7.6 PyPI version JSON
download
home_pagehttps://github.com/logicalclocks/feature-store-api
SummaryHSFS: An environment independent client to interact with the Hopsworks Featurestore
upload_time2024-05-03 09:42:19
maintainerNone
docs_urlNone
authorHopsworks AB
requires_python<3.13,>=3.8
licenseApache License 2.0
keywords hopsworks feature store spark machine learning mlops dataops
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hopsworks Feature Store

<p align="center">
  <a href="https://community.hopsworks.ai"><img
    src="https://img.shields.io/discourse/users?label=Hopsworks%20Community&server=https%3A%2F%2Fcommunity.hopsworks.ai"
    alt="Hopsworks Community"
  /></a>
    <a href="https://docs.hopsworks.ai"><img
    src="https://img.shields.io/badge/docs-HSFS-orange"
    alt="Hopsworks Feature Store Documentation"
  /></a>
  <a href="https://pypi.org/project/hsfs/"><img
    src="https://img.shields.io/pypi/v/hsfs?color=blue"
    alt="PyPiStatus"
  /></a>
  <a href="https://archiva.hops.works/#artifact/com.logicalclocks/hsfs"><img
    src="https://img.shields.io/badge/java-HSFS-green"
    alt="Scala/Java Artifacts"
  /></a>
  <a href="https://pepy.tech/project/hsfs/month"><img
    src="https://pepy.tech/badge/hsfs/month"
    alt="Downloads"
  /></a>
  <a href="https://github.com/psf/black"><img
    src="https://img.shields.io/badge/code%20style-black-000000.svg"
    alt="CodeStyle"
  /></a>
  <a><img
    src="https://img.shields.io/pypi/l/hsfs?color=green"
    alt="License"
  /></a>
</p>

HSFS is the library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easy.

The library is environment independent and can be used in two modes:

- Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.

- Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.

The library automatically configures itself based on the environment it is run.
However, to connect from an external environment such as Databricks or AWS Sagemaker,
additional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.

## Getting Started On Hopsworks

Instantiate a connection and get the project feature store handler
```python
import hsfs

connection = hsfs.connection()
fs = connection.get_feature_store()
```

Create a new feature group
```python
fg = fs.create_feature_group("rain",
                        version=1,
                        description="Rain features",
                        primary_key=['date', 'location_id'],
                        online_enabled=True)

fg.save(dataframe)
```

Upsert new data in to the feature group with `time_travel_format="HUDI"`".
```python
fg.insert(upsert_df)
```

Retrieve commit timeline metdata of the feature group with `time_travel_format="HUDI"`".
```python
fg.commit_details()
```

"Reading feature group as of specific point in time".
```python
fg = fs.get_feature_group("rain", 1)
fg.read("2020-10-20 07:34:11").show()
```

Read updates  that occurred between specified points in time.
```python
fg = fs.get_feature_group("rain", 1)
fg.read_changes("2020-10-20 07:31:38", "2020-10-20 07:34:11").show()
```

Join features together
```python
feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all(), on=["date", "location_id"])
                    .join(location_fg.select_all())
feature_join.show(5)
```

join feature groups that correspond to specific point in time
```python
feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all(), on=["date", "location_id"])
                    .join(location_fg.select_all())
                    .as_of("2020-10-31")
feature_join.show(5)
```

join feature groups that correspond to different time
```python
rain_fg_q = rain_fg.select_all().as_of("2020-10-20 07:41:43")
temperature_fg_q = temperature_fg.select_all().as_of("2020-10-20 07:32:33")
location_fg_q = location_fg.select_all().as_of("2020-10-20 07:33:08")
joined_features_q = rain_fg_q.join(temperature_fg_q).join(location_fg_q)
```

Use the query object to create a training dataset:
```python
td = fs.create_training_dataset("rain_dataset",
                                version=1,
                                data_format="tfrecords",
                                description="A test training dataset saved in TfRecords format",
                                splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})

td.save(feature_join)
```

A short introduction to the Scala API:
```scala
import com.logicalclocks.hsfs._
val connection = HopsworksConnection.builder().build()
val fs = connection.getFeatureStore();
val attendances_features_fg = fs.getFeatureGroup("games_features", 1);
attendances_features_fg.show(1)
```

You can find more examples on how to use the library in our [hops-examples](https://github.com/logicalclocks/hops-examples) repository.

## Usage

Usage data is collected for improving quality of the library. It is turned on by default if the backend
is "c.app.hopsworks.ai". To turn it off, use one of the following way:
```python
# use environment variable
import os
os.environ["ENABLE_HOPSWORKS_USAGE"] = "false"

# use `disable_usage_logging`
import hsfs
hsfs.disable_usage_logging()
```

The source code can be found in python/hsfs/usage.py.

## Documentation

Documentation is available at [Hopsworks Feature Store Documentation](https://docs.hopsworks.ai/).

## Issues

For general questions about the usage of Hopsworks and the Feature Store please open a topic on [Hopsworks Community](https://community.hopsworks.ai/).

Please report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues).


## Contributing

If you would like to contribute to this library, please see the [Contribution Guidelines](CONTRIBUTING.md).
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/logicalclocks/feature-store-api",
    "name": "hsfs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": "Hopsworks, Feature Store, Spark, Machine Learning, MLOps, DataOps",
    "author": "Hopsworks AB",
    "author_email": "moritz@logicalclocks.com",
    "download_url": "https://files.pythonhosted.org/packages/56/80/165c955b8db1e262d980823ee8b638b8486aea7a15c08d21a983f9adaa45/hsfs-3.7.6.tar.gz",
    "platform": null,
    "description": "# Hopsworks Feature Store\n\n<p align=\"center\">\n  <a href=\"https://community.hopsworks.ai\"><img\n    src=\"https://img.shields.io/discourse/users?label=Hopsworks%20Community&server=https%3A%2F%2Fcommunity.hopsworks.ai\"\n    alt=\"Hopsworks Community\"\n  /></a>\n    <a href=\"https://docs.hopsworks.ai\"><img\n    src=\"https://img.shields.io/badge/docs-HSFS-orange\"\n    alt=\"Hopsworks Feature Store Documentation\"\n  /></a>\n  <a href=\"https://pypi.org/project/hsfs/\"><img\n    src=\"https://img.shields.io/pypi/v/hsfs?color=blue\"\n    alt=\"PyPiStatus\"\n  /></a>\n  <a href=\"https://archiva.hops.works/#artifact/com.logicalclocks/hsfs\"><img\n    src=\"https://img.shields.io/badge/java-HSFS-green\"\n    alt=\"Scala/Java Artifacts\"\n  /></a>\n  <a href=\"https://pepy.tech/project/hsfs/month\"><img\n    src=\"https://pepy.tech/badge/hsfs/month\"\n    alt=\"Downloads\"\n  /></a>\n  <a href=\"https://github.com/psf/black\"><img\n    src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\n    alt=\"CodeStyle\"\n  /></a>\n  <a><img\n    src=\"https://img.shields.io/pypi/l/hsfs?color=green\"\n    alt=\"License\"\n  /></a>\n</p>\n\nHSFS is the library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easy.\n\nThe library is environment independent and can be used in two modes:\n\n- Spark mode: For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides bindings both for Python and JVM languages.\n\n- Python mode: For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker or KubeFlow.\n\nThe library automatically configures itself based on the environment it is run.\nHowever, to connect from an external environment such as Databricks or AWS Sagemaker,\nadditional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.\n\n## Getting Started On Hopsworks\n\nInstantiate a connection and get the project feature store handler\n```python\nimport hsfs\n\nconnection = hsfs.connection()\nfs = connection.get_feature_store()\n```\n\nCreate a new feature group\n```python\nfg = fs.create_feature_group(\"rain\",\n                        version=1,\n                        description=\"Rain features\",\n                        primary_key=['date', 'location_id'],\n                        online_enabled=True)\n\nfg.save(dataframe)\n```\n\nUpsert new data in to the feature group with `time_travel_format=\"HUDI\"`\".\n```python\nfg.insert(upsert_df)\n```\n\nRetrieve commit timeline metdata of the feature group with `time_travel_format=\"HUDI\"`\".\n```python\nfg.commit_details()\n```\n\n\"Reading feature group as of specific point in time\".\n```python\nfg = fs.get_feature_group(\"rain\", 1)\nfg.read(\"2020-10-20 07:34:11\").show()\n```\n\nRead updates  that occurred between specified points in time.\n```python\nfg = fs.get_feature_group(\"rain\", 1)\nfg.read_changes(\"2020-10-20 07:31:38\", \"2020-10-20 07:34:11\").show()\n```\n\nJoin features together\n```python\nfeature_join = rain_fg.select_all()\n                    .join(temperature_fg.select_all(), on=[\"date\", \"location_id\"])\n                    .join(location_fg.select_all())\nfeature_join.show(5)\n```\n\njoin feature groups that correspond to specific point in time\n```python\nfeature_join = rain_fg.select_all()\n                    .join(temperature_fg.select_all(), on=[\"date\", \"location_id\"])\n                    .join(location_fg.select_all())\n                    .as_of(\"2020-10-31\")\nfeature_join.show(5)\n```\n\njoin feature groups that correspond to different time\n```python\nrain_fg_q = rain_fg.select_all().as_of(\"2020-10-20 07:41:43\")\ntemperature_fg_q = temperature_fg.select_all().as_of(\"2020-10-20 07:32:33\")\nlocation_fg_q = location_fg.select_all().as_of(\"2020-10-20 07:33:08\")\njoined_features_q = rain_fg_q.join(temperature_fg_q).join(location_fg_q)\n```\n\nUse the query object to create a training dataset:\n```python\ntd = fs.create_training_dataset(\"rain_dataset\",\n                                version=1,\n                                data_format=\"tfrecords\",\n                                description=\"A test training dataset saved in TfRecords format\",\n                                splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})\n\ntd.save(feature_join)\n```\n\nA short introduction to the Scala API:\n```scala\nimport com.logicalclocks.hsfs._\nval connection = HopsworksConnection.builder().build()\nval fs = connection.getFeatureStore();\nval attendances_features_fg = fs.getFeatureGroup(\"games_features\", 1);\nattendances_features_fg.show(1)\n```\n\nYou can find more examples on how to use the library in our [hops-examples](https://github.com/logicalclocks/hops-examples) repository.\n\n## Usage\n\nUsage data is collected for improving quality of the library. It is turned on by default if the backend\nis \"c.app.hopsworks.ai\". To turn it off, use one of the following way:\n```python\n# use environment variable\nimport os\nos.environ[\"ENABLE_HOPSWORKS_USAGE\"] = \"false\"\n\n# use `disable_usage_logging`\nimport hsfs\nhsfs.disable_usage_logging()\n```\n\nThe source code can be found in python/hsfs/usage.py.\n\n## Documentation\n\nDocumentation is available at [Hopsworks Feature Store Documentation](https://docs.hopsworks.ai/).\n\n## Issues\n\nFor general questions about the usage of Hopsworks and the Feature Store please open a topic on [Hopsworks Community](https://community.hopsworks.ai/).\n\nPlease report any issue using [Github issue tracking](https://github.com/logicalclocks/feature-store-api/issues).\n\n\n## Contributing\n\nIf you would like to contribute to this library, please see the [Contribution Guidelines](CONTRIBUTING.md).",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "HSFS: An environment independent client to interact with the Hopsworks Featurestore",
    "version": "3.7.6",
    "project_urls": {
        "Download": "https://github.com/logicalclocks/feature-store-api/releases/tag/3.7.6",
        "Homepage": "https://github.com/logicalclocks/feature-store-api"
    },
    "split_keywords": [
        "hopsworks",
        " feature store",
        " spark",
        " machine learning",
        " mlops",
        " dataops"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5680165c955b8db1e262d980823ee8b638b8486aea7a15c08d21a983f9adaa45",
                "md5": "2841804940e04ccfeef13a0a42cbec3c",
                "sha256": "8978c2267bd5dca9877af119ed54801fdac781872ac277f572b006b66c7b52c1"
            },
            "downloads": -1,
            "filename": "hsfs-3.7.6.tar.gz",
            "has_sig": false,
            "md5_digest": "2841804940e04ccfeef13a0a42cbec3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 249803,
            "upload_time": "2024-05-03T09:42:19",
            "upload_time_iso_8601": "2024-05-03T09:42:19.613793Z",
            "url": "https://files.pythonhosted.org/packages/56/80/165c955b8db1e262d980823ee8b638b8486aea7a15c08d21a983f9adaa45/hsfs-3.7.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-03 09:42:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "logicalclocks",
    "github_project": "feature-store-api",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hsfs"
}
        
Elapsed time: 0.30787s