# MLflow-XetHub
This plugin integrates XetHub with MLflow so that you can use existing MLflow code to track experiments but store artifacts to XetHub.
## Install plugin
Install from PyPI for the plugin's published version
`pip install mlflow[xethub]`
Or clone this repo and install locally for the latest code
```bash
git clone https://github.com/xetdata/MLflow-XetHub.git
cd MLflow-XetHub
pip install .
```
## Authenticate with XetHub
If you haven't already, [create an XetHub account](https://xethub.com/assets/docs/getting-started/installation#create-a-xethub-account).
The plugin uses [PyXet](https://github.com/xetdata/pyxet) to access XetHub, so you need to authenticate with XetHub in one of the following two ways.
### Option 1: [Log in with Xet CLI](https://xethub.com/assets/docs/getting-started/installation#configure-authentication)
```
xet login --email <email address associated with account> --user <user name> --password <personal access token>
```
### Option 2: [Export xet credentials as environment variables](https://pyxet.readthedocs.io/en/latest/#environment-variable)
```bash
export XET_USER_EMAIL = <email>
export XET_USER_NAME = <username>
export XET_USER_TOKEN = <personal_access_token>
```
###
## Create a XetHub repo to store your artifacts
Go to https://xethub.com/ and [create a new repo](https://xethub.com/assets/docs/workflows/clone-and-iterate#create-a-xet-repository) to store your MLflow artifacts.
Or [log in with Xet CLI](log-in-with-xet-cLI) and `xet repo make xet://<username>/<repo> --private / --public`
## Run your MLflow as is
### Run MLflow server specifying XetHub repo to store artifact
No need to modify your MLflow code. The plugin will automatically detect MLflow runs and artifacts and store them in your XetHub repo once you start the MLflow server with:
```bash
mlflow server --backend-store-uri ./mlruns --artifacts-destination xet://<username>/<repo>/<branch> --default-artifact-root xet://<username>/<repo>/<branch>
```
which uses the `mlruns` directory on your machine as file store backend and XetHub as [artifact store](https://mlflow.org/docs/latest/tracking.html#artifact-stores) backend.
### Run MLflow experiment
*Experiments are logged in the directory where MLflow server is started, and the plugin and MLflow need to be running in the same python environment.
So make sure to run your MLflow code and server in the same directory as well as having the plugin and MLflow installed under the same environment.*
Using [MLflow's quickstart](https://docs.databricks.com/en/_extras/notebooks/source/mlflow/mlflow-quick-start-python.html) as an example,
```python
import mlflow
import os
import numpy as np
from mlflow import log_artifacts
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
with mlflow.start_run():
mlflow.autolog()
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)
# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/pred.txt", "w") as f:
f.write(np.array2string(predictions))
log_artifacts("outputs")
```
### Store artifacts on XetHub and visualize in MLflow UI
The artifacts will be automatically stored on XetHub under the specified repo and branch.
<img width="1720" alt="artifact_on_xethub" src="https://github.com/xetdata/Xet-MLflow/assets/22567795/fa5d4806-64b7-4d81-afde-1363175574d7">
And the MLflow server will show the artifacts with UI on the default `http://127.0.0.1:5000` or your own host.
<img width="1728" alt="artifact_on_mlflow_ui" src="https://github.com/xetdata/Xet-MLflow/assets/22567795/1a43b60d-d92d-4d9d-bd7e-9a69bc2026eb">
Raw data
{
"_id": null,
"home_page": "https://github.com/xetdata/Xet-MLflow",
"name": "mlflow-xethub",
"maintainer": "Kelton Zhang",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "kelton@xethub.com",
"keywords": "ai,collaboration,data-science,developer-tools,git,mlflow,machine-learning,reproducibility",
"author": "XetHub",
"author_email": "contact@xethub.com",
"download_url": "https://files.pythonhosted.org/packages/c4/71/0a33889c0756991c4106c87c338cdd00a123f7fb616701b753f0d9f3206b/mlflow-xethub-0.0.0.tar.gz",
"platform": null,
"description": "# MLflow-XetHub\nThis plugin integrates XetHub with MLflow so that you can use existing MLflow code to track experiments but store artifacts to XetHub.\n\n## Install plugin\nInstall from PyPI for the plugin's published version\n`pip install mlflow[xethub]`\n\nOr clone this repo and install locally for the latest code \n```bash\ngit clone https://github.com/xetdata/MLflow-XetHub.git\ncd MLflow-XetHub\npip install .\n```\n\n## Authenticate with XetHub \nIf you haven't already, [create an XetHub account](https://xethub.com/assets/docs/getting-started/installation#create-a-xethub-account).\n\nThe plugin uses [PyXet](https://github.com/xetdata/pyxet) to access XetHub, so you need to authenticate with XetHub in one of the following two ways.\n### Option 1: [Log in with Xet CLI](https://xethub.com/assets/docs/getting-started/installation#configure-authentication)\n```\nxet login --email <email address associated with account> --user <user name> --password <personal access token>\n```\n\n### Option 2: [Export xet credentials as environment variables](https://pyxet.readthedocs.io/en/latest/#environment-variable)\n\n```bash\nexport XET_USER_EMAIL = <email> \nexport XET_USER_NAME = <username>\nexport XET_USER_TOKEN = <personal_access_token>\n```\n### \n\n## Create a XetHub repo to store your artifacts\nGo to https://xethub.com/ and [create a new repo](https://xethub.com/assets/docs/workflows/clone-and-iterate#create-a-xet-repository) to store your MLflow artifacts.\n\nOr [log in with Xet CLI](log-in-with-xet-cLI) and `xet repo make xet://<username>/<repo> --private / --public`\n\n## Run your MLflow as is \n### Run MLflow server specifying XetHub repo to store artifact\nNo need to modify your MLflow code. The plugin will automatically detect MLflow runs and artifacts and store them in your XetHub repo once you start the MLflow server with:\n\n```bash\nmlflow server --backend-store-uri ./mlruns --artifacts-destination xet://<username>/<repo>/<branch> --default-artifact-root xet://<username>/<repo>/<branch>\n```\n\nwhich uses the `mlruns` directory on your machine as file store backend and XetHub as [artifact store](https://mlflow.org/docs/latest/tracking.html#artifact-stores) backend.\n\n### Run MLflow experiment\n*Experiments are logged in the directory where MLflow server is started, and the plugin and MLflow need to be running in the same python environment. \nSo make sure to run your MLflow code and server in the same directory as well as having the plugin and MLflow installed under the same environment.*\n\nUsing [MLflow's quickstart](https://docs.databricks.com/en/_extras/notebooks/source/mlflow/mlflow-quick-start-python.html) as an example,\n```python\nimport mlflow \nimport os\nimport numpy as np\nfrom mlflow import log_artifacts\nfrom sklearn.model_selection import train_test_split \nfrom sklearn.datasets import load_diabetes\nfrom sklearn.ensemble import RandomForestRegressor \n\nwith mlflow.start_run():\n mlflow.autolog() \n db = load_diabetes() \n\n X_train, X_test, y_train, y_test = train_test_split(db.data, db.target) \n\n # Create and train models. \n rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3) \n rf.fit(X_train, y_train) \n\n # Use the model to make predictions on the test dataset. \n predictions = rf.predict(X_test)\n\n if not os.path.exists(\"outputs\"):\n os.makedirs(\"outputs\")\n\n with open(\"outputs/pred.txt\", \"w\") as f:\n f.write(np.array2string(predictions))\n\n log_artifacts(\"outputs\")\n```\n\n### Store artifacts on XetHub and visualize in MLflow UI\nThe artifacts will be automatically stored on XetHub under the specified repo and branch. \n<img width=\"1720\" alt=\"artifact_on_xethub\" src=\"https://github.com/xetdata/Xet-MLflow/assets/22567795/fa5d4806-64b7-4d81-afde-1363175574d7\">\n\nAnd the MLflow server will show the artifacts with UI on the default `http://127.0.0.1:5000` or your own host.\n<img width=\"1728\" alt=\"artifact_on_mlflow_ui\" src=\"https://github.com/xetdata/Xet-MLflow/assets/22567795/1a43b60d-d92d-4d9d-bd7e-9a69bc2026eb\">\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "mlflow[xethub] is a mlflow plugin integrating XetHub with MLflow so that you can use existing MLflow code to track experiments but store artifacts to XetHub.",
"version": "0.0.0",
"project_urls": {
"Homepage": "https://github.com/xetdata/Xet-MLflow"
},
"split_keywords": [
"ai",
"collaboration",
"data-science",
"developer-tools",
"git",
"mlflow",
"machine-learning",
"reproducibility"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2820ea790e0add7a72b88fbfd5926efe9c6347bee4446c4e410d448efb555ef1",
"md5": "e19e13f75369160d937aec14cb1741bd",
"sha256": "b2a503fbf865d5386ef0344877161948103b74922a52ea2f86ff1dd1707e14fe"
},
"downloads": -1,
"filename": "mlflow_xethub-0.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e19e13f75369160d937aec14cb1741bd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 8000,
"upload_time": "2023-10-12T03:42:14",
"upload_time_iso_8601": "2023-10-12T03:42:14.113340Z",
"url": "https://files.pythonhosted.org/packages/28/20/ea790e0add7a72b88fbfd5926efe9c6347bee4446c4e410d448efb555ef1/mlflow_xethub-0.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c4710a33889c0756991c4106c87c338cdd00a123f7fb616701b753f0d9f3206b",
"md5": "f6f0897b76cb888cef5d1f12675e9351",
"sha256": "8358cd52ff8042e9d318d19f5d89a1cdfb6c13b2a069e06afd836e3ec0daad9c"
},
"downloads": -1,
"filename": "mlflow-xethub-0.0.0.tar.gz",
"has_sig": false,
"md5_digest": "f6f0897b76cb888cef5d1f12675e9351",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 10105,
"upload_time": "2023-10-12T03:42:15",
"upload_time_iso_8601": "2023-10-12T03:42:15.889383Z",
"url": "https://files.pythonhosted.org/packages/c4/71/0a33889c0756991c4106c87c338cdd00a123f7fb616701b753f0d9f3206b/mlflow-xethub-0.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-12 03:42:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xetdata",
"github_project": "Xet-MLflow",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mlflow-xethub"
}