<a href="https://tiledb.com"><img src="https://github.com/TileDB-Inc/TileDB/raw/dev/doc/source/_static/tiledb-logo_color_no_margin_@4x.png" alt="TileDB logo" width="400"></a>
[![TileDB-ML CI](https://github.com/TileDB-Inc/TileDB-ML/actions/workflows/ci.yml/badge.svg)](https://github.com/TileDB-Inc/TileDB-ML/actions/workflows/ci.yml)
![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/ktsitsi/2506f6c9d3375e2d636cf594340d11bf/raw/gistfile.json)
# TileDB-ML
TileDB-ML is the repository that contains all machine learning oriented functionality TileDB supports. In this repo, we explain how someone can employ
TileDB for machine learning oriented data management problems, and which are the next steps we have in mind. Here, we would firstly like to highlight our
perspective on the relation of TileDB with general machine learning oriented data management problems and how TileDB engine could be the solution for
efficiently storing any kind of machine learning data, i.e., from raw images, text, audio, time series and SAR to features and machine learning models.
Before you proceed further, please take a quick look on our [medium blog post](https://medium.com/tiledb/tiledb-as-the-data-engine-for-machine-learning-b48fb0e9b147),
which targets to explain in great detail how TileDB addresses many machine learning data format requirements, overcoming the drawbacks of the other
candidate formats, and take this opportunity to solicit feedback and contributions from the community.
## Description
As mentioned above, this repository contains all machine learning oriented functionality TileDB supports. Specifically, code that
can (or will be able to):
* **Save** machine learning models as TileDB arrays (At the moment we provide implementations for saving Tensorflow Keras, PyTorch and Scikit-Learn models.)
* **Load** machine learning models from TileDB arrays.
* **Read** features, in order to train machine learning models, from TileDB arrays directly to machine learning framework's data APIs.
We already [support](https://github.com/TileDB-Inc/TileDB-ML/blob/master/tiledb/ml/readers/) the Tensorflow and PyTorch
data APIs with the use of Python generators for Dense and Sparse TileDB arrays, and we are similarly working on Scikit-Learn
Pipelines which will be out soon.
## Examples
[comment]: <> (## Structure)
[comment]: <> (At the moment we provide code for saving and loading models to and from TileDB arrays and for loading features from TileDB arrays )
[comment]: <> (into Tensorflow Data API. The corresponding implementations for model save/load, live in ``tiledb/ml/models`` folder. )
[comment]: <> (All implemented classes (``TensorflowKerasTileDBModel``, ``PyTorchTileDBModel``, ``SklearnTileDBModel`` ) )
[comment]: <> (inherit from base class (``TileDBModel``) and implement ``save()`` and ``load()`` functionality. )
[comment]: <> (In case you would like to contribute model save/load implementations)
[comment]: <> (that support other machine learning frameworks, please take a look at the current implementations and commit code accordingly. Please)
[comment]: <> (also read the contributing section below.)
We provide some detailed notebook examples on how to save and load machine learning models as TileDB arrays (also on TileDB-Cloud) and explain why
this is useful in order to create simple and flexible model registries with TileDB.
* [Example for Tensorflow Keras Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/tensorflow_keras_tiledb_models_example.ipynb)
* [Example for PyTorch Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/pytorch_tiledb_models_example.ipynb)
* [Example for Scikit-Learn Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/sklearn_tiledb_models_example.ipynb)
* [Example for Tensorflow Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/tensorflow_tiledb_cloud_ml_model_array.ipynb)
* [Example for PyTorch Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/pytorch_tiledb_cloud_ml_model_array.ipynb)
* [Example for Scikit-Learn Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/sklearn_tiledb_cloud_ml_model_array.ipynb)
We also provide detailed notebook examples on how to train Tensorflow and PyTorch models with the use of our Data APIs support for Dense TileDB arrays.
* [Example on training wih Tensorflow and Dense TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/tensorflow_data_api_tiledb_dense.ipynb)
[comment]: <> (* [Example on training wih Tensorflow and Sparse TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/tensorflow_data_api_tiledb_sparse.ipynb))
* [Example on training wih PyTorch and Dense TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/pytorch_data_api_tiledb_dense.ipynb)
[comment]: <> (* [Example on training wih PyTorch and Sparse TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/pytorch_data_api_tiledb_sparse.ipynb))
Finally, we also provide an [End-To-End example](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/serverless_training) on how to ingest data, train a PyTorch model and serve it with UDFs completely serverlessly on TileDB-Cloud.
## Installation
TileDB-ML can be installed:
### Quick Installation
- from source by cloning the [Git](https://github.com/TileDB-Inc/TileDB-ML) repository:
git clone https://github.com/TileDB-Inc/TileDB-ML.git
cd TileDB-ML
# In case you want to install and check all frameworks. If you
# use zsh replace .[full] with .\[full\]
pip install -e .[full]
# In case you want to install and check Tensorflow only. If you
# use zsh replace .[tensorflow] with .\[tensorflow\]
pip install -e .[tensorflow]
# In case you want to install and check PyTorch only. If you
# use zsh replace .[pytorch] with .\[pytorch\]
pip install -e .[pytorch]
# In case you want to install and check Scikit-Learn only. If you
# use zsh replace .[sklearn] with .\[sklearn\]
pip install -e .[sklearn]
# In case you want to try any of the aforementioned machine learning framework
# on TileDB-Cloud try one of the follwoing.
pip install -e .[tensorflow_cloud]
pip install -e .[pytorch_cloud]
pip install -e .[sklearn_cloud]
- with pip from git:
pip install git+https://github.com/TileDB-Inc/TileDB-ML.git@master
- from PyPi:
[comment]: <> (TileDB-ML is available from either [PyPI](https://test.pypi.org/project/tiledb-ml/0.1.2.2/) with ``pip``:)
```
pip install tiledb-ml
```
The above command will just install the basic dependency of `tiledb-ml`, hence `tiledb`.
In order to install the integration for a specific framework you need to use:
```
pip install tiledb-ml[pytorch] # e.g. For checking only the Pytorch integration
```
Checking all the supported frameworks you will need to use:
```
pip install tiledb-ml[full]
```
The above commands apply to `bash` shell in case you use `zsh` you will
need to escape the `bracket` character like the following for example:
```
pip install tiledb-ml\[pytorch\]
```
- You may run the test suite with:
```
python setup.py test
```
[comment]: <> (## Roadmap)
[comment]: <> (We are already working on the following:)
[comment]: <> ([comment]: <> (* C++ integration of TileDB with the Tensorflow Data API through [tensorflow-io](https://github.com/tensorflow/io).))
[comment]: <> (* Readers from TileDB arrays to other popular machine learning framework Data APIs.)
[comment]: <> (* Model save/load support for other popular machine learning frameworks like XGBoost and CatBoost.)
[comment]: <> (Our ultimate goal is ALL machine learning data, from raw data (text, images, audio), to features (Feature Store) and models (Model Registry), represented, stored and managed)
[comment]: <> (in one **Data Engine**, i.e, TileDB.)
## Note
Here we would like to highlight that our current implementations are not optimal, and they don't support the aforementioned machine learning
frameworks 100%, e.g., serialization of model parts like numpy arrays, takes place with Pickle (which is far from optimal)
because of its ``Python Only`` nature and insecurity as described [here](https://docs.python.org/3/library/pickle.html).
We mainly show the universal data management ability of TileDB, and how elegantly applies in
machine learning data of any kind. Optimizations will follow as soon as possible.
In any case, note that the TileDB-ML repository is under development, and **the API is subject to change**.
## Contributing
We welcome all contributions! Please read the [contributing guidelines](https://github.com/TileDB-Inc/TileDB-ML/blob/master/CONTRIBUTING.md)
before submitting pull requests.
## Copyright
The TileDB-ML package is Copyright 2018-2021 TileDB, Inc
## License
MIT
Raw data
{
"_id": null,
"home_page": "https://github.com/TileDB-Inc/TileDB-ML",
"name": "tiledb-ml",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "tiledb,ml",
"author": "TileDB, Inc.",
"author_email": "help@tiledb.io",
"download_url": "https://files.pythonhosted.org/packages/85/ac/ab38ee5bf7a9fd998bb91ed88a9be05bef6909330d5023bafaa1cfb88c36/tiledb-ml-0.9.6.tar.gz",
"platform": "any",
"description": "<a href=\"https://tiledb.com\"><img src=\"https://github.com/TileDB-Inc/TileDB/raw/dev/doc/source/_static/tiledb-logo_color_no_margin_@4x.png\" alt=\"TileDB logo\" width=\"400\"></a>\n\n[![TileDB-ML CI](https://github.com/TileDB-Inc/TileDB-ML/actions/workflows/ci.yml/badge.svg)](https://github.com/TileDB-Inc/TileDB-ML/actions/workflows/ci.yml)\n![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/ktsitsi/2506f6c9d3375e2d636cf594340d11bf/raw/gistfile.json)\n\n# TileDB-ML\n\nTileDB-ML is the repository that contains all machine learning oriented functionality TileDB supports. In this repo, we explain how someone can employ \nTileDB for machine learning oriented data management problems, and which are the next steps we have in mind. Here, we would firstly like to highlight our \nperspective on the relation of TileDB with general machine learning oriented data management problems and how TileDB engine could be the solution for \nefficiently storing any kind of machine learning data, i.e., from raw images, text, audio, time series and SAR to features and machine learning models. \nBefore you proceed further, please take a quick look on our [medium blog post](https://medium.com/tiledb/tiledb-as-the-data-engine-for-machine-learning-b48fb0e9b147), \nwhich targets to explain in great detail how TileDB addresses many machine learning data format requirements, overcoming the drawbacks of the other \ncandidate formats, and take this opportunity to solicit feedback and contributions from the community.\n\n## Description\n\nAs mentioned above, this repository contains all machine learning oriented functionality TileDB supports. Specifically, code that \ncan (or will be able to): \n\n* **Save** machine learning models as TileDB arrays (At the moment we provide implementations for saving Tensorflow Keras, PyTorch and Scikit-Learn models.)\n \n* **Load** machine learning models from TileDB arrays. \n\n* **Read** features, in order to train machine learning models, from TileDB arrays directly to machine learning framework's data APIs. \n We already [support](https://github.com/TileDB-Inc/TileDB-ML/blob/master/tiledb/ml/readers/) the Tensorflow and PyTorch\n data APIs with the use of Python generators for Dense and Sparse TileDB arrays, and we are similarly working on Scikit-Learn \n Pipelines which will be out soon.\n \n## Examples\n\n[comment]: <> (## Structure)\n[comment]: <> (At the moment we provide code for saving and loading models to and from TileDB arrays and for loading features from TileDB arrays )\n\n[comment]: <> (into Tensorflow Data API. The corresponding implementations for model save/load, live in ``tiledb/ml/models`` folder. )\n\n[comment]: <> (All implemented classes (``TensorflowKerasTileDBModel``, ``PyTorchTileDBModel``, ``SklearnTileDBModel`` ) )\n\n[comment]: <> (inherit from base class (``TileDBModel``) and implement ``save()`` and ``load()`` functionality. )\n\n[comment]: <> (In case you would like to contribute model save/load implementations)\n\n[comment]: <> (that support other machine learning frameworks, please take a look at the current implementations and commit code accordingly. Please)\n\n[comment]: <> (also read the contributing section below.)\n\nWe provide some detailed notebook examples on how to save and load machine learning models as TileDB arrays (also on TileDB-Cloud) and explain why \nthis is useful in order to create simple and flexible model registries with TileDB.\n\n* [Example for Tensorflow Keras Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/tensorflow_keras_tiledb_models_example.ipynb)\n* [Example for PyTorch Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/pytorch_tiledb_models_example.ipynb)\n* [Example for Scikit-Learn Models](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/models/sklearn_tiledb_models_example.ipynb)\n* [Example for Tensorflow Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/tensorflow_tiledb_cloud_ml_model_array.ipynb)\n* [Example for PyTorch Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/pytorch_tiledb_cloud_ml_model_array.ipynb)\n* [Example for Scikit-Learn Model on TileDB-Cloud](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/models/sklearn_tiledb_cloud_ml_model_array.ipynb)\n\n\nWe also provide detailed notebook examples on how to train Tensorflow and PyTorch models with the use of our Data APIs support for Dense TileDB arrays.\n\n* [Example on training wih Tensorflow and Dense TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/tensorflow_data_api_tiledb_dense.ipynb)\n\n[comment]: <> (* [Example on training wih Tensorflow and Sparse TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/tensorflow_data_api_tiledb_sparse.ipynb))\n* [Example on training wih PyTorch and Dense TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/pytorch_data_api_tiledb_dense.ipynb)\n\n[comment]: <> (* [Example on training wih PyTorch and Sparse TileDB arrays](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/readers/pytorch_data_api_tiledb_sparse.ipynb))\n\nFinally, we also provide an [End-To-End example](https://github.com/TileDB-Inc/TileDB-ML/blob/master/examples/cloud/serverless_training) on how to ingest data, train a PyTorch model and serve it with UDFs completely serverlessly on TileDB-Cloud.\n\n## Installation\n\nTileDB-ML can be installed:\n\n### Quick Installation\n\n- from source by cloning the [Git](https://github.com/TileDB-Inc/TileDB-ML) repository:\n\n git clone https://github.com/TileDB-Inc/TileDB-ML.git\n cd TileDB-ML\n \n # In case you want to install and check all frameworks. If you\n # use zsh replace .[full] with .\\[full\\]\n pip install -e .[full]\n\n # In case you want to install and check Tensorflow only. If you\n # use zsh replace .[tensorflow] with .\\[tensorflow\\]\n pip install -e .[tensorflow]\n\n # In case you want to install and check PyTorch only. If you\n # use zsh replace .[pytorch] with .\\[pytorch\\]\n pip install -e .[pytorch]\n\n # In case you want to install and check Scikit-Learn only. If you\n # use zsh replace .[sklearn] with .\\[sklearn\\]\n pip install -e .[sklearn] \n\n # In case you want to try any of the aforementioned machine learning framework\n # on TileDB-Cloud try one of the follwoing.\n pip install -e .[tensorflow_cloud]\n pip install -e .[pytorch_cloud]\n pip install -e .[sklearn_cloud]\n\n- with pip from git:\n\n pip install git+https://github.com/TileDB-Inc/TileDB-ML.git@master\n\n- from PyPi:\n\n[comment]: <> (TileDB-ML is available from either [PyPI](https://test.pypi.org/project/tiledb-ml/0.1.2.2/) with ``pip``:)\n\n ```\n pip install tiledb-ml\n ```\n The above command will just install the basic dependency of `tiledb-ml`, hence `tiledb`.\n In order to install the integration for a specific framework you need to use:\n \n ```\n pip install tiledb-ml[pytorch] # e.g. For checking only the Pytorch integration\n ```\n \n Checking all the supported frameworks you will need to use:\n\n ```\n pip install tiledb-ml[full]\n ```\n \n The above commands apply to `bash` shell in case you use `zsh` you will \n need to escape the `bracket` character like the following for example:\n \n ```\n pip install tiledb-ml\\[pytorch\\]\n ```\n \n- You may run the test suite with:\n ```\n python setup.py test\n ```\n\n[comment]: <> (## Roadmap)\n\n[comment]: <> (We are already working on the following:)\n\n[comment]: <> ([comment]: <> (* C++ integration of TileDB with the Tensorflow Data API through [tensorflow-io](https://github.com/tensorflow/io).))\n\n[comment]: <> (* Readers from TileDB arrays to other popular machine learning framework Data APIs.)\n\n[comment]: <> (* Model save/load support for other popular machine learning frameworks like XGBoost and CatBoost.)\n\n[comment]: <> (Our ultimate goal is ALL machine learning data, from raw data (text, images, audio), to features (Feature Store) and models (Model Registry), represented, stored and managed)\n\n[comment]: <> (in one **Data Engine**, i.e, TileDB.)\n\n\n## Note\n\nHere we would like to highlight that our current implementations are not optimal, and they don't support the aforementioned machine learning\nframeworks 100%, e.g., serialization of model parts like numpy arrays, takes place with Pickle (which is far from optimal)\nbecause of its ``Python Only`` nature and insecurity as described [here](https://docs.python.org/3/library/pickle.html).\nWe mainly show the universal data management ability of TileDB, and how elegantly applies in \nmachine learning data of any kind. Optimizations will follow as soon as possible.\n\nIn any case, note that the TileDB-ML repository is under development, and **the API is subject to change**.\n\n\n## Contributing\n\nWe welcome all contributions! Please read the [contributing guidelines](https://github.com/TileDB-Inc/TileDB-ML/blob/master/CONTRIBUTING.md) \nbefore submitting pull requests.\n\n## Copyright\n\nThe TileDB-ML package is Copyright 2018-2021 TileDB, Inc\n\n## License\n\nMIT\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Package supports all machine learning functionality for TileDB Embedded and TileDB Cloud",
"version": "0.9.6",
"project_urls": {
"Bug Tracker": "https://github.com/TileDB-Inc/TileDB-ML/issues",
"Documentation": "https://docs.tiledb.com",
"Homepage": "https://github.com/TileDB-Inc/TileDB-ML"
},
"split_keywords": [
"tiledb",
"ml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "035a50b72647a3756418bad2a1e7575347364acd13030fb3cae88887f6a00dfd",
"md5": "3381dfd428fce9e02010e27714e27130",
"sha256": "98cfb4dc8c6eddecb3ce4cca44fe7d3c23672065cb99c234bfba9a68ed82ac88"
},
"downloads": -1,
"filename": "tiledb_ml-0.9.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3381dfd428fce9e02010e27714e27130",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 64425,
"upload_time": "2023-09-12T13:34:48",
"upload_time_iso_8601": "2023-09-12T13:34:48.513581Z",
"url": "https://files.pythonhosted.org/packages/03/5a/50b72647a3756418bad2a1e7575347364acd13030fb3cae88887f6a00dfd/tiledb_ml-0.9.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "85acab38ee5bf7a9fd998bb91ed88a9be05bef6909330d5023bafaa1cfb88c36",
"md5": "19a5f3d53b7404284e6350a4e27ae4a6",
"sha256": "debd0d82ea58a8b96d650c104ff045f22fefde5c55798a923aee9d93de2c2d14"
},
"downloads": -1,
"filename": "tiledb-ml-0.9.6.tar.gz",
"has_sig": false,
"md5_digest": "19a5f3d53b7404284e6350a4e27ae4a6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 167112,
"upload_time": "2023-09-12T13:34:50",
"upload_time_iso_8601": "2023-09-12T13:34:50.392250Z",
"url": "https://files.pythonhosted.org/packages/85/ac/ab38ee5bf7a9fd998bb91ed88a9be05bef6909330d5023bafaa1cfb88c36/tiledb-ml-0.9.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-12 13:34:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TileDB-Inc",
"github_project": "TileDB-ML",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tiledb-ml"
}