vineyard-ml: Accelerating Data Science Pipelines
================================================
Vineyard has been tightly integrated with the data preprocessing pipelines in
widely-adopted machine learning frameworks like PyTorch, TensorFlow, and MXNet.
Shared objects in vineyard, e.g., `vineyard::Tensor`, `vineyard::DataFrame`,
`vineyard::Table`, etc., can be directly used as the inputs of the training
and inference tasks in these frameworks.
Examples
--------
### Datasets
The following examples shows how `DataFrame` in vineyard can be used as the input
of Dataset for PyTorch:
```python
import os
import numpy as np
import pandas as pd
import torch
import vineyard
# connected to vineyard, see also: https://v6d.io/notes/getting-started.html
client = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])
# generate a dummy dataframe in vineyard
df = pd.DataFrame({
# multi-dimensional array as a column
'data': vineyard.data.dataframe.NDArrayArray(np.random.rand(1000, 10)),
'label': np.random.rand(1000)
})
object_id = client.put(df)
# take it as a torch dataset
from vineyard.contrib.ml.torch import torch_context
with torch_context():
# ds is a `torch.utils.data.TensorDataset`
ds = client.get(object_id)
# or, you can use datapipes from torchdata
from vineyard.contrib.ml.torch import datapipe
pipe = datapipe(ds)
# use the datapipes in your training loop
for data, label in pipe:
# do something
pass
```
### Pytorch Modules
The following example shows how to use vineyard to share pytorch modules between processes:
```python
import torch
import vineyard
# connected to vineyard, see also: https://v6d.io/notes/getting-started.html
client = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])
# generate a dummy model in vineyard
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
model = Model()
# put the model into vineyard
from vineyard.contrib.ml.torch import torch_context
with torch_context():
object_id = client.put(model)
# get the module state dict from vineyard and load it into a new model
model = Model()
with torch_context():
state_dict = client.get(object_id)
model.load_state_dict(state_dict, assign=True)
```
Reference and Implementation
----------------------------
- [torch](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/torch.py): including PyTorch datasets, torcharrow and torchdata.
- [tensorflow](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/tensorflow.py)
- [mxnet](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/mxnet.py)
For more details about vineyard itself, please refer to the [Vineyard](https://v6d.io) project.
Raw data
{
"_id": null,
"home_page": "https://v6d.io",
"name": "vineyard-ml",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "The vineyard team",
"author_email": "developers@v6d.io",
"download_url": "",
"platform": "POSIX",
"description": "vineyard-ml: Accelerating Data Science Pipelines\n================================================\n\nVineyard has been tightly integrated with the data preprocessing pipelines in\nwidely-adopted machine learning frameworks like PyTorch, TensorFlow, and MXNet.\nShared objects in vineyard, e.g., `vineyard::Tensor`, `vineyard::DataFrame`,\n`vineyard::Table`, etc., can be directly used as the inputs of the training\nand inference tasks in these frameworks.\n\nExamples\n--------\n\n### Datasets\n\nThe following examples shows how `DataFrame` in vineyard can be used as the input\nof Dataset for PyTorch:\n\n```python\nimport os\n\nimport numpy as np\nimport pandas as pd\n\nimport torch\nimport vineyard\n\n# connected to vineyard, see also: https://v6d.io/notes/getting-started.html\nclient = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])\n\n# generate a dummy dataframe in vineyard\ndf = pd.DataFrame({\n # multi-dimensional array as a column\n 'data': vineyard.data.dataframe.NDArrayArray(np.random.rand(1000, 10)),\n 'label': np.random.rand(1000)\n})\nobject_id = client.put(df)\n\n# take it as a torch dataset\nfrom vineyard.contrib.ml.torch import torch_context\nwith torch_context():\n # ds is a `torch.utils.data.TensorDataset`\n ds = client.get(object_id)\n\n# or, you can use datapipes from torchdata\nfrom vineyard.contrib.ml.torch import datapipe\npipe = datapipe(ds)\n\n# use the datapipes in your training loop\nfor data, label in pipe:\n # do something\n pass\n```\n\n### Pytorch Modules\n\nThe following example shows how to use vineyard to share pytorch modules between processes:\n\n```python\nimport torch\nimport vineyard\n\n# connected to vineyard, see also: https://v6d.io/notes/getting-started.html\nclient = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])\n\n# generate a dummy model in vineyard\nclass Model(nn.Module):\n def __init__(self):\n super().__init__()\n self.conv1 = nn.Conv2d(1, 20, 5)\n self.conv2 = nn.Conv2d(20, 20, 5)\n\n def forward(self, x):\n x = F.relu(self.conv1(x))\n return F.relu(self.conv2(x))\n\nmodel = Model()\n\n# put the model into vineyard\nfrom vineyard.contrib.ml.torch import torch_context\nwith torch_context():\n object_id = client.put(model)\n\n# get the module state dict from vineyard and load it into a new model\nmodel = Model()\nwith torch_context():\n state_dict = client.get(object_id)\nmodel.load_state_dict(state_dict, assign=True)\n```\n\nReference and Implementation\n----------------------------\n\n- [torch](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/torch.py): including PyTorch datasets, torcharrow and torchdata.\n- [tensorflow](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/tensorflow.py)\n- [mxnet](https://github.com/v6d-io/v6d/blob/main/python/vineyard/contrib/ml/mxnet.py)\n\nFor more details about vineyard itself, please refer to the [Vineyard](https://v6d.io) project.\n\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Vineyard integration with machine learning frameworks",
"version": "0.21.3",
"project_urls": {
"Documentation": "https://v6d.io",
"Homepage": "https://v6d.io",
"Source": "https://github.com/v6d-io/v6d",
"Tracker": "https://github.com/v6d-io/v6d/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e169a42403cd74d28e9d7f8d0cf94d94e862251774d55f93a607faaee19dc47a",
"md5": "319c1a9b3ccb09ee8217f7f720c55179",
"sha256": "d54baf1fca761bc4945f396172c42ce2720c8308e9dd064e0bb318038d48f560"
},
"downloads": -1,
"filename": "vineyard_ml-0.21.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "319c1a9b3ccb09ee8217f7f720c55179",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 31544,
"upload_time": "2024-03-01T03:12:28",
"upload_time_iso_8601": "2024-03-01T03:12:28.600987Z",
"url": "https://files.pythonhosted.org/packages/e1/69/a42403cd74d28e9d7f8d0cf94d94e862251774d55f93a607faaee19dc47a/vineyard_ml-0.21.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-01 03:12:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "v6d-io",
"github_project": "v6d",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "vineyard-ml"
}