Name | detrain JSON |
Version |
0.2.6
JSON |
| download |
home_page | None |
Summary | A package for distributed training & model parallelism using Torch |
upload_time | 2024-05-12 09:44:21 |
maintainer | None |
docs_url | None |
author | A2N Finance |
requires_python | >=3.8 |
license | MIT |
keywords |
torch
model parallelism
pipeline
tensor
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
==============
DeTrain
==============
Overview
--------
DeTrain is a Python package designed to train AI models using model parallelism methods. This package focuses on pipeline and tensor parallelism.
Installation
------------
You can install DeTrain using pip:
.. code-block:: sh
pip install detrain
Usage
-----
Once installed, you can use DeTrain in your Python scripts like this:
.. code-block:: python
import torch.nn as nn
import torch
import time
import os
from detrain.ppl.args_util import get_args
from detrain.ppl.worker import run_worker
from detrain.ppl.dataset_util import get_torchvision_dataset
from shards_model import NNShard1, NNShard2
import torch.optim as optim
if __name__=="__main__":
args = get_args()
# Get args
world_size = int(os.environ["WORLD_SIZE"])
rank = int(os.environ["RANK"])
epochs = int(args.epochs)
batch_size = int(args.batch_size)
lr = float(args.lr)
for i in range(torch.cuda.device_count()):
print(torch.cuda.get_device_properties(i).name)
devices = []
workers = []
shards = [NNShard1, NNShard2]
# Check devices
if (args.gpu is not None):
arr = args.gpu.split('_')
for dv in range(len(arr)):
if dv > 0:
workers.append(f"worker{dv}")
if int(arr[dv]) == 1:
devices.append("cuda:0")
else:
devices.append("cpu")
# Define optimizer & loss_fn
loss_fn = nn.CrossEntropyLoss()
optimizer_class = optim.SGD
# Dataloaders
(train_dataloader, test_dataloader) = get_torchvision_dataset("MNIST", batch_size)
print(f"World_size: {world_size}, Rank: {rank}")
num_split = 4
tik = time.time()
run_worker(
rank,
world_size,
(
args.split_size,
workers,
devices,
shards
),
train_dataloader,
test_dataloader,
loss_fn,
optimizer_class,
epochs,
batch_size,
lr
)
tok = time.time()
print(f"number of splits = {num_split}, execution time = {tok - tik}")
For detailed examples, please visit the `DeTrain examples <https://github.com/a2nfinance/detrain-example>`_.
Contributing
------------
Contributions are welcome! If you'd like to contribute to DeTrain, please follow these steps:
1. Fork the repository on GitHub.
2. Create a new branch.
3. Make your changes and commit them with clear descriptions.
4. Push your changes to your fork.
5. Submit a pull request.
Bug Reports and Feedback
------------------------
If you encounter any bugs or have feedback, please open an issue on the GitHub repository.
License
-------
DeTrain is licensed under the MIT License. See the LICENSE file for more information.
Raw data
{
"_id": null,
"home_page": null,
"name": "detrain",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "torch, model parallelism, pipeline, tensor",
"author": "A2N Finance",
"author_email": "Levi <levi@a2n.finance>, John <john@a2n.finance>",
"download_url": "https://files.pythonhosted.org/packages/a7/91/53324575d8608d11920d0b4e647c31ae16c51540af4387973771140dc81e/detrain-0.2.6.tar.gz",
"platform": null,
"description": "==============\nDeTrain\n==============\n\nOverview\n--------\n\nDeTrain is a Python package designed to train AI models using model parallelism methods. This package focuses on pipeline and tensor parallelism.\n\nInstallation\n------------\n\nYou can install DeTrain using pip:\n\n.. code-block:: sh\n\n pip install detrain\n\nUsage\n-----\n\nOnce installed, you can use DeTrain in your Python scripts like this:\n\n.. code-block:: python\n\n import torch.nn as nn\n import torch\n import time\n import os\n from detrain.ppl.args_util import get_args\n from detrain.ppl.worker import run_worker\n from detrain.ppl.dataset_util import get_torchvision_dataset\n from shards_model import NNShard1, NNShard2\n import torch.optim as optim\n\n if __name__==\"__main__\":\n args = get_args()\n # Get args\n world_size = int(os.environ[\"WORLD_SIZE\"])\n rank = int(os.environ[\"RANK\"])\n epochs = int(args.epochs)\n batch_size = int(args.batch_size)\n lr = float(args.lr)\n\n for i in range(torch.cuda.device_count()):\n print(torch.cuda.get_device_properties(i).name)\n\n devices = []\n workers = []\n shards = [NNShard1, NNShard2]\n # Check devices\n if (args.gpu is not None):\n arr = args.gpu.split('_')\n for dv in range(len(arr)):\n if dv > 0:\n workers.append(f\"worker{dv}\")\n if int(arr[dv]) == 1:\n devices.append(\"cuda:0\")\n else:\n devices.append(\"cpu\")\n\n # Define optimizer & loss_fn\n loss_fn = nn.CrossEntropyLoss()\n optimizer_class = optim.SGD\n \n # Dataloaders\n\n (train_dataloader, test_dataloader) = get_torchvision_dataset(\"MNIST\", batch_size)\n\n \n print(f\"World_size: {world_size}, Rank: {rank}\")\n \n num_split = 4\n tik = time.time()\n run_worker(\n rank, \n world_size, \n (\n args.split_size, \n workers,\n devices, \n shards\n ), \n train_dataloader, \n test_dataloader, \n loss_fn, \n optimizer_class, \n epochs, \n batch_size,\n lr\n )\n tok = time.time()\n print(f\"number of splits = {num_split}, execution time = {tok - tik}\")\n\nFor detailed examples, please visit the `DeTrain examples <https://github.com/a2nfinance/detrain-example>`_.\n\nContributing\n------------\n\nContributions are welcome! If you'd like to contribute to DeTrain, please follow these steps:\n\n1. Fork the repository on GitHub.\n2. Create a new branch.\n3. Make your changes and commit them with clear descriptions.\n4. Push your changes to your fork.\n5. Submit a pull request.\n\nBug Reports and Feedback\n------------------------\n\nIf you encounter any bugs or have feedback, please open an issue on the GitHub repository.\n\nLicense\n-------\n\nDeTrain is licensed under the MIT License. See the LICENSE file for more information.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A package for distributed training & model parallelism using Torch",
"version": "0.2.6",
"project_urls": null,
"split_keywords": [
"torch",
" model parallelism",
" pipeline",
" tensor"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "75c648a5fbef3722ceeb979b0c6991e3744540cbd94d4d3e05c1c0ea0737d1f1",
"md5": "210412fc3e9591bf883d090759f75b4a",
"sha256": "651bd12301a8011746797eef64a7a3a9c838f0d44b622f5e7aec8cdb0415a40e"
},
"downloads": -1,
"filename": "detrain-0.2.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "210412fc3e9591bf883d090759f75b4a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 16527,
"upload_time": "2024-05-12T09:44:19",
"upload_time_iso_8601": "2024-05-12T09:44:19.628113Z",
"url": "https://files.pythonhosted.org/packages/75/c6/48a5fbef3722ceeb979b0c6991e3744540cbd94d4d3e05c1c0ea0737d1f1/detrain-0.2.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a79153324575d8608d11920d0b4e647c31ae16c51540af4387973771140dc81e",
"md5": "498c36dd6f7947d8cd237529abe526cb",
"sha256": "6c88ffc326013aeeaa675aee365af335be7ee8f50799e7eeb66acf13e8b95df2"
},
"downloads": -1,
"filename": "detrain-0.2.6.tar.gz",
"has_sig": false,
"md5_digest": "498c36dd6f7947d8cd237529abe526cb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15698,
"upload_time": "2024-05-12T09:44:21",
"upload_time_iso_8601": "2024-05-12T09:44:21.329381Z",
"url": "https://files.pythonhosted.org/packages/a7/91/53324575d8608d11920d0b4e647c31ae16c51540af4387973771140dc81e/detrain-0.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-12 09:44:21",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "detrain"
}