trainy


Nametrainy JSON
Version 0.1.3 PyPI version JSON
download
home_page
SummaryTrainy: An observability tool for profiling PyTorch training on demand
upload_time2023-07-19 09:15:00
maintainer
docs_urlNone
authorTrainy Team
requires_python
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Trainy on-demand profiler

<p align="center">
  <img height='100px' src="https://www.ocf.berkeley.edu/~asai/static/images/trainy.png">
</p>

![GitHub Repo stars](https://img.shields.io/github/stars/Trainy-ai/trainy?style=social)
[![](https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/TrainyAI)
[![](https://dcbadge.vercel.app/api/server/d67CMuKY5V)](https://discord.gg/d67CMuKY5V)

This is the trainy CLI and daemon to setup on demand tracing for PyTorch in pure Python. This will allow you to extract traces in the middle of training.

## Installation

You can either install from pypi or from source

```
# install from pypi
pip install trainy

# install from source
git clone https://github.com/Trainy-ai/trainy
pip install -e trainy
```

## Quickstart

If you haven't already, set up ray head and worker nodes. This can configured to happen automatically using (Skypilot)[https://skypilot.readthedocs.io/en/latest/index.html] or K8s

```
# on the head node 
$ ray start --head --port 6380

# on the worker nodes
$ ray start --address ${HEAD_IP}
```

In your train code, initialize the trainy daemon before running your train loop.

```
import trainy
trainy.init()
Trainer.train()
```

While your model is training, to capture traces on all the nodes, run 

```
$ trainy trace --logdir ~/my-traces
```

This saves the traces for each process locally into `~/my-traces`. It's recommended
you run a shared file system like NFS or an s3 backed store so that all of your traces
are in the same place. An example of how to do this and scale this up is under the `examples/resnet_mnist`
on AWS 

## How It Works

Trainy registers a hook into whatever PyTorch optimizer is present in your code,
to count the optimizer iterations and registers the program with the head ray node. 
A separate HTTP server daemon thread is run concurrently, which waits for a trigger
POST request to start profiling.

## Need help 

We offer support for both setting up trainy and analyzing program traces. If you are interested,
please [email us](mailto:founders@trainy.ai)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "trainy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Trainy Team",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/58/54/c6c811cd0e1c36d5024e4eaae9bb4e48f5435b712ce2481637c1015788b8/trainy-0.1.3.tar.gz",
    "platform": null,
    "description": "# Trainy on-demand profiler\n\n<p align=\"center\">\n  <img height='100px' src=\"https://www.ocf.berkeley.edu/~asai/static/images/trainy.png\">\n</p>\n\n![GitHub Repo stars](https://img.shields.io/github/stars/Trainy-ai/trainy?style=social)\n[![](https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/TrainyAI)\n[![](https://dcbadge.vercel.app/api/server/d67CMuKY5V)](https://discord.gg/d67CMuKY5V)\n\nThis is the trainy CLI and daemon to setup on demand tracing for PyTorch in pure Python. This will allow you to extract traces in the middle of training.\n\n## Installation\n\nYou can either install from pypi or from source\n\n```\n# install from pypi\npip install trainy\n\n# install from source\ngit clone https://github.com/Trainy-ai/trainy\npip install -e trainy\n```\n\n## Quickstart\n\nIf you haven't already, set up ray head and worker nodes. This can configured to happen automatically using (Skypilot)[https://skypilot.readthedocs.io/en/latest/index.html] or K8s\n\n```\n# on the head node \n$ ray start --head --port 6380\n\n# on the worker nodes\n$ ray start --address ${HEAD_IP}\n```\n\nIn your train code, initialize the trainy daemon before running your train loop.\n\n```\nimport trainy\ntrainy.init()\nTrainer.train()\n```\n\nWhile your model is training, to capture traces on all the nodes, run \n\n```\n$ trainy trace --logdir ~/my-traces\n```\n\nThis saves the traces for each process locally into `~/my-traces`. It's recommended\nyou run a shared file system like NFS or an s3 backed store so that all of your traces\nare in the same place. An example of how to do this and scale this up is under the `examples/resnet_mnist`\non AWS \n\n## How It Works\n\nTrainy registers a hook into whatever PyTorch optimizer is present in your code,\nto count the optimizer iterations and registers the program with the head ray node. \nA separate HTTP server daemon thread is run concurrently, which waits for a trigger\nPOST request to start profiling.\n\n## Need help \n\nWe offer support for both setting up trainy and analyzing program traces. If you are interested,\nplease [email us](mailto:founders@trainy.ai)\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Trainy: An observability tool for profiling PyTorch training on demand",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/Trainy-ai/trainy"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e0f325143a7b6358497f4a12a480a9cafe491d884d5084f807561f57d4b82aa6",
                "md5": "602d97a0245be57f72bd02c4a27d23c7",
                "sha256": "367e0fa8940de27c462b5d8d349b884ceafbb8345d991d7c61f931f90966e254"
            },
            "downloads": -1,
            "filename": "trainy-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "602d97a0245be57f72bd02c4a27d23c7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6821,
            "upload_time": "2023-07-19T09:14:58",
            "upload_time_iso_8601": "2023-07-19T09:14:58.203588Z",
            "url": "https://files.pythonhosted.org/packages/e0/f3/25143a7b6358497f4a12a480a9cafe491d884d5084f807561f57d4b82aa6/trainy-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5854c6c811cd0e1c36d5024e4eaae9bb4e48f5435b712ce2481637c1015788b8",
                "md5": "20ddd6d2a17191e4bc5f897c54b0d87b",
                "sha256": "bfffb017399938567f0e4344699ea8d40271a11c90a1c90367c866b1f6cc79e2"
            },
            "downloads": -1,
            "filename": "trainy-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "20ddd6d2a17191e4bc5f897c54b0d87b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7763,
            "upload_time": "2023-07-19T09:15:00",
            "upload_time_iso_8601": "2023-07-19T09:15:00.364798Z",
            "url": "https://files.pythonhosted.org/packages/58/54/c6c811cd0e1c36d5024e4eaae9bb4e48f5435b712ce2481637c1015788b8/trainy-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-19 09:15:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Trainy-ai",
    "github_project": "trainy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "trainy"
}
        
Elapsed time: 0.42041s