smartscheduler


Namesmartscheduler JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/AIRI-Institute/SmartScheduler
SummaryThis package is designed to reduce CO2 emissions while training neural networks using Google Cloud.
upload_time2023-08-04 05:40:03
maintainer
docs_urlNone
authorMikhail Tiutiulnikov
requires_python>=3.9,<4.0
licenseMIT
keywords co2 emission google cloud pytorch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SmartScheduler

+ [About SmartScheduler :clipboard:](#1)
+ [Installation :wrench:](#2)
+ [Usage example (Tutorial on training with MNIST) :computer:](#3)
+ [How to use package without Google Cloud](#4)
+ [Citing](#5)
+ [Feedback :envelope:](#6) 



## About SmartScheduler :clipboard: <a name="1"></a> 
This package is designed to reduce CO2 emissions while training neural network models. The main idea of the package is to run the learning process at certain time intervals on certain Google Cloud servers with minimal emissions. A neural network (TCN) trained on the historical data of 13 zones is used to predict emissions for 24 hours ahead.

Currently supported Google Cloud zones: 'southamerica-east1-b', 'northamerica-northeast2-b', 'europe-west6-b', 'europe-west3-b', 'europe-central2-b', 'europe-west1-b', 'europe-west8-a', 'northamerica-northeast1-b', 'europe-southwest1-c', 'europe-west2-b', 'europe-north1-b', 'europe-west9-b',  'europe-west4-b' .

## Installation <a name="2"></a> 
Package can be installed using Pypi:
```
pip install smartscheduler
```

## Usage example. Tutorial on training with MNIST <a name="3"></a>
### What you will need
- Google Cloud account
- ElectricityMaps account with free trial
- Master machine to control VMs 

### Step 1. Setting up master machine.
Make a project directory and create venv (or conda env). Do necessary installations:
```
python3 -m venv venv
source venv/bin/activate
pip install smartscheduler
```

### Step 2. Google cloud setup
To setup you will need to go on [Google Cloud Console](console.cloud.google.com) and create project. Choose your project and click on "Activate Cloud Shell" (top-right corner of the window). Do following steps:

```
gcloud auth application-default login
```

Download your `application_default_credentials.json` and place it in project folder on Master machine.

Also you need to setup your SSH key for project. You can do it [here](https://console.cloud.google.com/compute/metadata/sshKeys).

### Step 3. Electricity Maps setup
Go to the [electricitymaps website](api-portal.electricitymaps.com) and create an account. Apply for free trial period and copy your API key (primary) into electricitymaps_api.py.


### Step 4. Creating VM
Create VM in "Compute Engine" section on Google Cloud. Select configutation and OS (for this tutorial we used E2-medium VM with 25 GB disk and Ubuntu Minimal 22.10). Set your SSH key (In this example VM user is named "scheduler") in Security settings of VM to be able to connect to it. We created it in "northamerica-northeast1-b" zone.


You will probably need to install extra dependencies on VM:
```
sudo apt update
sudo apt install python3.10-venv 
```

### Step 5. Create venv on VM
Connect to VM SSH and do folowing steps.
```
python3 -m venv venv
source venv/bin/activate
pip install smartscheduler
pip install torchvision  # Installation needed for MNIST
```

### Step 6. Create folder for python scripts
```
mkdir scheduler_task
```

### Step 7. Edit vm_main.py for your purposes.
Download file vm_main.py from github (can be found in `examples` folder).
This is the main file which includes all the training process logic. Here you can choose what callbacks will be used, what kind of model, dataset and all the parameters. 


### Step 8. Copy vm_main.py to VM
```
scp vm_main.py scheduler@your_ip:scheduler_task/
```


### Step 9. Run task on your Master machine
Download example `master_machine_main.py` from examples folder on our github. Edit some VM info in file (current ip adress, zone, your project name, instance name).
And after that you are ready to start the training!
```
python master_machine_main.py
```



### Example files details
Here we will describe what is going on in examples files `master_machine_main.py` and `vm_main.py` and how you can change them for your needs.

#### master_machine_main.py
Basically this file consists of usage of just one class - Controller class. This class's main functions ois to start training on Google Cloud VM. It uses ssh to connect to it (so you nhave to pass different ssh parameters). It generates training intervals using CO2Predictor (neural net to get 24 h forecast of CO2 in 13 regions) and IntervalGenerator to deal with the forecast. This class also uses Google Cloud API to move VM between zones to  get minimal value of CO2 emission at the time. 


#### vm_main.py
This file consists of all the training process logic. Firstly it initializes pytorch model with pytorch datasets. After that you can specify some callbacks you want to use during process (callbacks are realized using Lighting Fabric, thats why you can't just take Lightning-pytorch callbacks). Argument parser is needed to get information about training periods in current zone (this is a List of Tuples of form [(start_time, end_time)]). 

The main part of this file is IntervalTrainer class. This class uses custom pytorch logic so it can stop and resume training process on different VMs without losing any information. It even saves current batch info. So if your model has a large epoch time you can start it in one Google Cloud zone and continue it in another. 

Of course you can modify `vm_main.py` as you want. Probably you will use your own dataset, so you have to load it to VM once and import it in `vm_main.py`.


## How to use package without Google Cloud. <a name="4"></a>
If you want to use our scheduler without Google Cloud VMs and you are in one of the available zones you can use `local_main.py` example and specify your electricitymaps zone in code. Scheduler will start training only during time with minimal CO2 emission.

Available electricitymaps zones: "BR-CS", "CA-ON", "CH", "DE", "PL", "BE", "IT-NO", "CA-QC", "ES", "GB", "FI", "FR", "NL"

## Citing <a name="5"></a>
Paper info


## Feedback <a name="6"></a>
email?
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AIRI-Institute/SmartScheduler",
    "name": "smartscheduler",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "co2 emission,google cloud,pytorch",
    "author": "Mikhail Tiutiulnikov",
    "author_email": "tiutiulnikov@airi.net",
    "download_url": "https://files.pythonhosted.org/packages/7b/75/79d4ae662a0843151cb5e034f74c355de1231e745ce4a60e62829456b186/smartscheduler-0.0.4.tar.gz",
    "platform": null,
    "description": "# SmartScheduler\n\n+ [About SmartScheduler :clipboard:](#1)\n+ [Installation :wrench:](#2)\n+ [Usage example (Tutorial on training with MNIST) :computer:](#3)\n+ [How to use package without Google Cloud](#4)\n+ [Citing](#5)\n+ [Feedback :envelope:](#6) \n\n\n\n## About SmartScheduler :clipboard: <a name=\"1\"></a> \nThis package is designed to reduce CO2 emissions while training neural network models. The main idea of the package is to run the learning process at certain time intervals on certain Google Cloud servers with minimal emissions. A neural network (TCN) trained on the historical data of 13 zones is used to predict emissions for 24 hours ahead.\n\nCurrently supported Google Cloud zones: 'southamerica-east1-b', 'northamerica-northeast2-b', 'europe-west6-b', 'europe-west3-b', 'europe-central2-b', 'europe-west1-b', 'europe-west8-a', 'northamerica-northeast1-b', 'europe-southwest1-c', 'europe-west2-b', 'europe-north1-b', 'europe-west9-b',  'europe-west4-b' .\n\n## Installation <a name=\"2\"></a> \nPackage can be installed using Pypi:\n```\npip install smartscheduler\n```\n\n## Usage example. Tutorial on training with MNIST <a name=\"3\"></a>\n### What you will need\n- Google Cloud account\n- ElectricityMaps account with free trial\n- Master machine to control VMs \n\n### Step 1. Setting up master machine.\nMake a project directory and create venv (or conda env). Do necessary installations:\n```\npython3 -m venv venv\nsource venv/bin/activate\npip install smartscheduler\n```\n\n### Step 2. Google cloud setup\nTo setup you will need to go on [Google Cloud Console](console.cloud.google.com) and create project. Choose your project and click on \"Activate Cloud Shell\" (top-right corner of the window). Do following steps:\n\n```\ngcloud auth application-default login\n```\n\nDownload your `application_default_credentials.json` and place it in project folder on Master machine.\n\nAlso you need to setup your SSH key for project. You can do it [here](https://console.cloud.google.com/compute/metadata/sshKeys).\n\n### Step 3. Electricity Maps setup\nGo to the [electricitymaps website](api-portal.electricitymaps.com) and create an account. Apply for free trial period and copy your API key (primary) into electricitymaps_api.py.\n\n\n### Step 4. Creating VM\nCreate VM in \"Compute Engine\" section on Google Cloud. Select configutation and OS (for this tutorial we used E2-medium VM with 25 GB disk and Ubuntu Minimal 22.10). Set your SSH key (In this example VM user is named \"scheduler\") in Security settings of VM to be able to connect to it. We created it in \"northamerica-northeast1-b\" zone.\n\n\nYou will probably need to install extra dependencies on VM:\n```\nsudo apt update\nsudo apt install python3.10-venv \n```\n\n### Step 5. Create venv on VM\nConnect to VM SSH and do folowing steps.\n```\npython3 -m venv venv\nsource venv/bin/activate\npip install smartscheduler\npip install torchvision  # Installation needed for MNIST\n```\n\n### Step 6. Create folder for python scripts\n```\nmkdir scheduler_task\n```\n\n### Step 7. Edit vm_main.py for your purposes.\nDownload file vm_main.py from github (can be found in `examples` folder).\nThis is the main file which includes all the training process logic. Here you can choose what callbacks will be used, what kind of model, dataset and all the parameters. \n\n\n### Step 8. Copy vm_main.py to VM\n```\nscp vm_main.py scheduler@your_ip:scheduler_task/\n```\n\n\n### Step 9. Run task on your Master machine\nDownload example `master_machine_main.py` from examples folder on our github. Edit some VM info in file (current ip adress, zone, your project name, instance name).\nAnd after that you are ready to start the training!\n```\npython master_machine_main.py\n```\n\n\n\n### Example files details\nHere we will describe what is going on in examples files `master_machine_main.py` and `vm_main.py` and how you can change them for your needs.\n\n#### master_machine_main.py\nBasically this file consists of usage of just one class - Controller class. This class's main functions ois to start training on Google Cloud VM. It uses ssh to connect to it (so you nhave to pass different ssh parameters). It generates training intervals using CO2Predictor (neural net to get 24 h forecast of CO2 in 13 regions) and IntervalGenerator to deal with the forecast. This class also uses Google Cloud API to move VM between zones to  get minimal value of CO2 emission at the time. \n\n\n#### vm_main.py\nThis file consists of all the training process logic. Firstly it initializes pytorch model with pytorch datasets. After that you can specify some callbacks you want to use during process (callbacks are realized using Lighting Fabric, thats why you can't just take Lightning-pytorch callbacks). Argument parser is needed to get information about training periods in current zone (this is a List of Tuples of form [(start_time, end_time)]). \n\nThe main part of this file is IntervalTrainer class. This class uses custom pytorch logic so it can stop and resume training process on different VMs without losing any information. It even saves current batch info. So if your model has a large epoch time you can start it in one Google Cloud zone and continue it in another. \n\nOf course you can modify `vm_main.py` as you want. Probably you will use your own dataset, so you have to load it to VM once and import it in `vm_main.py`.\n\n\n## How to use package without Google Cloud. <a name=\"4\"></a>\nIf you want to use our scheduler without Google Cloud VMs and you are in one of the available zones you can use `local_main.py` example and specify your electricitymaps zone in code. Scheduler will start training only during time with minimal CO2 emission.\n\nAvailable electricitymaps zones: \"BR-CS\", \"CA-ON\", \"CH\", \"DE\", \"PL\", \"BE\", \"IT-NO\", \"CA-QC\", \"ES\", \"GB\", \"FI\", \"FR\", \"NL\"\n\n## Citing <a name=\"5\"></a>\nPaper info\n\n\n## Feedback <a name=\"6\"></a>\nemail?",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This package is designed to reduce CO2 emissions while training neural networks using Google Cloud.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/AIRI-Institute/SmartScheduler",
        "Repository": "https://github.com/AIRI-Institute/SmartScheduler"
    },
    "split_keywords": [
        "co2 emission",
        "google cloud",
        "pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c47b4d578cf2889ab5b398e4479eec9b0565f00750cd707e561b52e9b73cbb55",
                "md5": "129a1a06c154759def56ea77cdcbf9b9",
                "sha256": "a073e2f171d77c6ca3aec4764d976ce3d22f1c1fae09d97e1e68ed608879d090"
            },
            "downloads": -1,
            "filename": "smartscheduler-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "129a1a06c154759def56ea77cdcbf9b9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 17680026,
            "upload_time": "2023-08-04T05:39:58",
            "upload_time_iso_8601": "2023-08-04T05:39:58.893903Z",
            "url": "https://files.pythonhosted.org/packages/c4/7b/4d578cf2889ab5b398e4479eec9b0565f00750cd707e561b52e9b73cbb55/smartscheduler-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7b7579d4ae662a0843151cb5e034f74c355de1231e745ce4a60e62829456b186",
                "md5": "c28b45a19b35151dae37b5274b1b8f27",
                "sha256": "49b0db8ce993eb10394c6a5822e117700c85828690d65035cf93f99e6591035b"
            },
            "downloads": -1,
            "filename": "smartscheduler-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "c28b45a19b35151dae37b5274b1b8f27",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 17321288,
            "upload_time": "2023-08-04T05:40:03",
            "upload_time_iso_8601": "2023-08-04T05:40:03.811516Z",
            "url": "https://files.pythonhosted.org/packages/7b/75/79d4ae662a0843151cb5e034f74c355de1231e745ce4a60e62829456b186/smartscheduler-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-04 05:40:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AIRI-Institute",
    "github_project": "SmartScheduler",
    "github_not_found": true,
    "lcname": "smartscheduler"
}
        
Elapsed time: 0.09963s