# `pymops`: A multi-agent simulation-based optimization package for power scheduling
## About
Power scheduling is an NP-hard optimization problem with high dimensionality, combinatorial nature, non-convex, non-smooth, and discontinuous properties together with multi-period multiple constraints.
- Two sequential tasks:
- Unit Commitment
- Load Dispatch:
- Economic Load Dispatch
- Environmental Load Dispatch
Power Scheduling aims to determine an optimal load dispatch schedule for simultaneously minimizing different conflicting objectives, particularly economic costs and environmental emissions.
`pymops` is an open-source Python package developed for solving single- to tri-objective optimization in power scheduling problems. The package is built on a novel multi-agent simulation environment, where power-generating units are represented as agents. The agents are heterogeneous, each with multiple conflicting objectives. The scheduling dynamics are simulated using Markov Decision Processes (MDPs), which are used to train a deep reinforcement learning model for solving the optimization problem.
### Objective Function
The general multi-objective function is formulated by combining different conflicting objectives via a hybrid approach that uses both weighting hyperparameters and unit-specific cost-to-emission conversion factors:
$$\cal \Phi(C,E)=\sum\limits_{t=1}^{24}\sum\limits_{i=1}^n[\omega_0C_{ti}+\sum\limits_{h=1}^m\omega_h\eta_{ih}E_{ti}^{(h)}]$$
where $\eta_i$ denotes cost-to-emission conversion parameter which is defined as $$\displaystyle \eta_i = exp[\frac{\cal \nabla C^{on}(p_i)/\nabla E^{on}(p_i)}{{max[\cal \nabla C^{on}(p_i)/\nabla E^{on}(p_i);\forall i]-min[\cal \nabla C^{on}(p_i)/\nabla E^{on}(p_i);\forall i]}}];\forall i$$
and $\omega_h, h=0,1,...,m$ represents the weight hyperparameter associated with objective $m$.
**Economic Cost Functions**:
$$\cal C_{ti}=z_{ti}C^{on}(p_{ti})+z_{ti}(1-z_{t-1,i})C_{ti}^{su}+(1-z_{ti})z_{t-1,i}C_{ti}^{sd};\forall i,t$$
where
$$\cal C^c(p_{ti})=a_i^cp_{ti}^2+b^cp_{ti}+c^c+|d^c sin[e^c_i(p_{ti}^{min}+p_{ti})]|;\forall i,t$$
**Environmental Emission Functions**:
$$\cal E_{ti}=z_{ti}E^{on}(p_{ti})+z_{ti}(1-z_{t-1,i})E_{ti}^{su}+(1-z_{ti})z_{t-1,i}E_{ti}^{sd};\forall i,t$$
where
$$\cal E^e(p_{ti})=a_i^ep_{ti}^2+b^ep_{ti}+c^e+d^eexp(e^e_ip_{ti});\forall i,t$$
| Constraints | Specification |
| ---------------------------------------- | ---------------------------------------- |
| Minimum and maximum power capacities: | $\cal z_{ti}p_{i}^{min}\le p_{ti}\le z_{ti}p_{i}^{max}$ |
| Maximum ramp-down and ramp-up rates: | $\cal z_{ti}p_{t-1,i}-z_{ti}p_{ti}\le p_{i}^{down}$ and $z_{ti}p_{ti}-z_{t-1,i}p_{ti}\le p_{i}^{up}$ |
| Mininmum operating (online/offline) durations: | $\cal tt_{ti}^{ON}\ge tt_{i}^{OFF}$ and $tt_{ti}^{OFF}\ge tt_{i}^{OFF}$ |
| Power supply and demand balance: | $\cal \sum\limits_{i=1}^nz_{ti}p_{ti}=d_t$ |
| Minimum available reserve: | $\cal \sum\limits_{i=1}^nz_{ti}p_{ti}^{max}\ge (1+ r) d_t$ |
### The Multi-Agent Reinforcement Learning (MARL) Framework
The framework MARL manifests the form of state $\cal S$, action $\cal A$, transition (probability) function $\cal P$ and reward $\cal R$.
- **Planning Horizon**: The scheduling horizon is an hourly divided day.
- **Timestep/Period**: Each hour of a day is considered a timestep.
- **Episode**: One cycle of determination of unit commitments and load dispatches for a day.
- **Simulation Enviroment**: Custom MARL simulation environment, structurally similar to OpenAI Gym.
- Mono-objective to tri-objective scheduling problem (cost, CO2 and SO2).
- Ramp rate constraints and valve point effects are taken into account.
- **Agents**: The generating units are represented as multiple agents.
- The agents are heterogenous (different generating-unit-specific characteristics).
- Each agent has multiple conflicting objectives.
- The agents are cooperative type of RL agents:
- Agents collaborate the satisfy the demand at each period/timestep.
- Agents also strive to minimize the multi-objective function in the entire planning horizon.
- **State Space**: Consists of timestep, minimum and maximum capacities, operating (online/offline) durations, demand to be satisfied.
- **Action Space**: The commitment statuses (ON/OFF) of all agents.
- **Transition Function**: The probability of making transition from current state to the next state (no specific formula).
- The decisions of agents violating any constraint is automatically corrected by the environment.
- The environment makes also adjustments for both excess and shortages of power supplies.
- **Reward function**: Agents get a common reward which is the inverse of the average of the normalized value of all objectives.
The MOPS dynamics can be simulated as a 4-tuple $\cal (S,A,P,R)$ MDP:
- The MDPs are input for the deep RL model.
- The deep RL model predicts decision (action) of agents.
- The predicted agents' action is input for the transition function in the environment
## Installation
The simulation environment can be installed using `pip` :
```
pip install pymops
```
Or it can be cloned from GitHub repo and installed.
```
git clone https://github.com/awolseid/pymops.git
cd pymops
pip install .
```
### Import package
```python
import pymops
from pymops.environ import SimEnv
```
### Create simulation environment
```
env = SimEnv(
supply_df = default_supply_df, # Units' profile dataframe
demand_df = default_demand_df, # Demands profile
SR = 0.0, # proportion of spinning reserve => [0, 1]
RR = "Yes", # Ramp rate => "yes" or (default "no" (=None))
VPE = None, # Valve point effects => "yes" or (default "no" (=None))
n_objs = None, # Objectives => "tri" for 3 or (default "bi" (=None) for bi-objective)
w = None, # Weight => [0, 1] for bi-objective, a list [0.2,0.3,0.5] for tri-objective
duplicates = None # Num of duplicates: duplicate units and adjust demands proportionally
)
```
#### Reset environment
```
initial_flat_state, initial_dict_state = env.reset()
```
#### Get current state
```
flat_state, dict_state = env.get_current_state()
```
#### Execute decision (action) of agents
```
action_vec = np.array([1,1,0,1,0,0,0,0,0,0])
flat_next_state, reward, done, next_state_dict, dispatch_info = env.step(action_vec)
```
## Develop and training (own customized) model
### Import packages
```python
from pymops.define_dqn import DQNet
from pymops.madqn import DQNAgents
from pymops.replaymemory import ReplayMemory
from pymops.schedules import get_schedules
```
### Define model
```
model_0 = DQNet(env, 64)
print(model_0)
```
### Create instance
```
RL_agents = DQNAgents(
environ = env,
model = model_0,
epsilon_max = 1.0,
epsilon_min = 0.1,
epsilon_decay = 0.99,
lr = 0.001
)
```
### Replay memory
```
memory = ReplayMemory(environ = env, buffer_size = 64)
```
### Train model
```
training_results_df = RL_agents.train(memory = memory, batch_size = 64, num_episodes = 500)
```
### Get schedule solutions
```
cost, emis, CO2, SO2, schedules_df = get_schedules(environ = env, trained_agents = RL_agents)
schedules_df
```
### Contact Information
Any questions, issues, suggestions, or collaboration opportunities can be reached at: awolseid@pukyong.ac.kr ; youngk@pknu.ac.kr.
### Citation
Users should cite the following resources.
- Code Ocean Reproducible Capsule: https://codeocean.com/capsule/0242917/tree:
- **Ebrie, A.S.**;, **Kim, Y.J.** (2023). pymops: *A multi-agent reinforcement learning simulation environment for multi-objective optimization in power scheduling* [Software Code]. https://doi.org/10.24433/CO.9235622.v1
- **[Article](https://www.mdpi.com/1996-1073/16/16/5920) produced from the very first version of the package:
- **Ebrie, A.S.**; **Paik, C.**; **Chung, Y.**; **Kim, Y.J.** (2023). *Environment-Friendly Power Scheduling Based on Deep Contextual Reinforcement Learning*. *Energies*, 16, 5920. https://doi.org/10.3390/en16165920.
Raw data
{
"_id": null,
"home_page": "https://github.com/awolseid/pymops",
"name": "pymops",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "Economic Dispach,Power Scheduling,Reinforcement Learning,Unit Commitment",
"author": "Awol Seid Ebrie and Young Jin Kim",
"author_email": "es.awol@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/73/9c/8f3ab7dba6be4c0ae2d41a2651b0626996fa7f7050b1c87d3e9ad702d2ad/pymops-1.0.5.tar.gz",
"platform": null,
"description": "# `pymops`: A multi-agent simulation-based optimization package for power scheduling\r\n\r\n## About\r\n\r\nPower scheduling is an NP-hard optimization problem with high dimensionality, combinatorial nature, non-convex, non-smooth, and discontinuous properties together with multi-period multiple constraints.\r\n\r\n- Two sequential tasks:\r\n - Unit Commitment\r\n - Load Dispatch:\r\n - Economic Load Dispatch\r\n - Environmental Load Dispatch\r\n\r\nPower Scheduling aims to determine an optimal load dispatch schedule for simultaneously minimizing different conflicting objectives, particularly economic costs and environmental emissions.\r\n\r\n`pymops` is an open-source Python package developed for solving single- to tri-objective optimization in power scheduling problems. The package is built on a novel multi-agent simulation environment, where power-generating units are represented as agents. The agents are heterogeneous, each with multiple conflicting objectives. The scheduling dynamics are simulated using Markov Decision Processes (MDPs), which are used to train a deep reinforcement learning model for solving the optimization problem. \r\n\r\n\r\n\r\n### Objective Function\r\n\r\nThe general multi-objective function is formulated by combining different conflicting objectives via a hybrid approach that uses both weighting hyperparameters and unit-specific cost-to-emission conversion factors:\r\n\r\n$$\\cal \\Phi(C,E)=\\sum\\limits_{t=1}^{24}\\sum\\limits_{i=1}^n[\\omega_0C_{ti}+\\sum\\limits_{h=1}^m\\omega_h\\eta_{ih}E_{ti}^{(h)}]$$\r\n\r\nwhere $\\eta_i$ denotes cost-to-emission conversion parameter which is defined as $$\\displaystyle \\eta_i = exp[\\frac{\\cal \\nabla C^{on}(p_i)/\\nabla E^{on}(p_i)}{{max[\\cal \\nabla C^{on}(p_i)/\\nabla E^{on}(p_i);\\forall i]-min[\\cal \\nabla C^{on}(p_i)/\\nabla E^{on}(p_i);\\forall i]}}];\\forall i$$\r\nand $\\omega_h, h=0,1,...,m$ represents the weight hyperparameter associated with objective $m$.\r\n\r\n**Economic Cost Functions**:\r\n\r\n$$\\cal C_{ti}=z_{ti}C^{on}(p_{ti})+z_{ti}(1-z_{t-1,i})C_{ti}^{su}+(1-z_{ti})z_{t-1,i}C_{ti}^{sd};\\forall i,t$$ \r\nwhere \r\n$$\\cal C^c(p_{ti})=a_i^cp_{ti}^2+b^cp_{ti}+c^c+|d^c sin[e^c_i(p_{ti}^{min}+p_{ti})]|;\\forall i,t$$\r\n\r\n**Environmental Emission Functions**: \r\n$$\\cal E_{ti}=z_{ti}E^{on}(p_{ti})+z_{ti}(1-z_{t-1,i})E_{ti}^{su}+(1-z_{ti})z_{t-1,i}E_{ti}^{sd};\\forall i,t$$\r\n where \r\n $$\\cal E^e(p_{ti})=a_i^ep_{ti}^2+b^ep_{ti}+c^e+d^eexp(e^e_ip_{ti});\\forall i,t$$\r\n\r\n\r\n\r\n| Constraints | Specification |\r\n| ---------------------------------------- | ---------------------------------------- |\r\n| Minimum and maximum power capacities: | $\\cal z_{ti}p_{i}^{min}\\le p_{ti}\\le z_{ti}p_{i}^{max}$ |\r\n| Maximum ramp-down and ramp-up rates: | $\\cal z_{ti}p_{t-1,i}-z_{ti}p_{ti}\\le p_{i}^{down}$ and $z_{ti}p_{ti}-z_{t-1,i}p_{ti}\\le p_{i}^{up}$ |\r\n| Mininmum operating (online/offline) durations: | $\\cal tt_{ti}^{ON}\\ge tt_{i}^{OFF}$ and $tt_{ti}^{OFF}\\ge tt_{i}^{OFF}$ |\r\n| Power supply and demand balance: | $\\cal \\sum\\limits_{i=1}^nz_{ti}p_{ti}=d_t$ |\r\n| Minimum available reserve: | $\\cal \\sum\\limits_{i=1}^nz_{ti}p_{ti}^{max}\\ge (1+ r) d_t$ |\r\n\r\n\r\n\r\n\r\n### The Multi-Agent Reinforcement Learning (MARL) Framework\r\nThe framework MARL manifests the form of state $\\cal S$, action $\\cal A$, transition (probability) function $\\cal P$ and reward $\\cal R$. \r\n\r\n- **Planning Horizon**: The scheduling horizon is an hourly divided day.\r\n - **Timestep/Period**: Each hour of a day is considered a timestep. \r\n\r\n - **Episode**: One cycle of determination of unit commitments and load dispatches for a day.\r\n\r\n \u200b\r\n\r\n- **Simulation Enviroment**: Custom MARL simulation environment, structurally similar to OpenAI Gym.\r\n - Mono-objective to tri-objective scheduling problem (cost, CO2 and SO2).\r\n\r\n - Ramp rate constraints and valve point effects are taken into account.\r\n\r\n \u200b\r\n\r\n- **Agents**: The generating units are represented as multiple agents.\r\n - The agents are heterogenous (different generating-unit-specific characteristics).\r\n - Each agent has multiple conflicting objectives. \r\n - The agents are cooperative type of RL agents: \r\n - Agents collaborate the satisfy the demand at each period/timestep.\r\n\r\n - Agents also strive to minimize the multi-objective function in the entire planning horizon.\r\n\r\n \u200b\r\n\r\n- **State Space**: Consists of timestep, minimum and maximum capacities, operating (online/offline) durations, demand to be satisfied.\r\n\r\n \u200b\r\n\r\n- **Action Space**: The commitment statuses (ON/OFF) of all agents.\r\n\r\n \u200b\r\n\r\n- **Transition Function**: The probability of making transition from current state to the next state (no specific formula).\r\n - The decisions of agents violating any constraint is automatically corrected by the environment.\r\n\r\n - The environment makes also adjustments for both excess and shortages of power supplies.\r\n\r\n \u200b\r\n\r\n- **Reward function**: Agents get a common reward which is the inverse of the average of the normalized value of all objectives.\r\n\r\n\r\n\r\nThe MOPS dynamics can be simulated as a 4-tuple $\\cal (S,A,P,R)$ MDP:\r\n- The MDPs are input for the deep RL model.\r\n\r\n- The deep RL model predicts decision (action) of agents.\r\n\r\n- The predicted agents' action is input for the transition function in the environment\r\n\r\n \u200b\r\n\r\n\r\n\r\n## Installation\r\n\r\nThe simulation environment can be installed using `pip` :\r\n\r\n ```\r\n pip install pymops\r\n ```\r\n\r\nOr it can be cloned from GitHub repo and installed.\r\n\r\n ```\r\n git clone https://github.com/awolseid/pymops.git\r\n cd pymops\r\n pip install .\r\n \u200b```\r\n\r\n### Import package\r\n\r\n ```python \r\n import pymops\r\n from pymops.environ import SimEnv\r\n ```\r\n\r\n### Create simulation environment\r\n\r\n ```\r\n env = SimEnv(\r\n supply_df = default_supply_df, # Units' profile dataframe\r\n demand_df = default_demand_df, # Demands profile\r\n SR = 0.0, # proportion of spinning reserve => [0, 1]\r\n RR = \"Yes\", # Ramp rate => \"yes\" or (default \"no\" (=None)) \r\n VPE = None, # Valve point effects => \"yes\" or (default \"no\" (=None))\r\n n_objs = None, # Objectives => \"tri\" for 3 or (default \"bi\" (=None) for bi-objective)\r\n w = None, # Weight => [0, 1] for bi-objective, a list [0.2,0.3,0.5] for tri-objective\r\n duplicates = None # Num of duplicates: duplicate units and adjust demands proportionally\r\n )\r\n ```\r\n\r\n#### Reset environment\r\n\r\n ```\r\n initial_flat_state, initial_dict_state = env.reset()\r\n ```\r\n\r\n#### Get current state\r\n\r\n ```\r\n flat_state, dict_state = env.get_current_state()\r\n ```\r\n\r\n#### Execute decision (action) of agents\r\n\r\n ```\r\n action_vec = np.array([1,1,0,1,0,0,0,0,0,0])\r\n flat_next_state, reward, done, next_state_dict, dispatch_info = env.step(action_vec)\r\n ```\r\n\r\n## Develop and training (own customized) model\r\n\r\n### Import packages\r\n\r\n ```python \r\n from pymops.define_dqn import DQNet\r\n from pymops.madqn import DQNAgents\r\n from pymops.replaymemory import ReplayMemory\r\n from pymops.schedules import get_schedules\r\n ```\r\n\r\n### Define model\r\n\r\n ```\r\n model_0 = DQNet(env, 64)\r\n print(model_0)\r\n ```\r\n\r\n### Create instance\r\n\r\n ```\r\n RL_agents = DQNAgents(\r\n environ = env, \r\n model = model_0, \r\n epsilon_max = 1.0,\r\n epsilon_min = 0.1,\r\n epsilon_decay = 0.99,\r\n lr = 0.001\r\n )\r\n ```\r\n\r\n### Replay memory\r\n\r\n ```\r\n memory = ReplayMemory(environ = env, buffer_size = 64)\r\n ```\r\n\r\n### Train model\r\n\r\n ```\r\n training_results_df = RL_agents.train(memory = memory, batch_size = 64, num_episodes = 500)\r\n ```\r\n\r\n### Get schedule solutions\r\n\r\n ```\r\n cost, emis, CO2, SO2, schedules_df = get_schedules(environ = env, trained_agents = RL_agents)\r\n schedules_df\r\n ```\r\n\r\n### Contact Information\r\nAny questions, issues, suggestions, or collaboration opportunities can be reached at: awolseid@pukyong.ac.kr ; youngk@pknu.ac.kr. \r\n\r\n\r\n### Citation\r\n\r\nUsers should cite the following resources. \r\n\r\n- Code Ocean Reproducible Capsule: https://codeocean.com/capsule/0242917/tree:\r\n\r\n - **Ebrie, A.S.**;, **Kim, Y.J.** (2023). pymops: *A multi-agent reinforcement learning simulation environment for multi-objective optimization in power scheduling* [Software Code]. https://doi.org/10.24433/CO.9235622.v1 \r\n- **[Article](https://www.mdpi.com/1996-1073/16/16/5920) produced from the very first version of the package:\r\n - **Ebrie, A.S.**; **Paik, C.**; **Chung, Y.**; **Kim, Y.J.** (2023). *Environment-Friendly Power Scheduling Based on Deep Contextual Reinforcement Learning*. *Energies*, 16, 5920. https://doi.org/10.3390/en16165920. \r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A multi-agent simulation-based optimization package for power scheduling",
"version": "1.0.5",
"project_urls": {
"Bug Tracker": "https://github.com/awolseid/pymops",
"Homepage": "https://github.com/awolseid/pymops"
},
"split_keywords": [
"economic dispach",
"power scheduling",
"reinforcement learning",
"unit commitment"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "89d5cbe54e1e46b2b40ce554361dac7a0d01e534e9e5242f3aa1cc2502508f5b",
"md5": "ab65978e476bf604fac0fd72d3fe3342",
"sha256": "2f911455a240be013026fb29960b5f476ebbabfdc8291bcd610e6d5745de2e31"
},
"downloads": -1,
"filename": "pymops-1.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ab65978e476bf604fac0fd72d3fe3342",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 20147,
"upload_time": "2023-10-18T15:48:51",
"upload_time_iso_8601": "2023-10-18T15:48:51.037826Z",
"url": "https://files.pythonhosted.org/packages/89/d5/cbe54e1e46b2b40ce554361dac7a0d01e534e9e5242f3aa1cc2502508f5b/pymops-1.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "739c8f3ab7dba6be4c0ae2d41a2651b0626996fa7f7050b1c87d3e9ad702d2ad",
"md5": "a4fe6a5504204654edcb8825ad8f5c6f",
"sha256": "60f6948ba0d277724e89e774de94049e3abbf51eb6e0303b514c941e92849838"
},
"downloads": -1,
"filename": "pymops-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "a4fe6a5504204654edcb8825ad8f5c6f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 22263,
"upload_time": "2023-10-18T15:48:52",
"upload_time_iso_8601": "2023-10-18T15:48:52.497178Z",
"url": "https://files.pythonhosted.org/packages/73/9c/8f3ab7dba6be4c0ae2d41a2651b0626996fa7f7050b1c87d3e9ad702d2ad/pymops-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-18 15:48:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "awolseid",
"github_project": "pymops",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "tqdm",
"specs": []
}
],
"lcname": "pymops"
}