CEEMDAN_LSTM
===
GitHub: https://github.com/FateMurphy/CEEMDAN_LSTM
Future work: CFS
## Background
CEEMDAN_LSTM is a Python module for decomposition-integration forecasting models based on EMD methods and LSTM. It aims at helping beginners quickly make a decomposition-integration forecasting by `CEEMDAN`, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise [(Torres et al. 2011)](https://ieeexplore.ieee.org/abstract/document/5947265/), and `LSTM`, Long Short-Term Memory recurrent neural network [(Hochreiter and Schmidhuber, 1997)](https://ieeexplore.ieee.org/abstract/document/6795963). If you use or refer to the content of this module, please cite the paper: [(F. Zhou, Z. Huang, C. Zhang,
Carbon price forecasting based on CEEMDAN and LSTM, Applied Energy, 2022, Volume 311, 118601, ISSN 0306-2619.)](https://doi.org/10.1016/j.apenergy.2022.118601).
### Flowchart

#### Note, as it decomposes the entire series first, there is some look-ahead bias.
## Install
### (1) PyPi (recommended)
The quickest way to install the package is through pip.
```python
pip install CEEMDAN_LSTM
```
### (2) From the package
Download the package `CEEMDAN_LSTM-1.2.tar.gz` by clicking `Code` -> `Download ZIP`. After unzipping, move the package where you like.
```python
pip install .(your file path)/CEEMDAN_LSTM-1.2.tar.gz
```
### (3) From source
If you want to modify the code, you should download the code and build the package yourself. The source is publically available and hosted on GitHub: https://github.com/FateMurphy/CEEMDAN_LSTM. To download the code, you can either go to the source code page and click `Code` -> `Download ZIP`, or use the git command line.
After modifying the code, you can install the modified package by using the command line:
```python
python setup.py install
```
Or, you can link to the path for convenient modification, eg. `sys.path.append(.your file path/)`, and then import.
## Import and quickly predict
```python
import CEEMDAN_LSTM as cl
cl.quick_keras_predict(data=None) # default dataset: sse_index.csv
```
#### Load dataset
```python
data = cl.load_dataset() # some built-in dataset eg. sp500.csv hsi.csv ftse.csv nsdaq.csv n225.csv
# data = pd.read_csv(your_file_path + its_name + '.csv', header=0, index_col=['date'], parse_dates=['date'])
```
## Help and example
You can use the code to call for help. You can copy the code from the output of `cl.show_keras_example()` to run forecasting and help you learn more about the code.
```python
cl.help()
cl.show_keras_example()
cl.show_keras_example_model()
cl.details_keras_predict(data=None)
```
## Start to Forecast
Take Class: keras_predictor() as an example.
### Brief summary and forecast
```python
data = cl.load_dataset()
series = data['close'] # choose a DataFrame column
cl.statis_tests(series)
kr = cl.keras_predictor()
df_result = kr.hybrid_keras_predict(data=series, show=True, plot=True, save=True)
```
### 0. Statistical tests (not necessary)
The code will output the results of the ADF test, Ljung-Box Test, and Jarque-Bera Test, and plot ACF and PACF figures to evaluate stationarity, autocorrelation, and normality.
```python
cl.statis_tests(series=None)
```
### 1. Declare the parameters
Note, when declaring the PATH, folders will be created automatically, including the figure and log folders.
```python
kr = cl.keras_predictor(PATH=None, FORECAST_HORIZONS=30, FORECAST_LENGTH=30, KERAS_MODEL='GRU',
DECOM_MODE='CEEMDAN', INTE_LIST='auto', REDECOM_LIST={'co-imf0':'ovmd'},
NEXT_DAY=False, DAY_AHEAD=1, NOR_METHOD='minmax', FIT_METHOD='add',
USE_TPU=False, **kwargs))
```
| HyperParameters | Description |
| :-----| :----- |
|PATH |the saving path of figures and logs, eg. 'D:/CEEMDAN_LSTM/'|
|FORECAST_HORIZONS |the length of each input row(x_train.shape), which means the number of previous days related to today, also called Timestep, Forecast_horizons, or Sliding_windows_length in some papers|
|FORECAST_LENGTH |the length of the days to forecast (test set)|
|KERAS_MODEL |the Keras model, eg. 'GRU', 'LSTM', 'DNN', 'BPNN', model = Sequential(), or load_model.|
|DECOM_MODE |the decomposition method, eg.'EMD', 'EEMD', 'CEEMDAN', 'VMD', 'OVMD', 'SVMD'|
|INTE_LIST |the integration list, eg. 'auto', pd.Dataframe, (int) 3, (str) '233', (list) [0,0,1,1,1,2,2,2], ...|
|REDECOM_LIST |the re-decomposition list, eg. '{'co-imf0':'vmd', 'co-imf1':'emd'}', pd.DataFrame|
|NEXT_DAY |set True to only predict the next out-of-sample value|
|DAY_AHEAD |define to forecast n days' ahead, eg. 0, 1, 2 (default int 1)|
|NOR_METHOD |the normalizing method, eg. 'minmax'-MinMaxScaler, 'std'-StandardScaler, otherwise without normalization|
|FIT_METHOD |the fitting method to stabilize the forecasting result (not necessarily useful), eg. 'add', 'ensemble' (there some error for ensembleFIT_METHOD, please use add method as default.)|
|USE_TPU |change Keras model to TPU model (for Google Colab)|
| Keras Parameters | Description (more details refer to https://keras.io) |
| :-----| :----- |
|epochs |training epochs/iterations, eg. 30-1000|
|dropout |dropout rate of 3 dropout layers, eg. 0.2-0.5|
|units |the units of network layers, which (3 layers) will set to 4*units, 2*units, units, eg. 4-32|
|activation |activation function, all layers will be the same, eg. 'tanh', 'relu'|
|batch_size |training batch_size for parallel computing, eg. 4-128|
|shuffle |whether randomly disorder the training set during the training process, eg. True, False|
|verbose |report of the training process, eg. 0 not displayed, 1 detailed, 2 rough|
|valid_split |proportion of validation set during the training process, eg. 0.1-0.2|
|opt |network optimizer, eg. 'adam', 'sgd'|
|opt_lr |optimizer learning rate, eg. 0.001-0.1|
|opt_loss |optimizer loss, eg. 'mse','mae','mape','hinge', refer to https://keras.io/zh/losses/.|
|opt_patience |optimizer patience of adaptive learning rate, eg. 10-100|
|stop_patience |early stop patience, eg. 10-100|
### 2. Forecast
You can try the following forecasting methods. Note, `kr.` is the class defined in step 1, necessary for the code.
```python
df_result = kr.single_keras_predict(data, show=True, plot=True, save=True)
# df_result = kr.ensemble_keras_predict(data, show=True, plot=True, save=True)
# df_result = kr.respective_keras_predict(data, show=True, plot=True, save=True)
# df_result = kr.hybrid_keras_predict(data, show=True, plot=True, save=True)
# df_result = kr.multiple_keras_predict(data, show=True, plot=True, save=True)
```
| Forecast Method | Description |
| :-----| :----- |
|Single Method |Use Keras model to directly forecast with vector input|
|Ensemble Method |Use decomposition-integration Keras model to directly forecast with matrix input|
|Respective Method |Use decomposition-integration Keras model to respectively forecast each IMFs with vector input|
|Hybrid Method |Use the ensemble method to forecast high-frequency IMF and the respective method for other IMFs.|
|Multiple Method |Multiple runs of the above method|
|Rolling Method |Rolling run of the above method to avoid the look-ahead bias, but take a long long time|
### 3. Validate
#### (1) Plot heatmap
You need to install `seaborn` first, and the input should be 2D-array.
```python
cl.plot_heatmap(data, corr_method='pearson', fig_path=None)
```
#### (2) Diebold-Mariano-Test (DM test)
DM test will output the DM test statistics and its p-value. You can refer to https://github.com/johntwk/Diebold-Mariano-Test.
```python
rt = cl.dm_test(actual_lst, pred1_lst, pred2_lst, h=1, crit="MSE", power=2)
```
### 4. Next-day Forecast
Set `NEXT_DAY=True`.
```python
kr = cl.keras_predictor(NEXT_DAY=True)
df_result = kr.hybrid_keras_predict(data, show=True, plot=True, save=True)
# df_result = kr.rolling_keras_predict(data, predict_method='single')
```
## Sklearn Forecast
You can try the following forecasting methods. Note, `sr.` is the defined class, necessary for the code.
```python
# SKLEARN_MODEL = LASSO, SVM, or LGB(LightGBM); OPTIMIZER = Bayes, GS(GridSearch)
sr = cl.sklearn_predictor(PATH=path, FORECAST_HORIZONS=30, FORECAST_LENGTH=30,
SKLEARN_MODEL='LASSO', OPTIMIZER='Bayes',
DECOM_MODE='OVMD', INTE_LIST='auto')
df_result = sr.single_sklearn_predict(data, show=True, plot=True, save=True)
# df_result = sr.respective_sklearn_predict(data, show=True, plot=True, save=True)
# df_result = sr.multiple_sklearn_predict(series_close, run_times=10, predict_method='single')
```
## Discussion
### 1. Look-ahead bias
As the predictor will decompose the entire series first before splitting the training and test set, there is a look-ahead bias. It is still an issue about how to avoid the look-ahead bias.
### 2. VMD decompose
The vmdpy module can only decompose the even-numbered length time series. When forecasting an odd-numbered length one, this module will delete the oldest data point. It is still an issue how to modify VMD decomposition. Moreover, selecting the K parameters is important for the VMD method, and hence, I will add some methods to choose a suitable K, such as OVMD, REI, SampEn, and so on.
### 3. Rolling forecasting
Rolling forecasting costs a lot of time. Like a 30-forecast-length prediction, it will run 30 times cl.hybrid_keras_predict(), so I am not sure if it is really effective or not.
Raw data
{
"_id": null,
"home_page": "http://github.com/FateMurphy/CEEMDAN_LSTM",
"name": "CEEMDAN-LSTM",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "CEEMDAN, VMD, LSTM, decomposition, forecasting",
"author": "Feite Zhou",
"author_email": "jupiterzhou@foxmail.com",
"download_url": "https://files.pythonhosted.org/packages/7b/56/9f80ea8a8667ff1d2386570d7fcd129af534082b7031e633e45e5554c490/CEEMDAN_LSTM-1.2.1.tar.gz",
"platform": null,
"description": "CEEMDAN_LSTM\r\n===\r\nGitHub: https://github.com/FateMurphy/CEEMDAN_LSTM \r\nFuture work: CFS\r\n\r\n## Background \r\nCEEMDAN_LSTM is a Python module for decomposition-integration forecasting models based on EMD methods and LSTM. It aims at helping beginners quickly make a decomposition-integration forecasting by `CEEMDAN`, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise [(Torres et al. 2011)](https://ieeexplore.ieee.org/abstract/document/5947265/), and `LSTM`, Long Short-Term Memory recurrent neural network [(Hochreiter and Schmidhuber, 1997)](https://ieeexplore.ieee.org/abstract/document/6795963). If you use or refer to the content of this module, please cite the paper: [(F. Zhou, Z. Huang, C. Zhang,\r\nCarbon price forecasting based on CEEMDAN and LSTM, Applied Energy, 2022, Volume 311, 118601, ISSN 0306-2619.)](https://doi.org/10.1016/j.apenergy.2022.118601).\r\n### Flowchart\r\n\r\n#### Note, as it decomposes the entire series first, there is some look-ahead bias.\r\n\r\n## Install\r\n### (1) PyPi (recommended)\r\nThe quickest way to install the package is through pip.\r\n```python\r\npip install CEEMDAN_LSTM\r\n```\r\n### (2) From the package\r\nDownload the package `CEEMDAN_LSTM-1.2.tar.gz` by clicking `Code` -> `Download ZIP`. After unzipping, move the package where you like.\r\n```python\r\npip install .(your file path)/CEEMDAN_LSTM-1.2.tar.gz\r\n```\r\n### (3) From source\r\nIf you want to modify the code, you should download the code and build the package yourself. The source is publically available and hosted on GitHub: https://github.com/FateMurphy/CEEMDAN_LSTM. To download the code, you can either go to the source code page and click `Code` -> `Download ZIP`, or use the git command line. \r\nAfter modifying the code, you can install the modified package by using the command line:\r\n```python\r\npython setup.py install\r\n```\r\nOr, you can link to the path for convenient modification, eg. `sys.path.append(.your file path/)`, and then import.\r\n\r\n## Import and quickly predict\r\n```python\r\nimport CEEMDAN_LSTM as cl\r\ncl.quick_keras_predict(data=None) # default dataset: sse_index.csv\r\n```\r\n#### Load dataset\r\n```python\r\ndata = cl.load_dataset() # some built-in dataset eg. sp500.csv hsi.csv ftse.csv nsdaq.csv n225.csv\r\n# data = pd.read_csv(your_file_path + its_name + '.csv', header=0, index_col=['date'], parse_dates=['date'])\r\n```\r\n\r\n## Help and example\r\nYou can use the code to call for help. You can copy the code from the output of `cl.show_keras_example()` to run forecasting and help you learn more about the code.\r\n```python\r\ncl.help()\r\ncl.show_keras_example()\r\ncl.show_keras_example_model()\r\ncl.details_keras_predict(data=None)\r\n```\r\n\r\n## Start to Forecast\r\nTake Class: keras_predictor() as an example.\r\n### Brief summary and forecast\r\n```python\r\ndata = cl.load_dataset()\r\nseries = data['close'] # choose a DataFrame column \r\ncl.statis_tests(series)\r\nkr = cl.keras_predictor()\r\ndf_result = kr.hybrid_keras_predict(data=series, show=True, plot=True, save=True)\r\n```\r\n\r\n### 0. Statistical tests (not necessary)\r\nThe code will output the results of the ADF test, Ljung-Box Test, and Jarque-Bera Test, and plot ACF and PACF figures to evaluate stationarity, autocorrelation, and normality.\r\n```python\r\ncl.statis_tests(series=None)\r\n```\r\n\r\n### 1. Declare the parameters\r\nNote, when declaring the PATH, folders will be created automatically, including the figure and log folders.\r\n```python\r\nkr = cl.keras_predictor(PATH=None, FORECAST_HORIZONS=30, FORECAST_LENGTH=30, KERAS_MODEL='GRU', \r\n DECOM_MODE='CEEMDAN', INTE_LIST='auto', REDECOM_LIST={'co-imf0':'ovmd'},\r\n NEXT_DAY=False, DAY_AHEAD=1, NOR_METHOD='minmax', FIT_METHOD='add', \r\n USE_TPU=False, **kwargs))\r\n```\r\n\r\n| HyperParameters | Description | \r\n| :-----| :----- | \r\n|PATH |the saving path of figures and logs, eg. 'D:/CEEMDAN_LSTM/'|\r\n|FORECAST_HORIZONS |the length of each input row(x_train.shape), which means the number of previous days related to today, also called Timestep, Forecast_horizons, or Sliding_windows_length in some papers|\r\n|FORECAST_LENGTH |the length of the days to forecast (test set)|\r\n|KERAS_MODEL |the Keras model, eg. 'GRU', 'LSTM', 'DNN', 'BPNN', model = Sequential(), or load_model.|\r\n|DECOM_MODE |the decomposition method, eg.'EMD', 'EEMD', 'CEEMDAN', 'VMD', 'OVMD', 'SVMD'|\r\n|INTE_LIST |the integration list, eg. 'auto', pd.Dataframe, (int) 3, (str) '233', (list) [0,0,1,1,1,2,2,2], ...|\r\n|REDECOM_LIST |the re-decomposition list, eg. '{'co-imf0':'vmd', 'co-imf1':'emd'}', pd.DataFrame|\r\n|NEXT_DAY |set True to only predict the next out-of-sample value|\r\n|DAY_AHEAD |define to forecast n days' ahead, eg. 0, 1, 2 (default int 1)|\r\n|NOR_METHOD |the normalizing method, eg. 'minmax'-MinMaxScaler, 'std'-StandardScaler, otherwise without normalization|\r\n|FIT_METHOD |the fitting method to stabilize the forecasting result (not necessarily useful), eg. 'add', 'ensemble' (there some error for ensembleFIT_METHOD, please use add method as default.)|\r\n|USE_TPU |change Keras model to TPU model (for Google Colab)|\r\n\r\n| Keras Parameters | Description (more details refer to https://keras.io) | \r\n| :-----| :----- | \r\n|epochs |training epochs/iterations, eg. 30-1000|\r\n|dropout |dropout rate of 3 dropout layers, eg. 0.2-0.5|\r\n|units |the units of network layers, which (3 layers) will set to 4*units, 2*units, units, eg. 4-32|\r\n|activation |activation function, all layers will be the same, eg. 'tanh', 'relu'|\r\n|batch_size |training batch_size for parallel computing, eg. 4-128|\r\n|shuffle |whether randomly disorder the training set during the training process, eg. True, False|\r\n|verbose |report of the training process, eg. 0 not displayed, 1 detailed, 2 rough|\r\n|valid_split |proportion of validation set during the training process, eg. 0.1-0.2|\r\n|opt |network optimizer, eg. 'adam', 'sgd'|\r\n|opt_lr |optimizer learning rate, eg. 0.001-0.1|\r\n|opt_loss |optimizer loss, eg. 'mse','mae','mape','hinge', refer to https://keras.io/zh/losses/.|\r\n|opt_patience |optimizer patience of adaptive learning rate, eg. 10-100|\r\n|stop_patience |early stop patience, eg. 10-100|\r\n\r\n### 2. Forecast\r\nYou can try the following forecasting methods. Note, `kr.` is the class defined in step 1, necessary for the code.\r\n```python\r\ndf_result = kr.single_keras_predict(data, show=True, plot=True, save=True)\r\n# df_result = kr.ensemble_keras_predict(data, show=True, plot=True, save=True)\r\n# df_result = kr.respective_keras_predict(data, show=True, plot=True, save=True)\r\n# df_result = kr.hybrid_keras_predict(data, show=True, plot=True, save=True)\r\n# df_result = kr.multiple_keras_predict(data, show=True, plot=True, save=True)\r\n```\r\n| Forecast Method | Description | \r\n| :-----| :----- | \r\n|Single Method |Use Keras model to directly forecast with vector input|\r\n|Ensemble Method |Use decomposition-integration Keras model to directly forecast with matrix input|\r\n|Respective Method |Use decomposition-integration Keras model to respectively forecast each IMFs with vector input|\r\n|Hybrid Method |Use the ensemble method to forecast high-frequency IMF and the respective method for other IMFs.|\r\n|Multiple Method |Multiple runs of the above method|\r\n|Rolling Method |Rolling run of the above method to avoid the look-ahead bias, but take a long long time|\r\n\r\n### 3. Validate \r\n#### (1) Plot heatmap\r\nYou need to install `seaborn` first, and the input should be 2D-array.\r\n```python\r\ncl.plot_heatmap(data, corr_method='pearson', fig_path=None)\r\n```\r\n#### (2) Diebold-Mariano-Test (DM test)\r\nDM test will output the DM test statistics and its p-value. You can refer to https://github.com/johntwk/Diebold-Mariano-Test.\r\n```python\r\nrt = cl.dm_test(actual_lst, pred1_lst, pred2_lst, h=1, crit=\"MSE\", power=2)\r\n```\r\n\r\n### 4. Next-day Forecast\r\nSet `NEXT_DAY=True`.\r\n```python\r\nkr = cl.keras_predictor(NEXT_DAY=True)\r\ndf_result = kr.hybrid_keras_predict(data, show=True, plot=True, save=True)\r\n# df_result = kr.rolling_keras_predict(data, predict_method='single')\r\n```\r\n\r\n## Sklearn Forecast\r\nYou can try the following forecasting methods. Note, `sr.` is the defined class, necessary for the code.\r\n```python\r\n# SKLEARN_MODEL = LASSO, SVM, or LGB(LightGBM); OPTIMIZER = Bayes, GS(GridSearch)\r\nsr = cl.sklearn_predictor(PATH=path, FORECAST_HORIZONS=30, FORECAST_LENGTH=30,\r\n SKLEARN_MODEL='LASSO', OPTIMIZER='Bayes',\r\n DECOM_MODE='OVMD', INTE_LIST='auto')\r\ndf_result = sr.single_sklearn_predict(data, show=True, plot=True, save=True)\r\n# df_result = sr.respective_sklearn_predict(data, show=True, plot=True, save=True)\r\n# df_result = sr.multiple_sklearn_predict(series_close, run_times=10, predict_method='single')\r\n```\r\n\r\n## Discussion\r\n### 1. Look-ahead bias\r\nAs the predictor will decompose the entire series first before splitting the training and test set, there is a look-ahead bias. It is still an issue about how to avoid the look-ahead bias.\r\n### 2. VMD decompose\r\nThe vmdpy module can only decompose the even-numbered length time series. When forecasting an odd-numbered length one, this module will delete the oldest data point. It is still an issue how to modify VMD decomposition. Moreover, selecting the K parameters is important for the VMD method, and hence, I will add some methods to choose a suitable K, such as OVMD, REI, SampEn, and so on.\r\n### 3. Rolling forecasting \r\nRolling forecasting costs a lot of time. Like a 30-forecast-length prediction, it will run 30 times cl.hybrid_keras_predict(), so I am not sure if it is really effective or not.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "CEEMDAN_LSTM is a Python project for decomposition-integration forecasting models based on EMD methods and LSTM.",
"version": "1.2.1",
"project_urls": {
"Homepage": "http://github.com/FateMurphy/CEEMDAN_LSTM"
},
"split_keywords": [
"ceemdan",
" vmd",
" lstm",
" decomposition",
" forecasting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7b569f80ea8a8667ff1d2386570d7fcd129af534082b7031e633e45e5554c490",
"md5": "824a09b7874182c720b29f68bb505603",
"sha256": "1db5bb772e7dde7f393771f7800c964fe26ef6eeacd847754acf94e038a6c66a"
},
"downloads": -1,
"filename": "CEEMDAN_LSTM-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "824a09b7874182c720b29f68bb505603",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 383554,
"upload_time": "2024-07-08T05:29:50",
"upload_time_iso_8601": "2024-07-08T05:29:50.209086Z",
"url": "https://files.pythonhosted.org/packages/7b/56/9f80ea8a8667ff1d2386570d7fcd129af534082b7031e633e45e5554c490/CEEMDAN_LSTM-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-08 05:29:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "FateMurphy",
"github_project": "CEEMDAN_LSTM",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ceemdan-lstm"
}