nabqr

Name	nabqr JSON
Version	0.0.46 JSON
	download
home_page	https://github.com/bast0320/nabqr
Summary	NABQR is a method for sequential error-corrections tailored for wind power forecast in Denmark
upload_time	2025-02-13 10:07:56
maintainer	None
docs_url	None
author	Bastian S. Jørgensen
requires_python	>=3.10
license	MIT
keywords	nabqr energy quantile forecasting
VCS
bugtrack_url
requirements	icecream matplotlib numpy pandas
Travis-CI
coveralls test coverage	No coveralls.

            # NABQR

[![PyPI Version](https://img.shields.io/pypi/v/nabqr.svg)](https://pypi.python.org/pypi/nabqr)
[![Documentation Status](https://readthedocs.org/projects/nabqr/badge/?version=latest)](https://nabqr.readthedocs.io/en/latest/?version=latest)

- **Free software**: MIT license  
- **Documentation**: [NABQR Documentation](https://nabqr.readthedocs.io)

README for nabqr package
=======================

## Table of Contents
- [Introduction](#introduction)
- [Getting Started](#getting-started)
- [Main functions](#main-functions)
  - [Pipeline](#pipeline)
  - [Time-Adaptive Quantile Regression](#time-adaptive-quantile-regression)
  - [LSTM Network](#lstm-network)
- [Demonstration: Test file](#test-file)
- [Credits/Copyright](#credits)
---

## Introduction

This section provides an overview of the project. Discuss the goals, purpose, and high-level summary here.


NABQR is a method for sequential error-corrections tailored for wind power forecast in Denmark.

The method is based on the paper: *Sequential methods for Error Corrections in Wind Power Forecasts*, with the following abstract:
> Wind power is a rapidly expanding renewable energy source and is set for continued growth in the future. This leads to parts of the world relying on an inherently volatile energy source.
> Efficient operation of such systems requires reliable probabilistic forecasts of future wind power production to better manage the uncertainty that wind power bring. These forecasts provide critical insights, enabling wind power producers and system operators to maximize the economic benefits of renewable energy while minimizing its potential adverse effects on grid stability.
> This study introduces sequential methods to correct errors in power production forecasts derived from numerical weather predictions. 
> We introduce Neural Adaptive Basis for (Time-Adaptive) Quantile Regression (NABQR), a novel approach that combines neural networks with Time-Adaptive Quantile Regression (TAQR) to enhance the accuracy of wind power production forecasts. 
> First, NABQR corrects power production ensembles using neural networks.
> Our study identifies Long Short-Term Memory networks as the most effective architecture for this purpose.
> Second, TAQR is applied to the corrected ensembles to obtain optimal median predictions along with quantile descriptions of the forecast density. 
> The method achieves substantial improvements upwards of 40% in mean absolute terms. Additionally, we explore the potential of this methodology for applications in energy trading.
> The method is available as an open-source Python package to support further research and applications in renewable energy forecasting.


- **Free software**: MIT license  
- **Documentation**: [NABQR Documentation](https://nabqr.readthedocs.io)
  - [API Documentation](https://nabqr.readthedocs.io/en/latest/api.html#main-pipeline)
---

## Getting Started

### Installation
`pip install nabqr`

Then see the [Test file](#test-file) section for an example of how to use the package.

## Main functions
### Pipeline
```python
import nabqr as nq
```

```python
nq.pipeline(X, y, 
             name = "TEST",
             training_size = 0.8, 
             epochs = 100,
             timesteps_for_lstm = [0,1,2,6,12,24,48],
             **kwargs)
```

The pipeline trains a LSTM network to correct the provided ensembles.
It then runs the TAQR algorithm on the corrected ensembles to predict the observations, y, on the test set.

**Parameters:**

- **X**: `pd.DataFrame` or `np.array`, shape `(n_timesteps, n_ensembles)`
  - The ensemble data to be corrected.
- **y**: `pd.Series` or `np.array`, shape `(n_timesteps,)`
  - The observations to be predicted.
- **name**: `str`
  - The name of the dataset.
- **training_size**: `float`
  - The proportion of the data to be used for training.
- **epochs**: `int`
  - The number of epochs to train the LSTM.
- **timesteps_for_lstm**: `list`
  - The timesteps to use for the LSTM.

**Output:**
The pipeline saves the following outputs and also returns them:

- **Corrected Ensembles**: 
  - File: `results_<today>_<data_source>_corrected_ensembles.csv`
  - A CSV file containing the corrected ensemble data.

- **TAQR Results**: 
  - File: `results_<today>_<data_source>_taqr_results.npy`
  - Contains the results from the Time-Adaptive Quantile Regression (TAQR).

- **Actuals Out of Sample**: 
  - File: `results_<today>_<data_source>_actuals_out_of_sample.npy`
  - Contains the actual observations that are out of the sample.

- **BETA Parameters**: 
  - File: `results_<today>_<data_source>_BETA_output.npy`
  - Contains the BETA parameters from the TAQR.

- **Ensembles**: 
  - Contains the original ensembles.


Note: `<today>` is the current date in the format `YYYY-MM-DD`, and `<data_source>` is the name of the dataset.

The pipeline trains a LSTM network to correct the provided ensembles and then runs the TAQR algorithm on the corrected ensembles to predict the observations, y, on the test set.

### Time-Adaptive Quantile Regression
nabqr also include a time-adaptive quantile regression model, which can be used independently of the pipeline.
```python
import nabqr as nq
```
```python
nq.run_taqr(corrected_ensembles, actuals, quantiles, n_init, n_full, n_in_X)
```

Run TAQR on `corrected_ensembles`, `X`, based on the actual values, `y`, and the given quantiles.

**Parameters:**

- **corrected_ensembles**: `np.array`, shape `(n_timesteps, n_ensembles)`
  - The corrected ensembles to run TAQR on.
- **actuals**: `np.array`, shape `(n_timesteps,)`
  - The actual values to run TAQR on.
- **quantiles**: `list`
  - The quantiles to run TAQR for.
- **n_init**: `int`
  - The number of initial timesteps to use for warm start.
- **n_full**: `int`
  - The total number of timesteps to run TAQR for.
- **n_in_X**: `int`
  - The number of timesteps to include in the design matrix.

### LSTM Network
The LSTM Network is trained on the ensembles and the actual values.

The function `train_model_lstm` is used to train the LSTM Network and can be called directly from the `functions.py` module. 

To build the LSTM network we use the `QuantileRegressionLSTM` class.

For more information, please refer to the [documentation on ReadTheDocs](https://nabqr.readthedocs.io/en/latest/).


## Test file 
Here we introduce the function `simulate_correlated_ar1_process`, which can be used to simulate multivariate AR data. The functions uses the `build_ar1_covariance` function to build the covariance matrix for the AR(1) process. The entire file can be run by 
```python
import nabqr as nq
nq.run_nabqr_pipeline(...)
# or
from nabqr import run_nabqr_pipeline
run_nabqr_pipeline(...)
```
The entire `run_nabqr_pipeline` function is provided below:
```python
def run_nabqr_pipeline(
    n_samples=2000,
    phi=0.995,
    sigma=8,
    offset_start=10,
    offset_end=500,
    offset_step=15,
    correlation=0.8,
    data_source="NABQR-TEST",
    training_size=0.7,
    epochs=20,
    timesteps=[0, 1, 2, 6, 12, 24],
    quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99],
    X=None,
    actuals=None,
):
    """
    Run the complete NABQR pipeline, which may include data simulation, model training,
    and visualization. The user can either provide pre-computed inputs (X, actuals)
    or opt to simulate data if both are not provided.

    Parameters
    ----------
    n_samples : int, optional
        Number of time steps to simulate if no data provided, by default 5000.
    phi : float, optional
        AR(1) coefficient for simulation, by default 0.995.
    sigma : float, optional
        Standard deviation of noise for simulation, by default 8.
    offset_start : int, optional
        Start value for offset range, by default 10.
    offset_end : int, optional
        End value for offset range, by default 500.
    offset_step : int, optional
        Step size for offset range, by default 15.
    correlation : float, optional
        Base correlation between dimensions, by default 0.8.
    data_source : str, optional
        Identifier for the data source, by default "NABQR-TEST".
    training_size : float, optional
        Proportion of data to use for training, by default 0.7.
    epochs : int, optional
        Number of epochs for model training, by default 100.
    timesteps : list, optional
        List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].
    quantiles : list, optional
        List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].
    X : array-like, optional
        Pre-computed input features. If not provided along with `actuals`, the function
        will prompt to simulate data.
    actuals : array-like, optional
        Pre-computed actual target values. If not provided along with `X`, the function
        will prompt to simulate data.
    simulation_type : str, optional
        Type of simulation to use, by default "ar1". "sde" is more advanced and uses a SDE model and realistic.
    visualize : bool, optional
        Determines if any visual elements will be plotted to the screen or saved as figures.
    taqr_limit : int, optional
        The lookback limit for the TAQR model, by default 5000.
    save_files : bool, optional
        Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.

    Returns
    -------
    tuple
        A tuple containing:
        - corrected_ensembles: pd.DataFrame
            The corrected ensemble predictions.
        - taqr_results: list of numpy.ndarray
            The TAQR results.
        - actuals_output: list of numpy.ndarray
            The actual output values.
        - BETA_output: list of numpy.ndarray
            The BETA parameters.
        - scores: pd.DataFrame
            The scores for the predictions and original/corrected ensembles.

    Raises
    ------
    ValueError
        If user opts not to simulate data when both X and actuals are missing.
    """
    # If both X and actuals are not provided, ask user if they want to simulate
    if X is None or actuals is None:
        if X is not None or actuals is not None:
            raise ValueError("Either provide both X and actuals, or none at all.")
        choice = (
            input(
                "X and actuals are not provided. Do you want to simulate data? (y/n): "
            )
            .strip()
            .lower()
        )
        if choice != "y":
            raise ValueError(
                "Data was not provided and simulation not approved. Terminating function."
            )

        # Generate offset and correlation matrix for simulation
        offset = np.arange(offset_start, offset_end, offset_step)
        m = len(offset)
        corr_matrix = correlation * np.ones((m, m)) + (1 - correlation) * np.eye(m)

        # Generate simulated data
        # Check if simulation_type is valid
        if simulation_type not in ["ar1", "sde"]:
            raise ValueError("Invalid simulation type. Please choose 'ar1' or 'sde'.")
        if simulation_type == "ar1":    
            X, actuals = simulate_correlated_ar1_process(
                n_samples, phi, sigma, m, corr_matrix, offset, smooth=5
            )
        elif simulation_type == "sde":
            initial_params = {
                    'X0': 0.6,
                    'theta': 0.77,
                    'kappa': 0.12,        # Slower mean reversion
                    'sigma_base': 1.05,  # Lower base volatility
                    'alpha': 0.57,       # Lower ARCH effect
                    'beta': 1.2,        # High persistence
                    'lambda_jump': 0.045, # Fewer jumps
                    'jump_mu': 0.0,     # Negative jumps
                    'jump_sigma': 0.1    # Moderate jump size variation
                }
            # Check that initial parameters are within bounds
            bounds = get_parameter_bounds()
            for param, value in initial_params.items():
                lower_bound, upper_bound = bounds[param]
                if not (lower_bound <= value <= upper_bound):
                    print(f"Initial parameter {param}={value} is out of bounds ({lower_bound}, {upper_bound})")
                    if value < lower_bound:
                        initial_params[param] = lower_bound
                    else:
                        initial_params[param] = upper_bound
            
            t, actuals, X = simulate_wind_power_sde(
                initial_params, T=n_samples, dt=1.0
            )



        # Plot the simulated data with X in shades of blue and actuals in bold black
        plt.figure(figsize=(10, 6))
        cmap = plt.cm.Blues
        num_series = X.shape[1] if X.ndim > 1 else 1
        colors = [cmap(i) for i in np.linspace(0.3, 1, num_series)]  # Shades of blue
        if num_series > 1:
            for i in range(num_series):
                plt.plot(X[:, i], color=colors[i], alpha=0.7)
        else:
            plt.plot(X, color=colors[0], alpha=0.7)
        plt.plot(actuals, color="black", linewidth=2, label="Actuals")
        plt.title("Simulated Data")
        plt.xlabel("Time")
        plt.ylabel("Value")
        plt.legend()
        plt.show()

    # Run the pipeline
    corrected_ensembles, taqr_results, actuals_output, BETA_output, X_ensembles = pipeline(
        X,
        actuals,
        data_source,
        training_size=training_size,
        epochs=epochs,
        timesteps_for_lstm=timesteps,
        quantiles_taqr=quantiles,
        limit=taqr_limit,
        save_files = save_files
    )

    # Get today's date for file naming
    today = dt.datetime.today().strftime("%Y-%m-%d")

    # Visualize results
    if visualize:
        visualize_results(actuals_output, taqr_results, f"{data_source} example")

    # Calculate scores
    scores = calculate_scores(
        actuals_output,
        taqr_results,
        X_ensembles,
        corrected_ensembles,
        quantiles,
        data_source,
        plot_reliability=True,
        visualize = visualize
    )

    return corrected_ensembles, taqr_results, actuals_output, BETA_output, scores

```

We provide an overview of the shapes for this test file:
```python
actuals.shape: (n_samples,) # 2000
m: 1 + (offset_end - offset_start) // offset_step # 33
simulated_data.shape: (n_samples, m) # (2000, 33)
len(quantiles_taqr): 7
```

## Requirements

- Python 3.10 or later
- icecream, matplotlib, numpy<2.0.0, pandas, properscoring, rich, SciencePlots, scikit_learn, scipy, tensorflow, tensorflow_probability, torch, typer, sphinx_rtd_theme, myst_parser, tf_keras
- R with the following packages: quantreg, readr, SparseM (implicitly called)

## Credits/Copyright
Copyright © 2024 Technical University of Denmark

This version of the software was developed by Bastian Schmidt Jørgensen as a Research Assistant at the Department of Dynamical Systems, DTU Compute.


This package was partially created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [`audreyr/cookiecutter-pypackage`](https://github.com/audreyr/cookiecutter-pypackage) project template.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bast0320/nabqr",
    "name": "nabqr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "nabqr, energy, quantile, forecasting",
    "author": "Bastian S. J\u00f8rgensen",
    "author_email": "bassc@dtu.dk",
    "download_url": "https://files.pythonhosted.org/packages/c6/24/8a53d54d47230fdb7fc15f9ded200f26894fb36731a025c19b92d20b1240/nabqr-0.0.46.tar.gz",
    "platform": null,
    "description": "# NABQR\n\n[![PyPI Version](https://img.shields.io/pypi/v/nabqr.svg)](https://pypi.python.org/pypi/nabqr)\n[![Documentation Status](https://readthedocs.org/projects/nabqr/badge/?version=latest)](https://nabqr.readthedocs.io/en/latest/?version=latest)\n\n- **Free software**: MIT license  \n- **Documentation**: [NABQR Documentation](https://nabqr.readthedocs.io)\n\nREADME for nabqr package\n=======================\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Getting Started](#getting-started)\n- [Main functions](#main-functions)\n  - [Pipeline](#pipeline)\n  - [Time-Adaptive Quantile Regression](#time-adaptive-quantile-regression)\n  - [LSTM Network](#lstm-network)\n- [Demonstration: Test file](#test-file)\n- [Credits/Copyright](#credits)\n---\n\n## Introduction\n\nThis section provides an overview of the project. Discuss the goals, purpose, and high-level summary here.\n\n\nNABQR is a method for sequential error-corrections tailored for wind power forecast in Denmark.\n\nThe method is based on the paper: *Sequential methods for Error Corrections in Wind Power Forecasts*, with the following abstract:\n> Wind power is a rapidly expanding renewable energy source and is set for continued growth in the future. This leads to parts of the world relying on an inherently volatile energy source.\n> Efficient operation of such systems requires reliable probabilistic forecasts of future wind power production to better manage the uncertainty that wind power bring. These forecasts provide critical insights, enabling wind power producers and system operators to maximize the economic benefits of renewable energy while minimizing its potential adverse effects on grid stability.\n> This study introduces sequential methods to correct errors in power production forecasts derived from numerical weather predictions. \n> We introduce Neural Adaptive Basis for (Time-Adaptive) Quantile Regression (NABQR), a novel approach that combines neural networks with Time-Adaptive Quantile Regression (TAQR) to enhance the accuracy of wind power production forecasts. \n> First, NABQR corrects power production ensembles using neural networks.\n> Our study identifies Long Short-Term Memory networks as the most effective architecture for this purpose.\n> Second, TAQR is applied to the corrected ensembles to obtain optimal median predictions along with quantile descriptions of the forecast density. \n> The method achieves substantial improvements upwards of 40% in mean absolute terms. Additionally, we explore the potential of this methodology for applications in energy trading.\n> The method is available as an open-source Python package to support further research and applications in renewable energy forecasting.\n\n\n- **Free software**: MIT license  \n- **Documentation**: [NABQR Documentation](https://nabqr.readthedocs.io)\n  - [API Documentation](https://nabqr.readthedocs.io/en/latest/api.html#main-pipeline)\n---\n\n## Getting Started\n\n### Installation\n`pip install nabqr`\n\nThen see the [Test file](#test-file) section for an example of how to use the package.\n\n## Main functions\n### Pipeline\n```python\nimport nabqr as nq\n```\n\n```python\nnq.pipeline(X, y, \n             name = \"TEST\",\n             training_size = 0.8, \n             epochs = 100,\n             timesteps_for_lstm = [0,1,2,6,12,24,48],\n             **kwargs)\n```\n\nThe pipeline trains a LSTM network to correct the provided ensembles.\nIt then runs the TAQR algorithm on the corrected ensembles to predict the observations, y, on the test set.\n\n**Parameters:**\n\n- **X**: `pd.DataFrame` or `np.array`, shape `(n_timesteps, n_ensembles)`\n  - The ensemble data to be corrected.\n- **y**: `pd.Series` or `np.array`, shape `(n_timesteps,)`\n  - The observations to be predicted.\n- **name**: `str`\n  - The name of the dataset.\n- **training_size**: `float`\n  - The proportion of the data to be used for training.\n- **epochs**: `int`\n  - The number of epochs to train the LSTM.\n- **timesteps_for_lstm**: `list`\n  - The timesteps to use for the LSTM.\n\n**Output:**\nThe pipeline saves the following outputs and also returns them:\n\n- **Corrected Ensembles**: \n  - File: `results_<today>_<data_source>_corrected_ensembles.csv`\n  - A CSV file containing the corrected ensemble data.\n\n- **TAQR Results**: \n  - File: `results_<today>_<data_source>_taqr_results.npy`\n  - Contains the results from the Time-Adaptive Quantile Regression (TAQR).\n\n- **Actuals Out of Sample**: \n  - File: `results_<today>_<data_source>_actuals_out_of_sample.npy`\n  - Contains the actual observations that are out of the sample.\n\n- **BETA Parameters**: \n  - File: `results_<today>_<data_source>_BETA_output.npy`\n  - Contains the BETA parameters from the TAQR.\n\n- **Ensembles**: \n  - Contains the original ensembles.\n\n\nNote: `<today>` is the current date in the format `YYYY-MM-DD`, and `<data_source>` is the name of the dataset.\n\nThe pipeline trains a LSTM network to correct the provided ensembles and then runs the TAQR algorithm on the corrected ensembles to predict the observations, y, on the test set.\n\n### Time-Adaptive Quantile Regression\nnabqr also include a time-adaptive quantile regression model, which can be used independently of the pipeline.\n```python\nimport nabqr as nq\n```\n```python\nnq.run_taqr(corrected_ensembles, actuals, quantiles, n_init, n_full, n_in_X)\n```\n\nRun TAQR on `corrected_ensembles`, `X`, based on the actual values, `y`, and the given quantiles.\n\n**Parameters:**\n\n- **corrected_ensembles**: `np.array`, shape `(n_timesteps, n_ensembles)`\n  - The corrected ensembles to run TAQR on.\n- **actuals**: `np.array`, shape `(n_timesteps,)`\n  - The actual values to run TAQR on.\n- **quantiles**: `list`\n  - The quantiles to run TAQR for.\n- **n_init**: `int`\n  - The number of initial timesteps to use for warm start.\n- **n_full**: `int`\n  - The total number of timesteps to run TAQR for.\n- **n_in_X**: `int`\n  - The number of timesteps to include in the design matrix.\n\n### LSTM Network\nThe LSTM Network is trained on the ensembles and the actual values.\n\nThe function `train_model_lstm` is used to train the LSTM Network and can be called directly from the `functions.py` module. \n\nTo build the LSTM network we use the `QuantileRegressionLSTM` class.\n\nFor more information, please refer to the [documentation on ReadTheDocs](https://nabqr.readthedocs.io/en/latest/).\n\n\n## Test file \nHere we introduce the function `simulate_correlated_ar1_process`, which can be used to simulate multivariate AR data. The functions uses the `build_ar1_covariance` function to build the covariance matrix for the AR(1) process. The entire file can be run by \n```python\nimport nabqr as nq\nnq.run_nabqr_pipeline(...)\n# or\nfrom nabqr import run_nabqr_pipeline\nrun_nabqr_pipeline(...)\n```\nThe entire `run_nabqr_pipeline` function is provided below:\n```python\ndef run_nabqr_pipeline(\n    n_samples=2000,\n    phi=0.995,\n    sigma=8,\n    offset_start=10,\n    offset_end=500,\n    offset_step=15,\n    correlation=0.8,\n    data_source=\"NABQR-TEST\",\n    training_size=0.7,\n    epochs=20,\n    timesteps=[0, 1, 2, 6, 12, 24],\n    quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99],\n    X=None,\n    actuals=None,\n):\n    \"\"\"\n    Run the complete NABQR pipeline, which may include data simulation, model training,\n    and visualization. The user can either provide pre-computed inputs (X, actuals)\n    or opt to simulate data if both are not provided.\n\n    Parameters\n    ----------\n    n_samples : int, optional\n        Number of time steps to simulate if no data provided, by default 5000.\n    phi : float, optional\n        AR(1) coefficient for simulation, by default 0.995.\n    sigma : float, optional\n        Standard deviation of noise for simulation, by default 8.\n    offset_start : int, optional\n        Start value for offset range, by default 10.\n    offset_end : int, optional\n        End value for offset range, by default 500.\n    offset_step : int, optional\n        Step size for offset range, by default 15.\n    correlation : float, optional\n        Base correlation between dimensions, by default 0.8.\n    data_source : str, optional\n        Identifier for the data source, by default \"NABQR-TEST\".\n    training_size : float, optional\n        Proportion of data to use for training, by default 0.7.\n    epochs : int, optional\n        Number of epochs for model training, by default 100.\n    timesteps : list, optional\n        List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].\n    quantiles : list, optional\n        List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].\n    X : array-like, optional\n        Pre-computed input features. If not provided along with `actuals`, the function\n        will prompt to simulate data.\n    actuals : array-like, optional\n        Pre-computed actual target values. If not provided along with `X`, the function\n        will prompt to simulate data.\n    simulation_type : str, optional\n        Type of simulation to use, by default \"ar1\". \"sde\" is more advanced and uses a SDE model and realistic.\n    visualize : bool, optional\n        Determines if any visual elements will be plotted to the screen or saved as figures.\n    taqr_limit : int, optional\n        The lookback limit for the TAQR model, by default 5000.\n    save_files : bool, optional\n        Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.\n\n    Returns\n    -------\n    tuple\n        A tuple containing:\n        - corrected_ensembles: pd.DataFrame\n            The corrected ensemble predictions.\n        - taqr_results: list of numpy.ndarray\n            The TAQR results.\n        - actuals_output: list of numpy.ndarray\n            The actual output values.\n        - BETA_output: list of numpy.ndarray\n            The BETA parameters.\n        - scores: pd.DataFrame\n            The scores for the predictions and original/corrected ensembles.\n\n    Raises\n    ------\n    ValueError\n        If user opts not to simulate data when both X and actuals are missing.\n    \"\"\"\n    # If both X and actuals are not provided, ask user if they want to simulate\n    if X is None or actuals is None:\n        if X is not None or actuals is not None:\n            raise ValueError(\"Either provide both X and actuals, or none at all.\")\n        choice = (\n            input(\n                \"X and actuals are not provided. Do you want to simulate data? (y/n): \"\n            )\n            .strip()\n            .lower()\n        )\n        if choice != \"y\":\n            raise ValueError(\n                \"Data was not provided and simulation not approved. Terminating function.\"\n            )\n\n        # Generate offset and correlation matrix for simulation\n        offset = np.arange(offset_start, offset_end, offset_step)\n        m = len(offset)\n        corr_matrix = correlation * np.ones((m, m)) + (1 - correlation) * np.eye(m)\n\n        # Generate simulated data\n        # Check if simulation_type is valid\n        if simulation_type not in [\"ar1\", \"sde\"]:\n            raise ValueError(\"Invalid simulation type. Please choose 'ar1' or 'sde'.\")\n        if simulation_type == \"ar1\":    \n            X, actuals = simulate_correlated_ar1_process(\n                n_samples, phi, sigma, m, corr_matrix, offset, smooth=5\n            )\n        elif simulation_type == \"sde\":\n            initial_params = {\n                    'X0': 0.6,\n                    'theta': 0.77,\n                    'kappa': 0.12,        # Slower mean reversion\n                    'sigma_base': 1.05,  # Lower base volatility\n                    'alpha': 0.57,       # Lower ARCH effect\n                    'beta': 1.2,        # High persistence\n                    'lambda_jump': 0.045, # Fewer jumps\n                    'jump_mu': 0.0,     # Negative jumps\n                    'jump_sigma': 0.1    # Moderate jump size variation\n                }\n            # Check that initial parameters are within bounds\n            bounds = get_parameter_bounds()\n            for param, value in initial_params.items():\n                lower_bound, upper_bound = bounds[param]\n                if not (lower_bound <= value <= upper_bound):\n                    print(f\"Initial parameter {param}={value} is out of bounds ({lower_bound}, {upper_bound})\")\n                    if value < lower_bound:\n                        initial_params[param] = lower_bound\n                    else:\n                        initial_params[param] = upper_bound\n            \n            t, actuals, X = simulate_wind_power_sde(\n                initial_params, T=n_samples, dt=1.0\n            )\n\n\n\n        # Plot the simulated data with X in shades of blue and actuals in bold black\n        plt.figure(figsize=(10, 6))\n        cmap = plt.cm.Blues\n        num_series = X.shape[1] if X.ndim > 1 else 1\n        colors = [cmap(i) for i in np.linspace(0.3, 1, num_series)]  # Shades of blue\n        if num_series > 1:\n            for i in range(num_series):\n                plt.plot(X[:, i], color=colors[i], alpha=0.7)\n        else:\n            plt.plot(X, color=colors[0], alpha=0.7)\n        plt.plot(actuals, color=\"black\", linewidth=2, label=\"Actuals\")\n        plt.title(\"Simulated Data\")\n        plt.xlabel(\"Time\")\n        plt.ylabel(\"Value\")\n        plt.legend()\n        plt.show()\n\n    # Run the pipeline\n    corrected_ensembles, taqr_results, actuals_output, BETA_output, X_ensembles = pipeline(\n        X,\n        actuals,\n        data_source,\n        training_size=training_size,\n        epochs=epochs,\n        timesteps_for_lstm=timesteps,\n        quantiles_taqr=quantiles,\n        limit=taqr_limit,\n        save_files = save_files\n    )\n\n    # Get today's date for file naming\n    today = dt.datetime.today().strftime(\"%Y-%m-%d\")\n\n    # Visualize results\n    if visualize:\n        visualize_results(actuals_output, taqr_results, f\"{data_source} example\")\n\n    # Calculate scores\n    scores = calculate_scores(\n        actuals_output,\n        taqr_results,\n        X_ensembles,\n        corrected_ensembles,\n        quantiles,\n        data_source,\n        plot_reliability=True,\n        visualize = visualize\n    )\n\n    return corrected_ensembles, taqr_results, actuals_output, BETA_output, scores\n\n```\n\nWe provide an overview of the shapes for this test file:\n```python\nactuals.shape: (n_samples,) # 2000\nm: 1 + (offset_end - offset_start) // offset_step # 33\nsimulated_data.shape: (n_samples, m) # (2000, 33)\nlen(quantiles_taqr): 7\n```\n\n## Requirements\n\n- Python 3.10 or later\n- icecream, matplotlib, numpy<2.0.0, pandas, properscoring, rich, SciencePlots, scikit_learn, scipy, tensorflow, tensorflow_probability, torch, typer, sphinx_rtd_theme, myst_parser, tf_keras\n- R with the following packages: quantreg, readr, SparseM (implicitly called)\n\n## Credits/Copyright\nCopyright \u00a9 2024 Technical University of Denmark\n\nThis version of the software was developed by Bastian Schmidt J\u00f8rgensen as a Research Assistant at the Department of Dynamical Systems, DTU Compute.\n\n\nThis package was partially created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [`audreyr/cookiecutter-pypackage`](https://github.com/audreyr/cookiecutter-pypackage) project template.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "NABQR is a method for sequential error-corrections tailored for wind power forecast in Denmark",
    "version": "0.0.46",
    "project_urls": {
        "Homepage": "https://github.com/bast0320/nabqr"
    },
    "split_keywords": [
        "nabqr",
        " energy",
        " quantile",
        " forecasting"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f5273ad3703b62ae72130888c939a4d6cb8b521d36975aa2dbf6691e8c9a89d",
                "md5": "3bd39f42cee37295e7bcd23e7efc91b6",
                "sha256": "07a846eef277aab97f45ea9817e5f18ebe9223ae861a6d3e2138f2ebbda45180"
            },
            "downloads": -1,
            "filename": "nabqr-0.0.46-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3bd39f42cee37295e7bcd23e7efc91b6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 54699,
            "upload_time": "2025-02-13T10:07:54",
            "upload_time_iso_8601": "2025-02-13T10:07:54.294727Z",
            "url": "https://files.pythonhosted.org/packages/1f/52/73ad3703b62ae72130888c939a4d6cb8b521d36975aa2dbf6691e8c9a89d/nabqr-0.0.46-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6248a53d54d47230fdb7fc15f9ded200f26894fb36731a025c19b92d20b1240",
                "md5": "153021f8b4c08e5a2c1912ad9d1f82f1",
                "sha256": "5f2c92ba95ce695f647de181891be495340953ce8ea56aabe24c24316462c0a2"
            },
            "downloads": -1,
            "filename": "nabqr-0.0.46.tar.gz",
            "has_sig": false,
            "md5_digest": "153021f8b4c08e5a2c1912ad9d1f82f1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 44508,
            "upload_time": "2025-02-13T10:07:56",
            "upload_time_iso_8601": "2025-02-13T10:07:56.398730Z",
            "url": "https://files.pythonhosted.org/packages/c6/24/8a53d54d47230fdb7fc15f9ded200f26894fb36731a025c19b92d20b1240/nabqr-0.0.46.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-13 10:07:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bast0320",
    "github_project": "nabqr",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "icecream",
            "specs": [
                [
                    ">=",
                    "2.1.3"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.8.4"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.2.3"
                ]
            ]
        }
    ],
    "lcname": "nabqr"
}

Bastian S. Jørgensen