SPRT-TANDEM

Name	SPRT-TANDEM JSON
Version	0.1.11 JSON
	download
home_page	https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch
Summary	SPRT-TANDEM for sequential density ratio estimation to simultaneously optimize both speed and accuracy of early-classification.
upload_time	2023-06-01 03:15:15
maintainer
docs_url	None
author	Akinori F. Ebihara
requires_python	>=3.8
license	MIT
keywords	sequential probability ratio test likelihood ratio density ratio estimation early classification artificial intelligence machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # SPRT-TANDEM-PyTorch
This repository contains the official PyTorch implementation of __SPRT-TANDEM__ ([ICASSP2023](https://arxiv.org/abs/2302.09810), [ICML2021](http://proceedings.mlr.press/v139/miyagawa21a.html), and [ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL)). __SPRT-TANDEM__ is a neuroscience-inspired sequential density ratio estimation (SDRE) algorithm that estimates log-likelihood ratios of two or more hypotheses for fast and accurate sequential data classification. For an intuitive understanding, please refer to the [SPRT-TANDEM tutorial](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM_tutorial).


## Quickstart
1. To create a new SDRE dataset, run the [Generate_sequential_Gaussian_as_LMDB.ipynb](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/notebooks/Generate_sequential_Gaussian_as_LMDB.ipynb) notebook.
2. Edit the user editable block of [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Specify path to the dataset file created in step 1. Other frequently used entries include SUBPROJECT_NAME_PREFIX (to tag your experiment) and EXP_PHASE (to specify whether you are trying, tuning, or running statistics. See Hyperparameter Tuning for details).
3. Execute [sprt_tandem_main.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/sprt_tandem_main.py).

## Tested Environment
```
python      3.8.10
torch       2.0.0
notebook    6.5.3
optuna      3.1.0
```
## Supported Network Architectures  
We support the two major architectures for processing time series data: Long short-term memory (LSTM, [1]) and Transformer [2]. To avoid the likelihood ratio saturation problem and approach asymptotic optimality (for details, see [Ebihara+, ICASSP2023](https://arxiv.org/abs/2302.09810)), we developed two novel models based on these architectures: B2Bssqrt-TANDEM (based on LSTM) and TANDEMformer (based on Transformer).
### LSTM (B2Bsqrt-TANDEM, [ICASSP2023](https://arxiv.org/abs/2302.09810))  
The LSTM with the back-to-back square root (B2Bsqrt) activation function can be used by setting the following variables: 

- MODEL_BACKBONE: "LSTM"
- ACTIVATION_OUTPUT: "B2Bsqrt"  

It's important to note that setting ACTIVATION_OUTPUT to "tanh" will result in a vanilla LSTM. The B2Bsqrt function was introduced in the ICASSP2023 paper as a way to precisely avoid the likelihood ratio saturation problem in SDRE.  

\begin{align}
f_{\mathrm{B2Bsqrt}}(x) := \mathrm{sign}(x)(\sqrt{\alpha+|x|}-\sqrt{\alpha})
\end{align}

Where $\alpha$ is a hyperparameter. 


### Transformer (TANDEMformer, [ICASSP2023](https://arxiv.org/abs/2302.09810))  
The Transformer is equipped with the Normalized Summation Pooling (NSP) layer, which is incorporated by default.

Let $X_i^{(t, t+w)}$ be subtokens sampled with a sliding window of size $w \in [N]$, and let $Z_i^{(t, t+w)}:=\{z_i^{(s)}\}^{t+w}_{s=t}$ be the subtokens mixed with self-attention. Given the Markov order $N$, the \texttt{NSP} layer is defined as:

\begin{align}
NSP(Z_i^{(t, t+w)}) := \sum_{s=t}^{t+w}\frac{z_i^{(s)}}{N+1}.
\end{align}

To use it, set the following variable:

- MODEL_BACKBONE: "Transformer"



## Supported Loss Functions for SDRE
SPRT-TANDEM uses both the loss for sequential likelihood ratio estimation (SDRE) and (multiplet-) cross-entropy loss ([ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL)). The two functions, LSEL and LLLR, are supported loss function for SDRE. To choose the loss function, set the following variables:

- LLLR_VERSION: "LSEL" or "LLLR"

Additionally, modify the values of PARAM_LLR_LOSS and PARAM_MULTIPLET_LOSS to achieve the desired balance between likelihood estimation and cross-entropy loss.
### Log-sum exponential loss (LSEL, [ICML2021](http://proceedings.mlr.press/v139/miyagawa21a.html))  

\begin{align}
\hat{L}_{\mathrm{\text{LSEL}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \log \left(   1 + \sum\_{l(\neq k)} e^{ -\hat{\lambda}\_{k,l}   ( X_i^{(1,t)}; \theta)  }\right)  \right]
\end{align}

<!-- 1 + \sum_{l(\neq k)} e^{ - \hat{\lambda}_{k l} ( X_i^{(1,t)}; \theta) } -->

### Loss for log-likelihood ratio estimation (LLLR, [ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL))  

\begin{align}
\hat{L}_{\mathrm{\text{LLLR}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \left| y - \sigma\left( \hat{\lambda}\_{k,l}   ( X_i^{(1,t)}; \theta) \right) \right| \right]
\end{align}

## Order N of Markov assumption
The Markov order $N$ is used to determine the length of the sliding window that extracts a subset from the entire feature vector of a time series. $N$ is a convenient hyperparameter that incorporates prior knowledge of the time series. An optimal $N$ can be found either based on the \textit{specific time scale} or through hyperparameter tuning. The specific time scale characterizes the data class, e.g., long temporal action such as UCF101 has a long specific time scale, while a spoofing attack such as SiW has a short specific time scale (because one frame can have sufficient information of the attack). Setting $N$ equal to the specific time scale usually works best. Alternatively, $N$ can be objectively chosen using a hyperparameter tuning algorithm such as Optuna, just like other hyperparameters. Because $N$ is only related to the temporal integrator after feature extraction, optimizing it is not computationally expensive.


The log-likelihood ratio is estimated from a subset of the feature vectors extracted using a sliding window of size $N$. This estimation is classification-based. Specifically, the temporal integrator is trained to output class logits, which are then used to update the log-likelihood ratio at each time step based on the TANDEM formula.
### TANDEM formula ([ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL))
\begin{align}
&\ \log \left(
\frac{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=1)}{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=0)}
\right)\nonumber \newline
= &\sum_{s=N+1}^{t} \log \left(
\frac{
p(y=1| x^{(s-N)}, ...,x^{(s)})
}{
p(y=0| x^{(s-N)}, ...,x^{(s)})
}
\right) - \sum_{s=N+2}^{t} \log \left(
\frac{
p(y=1| x^{(s-N)}, ...,x^{(s-1)})
}{
p(y=0| x^{(s-N)}, ...,x^{(s-1)})
}
\right) \nonumber \newline
& - \log\left( \frac{p(y=1)}{p(y=0)} \right)
\end{align}

## Experiment Phases
EXP_PHASE must be set as one of the followings:
- try: All the hyperparameters are fixed as defined in [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Use it for debugging purposes.
- tuning: Enter hyperparameter tuning mode. Hyperparameters with corresponding search spaces will be overwritten with suggested parameters. See the Hyperparameter Tuning section for more details.
- stat: All the hyperparameters are fixed as defined in [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Repeat training for the specified number of times with NUM_TRIALS to test reproducibility (e.g., plot error bars, run a statistical test). 
The subproject name will be suffixed with the EXP_PHASE to prevent contamination of results from different phases.

## Hyperparameter Tuning
Our project supports Optuna [3] for hyperparameter tuning. To begin, edit the following variables in the [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py):
- EXP_PHASE: set as "tuning" to enter hyperparameter tuning mode.
- NUM_TRIALS: set an integer that specifies the number of hyperparameter sets to experiment with.
- PRUNER_NAME (optional): select a pruner supported by Optuna, or set it to "None."    
Also, set PRUNER_STARTUP_TRIALS, PRUNER_WARMUP_STEPS, and PRUNER_INTERVAL STEPS. For details, see the [official Optuna docs](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.MedianPruner.html#optuna.pruners.MedianPruner).  

Next, customize the hyperparameter space defined with variables that have prefix "SPACE_". For example, [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py) contains an entry like this:

```
    "SPACE_ORDER_SPRT": {
        "PARAM_SPACE": "int",
        "LOW": 0,
        "HIGH": 5,  # 10
        "STEP": 1,
        "LOG": False,
    }
```
The above entry specifies the search space of a hyperparameter "ORDER_SPRT." The key "PARAM_SPACE" must be one of the followings:  
 - float: use suggest_float to suggest a float of range [LOW, HIGH], separated by STEP. If LOG=True, a float is sampled from logspace. However, if LOG=True, set STEP=None.
 - int: use suggest_int to suggest an integer of range [LOW, HIGH], separated by STEP. STEP should be divisor of the range; otherwise, HIGH will be automatically modified. If LOG=True, an int is sampled from logspace. However, if LOG=True, set STEP=None.
 - categorical: use suggest_categorical to select one category from CATEGORY_SET. Note that if the parameter is continuous (e.g., 1, 2, 3, ..., or 1.0, 0.1, 0.001, ...), it is advisable to use float or int space because suggest_categorical treats each category independently.

For more informatin, please refer to the [official Optuna docs](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html).   

To select specific values for a hyperparameter, use entries that start with "SPACE_". These values will be assigned to the hyperparameter whose name is defined after "SPACE_" (for example, in the above example, "ORDER_SPRT").

## Command-line Arguments  
Frequently-used variables can be overwritten by specifying command-line arguments. 
```
options:
  -h, --help            show this help message and exit
  -g GPU, --gpu         set GPU, gpu number
  -t NUM_TRIALS, --num_trials 
                        set NUM_TRIALS, number of trials
  -i NUM_ITER, --num_iter 
                        set NUM_ITER, number of iterations
  -e EXP_PHASE, --exp_phase EXP_PHASE
                        phase of an experiment, "try," "tuning," or "stat"
  -m MODEL, --model MODEL
                        set model backbone, "LSTM", or "Transformer"
  -o OPTIMIZE, --optimize OPTIMIZE
                        set optimization target: "MABS", "MacRec", "ausat_confmx", or "ALL"
  -n NAME, --name NAME  set the subproject name
  --flip_memory_loading
                        set a boolean flag indicating whether to load onto memory

```
## Logging
Under the [logs](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/logs) folder, you will see a subfolder like this:
```
{SUBPROJECT_SUFFIX}_offset{DATA_SEPARATION}_optim{OPTIMIZATION_TARGET}_{EXP_PHASE}
```
inside of which the following four folders will be created.
- Optuna_databases: Optuna .db file is stored here.
- TensorBoard_events: TensorBard event files are saved here.
- checkpoints: trained parameters are saved as .py files when the best optimation target value is updated.
- stdout_logs: standard output strings are saved as .log files.

The plot below shows an example image saved in a TensorBoard event file. Note that you can avoid saving figures by setting IS_SAVE_FIGURE=False.


Note that "Class $a$ vs. $b$ at $y=a$" indicates that the plotted LLR shows $\log{p(X|y=a) / p(X|y=b)}$, when the ground truth label is $y=a$. 

## Citation
___Please cite the orignal paper(s) if you use the whole or a part of our codes.___

```
# ICASSP2023
@inproceedings{saturation_problem,
  title =     {Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for Early Classification},
  author =    {Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
  year =      {2023},
}

# ICML2021
@inproceedings{MSPRT-TANDEM,
  title = 	  {The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization},
  author =    {Miyagawa, Taiki and Ebihara, Akinori F},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	  {7792--7804},
  year = 	  {2021},
  url = 	  {http://proceedings.mlr.press/v139/miyagawa21a.html}
}

# ICLR2021
@inproceedings{SPRT-TANDEM,
  title={Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy},
  author={Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=Rhsu5qD36cL}
}
```

## References
1. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
2. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in KDD, 2019, p. 2623–2631.
3. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, vol. 30, pp. 5998–6008.

## Contacts
SPRT-TANDEM marks its 4th anniversary. What started as a small project has now become a huge undertaking that we never imagined. Due to its complexity, it is difficult for me to explain all the details in this README section. Please feel free to reach out to me anytime if you have any questions.
- email: aebihara[at]nec.com
- twitter: [@non_iid](http/twitter.com/non_iid)
- GitHub issues: see the link above or click [here](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/issues)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch",
    "name": "SPRT-TANDEM",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "Sequential Probability Ratio Test,likelihood ratio,density ratio estimation,early classification,artificial intelligence,machine learning",
    "author": "Akinori F. Ebihara",
    "author_email": "aebihara@nec.com",
    "download_url": "https://files.pythonhosted.org/packages/54/3b/c399732740160d0826562d03ac204ee80b6139bbca81efaada3e24fac607/SPRT-TANDEM-0.1.11.tar.gz",
    "platform": null,
    "description": "# SPRT-TANDEM-PyTorch\nThis repository contains the official PyTorch implementation of __SPRT-TANDEM__ ([ICASSP2023](https://arxiv.org/abs/2302.09810), [ICML2021](http://proceedings.mlr.press/v139/miyagawa21a.html), and [ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL)). __SPRT-TANDEM__ is a neuroscience-inspired sequential density ratio estimation (SDRE) algorithm that estimates log-likelihood ratios of two or more hypotheses for fast and accurate sequential data classification. For an intuitive understanding, please refer to the [SPRT-TANDEM tutorial](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM_tutorial).\n\n\n## Quickstart\n1. To create a new SDRE dataset, run the [Generate_sequential_Gaussian_as_LMDB.ipynb](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/notebooks/Generate_sequential_Gaussian_as_LMDB.ipynb) notebook.\n2. Edit the user editable block of [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Specify path to the dataset file created in step 1. Other frequently used entries include SUBPROJECT_NAME_PREFIX (to tag your experiment) and EXP_PHASE (to specify whether you are trying, tuning, or running statistics. See Hyperparameter Tuning for details).\n3. Execute [sprt_tandem_main.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/sprt_tandem_main.py).\n\n## Tested Environment\n```\npython      3.8.10\ntorch       2.0.0\nnotebook    6.5.3\noptuna      3.1.0\n```\n## Supported Network Architectures  \nWe support the two major architectures for processing time series data: Long short-term memory (LSTM, [1]) and Transformer [2]. To avoid the likelihood ratio saturation problem and approach asymptotic optimality (for details, see [Ebihara+, ICASSP2023](https://arxiv.org/abs/2302.09810)), we developed two novel models based on these architectures: B2Bssqrt-TANDEM (based on LSTM) and TANDEMformer (based on Transformer).\n### LSTM (B2Bsqrt-TANDEM, [ICASSP2023](https://arxiv.org/abs/2302.09810))  \nThe LSTM with the back-to-back square root (B2Bsqrt) activation function can be used by setting the following variables: \n\n- MODEL_BACKBONE: \"LSTM\"\n- ACTIVATION_OUTPUT: \"B2Bsqrt\"  \n\nIt's important to note that setting ACTIVATION_OUTPUT to \"tanh\" will result in a vanilla LSTM. The B2Bsqrt function was introduced in the ICASSP2023 paper as a way to precisely avoid the likelihood ratio saturation problem in SDRE.  \n\n\\begin{align}\nf_{\\mathrm{B2Bsqrt}}(x) := \\mathrm{sign}(x)(\\sqrt{\\alpha+|x|}-\\sqrt{\\alpha})\n\\end{align}\n\nWhere $\\alpha$ is a hyperparameter. \n\n\n### Transformer (TANDEMformer, [ICASSP2023](https://arxiv.org/abs/2302.09810))  \nThe Transformer is equipped with the Normalized Summation Pooling (NSP) layer, which is incorporated by default.\n\nLet $X_i^{(t, t+w)}$ be subtokens sampled with a sliding window of size $w \\in [N]$, and let $Z_i^{(t, t+w)}:=\\{z_i^{(s)}\\}^{t+w}_{s=t}$ be the subtokens mixed with self-attention. Given the Markov order $N$, the \\texttt{NSP} layer is defined as:\n\n\\begin{align}\nNSP(Z_i^{(t, t+w)}) := \\sum_{s=t}^{t+w}\\frac{z_i^{(s)}}{N+1}.\n\\end{align}\n\nTo use it, set the following variable:\n\n- MODEL_BACKBONE: \"Transformer\"\n\n\n\n## Supported Loss Functions for SDRE\nSPRT-TANDEM uses both the loss for sequential likelihood ratio estimation (SDRE) and (multiplet-) cross-entropy loss ([ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL)). The two functions, LSEL and LLLR, are supported loss function for SDRE. To choose the loss function, set the following variables:\n\n- LLLR_VERSION: \"LSEL\" or \"LLLR\"\n\nAdditionally, modify the values of PARAM_LLR_LOSS and PARAM_MULTIPLET_LOSS to achieve the desired balance between likelihood estimation and cross-entropy loss.\n### Log-sum exponential loss (LSEL, [ICML2021](http://proceedings.mlr.press/v139/miyagawa21a.html))  \n\n\\begin{align}\n\\hat{L}_{\\mathrm{\\text{LSEL}}} (\\mathbb{\\theta}; S) := \\mathbb{E} \\left[ \\log \\left(   1 + \\sum\\_{l(\\neq k)} e^{ -\\hat{\\lambda}\\_{k,l}   ( X_i^{(1,t)}; \\theta)  }\\right)  \\right]\n\\end{align}\n\n<!-- 1 + \\sum_{l(\\neq k)} e^{ - \\hat{\\lambda}_{k l} ( X_i^{(1,t)}; \\theta) } -->\n\n### Loss for log-likelihood ratio estimation (LLLR, [ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL))  \n\n\\begin{align}\n\\hat{L}_{\\mathrm{\\text{LLLR}}} (\\mathbb{\\theta}; S) := \\mathbb{E} \\left[ \\left| y - \\sigma\\left( \\hat{\\lambda}\\_{k,l}   ( X_i^{(1,t)}; \\theta) \\right) \\right| \\right]\n\\end{align}\n\n## Order N of Markov assumption\nThe Markov order $N$ is used to determine the length of the sliding window that extracts a subset from the entire feature vector of a time series. $N$ is a convenient hyperparameter that incorporates prior knowledge of the time series. An optimal $N$ can be found either based on the \\textit{specific time scale} or through hyperparameter tuning. The specific time scale characterizes the data class, e.g., long temporal action such as UCF101 has a long specific time scale, while a spoofing attack such as SiW has a short specific time scale (because one frame can have sufficient information of the attack). Setting $N$ equal to the specific time scale usually works best. Alternatively, $N$ can be objectively chosen using a hyperparameter tuning algorithm such as Optuna, just like other hyperparameters. Because $N$ is only related to the temporal integrator after feature extraction, optimizing it is not computationally expensive.\n\n\nThe log-likelihood ratio is estimated from a subset of the feature vectors extracted using a sliding window of size $N$. This estimation is classification-based. Specifically, the temporal integrator is trained to output class logits, which are then used to update the log-likelihood ratio at each time step based on the TANDEM formula.\n### TANDEM formula ([ICLR2021](https://openreview.net/forum?id=Rhsu5qD36cL))\n\\begin{align}\n&\\ \\log \\left(\n\\frac{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=1)}{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=0)}\n\\right)\\nonumber \\newline\n= &\\sum_{s=N+1}^{t} \\log \\left(\n\\frac{\np(y=1| x^{(s-N)}, ...,x^{(s)})\n}{\np(y=0| x^{(s-N)}, ...,x^{(s)})\n}\n\\right) - \\sum_{s=N+2}^{t} \\log \\left(\n\\frac{\np(y=1| x^{(s-N)}, ...,x^{(s-1)})\n}{\np(y=0| x^{(s-N)}, ...,x^{(s-1)})\n}\n\\right) \\nonumber \\newline\n& - \\log\\left( \\frac{p(y=1)}{p(y=0)} \\right)\n\\end{align}\n\n## Experiment Phases\nEXP_PHASE must be set as one of the followings:\n- try: All the hyperparameters are fixed as defined in [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Use it for debugging purposes.\n- tuning: Enter hyperparameter tuning mode. Hyperparameters with corresponding search spaces will be overwritten with suggested parameters. See the Hyperparameter Tuning section for more details.\n- stat: All the hyperparameters are fixed as defined in [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py). Repeat training for the specified number of times with NUM_TRIALS to test reproducibility (e.g., plot error bars, run a statistical test). \nThe subproject name will be suffixed with the EXP_PHASE to prevent contamination of results from different phases.\n\n## Hyperparameter Tuning\nOur project supports Optuna [3] for hyperparameter tuning. To begin, edit the following variables in the [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py):\n- EXP_PHASE: set as \"tuning\" to enter hyperparameter tuning mode.\n- NUM_TRIALS: set an integer that specifies the number of hyperparameter sets to experiment with.\n- PRUNER_NAME (optional): select a pruner supported by Optuna, or set it to \"None.\"    \nAlso, set PRUNER_STARTUP_TRIALS, PRUNER_WARMUP_STEPS, and PRUNER_INTERVAL STEPS. For details, see the [official Optuna docs](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.MedianPruner.html#optuna.pruners.MedianPruner).  \n\nNext, customize the hyperparameter space defined with variables that have prefix \"SPACE_\". For example, [config_definition.py](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/config/config_definition.py) contains an entry like this:\n\n```\n    \"SPACE_ORDER_SPRT\": {\n        \"PARAM_SPACE\": \"int\",\n        \"LOW\": 0,\n        \"HIGH\": 5,  # 10\n        \"STEP\": 1,\n        \"LOG\": False,\n    }\n```\nThe above entry specifies the search space of a hyperparameter \"ORDER_SPRT.\" The key \"PARAM_SPACE\" must be one of the followings:  \n - float: use suggest_float to suggest a float of range [LOW, HIGH], separated by STEP. If LOG=True, a float is sampled from logspace. However, if LOG=True, set STEP=None.\n - int: use suggest_int to suggest an integer of range [LOW, HIGH], separated by STEP. STEP should be divisor of the range; otherwise, HIGH will be automatically modified. If LOG=True, an int is sampled from logspace. However, if LOG=True, set STEP=None.\n - categorical: use suggest_categorical to select one category from CATEGORY_SET. Note that if the parameter is continuous (e.g., 1, 2, 3, ..., or 1.0, 0.1, 0.001, ...), it is advisable to use float or int space because suggest_categorical treats each category independently.\n\nFor more informatin, please refer to the [official Optuna docs](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html).   \n\nTo select specific values for a hyperparameter, use entries that start with \"SPACE_\". These values will be assigned to the hyperparameter whose name is defined after \"SPACE_\" (for example, in the above example, \"ORDER_SPRT\").\n\n## Command-line Arguments  \nFrequently-used variables can be overwritten by specifying command-line arguments. \n```\noptions:\n  -h, --help            show this help message and exit\n  -g GPU, --gpu         set GPU, gpu number\n  -t NUM_TRIALS, --num_trials \n                        set NUM_TRIALS, number of trials\n  -i NUM_ITER, --num_iter \n                        set NUM_ITER, number of iterations\n  -e EXP_PHASE, --exp_phase EXP_PHASE\n                        phase of an experiment, \"try,\" \"tuning,\" or \"stat\"\n  -m MODEL, --model MODEL\n                        set model backbone, \"LSTM\", or \"Transformer\"\n  -o OPTIMIZE, --optimize OPTIMIZE\n                        set optimization target: \"MABS\", \"MacRec\", \"ausat_confmx\", or \"ALL\"\n  -n NAME, --name NAME  set the subproject name\n  --flip_memory_loading\n                        set a boolean flag indicating whether to load onto memory\n\n```\n## Logging\nUnder the [logs](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/blob/main/logs) folder, you will see a subfolder like this:\n```\n{SUBPROJECT_SUFFIX}_offset{DATA_SEPARATION}_optim{OPTIMIZATION_TARGET}_{EXP_PHASE}\n```\ninside of which the following four folders will be created.\n- Optuna_databases: Optuna .db file is stored here.\n- TensorBoard_events: TensorBard event files are saved here.\n- checkpoints: trained parameters are saved as .py files when the best optimation target value is updated.\n- stdout_logs: standard output strings are saved as .log files.\n\nThe plot below shows an example image saved in a TensorBoard event file. Note that you can avoid saving figures by setting IS_SAVE_FIGURE=False.\n\n\nNote that \"Class $a$ vs. $b$ at $y=a$\" indicates that the plotted LLR shows $\\log{p(X|y=a) / p(X|y=b)}$, when the ground truth label is $y=a$. \n\n## Citation\n___Please cite the orignal paper(s) if you use the whole or a part of our codes.___\n\n```\n# ICASSP2023\n@inproceedings{saturation_problem,\n  title =     {Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for Early Classification},\n  author =    {Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},\n  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},\n  year =      {2023},\n}\n\n# ICML2021\n@inproceedings{MSPRT-TANDEM,\n  title = \t  {The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization},\n  author =    {Miyagawa, Taiki and Ebihara, Akinori F},\n  booktitle = {Proceedings of the 38th International Conference on Machine Learning},\n  pages = \t  {7792--7804},\n  year = \t  {2021},\n  url = \t  {http://proceedings.mlr.press/v139/miyagawa21a.html}\n}\n\n# ICLR2021\n@inproceedings{SPRT-TANDEM,\n  title={Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy},\n  author={Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},\n  booktitle={International Conference on Learning Representations},\n  year={2021},\n  url={https://openreview.net/forum?id=Rhsu5qD36cL}\n}\n```\n\n## References\n1. S. Hochreiter and J. Schmidhuber, \u201cLong short-term memory,\u201d Neural Comput., vol. 9, no. 8, pp. 1735\u20131780, 1997.\n2. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, \u201cOptuna: A next-generation hyperparameter optimization framework,\u201d in KDD, 2019, p. 2623\u20132631.\n3. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, \u201cAttention is all you need,\u201d in NeurIPS, 2017, vol. 30, pp. 5998\u20136008.\n\n## Contacts\nSPRT-TANDEM marks its 4th anniversary. What started as a small project has now become a huge undertaking that we never imagined. Due to its complexity, it is difficult for me to explain all the details in this README section. Please feel free to reach out to me anytime if you have any questions.\n- email: aebihara[at]nec.com\n- twitter: [@non_iid](http/twitter.com/non_iid)\n- GitHub issues: see the link above or click [here](https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch/issues)\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SPRT-TANDEM for sequential density ratio estimation to simultaneously optimize both speed and accuracy of early-classification.",
    "version": "0.1.11",
    "project_urls": {
        "Homepage": "https://github.com/Akinori-F-Ebihara/SPRT-TANDEM-PyTorch"
    },
    "split_keywords": [
        "sequential probability ratio test",
        "likelihood ratio",
        "density ratio estimation",
        "early classification",
        "artificial intelligence",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "07f67862e755b489406078e1573b7111ef9577f2069da3696c44d7c572c7d975",
                "md5": "da03c946a7c2b98fe695e211da831969",
                "sha256": "547acbb90281d499638995b57b25cb701ded631f56d8e6129a932d0634bf3d76"
            },
            "downloads": -1,
            "filename": "SPRT_TANDEM-0.1.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da03c946a7c2b98fe695e211da831969",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8574695,
            "upload_time": "2023-06-01T03:15:06",
            "upload_time_iso_8601": "2023-06-01T03:15:06.136362Z",
            "url": "https://files.pythonhosted.org/packages/07/f6/7862e755b489406078e1573b7111ef9577f2069da3696c44d7c572c7d975/SPRT_TANDEM-0.1.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "543bc399732740160d0826562d03ac204ee80b6139bbca81efaada3e24fac607",
                "md5": "badbc6b5a422547c32d8b9d594c5a1bf",
                "sha256": "47ce3f2ca7b48195fb7e59d8d0f6106cd22273c648fda009e685c22354cbdae1"
            },
            "downloads": -1,
            "filename": "SPRT-TANDEM-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "badbc6b5a422547c32d8b9d594c5a1bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8535815,
            "upload_time": "2023-06-01T03:15:15",
            "upload_time_iso_8601": "2023-06-01T03:15:15.473805Z",
            "url": "https://files.pythonhosted.org/packages/54/3b/c399732740160d0826562d03ac204ee80b6139bbca81efaada3e24fac607/SPRT-TANDEM-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-01 03:15:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Akinori-F-Ebihara",
    "github_project": "SPRT-TANDEM-PyTorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sprt-tandem"
}

Akinori F. Ebihara