# Astrotime
#### Machine learning methods for irregularly spaced time series
## Quick Start
For a quick start, workflows and container usage have been documented in this section. For additional
details, please read the rest of the sections of this README. As a summary, each workflow
(Sinusoid, Synthetic, and MIT) have a training, eval, and peakfinder script.
For the MIT dataset, the train step is replaced with finetune, because in this case the training is
intended to start with weights from the synthetic training. The peakfinder scripts run a simple (non-ML)
workflow that computes the frequency of the highest peak in the spectrum, and returns the corresponding
period, which is used for comparison and evaluation of the ML workflow.
### Sinusoid Dataset Workflow
An example run training the deep learning model:
```bash
PYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/train.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16
```
Note that the following are the options allowed to run this workflow. If you need to change the path to the data or any other settings,
feel free to modify the settings coming from the CLI.
```bash
Singularity> python /explore/nobackup/people/jacaraba/development/astrotime/workflow/full/sinusoid/train.py -h
train is powered by Hydra.
== Configuration groups ==
Compose your configuration from those groups (group=option)
__legacy__: MIT_period, MIT_period.ce, MIT_period.octaves, MIT_period.octaves.pcross, MIT_period.synthetic, MIT_period.synthetic.folded, MIT_period.wp, baseline_cnn, desktop_period.analysis, desktop_period.octaves, progressive_MIT_period, sinusoid_period.baseline, sinusoid_period.baseline_small, sinusoid_period.poly, sinusoid_period.wp, sinusoid_period.wp_scaled, sinusoid_period.wp_small, sinusoid_period.wpk, sinusoid_period.wwz, sinusoid_period.wwz_small, synthetic_period_autocorr, synthetic_period_transformer, synthetic_period_transformer.classification, synthetic_period_transformer.regression, synthetic_transformer
__legacy__/data: MIT, MIT-1, MIT.csv, MIT.octaves, MIT.synthetic, MIT.synthetic.folded, astro_synthetic, astro_synthetic_autocorr, pcross.octaves, planet_crossing_generator, sinusoids.nc, sinusoids.npz, sinusoids_small.nc
__legacy__/model: relation_aware_transformer, transformer, transformer.classication, transformer.regression, wpk_cnn
__legacy__/transform: MIT.octaves, MIT.synthetic, MIT.synthetic.folded, ce-MIT, correlation, gp, value, wp, wp-MIT, wp-scaled, wpk, wwz
data: MIT, sinusoids, synthetic, synthetic.octave
model: cnn, cnn.classification, cnn.octave_regression, dense
platform: desktop1, explore
train: MIT_cnn, sinusoid_cnn, synthetic_cnn
transform: MIT, sinusoid, synthetic, synthetic.octave
== Config ==
Override anything in the config (foo.bar=value)
platform:
project_root: /explore/nobackup/projects/ilab/data/astrotime
gpu: 0
log_level: info
train:
optim: rms
lr: 0.001
nepochs: 5000
refresh_state: false
overwrite_log: true
results_path: ${platform.project_root}/results
weight_decay: 0.0
mode: train
base_freq: ${data.base_freq}
transform:
sparsity: 0.0
batch_size: ${data.batch_size}
nfreq_oct: ${data.nfreq_oct}
base_freq: ${data.base_freq}
noctaves: ${data.noctaves}
test_mode: ${data.test_mode}
maxh: ${data.maxh}
accumh: false
decay_factor: 0.0
subbatch_size: 4
norm: std
fold_octaves: false
data:
source: sinusoid
dataset_root: ${platform.project_root}/sinusoids/nc
dataset_files: padded_sinusoids_*.nc
cache_path: ${platform.project_root}/cache/data/synthetic
dset_reduction: 1.0
batch_size: 16
nfreq_oct: 512
base_freq: 0.025
noctaves: 9
test_mode: default
file_size: 1000
nfiles: 1000
refresh: false
maxh: 8
model:
mtype: cnn.regression
cnn_channels: 64
dense_channels: 64
out_channels: 1
num_cnn_layers: 3
num_blocks: 8
pool_size: 2
stride: 1
kernel_size: 3
cnn_expansion_factor: 4
base_freq: ${data.base_freq}
feature: 1
Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help
```
Then followed by the peakfinder method:
```bash
PYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/peakfinder.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc
```
Finally, performing evaluation of these methods:
```bash
PYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/eval.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16
```
### Synthetic Dataset Workflow
```bash
PYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/synthetic/train.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16
```
### MIT Dataset Workflow
```bash
```
## Project Description
This project contains the implementation of a time-aware neural network (TAN) and workflows for testing its performance on the task of predicting periods of the timeseries datasets provided by Brian Powell.
Three datasets have been provided by Brian Powell for test and evalutaion:
* Synthetic Sinusoids (SS): A set of sinusoid timeseries with irregular time spacing.
* Synthetic Light Curves (SLC): A set of artifically generated timeseries imitating realistic lightcurves.
* MIT Lightcurves (MIT-LC): A set of actual lightcurves provided by MIT.
### Spectral Projection
* This project utilizes a spectral projection as the first stage of data processing. The spectral coefficients represent the projection of a signal onto a set of basis functions,
implemented as a weighted inner product between the signal and the basis functions (evaluated at the time points). There is a good summary of the equations implemented in this project
in the appendix of [Witt & Schumann (2005)](https://www.researchgate.net/publication/200033740_Holocene_climate_variability_on_millennial_scales_recorded_in_Greenland_ice_cores).
The spectral projection generates three features by computing weighted scalar products (equation A3) between the signal values and the sinusoid basis functions described by equation A5.
The magnitude of the projection is defined by equation A10. Futher mathematical detail can be found in [Foster (1996)](https://articles.adsabs.harvard.edu/pdf/1996AJ....112.1709F).
* The frequency (f) space is scaled such that the density of f valuse is constant across octaves.
The f values are given by f[j] = f0 * pow( 2, j/N ), with j ranging over [0,N*M], where N is the number of f values per octave,
M is the number of octaves in the f range, and f0 is the lowest value in the f range.
### Learning Model
* This project utilizes a convolutional neural network (CNN) with 24 layers. For each of the datasets, the input to the network is the spectral projection of each light curve (LC)
and the output is the frequency of a periodic component of the LC, trained using the target frequency provided in the dataset for each LC.
* The output layer of the network is dense, with an exponential activation function defined by the equation y = f0 * (pow(2, x) - 1), where f0 is the lowest value in the f range.
In order to account for the very large dynamic range of the target frequency spectrum, a custom loss function is used, defined by the equation
loss = abs( log2( (yn + f0) / (yt + f0) ) ), where yn is the network output and yt is the target frequency.
## Conda environment
* On Adapt load modules: gcc/12.1.0, nvidia/12.1
* If mamba is not available, install [miniforge](https://github.com/conda-forge/miniforge) (or load mamba module)
* Execute the following to set up a conda environment for astrotime:
### Torch Environment:
> * mamba create -n astrotime.pt ninja python=3.10
> * mamba activate astrotime
> * pip install torch jupyterlab==4.0.13 ipywidgets==7.8.4 cuda-python jupyterlab_widgets ipykernel==6.29 ipympl ipython==8.26 xarray netCDF4 pygam wotan statsmodels transitleastsquares scikit-learn hydra-core rich
> * pip install diffusers lightkurve --upgrade
## Dataset Preparation
* This project utilizes three datasets (sinusoid, synthetic, and MIT) which are located in the **cfg.platform.project_root** directory. The project_root directory on explore is: **/explore/nobackup/projects/ilab/data/astrotime**.
* The raw sinusoid data can be found on explore at <project_root>/sinusoids/npz. The script **.workflow/util/npz2nc.py** has been used to convert the .npz files to netcdf files in the <project_root>/sinusoids/nc directory.
* The raw synthetic light curves are stored on explore at **/explore/nobackup/people/bppowel1/timehascome/**. The script **.workflow/util/npz2nc.py** has been used to convert the .npz files to netcdf files in the <project_root>/synthetic directory.
* The MIT light curves are stored in their original form at: **/explore/nobackup/people/bppowel1/mit_lcs/**. Methods in the class **astrotime.loaders.MIT.MITLoader** have been used to convert the lc txt files to netcdf files in the <project_root>/MIT directory.
## Workflows
For each of the datasets (sinusoid, synthetic, and MIT), three ML workflows are provided:
* _train_ (**.workflow/train-baseline-cnn.py**): Runs the TAN training workflow.
* _eval_ (**.workflow/wavelet-synthesis-cnn.py**): Runs the TAN validation/test workflow.
* _peakfinder_ (**.workflow/wavelet-analysis-cnn.py**): Runs the peakfinder validation/test workflow.
The workflows save checkpoint files at the end of each epoch. By default the model is initialized with any existing checkpoint file at the begining of script execution.
A workflow's checkpoints are named after it's *version* parameter.
To execute the script with a new set of checkpoints (while keeping the old ones), create a new script with a different value of the *version* parameter
(and a new defaults hydra yaml file with the same name in the config dir). The second (ckp_version) argument to the _train_ method of the Trainer class is used for fine
tuning. If this argument is specified, then the training workflow will be initialized with the checkpoint from that version, and all new checkpoint saves will be
to the primary version of the workflow.
## Configuration
The workflows are configured using [hydra](https://hydra.cc/docs/intro/).
* All hydra yaml configuration files are found under **.config**.
* The workflow configurations can be modified at runtime as [supported by hydra](https://hydra.cc/docs/tutorials/basic/your_first_app/simple_cli/).
* For example, the following command runs the synthetic dataset training workflow on gpu 3 with random initialization (i.e. ignoring & overwriting any existing checkpoints):
> python workflow/synthetic/train.py platform.gpu=3 train.refresh_state=True
* To run validation (no training), execute:
> python workflow/synthetic/train.py train.mode=valid platform.gpu=0
### Configuration Parameters
Here is a partial list of configuration parameters with typical default values. Their values are configured in the hydra yaml files and reconfigurable on the command line:
platform.project_root: "/explore/nobackup/projects/ilab/data/astrotime" # Base directory for all saved files
platform.gpu: 0 # Index of gpu to execcute on
platform.log_level: "info" # Log level: typically debug or info
data.source: sinusoid # Dataset type (currently only sinusoid is supported)
data.dataset_root: "${platform.project_root}/sinusoids/nc" # Location of processed netcdf files
data.dataset_files: "padded_sinusoids_*.nc" # Glob pattern for file names
data.file_size: 1000 # Number of sinusoids in a single nc file
data.batch_size: 50 # Batch size for training
data.validation_fraction: 0.1 # Fraction of training dataset that is used for validation
data.dset_reduction: 1.0 # Fraction of the full dataset that is used for training/validation
transform.nfeatures: 1 # Number of feaatures to be passed to network
transform.sparsity: 0.0 # Fraction of observations to drop (randomly)
model.cnn_channels: 64 # Number of channels in first CNN layer
model.dense_channels: 64 # Number of channels in dense layer
model.out_channels: 1 # Number of network output channels
model.num_cnn_layers: 3 # Number of CNN layers in a CNN block
model.num_blocks: 7 # Number of CNN blocks in the network
model.pool_size: 2 # Max pool size for every block
model.stride: 1 # Stride value for every CNN layer
model.kernel_size: 3 # Kernel size for every CNN layer
model.cnn_expansion_factor: 4 # Increase in the number of channels from one CNN layer to the next
train.optim: rms # Optimizer
train.lr: 1e-3 # Learning rate
train.nepochs: 5000 # Training Epochs
train.refresh_state: False # Start from random weights (Ignore & overwrite existing checkpoints)
train.overwrite_log: True # Start new log file
train.results_path: "${platform.project_root}/results" # Checkpoint and log files are saved under this directory
train.weight_decay: 0.0 # Weight decay parameter for optimizer
train.mode: train # execution mode: 'train' or 'valid'
## Working from the container
In addition to the anaconda environment, the software can be run from
a container. This project provides a Docker container that can be converted
to Singularity or any container engine based on the user needs. The
instructions below are geared towards the use of Singularity since that is
the default available in the NCCS super computing facility.
### Container Download
To create a sandbox out of the container:
```bash
singularity build --sandbox /lscratch/$USER/container/astrotime docker://nasanccs/astrotime:latest
```
*note - /lscratch is only available on gpu### nodes
An already downloaded version of this sandbox is available under:
```bash
/explore/nobackup/projects/ilab/containers/astrotime-latest
```
### Working from the container with a shell session
To get a shell session inside the container:
```bash
singularity shell -B $NOBACKUP,/explore/nobackup/projects,/explore/nobackup/people --nv /explore/nobackup/projects/ilab/containers/astrotime-latest
```
### An example run training
An example run training:
```bash
python /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc
```
Expected training output files:
```bash
/explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.pt
/explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.backup.pt
```
An example run validation:
```bash
python /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.mode=valid
```
Expected validation output:
```bash
Loading checkpoint from /explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.pt: epoch=122, batch=0
SignalTrainer[TSet.Validation]: 2000 batches, 1 epochs, nelements = 100000, device=cuda:0
Validation Loss: mean=0.021, median=0.021, range=(0.012 -> 0.043)
98.04user 8.85system 2:00.79elapsed 88%CPU (0avgtext+0avgdata 1080416maxresident)k
2059752inputs+1120outputs (1677major+582379minor)pagefaults 0swaps
```
### Sending a slurm job using the container (training example):
From gpulogin1:
```bash
sbatch --mem-per-cpu=10240 -G1 -c10 -t01:00:00 -J astrotime --wrap="time singularity exec -B $NOBACKUP,/explore/nobackup/projects,/explore/nobackup/people --nv /explore/nobackup/projects/ilab/containers/astrotime-latest python /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc"
```
## References
- Foster, G. Wavelets for period analysis of unevenly sampled time series. The Astronomical Journal 112, 1709 (1996).
- Witt, A. & Schumann, A. Y. Holocene climate variability on millennial scales recorded in Greenland ice cores. Nonlinear Processes in Geophysics 12, 345–352 (2005).
Raw data
{
"_id": null,
"home_page": "https://github.com/nasa-nccs-hpda/astrotime",
"name": "astrotime",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "astrotime, deep-learning, machine-learning",
"author": "jordancaraballo",
"author_email": "jordan.a.caraballo-vega@nasa.gov",
"download_url": "https://files.pythonhosted.org/packages/46/3f/c14dc19fa18ad064f845a099fb2d72079b1810b80fdfc0cd3b396144d8da/astrotime-0.0.1.tar.gz",
"platform": null,
"description": "# Astrotime\n\n#### Machine learning methods for irregularly spaced time series\n\n## Quick Start\n\nFor a quick start, workflows and container usage have been documented in this section. For additional\ndetails, please read the rest of the sections of this README. As a summary, each workflow \n(Sinusoid, Synthetic, and MIT) have a training, eval, and peakfinder script. \n\nFor the MIT dataset, the train step is replaced with finetune, because in this case the training is\nintended to start with weights from the synthetic training. The peakfinder scripts run a simple (non-ML)\nworkflow that computes the frequency of the highest peak in the spectrum, and returns the corresponding \nperiod, which is used for comparison and evaluation of the ML workflow.\n\n### Sinusoid Dataset Workflow\n\nAn example run training the deep learning model:\n\n```bash\nPYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/train.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16\n```\n\nNote that the following are the options allowed to run this workflow. If you need to change the path to the data or any other settings,\nfeel free to modify the settings coming from the CLI.\n\n```bash\nSingularity> python /explore/nobackup/people/jacaraba/development/astrotime/workflow/full/sinusoid/train.py -h\ntrain is powered by Hydra.\n\n== Configuration groups ==\nCompose your configuration from those groups (group=option)\n\n__legacy__: MIT_period, MIT_period.ce, MIT_period.octaves, MIT_period.octaves.pcross, MIT_period.synthetic, MIT_period.synthetic.folded, MIT_period.wp, baseline_cnn, desktop_period.analysis, desktop_period.octaves, progressive_MIT_period, sinusoid_period.baseline, sinusoid_period.baseline_small, sinusoid_period.poly, sinusoid_period.wp, sinusoid_period.wp_scaled, sinusoid_period.wp_small, sinusoid_period.wpk, sinusoid_period.wwz, sinusoid_period.wwz_small, synthetic_period_autocorr, synthetic_period_transformer, synthetic_period_transformer.classification, synthetic_period_transformer.regression, synthetic_transformer\n__legacy__/data: MIT, MIT-1, MIT.csv, MIT.octaves, MIT.synthetic, MIT.synthetic.folded, astro_synthetic, astro_synthetic_autocorr, pcross.octaves, planet_crossing_generator, sinusoids.nc, sinusoids.npz, sinusoids_small.nc\n__legacy__/model: relation_aware_transformer, transformer, transformer.classication, transformer.regression, wpk_cnn\n__legacy__/transform: MIT.octaves, MIT.synthetic, MIT.synthetic.folded, ce-MIT, correlation, gp, value, wp, wp-MIT, wp-scaled, wpk, wwz\ndata: MIT, sinusoids, synthetic, synthetic.octave\nmodel: cnn, cnn.classification, cnn.octave_regression, dense\nplatform: desktop1, explore\ntrain: MIT_cnn, sinusoid_cnn, synthetic_cnn\ntransform: MIT, sinusoid, synthetic, synthetic.octave\n\n\n== Config ==\nOverride anything in the config (foo.bar=value)\n\nplatform:\n project_root: /explore/nobackup/projects/ilab/data/astrotime\n gpu: 0\n log_level: info\ntrain:\n optim: rms\n lr: 0.001\n nepochs: 5000\n refresh_state: false\n overwrite_log: true\n results_path: ${platform.project_root}/results\n weight_decay: 0.0\n mode: train\n base_freq: ${data.base_freq}\ntransform:\n sparsity: 0.0\n batch_size: ${data.batch_size}\n nfreq_oct: ${data.nfreq_oct}\n base_freq: ${data.base_freq}\n noctaves: ${data.noctaves}\n test_mode: ${data.test_mode}\n maxh: ${data.maxh}\n accumh: false\n decay_factor: 0.0\n subbatch_size: 4\n norm: std\n fold_octaves: false\ndata:\n source: sinusoid\n dataset_root: ${platform.project_root}/sinusoids/nc\n dataset_files: padded_sinusoids_*.nc\n cache_path: ${platform.project_root}/cache/data/synthetic\n dset_reduction: 1.0\n batch_size: 16\n nfreq_oct: 512\n base_freq: 0.025\n noctaves: 9\n test_mode: default\n file_size: 1000\n nfiles: 1000\n refresh: false\n maxh: 8\nmodel:\n mtype: cnn.regression\n cnn_channels: 64\n dense_channels: 64\n out_channels: 1\n num_cnn_layers: 3\n num_blocks: 8\n pool_size: 2\n stride: 1\n kernel_size: 3\n cnn_expansion_factor: 4\n base_freq: ${data.base_freq}\n feature: 1\n\n\nPowered by Hydra (https://hydra.cc)\nUse --hydra-help to view Hydra specific help\n```\n\nThen followed by the peakfinder method:\n\n```bash\nPYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/peakfinder.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc\n```\n\nFinally, performing evaluation of these methods:\n\n```bash\nPYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/sinusoid/eval.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16\n```\n\n### Synthetic Dataset Workflow\n\n```bash\nPYTHONPATH=/explore/nobackup/people/jacaraba/development/astrotime python /explore/nobackup/people/jacaraba/development/astrotime/workflow/release/synthetic/train.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/jacaraba/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.nepochs=10 data.batch_size=16\n```\n\n### MIT Dataset Workflow\n\n```bash\n```\n\n## Project Description\n\nThis project contains the implementation of a time-aware neural network (TAN) and workflows for testing its performance on the task of predicting periods of the timeseries datasets provided by Brian Powell. \nThree datasets have been provided by Brian Powell for test and evalutaion:\n * Synthetic Sinusoids (SS): A set of sinusoid timeseries with irregular time spacing. \n * Synthetic Light Curves (SLC): A set of artifically generated timeseries imitating realistic lightcurves. \n * MIT Lightcurves (MIT-LC): A set of actual lightcurves provided by MIT.\n\n### Spectral Projection\n\n* This project utilizes a spectral projection as the first stage of data processing. The spectral coefficients represent the projection of a signal onto a set of basis functions, \n implemented as a weighted inner product between the signal and the basis functions (evaluated at the time points). There is a good summary of the equations implemented in this project \n in the appendix of [Witt & Schumann (2005)](https://www.researchgate.net/publication/200033740_Holocene_climate_variability_on_millennial_scales_recorded_in_Greenland_ice_cores). \n The spectral projection generates three features by computing weighted scalar products (equation A3) between the signal values and the sinusoid basis functions described by equation A5. \n The magnitude of the projection is defined by equation A10. Futher mathematical detail can be found in [Foster (1996)](https://articles.adsabs.harvard.edu/pdf/1996AJ....112.1709F).\n* The frequency (f) space is scaled such that the density of f valuse is constant across octaves. \n The f values are given by f[j] = f0 * pow( 2, j/N ), with j ranging over [0,N*M], where N is the number of f values per octave, \n M is the number of octaves in the f range, and f0 is the lowest value in the f range. \n\n### Learning Model\n* This project utilizes a convolutional neural network (CNN) with 24 layers. For each of the datasets, the input to the network is the spectral projection of each light curve (LC) \n and the output is the frequency of a periodic component of the LC, trained using the target frequency provided in the dataset for each LC. \n* The output layer of the network is dense, with an exponential activation function defined by the equation y = f0 * (pow(2, x) - 1), where f0 is the lowest value in the f range. \n In order to account for the very large dynamic range of the target frequency spectrum, a custom loss function is used, defined by the equation \n loss = abs( log2( (yn + f0) / (yt + f0) ) ), where yn is the network output and yt is the target frequency.\n\n## Conda environment\n\n* On Adapt load modules: gcc/12.1.0, nvidia/12.1\n* If mamba is not available, install [miniforge](https://github.com/conda-forge/miniforge) (or load mamba module)\n* Execute the following to set up a conda environment for astrotime:\n\n### Torch Environment:\n\n > * mamba create -n astrotime.pt ninja python=3.10\n > * mamba activate astrotime\n > * pip install torch jupyterlab==4.0.13 ipywidgets==7.8.4 cuda-python jupyterlab_widgets ipykernel==6.29 ipympl ipython==8.26 xarray netCDF4 pygam wotan statsmodels transitleastsquares scikit-learn hydra-core rich \n > * pip install diffusers lightkurve --upgrade\n\n## Dataset Preparation\n\n* This project utilizes three datasets (sinusoid, synthetic, and MIT) which are located in the **cfg.platform.project_root** directory. The project_root directory on explore is: **/explore/nobackup/projects/ilab/data/astrotime**.\n* The raw sinusoid data can be found on explore at <project_root>/sinusoids/npz. The script **.workflow/util/npz2nc.py** has been used to convert the .npz files to netcdf files in the <project_root>/sinusoids/nc directory.\n* The raw synthetic light curves are stored on explore at **/explore/nobackup/people/bppowel1/timehascome/**. The script **.workflow/util/npz2nc.py** has been used to convert the .npz files to netcdf files in the <project_root>/synthetic directory.\n* The MIT light curves are stored in their original form at: **/explore/nobackup/people/bppowel1/mit_lcs/**. Methods in the class **astrotime.loaders.MIT.MITLoader** have been used to convert the lc txt files to netcdf files in the <project_root>/MIT directory.\n\n## Workflows\nFor each of the datasets (sinusoid, synthetic, and MIT), three ML workflows are provided:\n\n* _train_ (**.workflow/train-baseline-cnn.py**): Runs the TAN training workflow.\n* _eval_ (**.workflow/wavelet-synthesis-cnn.py**): Runs the TAN validation/test workflow.\n* _peakfinder_ (**.workflow/wavelet-analysis-cnn.py**): Runs the peakfinder validation/test workflow.\n\nThe workflows save checkpoint files at the end of each epoch. By default the model is initialized with any existing checkpoint file at the begining of script execution. \nA workflow's checkpoints are named after it's *version* parameter.\nTo execute the script with a new set of checkpoints (while keeping the old ones), create a new script with a different value of the *version* parameter \n(and a new defaults hydra yaml file with the same name in the config dir). The second (ckp_version) argument to the _train_ method of the Trainer class is used for fine\ntuning. If this argument is specified, then the training workflow will be initialized with the checkpoint from that version, and all new checkpoint saves will be\nto the primary version of the workflow.\n\n## Configuration\n\nThe workflows are configured using [hydra](https://hydra.cc/docs/intro/).\n* All hydra yaml configuration files are found under **.config**.\n* The workflow configurations can be modified at runtime as [supported by hydra](https://hydra.cc/docs/tutorials/basic/your_first_app/simple_cli/).\n* For example, the following command runs the synthetic dataset training workflow on gpu 3 with random initialization (i.e. ignoring & overwriting any existing checkpoints):\n > python workflow/synthetic/train.py platform.gpu=3 train.refresh_state=True\n* To run validation (no training), execute:\n > python workflow/synthetic/train.py train.mode=valid platform.gpu=0\n\n### Configuration Parameters\n\nHere is a partial list of configuration parameters with typical default values. Their values are configured in the hydra yaml files and reconfigurable on the command line:\n\n platform.project_root: \"/explore/nobackup/projects/ilab/data/astrotime\" # Base directory for all saved files\n platform.gpu: 0 # Index of gpu to execcute on\n platform.log_level: \"info\" # Log level: typically debug or info\n data.source: sinusoid # Dataset type (currently only sinusoid is supported)\n data.dataset_root: \"${platform.project_root}/sinusoids/nc\" # Location of processed netcdf files\n data.dataset_files: \"padded_sinusoids_*.nc\" # Glob pattern for file names\n data.file_size: 1000 # Number of sinusoids in a single nc file\n data.batch_size: 50 # Batch size for training\n data.validation_fraction: 0.1 # Fraction of training dataset that is used for validation\n data.dset_reduction: 1.0 # Fraction of the full dataset that is used for training/validation\n transform.nfeatures: 1 # Number of feaatures to be passed to network\n transform.sparsity: 0.0 # Fraction of observations to drop (randomly)\n model.cnn_channels: 64 # Number of channels in first CNN layer\n model.dense_channels: 64 # Number of channels in dense layer\n model.out_channels: 1 # Number of network output channels\n model.num_cnn_layers: 3 # Number of CNN layers in a CNN block\n model.num_blocks: 7 # Number of CNN blocks in the network\n model.pool_size: 2 # Max pool size for every block\n model.stride: 1 # Stride value for every CNN layer\n model.kernel_size: 3 # Kernel size for every CNN layer\n model.cnn_expansion_factor: 4 # Increase in the number of channels from one CNN layer to the next\n train.optim: rms # Optimizer\n train.lr: 1e-3 # Learning rate\n train.nepochs: 5000 # Training Epochs\n train.refresh_state: False # Start from random weights (Ignore & overwrite existing checkpoints)\n train.overwrite_log: True # Start new log file\n train.results_path: \"${platform.project_root}/results\" # Checkpoint and log files are saved under this directory\n train.weight_decay: 0.0 # Weight decay parameter for optimizer\n train.mode: train # execution mode: 'train' or 'valid'\n\n## Working from the container\n\nIn addition to the anaconda environment, the software can be run from\na container. This project provides a Docker container that can be converted\nto Singularity or any container engine based on the user needs. The \ninstructions below are geared towards the use of Singularity since that is \nthe default available in the NCCS super computing facility.\n\n### Container Download\n\nTo create a sandbox out of the container:\n\n```bash\nsingularity build --sandbox /lscratch/$USER/container/astrotime docker://nasanccs/astrotime:latest\n```\n*note - /lscratch is only available on gpu### nodes\n\nAn already downloaded version of this sandbox is available under:\n\n```bash\n/explore/nobackup/projects/ilab/containers/astrotime-latest\n```\n\n### Working from the container with a shell session\n\nTo get a shell session inside the container:\n\n```bash\nsingularity shell -B $NOBACKUP,/explore/nobackup/projects,/explore/nobackup/people --nv /explore/nobackup/projects/ilab/containers/astrotime-latest\n```\n\n### An example run training\n\nAn example run training:\n\n```bash\npython /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc\n```\nExpected training output files:\n```bash\n/explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.pt\n/explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.backup.pt\n```\n\nAn example run validation:\n\n```bash\npython /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc train.mode=valid\n```\nExpected validation output:\n```bash\n Loading checkpoint from /explore/nobackup/projects/ilab/ilab_testing/astrotime/results/checkpoints/sinusoid_period.baseline.pt: epoch=122, batch=0\n\nSignalTrainer[TSet.Validation]: 2000 batches, 1 epochs, nelements = 100000, device=cuda:0\n Validation Loss: mean=0.021, median=0.021, range=(0.012 -> 0.043)\n98.04user 8.85system 2:00.79elapsed 88%CPU (0avgtext+0avgdata 1080416maxresident)k\n2059752inputs+1120outputs (1677major+582379minor)pagefaults 0swaps\n```\n\n### Sending a slurm job using the container (training example):\n\nFrom gpulogin1:\n\n```bash\nsbatch --mem-per-cpu=10240 -G1 -c10 -t01:00:00 -J astrotime --wrap=\"time singularity exec -B $NOBACKUP,/explore/nobackup/projects,/explore/nobackup/people --nv /explore/nobackup/projects/ilab/containers/astrotime-latest python /explore/nobackup/projects/ilab/ilab_testing/astrotime/workflow/baseline-cnn.py platform.project_root=/explore/nobackup/projects/ilab/ilab_testing/astrotime data.dataset_root=/explore/nobackup/projects/ilab/data/astrotime/sinusoids/nc\"\n```\n\n## References\n\n- Foster, G. Wavelets for period analysis of unevenly sampled time series. The Astronomical Journal 112, 1709 (1996).\n- Witt, A. & Schumann, A. Y. Holocene climate variability on millennial scales recorded in Greenland ice cores. Nonlinear Processes in Geophysics 12, 345\u2013352 (2005).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Methods for pytorch deep learning applications",
"version": "0.0.1",
"project_urls": {
"Documentation": "https://github.com/nasa-nccs-hpda/astrotime",
"Homepage": "https://github.com/nasa-nccs-hpda/astrotime",
"Issues": "https://github.com/nasa-nccs-hpda/astrotime/issues",
"Source": "https://github.com/nasa-nccs-hpda/astrotime"
},
"split_keywords": [
"astrotime",
" deep-learning",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9b03a8d446e6b22261e308a4f8bf59122b08d61a8c8073e669d0bffeebe06f78",
"md5": "0cc132bc7fa1e6d64d86e8e7ff459513",
"sha256": "1ff8292f25ed83141ff18918a0395255b08dfd66d2d60d508108abd0ec877446"
},
"downloads": -1,
"filename": "astrotime-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0cc132bc7fa1e6d64d86e8e7ff459513",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 139323,
"upload_time": "2025-07-28T18:33:09",
"upload_time_iso_8601": "2025-07-28T18:33:09.253203Z",
"url": "https://files.pythonhosted.org/packages/9b/03/a8d446e6b22261e308a4f8bf59122b08d61a8c8073e669d0bffeebe06f78/astrotime-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "463fc14dc19fa18ad064f845a099fb2d72079b1810b80fdfc0cd3b396144d8da",
"md5": "547f6064d2bf87fc8e2b63eb8ad086ae",
"sha256": "3382647fa414e1971011ae636496be971b72b6a2cb39b0a0900e5f6b001982e2"
},
"downloads": -1,
"filename": "astrotime-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "547f6064d2bf87fc8e2b63eb8ad086ae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 117191,
"upload_time": "2025-07-28T18:33:10",
"upload_time_iso_8601": "2025-07-28T18:33:10.759337Z",
"url": "https://files.pythonhosted.org/packages/46/3f/c14dc19fa18ad064f845a099fb2d72079b1810b80fdfc0cd3b396144d8da/astrotime-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-28 18:33:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nasa-nccs-hpda",
"github_project": "astrotime",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "astrotime"
}