time-series-test


Nametime-series-test JSON
Version 0.11.2 PyPI version JSON
download
home_page
SummaryA statistical test and plotting function for time-series data in general,
upload_time2023-05-26 14:28:14
maintainer
docs_urlNone
author
requires_python>=3.5
license
keywords signal processing statistics plotting data analysis crossvalidation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Time Series Test

*Statistical testing and plotting functions for time-series data in general, and data from cognitive-pupillometry and electroencephalography (EEG) experiments in particular. Based on linear mixed effects modeling (or regular multiple linear regression), crossvalidation, and cluster-based permutation testing.*

Sebastiaan Mathôt (@smathot) <br />
Copyright 2021 - 2023

[![Publish to PyPi](https://github.com/smathot/time_series_test/actions/workflows/publish-package.yaml/badge.svg)](https://github.com/smathot/time_series_test/actions/workflows/publish-package.yaml)
[![Tests](https://github.com/smathot/time_series_test/actions/workflows/run-unittests.yaml/badge.svg)](https://github.com/smathot/time_series_test/actions/workflows/run-unittests.yaml)


## Contents

- [Citation](#citation)
- [About](#about)
- [Dependencies](#dependencies)
- [Usage](#usage)
- [Function reference](#function-reference)
- [License](#license)


## Citation

Mathôt, S., & Vilotijević, A. (2022). Methods in cognitive pupillometry: design, preprocessing, and analysis. *Behavior Research Methods*. <https://doi.org/10.1101/2022.02.23.481628>


## About

This library provides two main functions for statistical testing of time-series data: `lmer_crossvalidation_test()` and `lmer_permutation_test()`. For a detailed description, see the manuscript above, but below a short introduction to both functions with their respective advantages and disadavantages.


### When to use crossvalidation?

In general terms, `lmer_crossvalidation_test()` implements a statistical test for a specific-yet-common question when analyzing time-series data:

> Do one or more independent variables affect a continuously recorded dependent variable (a 'time series') at any point in time?

When to use this test:

- For time series consisting of only a single component, that is, when each independent variable has only a single effect on the time series. An example of this is the effect of stimulus intensity on pupil size, when presenting light flashes of different intensities.
- When you do not know a priori which time points to test.

When *not* to use this test:

- For time series that contain multiple components, that is, when each independent variable affects the time series in multiple ways that change over time. An example of this is the effect of visual attention on lateralized EEG recordings, where different EEG components emerge at different points in time.
- When you know a priori which time points to test.

More specifically, `lmer_crossvalidation_test()` locates and statistically tests effects in time-series data. It does so by using crossvalidation to identify time points to test, and then using a linear mixed effects model to actually perform the statistical test. More specifically, the data is subdivided in a number of subsets (by default 4). It takes one of the subsets (the *test* set) out of the full dataset, and conducts a linear mixed effects model on each sample of the remaining data (the *training* set). The sample with the highest absolute z value in the training set is used as the sample-to-be-tested for the test set. This procedure is repeated for all subsets of the data, and for all fixed effects in the model. Finally, a single linear mixed effects model is conducted for each fixed effects on the samples that were thus identified.

This packages also provides a function (`plot()`) to visualize time-series data to visually annotate the results of `lmer_crossvalidation_test()`.


### When to use `lmer_permutation_test()`?

`lmer_permutation_test()` implements a fairly standard cluster-based permutation test, which differs from most other implementations in that it relies on linear mixed-effects modeling to calculate the test statistics. Therefore, this function tends to be extremely computationally intensive, but should also be more sensitive than cluster-based permutation tests that are based on average data. Its main advantage as compared to `lmer_crossvalidation_test()` is that it is also valid for data with multiple components, such as event-related potentials (ERPs).


### Can the tests also be based on regular multiple regression (instead of linear mixed effects modeling)?

Yes. If you pass `groups=None` to any of the functions, the analysis will be based on a regular multiple linear regression instead of linear mixed effects modeling.


## Installation

```
pip install time_series_test
```

## Dependencies

- [Python 3](https://www.python.org/)
- [datamatrix](https://pydatamatrix.eu/)
- [statsmodels](https://www.statsmodels.org/)
- [matplotlib](https://matplotlib.org/)


## Usage

We will use data from [Zhou, Lorist, and Mathôt (2021)](https://doi.org/10.1101/2021.11.23.469689). In brief, this is data from a visual-working-memory experiment in which participant memorized one or more colors (set size: 1, 2, 3 or 4) of two different types (color type: proto, nonproto) while pupil size was being recorded during a 3s retention interval.

This dataset contains the following columns:

- `pupil`, which is is our dependent measure. It is a baseline-corrected pupil time series of 300 samples, recorded at 100 Hz
- `subject_nr`, which we will use as a random effect
- `set_size`, which we will use as a fixed effect
- `color_type`, which we will use as a fixed effect

First, load the dataset:



```python
from datamatrix import io
dm = io.readpickle('data/zhou_et_al_2021.pkl')
```



The `plot()` function provides a convenient way to plot pupil size over time as a function of one or two factors, in this case set size and color type:



```python
import time_series_test as tst
from matplotlib import pyplot as plt

tst.plot(dm, dv='pupil', hue_factor='set_size', linestyle_factor='color_type')
plt.savefig('img/signal-plot-1.png')
```



![](https://github.com/smathot/time_series_test/raw/master/img/signal-plot-1.png)

From this plot, we can tell that there appear to be effects in the 1500 to 2000 ms interval. To test this, we could perform a linear mixed effects model on this interval, which corresponds to samples 150 to 200.

The model below uses mean pupil size during the 150 - 200 sample range as dependent measure, set size and color type as fixed effects, and a random by-subject intercept. In the more familiar notation of the R package `lme4`, this corresponds to `mean_pupil ~ set_size * color_type + (1 | subject_nr)`. (To use more complex random-effects structures, you can use the `re_formula` argument to `mixedlm()`.)



```python
from statsmodels.formula.api import mixedlm
from datamatrix import series as srs, NAN

dm.mean_pupil = srs.reduce(dm.pupil[:, 150:200])
dm_valid_data = dm.mean_pupil != NAN
model = mixedlm(formula='mean_pupil ~ set_size * color_type',
                data=dm_valid_data, groups='subject_nr').fit()
print(model.summary())
```

__Output:__
``` .text
                    Mixed Linear Model Regression Results
=============================================================================
Model:                    MixedLM       Dependent Variable:       mean_pupil 
No. Observations:         7300          Method:                   REML       
No. Groups:               30            Scale:                    38610.3390 
Min. group size:          235           Log-Likelihood:           -48952.3998
Max. group size:          248           Converged:                Yes        
Mean group size:          243.3                                              
-----------------------------------------------------------------------------
                              Coef.   Std.Err.   z    P>|z|  [0.025   0.975] 
-----------------------------------------------------------------------------
Intercept                    -144.024   17.438 -8.259 0.000 -178.202 -109.846
color_type[T.proto]           -24.133   11.299 -2.136 0.033  -46.278   -1.987
set_size                       49.979    2.906 17.200 0.000   44.284   55.675
set_size:color_type[T.proto]   10.176    4.120  2.470 0.014    2.101   18.251
subject_nr Var               7217.423    9.882                               
=============================================================================

```



The model summary shows that, assuming an alpha level of .05, there are significant main effects of color type (z = -2.136, p = .033), set size (z = 17.2, p < .001), and a significant color-type by set-size interaction (z = 2.47, p = .014). However, we have selectively analyzed a sample range that we knew, based on a visual inspection of the data, to show these effects. This means that our analysis is circular: we have looked at the data to decide where to look! The `find()` function improves this by splitting the data into training and tests sets, as described under [About](#about), thus breaking the circularity.



```python
results = tst.find(dm, 'pupil ~ set_size * color_type',
                   groups='subject_nr', winlen=5)
```



The return value of `find()` is a `dict`, where keys are effect labels and values are named tuples of the following:

- `model`: a model as returned by `mixedlm().fit()`
- `samples`: a `set` with the sample indices that were used
- `p`: the p-value from the model
- `z`: the z-value from the model

The `summarize()` function is a convenient way to get the results in a human-readable format.



```python
print(tst.summarize(results))
```

__Output:__
``` .text
Intercept was tested at samples {95} → z = -13.1098, p = 2.892e-39
color_type[T.proto] was tested at samples {160, 170, 175} → z = -2.0949, p = 0.03618
set_size was tested at samples {185, 210, 195, 255} → z = 16.2437, p = 2.475e-59
set_size:color_type[T.proto] was tested at samples {165, 175} → z = 2.5767, p = 0.009974
```



We can pass the `results` to `plot()` to visualize the results:



```python
plt.clf()
tst.plot(dm, dv='pupil', hue_factor='set_size', linestyle_factor='color_type',
         results=results)
plt.savefig('img/signal-plot-2.png')
```



![](https://github.com/smathot/time_series_test/raw/master/img/signal-plot-2.png)


## Function reference

## <span style="color:purple">time\_series\_test.lmer\_crossvalidation\_test</span>_(dm, formula, groups, re\_formula=None, winlen=1, split=4, split\_method='interleaved', samples\_fe=True, samples\_re=True, localizer\_re=False, fit\_method=None, suppress\_convergence\_warnings=False, fit\_kwargs=None, \*\*kwargs)_

Conducts a single linear mixed effects model to a time series, where the
to-be-tested samples are determined through crossvalidation.

This function uses `mixedlm()` from the `statsmodels` package. See the
statsmodels documentation for a more detailed explanation of the
parameters.

### Parameters

* **dm: DataMatrix**

  The dataset

* **formula: str**

  A formula that describes the dependent variable, which should be the
  name of a series column in `dm`, and the fixed effects, which should
  be regular (non-series) columns.

* **groups: str or None or list of str**

  The groups for the random effects, which should be regular (non-series)
  columns in `dm`. If `None` is specified, then all analyses are based
  on a regular multiple linear regression (instead of linear mixed 
  effects model).

* **re\_formula: str or None**

  A formula that describes the random effects, which should be regular
  (non-series) columns in `dm`.

* **winlen: int, optional**

  The number of samples that should be analyzed together, i.e. a 
  downsampling window to speed up the analysis.

* **split: int, optional**

  The number of splits that the analysis should be based on.

* **split\_method: str, optional**

  If 'interleaved', the data is split in a regular interleaved fashion,
  such that the first row goes to the first subset, the second row to the
  second subset, etc. If 'random', the data is split randomly in subsets.
  Interleaved splitting is deterministic (i.e. it results in the same
  outcome each time), but random splitting is not.

* **samples\_fe: bool, optional**

  Indicates whether sample indices are included as an additive factor
  to the fixed-effects formula. If all splits yielded the same sample
  index, this is ignored.

* **samples\_re: bool, optional**

  Indicates whether sample indices are included as an additive factor
  to the random-effects formula. If all splits yielded the same sample
  index, this is ignored.

* **localizer\_re: bool, optional**

  Indicates whether a random effects structure as specified using the
  `re_formula` keyword should also be used for the localizer models,
  or only for the final model.

* **fit\_kwargs: dict or None, optional**

  A `dict` that is passed as keyword arguments to `mixedlm.fit()`. For
  example, to specify the nm as the fitting method, specify
  `fit_kwargs={'fit': 'nm'}`.

* **fit\_method: str, list of str, or None, optional**

  Deprecated. Use `fit_kwargs` instead.

* **suppress\_convergence\_warnings: bool, optional**

  Installs a warning filter to suppress conververgence (and other)
  warnings.

* **\*\*kwargs: dict, optional**

  Optional keywords to be passed to `mixedlm()`.

### Returns

* **_dict_**

  A dict where keys are effect labels, and values are named tuples
  of `model`, `samples`, `p`, and `z`.

## <span style="color:purple">time\_series\_test.lmer\_permutation\_test</span>_(dm, formula, groups, re\_formula=None, winlen=1, suppress\_convergence\_warnings=False, fit\_kwargs={}, iterations=1000, cluster\_p\_threshold=0.05, test\_intercept=False, \*\*kwargs)_

Performs a cluster-based permutation test based on sample-by-sample
linear-mixed-effects analyses. The permutation test identifies clusters
based on p-value threshold and uses the absolute of the summed z-values of
the clusters as test statistic.

If no clusters reach the threshold, the test is skipped right away. By
default the Intercept is ignored for this criterion, because the intercept
usually has significant clusters that we're not interested in. However, you
can change this using the `test_intercept` keyword.

*Warning:* This is generally an extremely time-consuming analysis because
it requires thousands of lmers to be run.

See `lmer_crossvalidation()` for an explanation of the arguments.

### Parameters

* **dm: DataMatrix**

* **formula: str**

* **groups: str**

* **re\_formula: str or None, optional**

* **winlen: int, optional**

* **suppress\_convergence\_warnings: bool, optional**

* **fit\_kwargs: dict, optional**

* **iterations: int, optional**

  The number of permutations to run.

* **cluster\_p\_threshold: float or None, optional**

  The maximum p-value for a sample to be considered part of a cluster.

* **test\_intercept: bool, optional**

  Indicates whether the intercept should be included when considering if
  there are any clusters, as described above.

* **\*\*kwargs: dict, optional**

### Returns

* **_dict_**

  A dict with effects as keys and lists of clusters defined by
  (start, end, z-sum, hit proportion) tuples. The p-value is
  1 - hit proportion.

## <span style="color:purple">time\_series\_test.lmer\_series</span>_(dm, formula, winlen=1, fit\_kwargs={}, \*\*kwargs)_

Performs a sample-by-sample linear-mixed-effects analysis. See
`lmer_crossvalidation()` for an explanation of the arguments.

### Parameters

* **dm: DataMatrix**

* **formula: str**

* **winlen: int, optional**

* **fit\_kwargs: dict, optional**

* **\*\*kwargs: dict, optional**

### Returns

* **_DataMatrix_**

  A DataMatrix with one row per effect, including the intercept, and
  three series columns with the same depth as the dependent measure
  specified in the formula:

  - `est`: the slope
  - `p`: the p value
  - `z`: the z value
  - `se`: the standard error

## <span style="color:purple">time\_series\_test.plot</span>_(dm, dv, hue\_factor, results=None, linestyle\_factor=None, hues=None, linestyles=None, alpha\_level=0.05, annotate\_intercept=False, annotation\_hues=None, annotation\_linestyle=':', legend\_kwargs=None, annotation\_legend\_kwargs=None)_

Visualizes a time series, where the signal is plotted as a function of
sample number on the x-axis. One fixed effect is indicated by the hue
(color) of the lines. An optional second fixed effect is indicated by the
linestyle. If the `results` parameter is used, significant effects are
annotated in the figure.

### Parameters

* **dm: DataMatrix**

  The dataset

* **dv: str**

  The name of the dependent variable, which should be a series column
  in `dm`.

* **hue\_factor: str**

  The name of a regular (non-series) column in `dm` that specifies the
  hue (color) of the lines.

* **results: dict, optional**

  A `results` dict as returned by `lmer_crossvalidation()`.

* **linestyle\_factor: str, optional**

  The name of a regular (non-series) column in `dm` that specifies the
  linestyle of the lines for a two-factor plot.

* **hues: str, list, or None, optional**

  The name of a matplotlib colormap or a list of hues to be used as line
  colors for the hue factor.

* **linestyles: list or None, optional**

  A list of linestyles to be used for the second factor.

* **alpha\_level: float, optional**

  The alpha level (maximum p value) to be used for annotating effects
  in the plot.

* **annotate\_intercept: bool, optional**

  Specifies whether the intercept should also be annotated along with
  the fixed effects.

* **annotation\_hues: str, list, or None, optional**

  The name of a matplotlib colormap or a list of hues to be used for the
  annotations if `results` is provided.

* **annotation\_linestyle: str, optional**

  The linestyle for the annotations.

* **legend\_kwargs: None or dict, optional**

  Optional keywords to be passed to `plt.legend()` for the factor legend.

* **annotation\_legend\_kwargs: None or dict, optional**

  Optional keywords to be passed to `plt.legend()` for the annotation
  legend.

## <span style="color:purple">time\_series\_test.summarize</span>_(results, detailed=False)_

Generates a string with a human-readable summary of a results `dict` as
returned by `lmer_crossvalidation()`.

### Parameters

* **results: dict**

  A `results` dict as returned by `lmer_crossvalidation()`.

* **detailed: bool, optional**

  Indicates whether model details should be included in the summary.

### Returns

* **_str_**


## License

`time_series_test` is licensed under the [GNU General Public License
v3](http://www.gnu.org/licenses/gpl-3.0.en.html).


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "time-series-test",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "signal processing,statistics,plotting,data analysis,crossvalidation",
    "author": "",
    "author_email": "Sebastiaan Math\u00f4t <s.mathot@cogsci.nl>",
    "download_url": "https://files.pythonhosted.org/packages/c8/af/cc625e8ae2e149f6148c365c6dbf73098d84a29dd9489644da762d3ca7b9/time_series_test-0.11.2.tar.gz",
    "platform": null,
    "description": "# Time Series Test\n\n*Statistical testing and plotting functions for time-series data in general, and data from cognitive-pupillometry and electroencephalography (EEG) experiments in particular. Based on linear mixed effects modeling (or regular multiple linear regression), crossvalidation, and cluster-based permutation testing.*\n\nSebastiaan Math\u00f4t (@smathot) <br />\nCopyright 2021 - 2023\n\n[![Publish to PyPi](https://github.com/smathot/time_series_test/actions/workflows/publish-package.yaml/badge.svg)](https://github.com/smathot/time_series_test/actions/workflows/publish-package.yaml)\n[![Tests](https://github.com/smathot/time_series_test/actions/workflows/run-unittests.yaml/badge.svg)](https://github.com/smathot/time_series_test/actions/workflows/run-unittests.yaml)\n\n\n## Contents\n\n- [Citation](#citation)\n- [About](#about)\n- [Dependencies](#dependencies)\n- [Usage](#usage)\n- [Function reference](#function-reference)\n- [License](#license)\n\n\n## Citation\n\nMath\u00f4t, S., & Vilotijevi\u0107, A. (2022). Methods in cognitive pupillometry: design, preprocessing, and analysis. *Behavior Research Methods*. <https://doi.org/10.1101/2022.02.23.481628>\n\n\n## About\n\nThis library provides two main functions for statistical testing of time-series data: `lmer_crossvalidation_test()` and `lmer_permutation_test()`. For a detailed description, see the manuscript above, but below a short introduction to both functions with their respective advantages and disadavantages.\n\n\n### When to use crossvalidation?\n\nIn general terms, `lmer_crossvalidation_test()` implements a statistical test for a specific-yet-common question when analyzing time-series data:\n\n> Do one or more independent variables affect a continuously recorded dependent variable (a 'time series') at any point in time?\n\nWhen to use this test:\n\n- For time series consisting of only a single component, that is, when each independent variable has only a single effect on the time series. An example of this is the effect of stimulus intensity on pupil size, when presenting light flashes of different intensities.\n- When you do not know a priori which time points to test.\n\nWhen *not* to use this test:\n\n- For time series that contain multiple components, that is, when each independent variable affects the time series in multiple ways that change over time. An example of this is the effect of visual attention on lateralized EEG recordings, where different EEG components emerge at different points in time.\n- When you know a priori which time points to test.\n\nMore specifically, `lmer_crossvalidation_test()` locates and statistically tests effects in time-series data. It does so by using crossvalidation to identify time points to test, and then using a linear mixed effects model to actually perform the statistical test. More specifically, the data is subdivided in a number of subsets (by default 4). It takes one of the subsets (the *test* set) out of the full dataset, and conducts a linear mixed effects model on each sample of the remaining data (the *training* set). The sample with the highest absolute z value in the training set is used as the sample-to-be-tested for the test set. This procedure is repeated for all subsets of the data, and for all fixed effects in the model. Finally, a single linear mixed effects model is conducted for each fixed effects on the samples that were thus identified.\n\nThis packages also provides a function (`plot()`) to visualize time-series data to visually annotate the results of `lmer_crossvalidation_test()`.\n\n\n### When to use `lmer_permutation_test()`?\n\n`lmer_permutation_test()` implements a fairly standard cluster-based permutation test, which differs from most other implementations in that it relies on linear mixed-effects modeling to calculate the test statistics. Therefore, this function tends to be extremely computationally intensive, but should also be more sensitive than cluster-based permutation tests that are based on average data. Its main advantage as compared to `lmer_crossvalidation_test()` is that it is also valid for data with multiple components, such as event-related potentials (ERPs).\n\n\n### Can the tests also be based on regular multiple regression (instead of linear mixed effects modeling)?\n\nYes. If you pass `groups=None` to any of the functions, the analysis will be based on a regular multiple linear regression instead of linear mixed effects modeling.\n\n\n## Installation\n\n```\npip install time_series_test\n```\n\n## Dependencies\n\n- [Python 3](https://www.python.org/)\n- [datamatrix](https://pydatamatrix.eu/)\n- [statsmodels](https://www.statsmodels.org/)\n- [matplotlib](https://matplotlib.org/)\n\n\n## Usage\n\nWe will use data from [Zhou, Lorist, and Math\u00f4t (2021)](https://doi.org/10.1101/2021.11.23.469689). In brief, this is data from a visual-working-memory experiment in which participant memorized one or more colors (set size: 1, 2, 3 or 4) of two different types (color type: proto, nonproto) while pupil size was being recorded during a 3s retention interval.\n\nThis dataset contains the following columns:\n\n- `pupil`, which is is our dependent measure. It is a baseline-corrected pupil time series of 300 samples, recorded at 100 Hz\n- `subject_nr`, which we will use as a random effect\n- `set_size`, which we will use as a fixed effect\n- `color_type`, which we will use as a fixed effect\n\nFirst, load the dataset:\n\n\n\n```python\nfrom datamatrix import io\ndm = io.readpickle('data/zhou_et_al_2021.pkl')\n```\n\n\n\nThe `plot()` function provides a convenient way to plot pupil size over time as a function of one or two factors, in this case set size and color type:\n\n\n\n```python\nimport time_series_test as tst\nfrom matplotlib import pyplot as plt\n\ntst.plot(dm, dv='pupil', hue_factor='set_size', linestyle_factor='color_type')\nplt.savefig('img/signal-plot-1.png')\n```\n\n\n\n![](https://github.com/smathot/time_series_test/raw/master/img/signal-plot-1.png)\n\nFrom this plot, we can tell that there appear to be effects in the 1500 to 2000 ms interval. To test this, we could perform a linear mixed effects model on this interval, which corresponds to samples 150 to 200.\n\nThe model below uses mean pupil size during the 150 - 200 sample range as dependent measure, set size and color type as fixed effects, and a random by-subject intercept. In the more familiar notation of the R package `lme4`, this corresponds to `mean_pupil ~ set_size * color_type + (1 | subject_nr)`. (To use more complex random-effects structures, you can use the `re_formula` argument to `mixedlm()`.)\n\n\n\n```python\nfrom statsmodels.formula.api import mixedlm\nfrom datamatrix import series as srs, NAN\n\ndm.mean_pupil = srs.reduce(dm.pupil[:, 150:200])\ndm_valid_data = dm.mean_pupil != NAN\nmodel = mixedlm(formula='mean_pupil ~ set_size * color_type',\n                data=dm_valid_data, groups='subject_nr').fit()\nprint(model.summary())\n```\n\n__Output:__\n``` .text\n                    Mixed Linear Model Regression Results\n=============================================================================\nModel:                    MixedLM       Dependent Variable:       mean_pupil \nNo. Observations:         7300          Method:                   REML       \nNo. Groups:               30            Scale:                    38610.3390 \nMin. group size:          235           Log-Likelihood:           -48952.3998\nMax. group size:          248           Converged:                Yes        \nMean group size:          243.3                                              \n-----------------------------------------------------------------------------\n                              Coef.   Std.Err.   z    P>|z|  [0.025   0.975] \n-----------------------------------------------------------------------------\nIntercept                    -144.024   17.438 -8.259 0.000 -178.202 -109.846\ncolor_type[T.proto]           -24.133   11.299 -2.136 0.033  -46.278   -1.987\nset_size                       49.979    2.906 17.200 0.000   44.284   55.675\nset_size:color_type[T.proto]   10.176    4.120  2.470 0.014    2.101   18.251\nsubject_nr Var               7217.423    9.882                               \n=============================================================================\n\n```\n\n\n\nThe model summary shows that, assuming an alpha level of .05, there are significant main effects of color type (z = -2.136, p = .033), set size (z = 17.2, p < .001), and a significant color-type by set-size interaction (z = 2.47, p = .014). However, we have selectively analyzed a sample range that we knew, based on a visual inspection of the data, to show these effects. This means that our analysis is circular: we have looked at the data to decide where to look! The `find()` function improves this by splitting the data into training and tests sets, as described under [About](#about), thus breaking the circularity.\n\n\n\n```python\nresults = tst.find(dm, 'pupil ~ set_size * color_type',\n                   groups='subject_nr', winlen=5)\n```\n\n\n\nThe return value of `find()` is a `dict`, where keys are effect labels and values are named tuples of the following:\n\n- `model`: a model as returned by `mixedlm().fit()`\n- `samples`: a `set` with the sample indices that were used\n- `p`: the p-value from the model\n- `z`: the z-value from the model\n\nThe `summarize()` function is a convenient way to get the results in a human-readable format.\n\n\n\n```python\nprint(tst.summarize(results))\n```\n\n__Output:__\n``` .text\nIntercept was tested at samples {95} \u2192 z = -13.1098, p = 2.892e-39\ncolor_type[T.proto] was tested at samples {160, 170, 175} \u2192 z = -2.0949, p = 0.03618\nset_size was tested at samples {185, 210, 195, 255} \u2192 z = 16.2437, p = 2.475e-59\nset_size:color_type[T.proto] was tested at samples {165, 175} \u2192 z = 2.5767, p = 0.009974\n```\n\n\n\nWe can pass the `results` to `plot()` to visualize the results:\n\n\n\n```python\nplt.clf()\ntst.plot(dm, dv='pupil', hue_factor='set_size', linestyle_factor='color_type',\n         results=results)\nplt.savefig('img/signal-plot-2.png')\n```\n\n\n\n![](https://github.com/smathot/time_series_test/raw/master/img/signal-plot-2.png)\n\n\n## Function reference\n\n## <span style=\"color:purple\">time\\_series\\_test.lmer\\_crossvalidation\\_test</span>_(dm, formula, groups, re\\_formula=None, winlen=1, split=4, split\\_method='interleaved', samples\\_fe=True, samples\\_re=True, localizer\\_re=False, fit\\_method=None, suppress\\_convergence\\_warnings=False, fit\\_kwargs=None, \\*\\*kwargs)_\n\nConducts a single linear mixed effects model to a time series, where the\nto-be-tested samples are determined through crossvalidation.\n\nThis function uses `mixedlm()` from the `statsmodels` package. See the\nstatsmodels documentation for a more detailed explanation of the\nparameters.\n\n### Parameters\n\n* **dm: DataMatrix**\n\n  The dataset\n\n* **formula: str**\n\n  A formula that describes the dependent variable, which should be the\n  name of a series column in `dm`, and the fixed effects, which should\n  be regular (non-series) columns.\n\n* **groups: str or None or list of str**\n\n  The groups for the random effects, which should be regular (non-series)\n  columns in `dm`. If `None` is specified, then all analyses are based\n  on a regular multiple linear regression (instead of linear mixed \n  effects model).\n\n* **re\\_formula: str or None**\n\n  A formula that describes the random effects, which should be regular\n  (non-series) columns in `dm`.\n\n* **winlen: int, optional**\n\n  The number of samples that should be analyzed together, i.e. a \n  downsampling window to speed up the analysis.\n\n* **split: int, optional**\n\n  The number of splits that the analysis should be based on.\n\n* **split\\_method: str, optional**\n\n  If 'interleaved', the data is split in a regular interleaved fashion,\n  such that the first row goes to the first subset, the second row to the\n  second subset, etc. If 'random', the data is split randomly in subsets.\n  Interleaved splitting is deterministic (i.e. it results in the same\n  outcome each time), but random splitting is not.\n\n* **samples\\_fe: bool, optional**\n\n  Indicates whether sample indices are included as an additive factor\n  to the fixed-effects formula. If all splits yielded the same sample\n  index, this is ignored.\n\n* **samples\\_re: bool, optional**\n\n  Indicates whether sample indices are included as an additive factor\n  to the random-effects formula. If all splits yielded the same sample\n  index, this is ignored.\n\n* **localizer\\_re: bool, optional**\n\n  Indicates whether a random effects structure as specified using the\n  `re_formula` keyword should also be used for the localizer models,\n  or only for the final model.\n\n* **fit\\_kwargs: dict or None, optional**\n\n  A `dict` that is passed as keyword arguments to `mixedlm.fit()`. For\n  example, to specify the nm as the fitting method, specify\n  `fit_kwargs={'fit': 'nm'}`.\n\n* **fit\\_method: str, list of str, or None, optional**\n\n  Deprecated. Use `fit_kwargs` instead.\n\n* **suppress\\_convergence\\_warnings: bool, optional**\n\n  Installs a warning filter to suppress conververgence (and other)\n  warnings.\n\n* **\\*\\*kwargs: dict, optional**\n\n  Optional keywords to be passed to `mixedlm()`.\n\n### Returns\n\n* **_dict_**\n\n  A dict where keys are effect labels, and values are named tuples\n  of `model`, `samples`, `p`, and `z`.\n\n## <span style=\"color:purple\">time\\_series\\_test.lmer\\_permutation\\_test</span>_(dm, formula, groups, re\\_formula=None, winlen=1, suppress\\_convergence\\_warnings=False, fit\\_kwargs={}, iterations=1000, cluster\\_p\\_threshold=0.05, test\\_intercept=False, \\*\\*kwargs)_\n\nPerforms a cluster-based permutation test based on sample-by-sample\nlinear-mixed-effects analyses. The permutation test identifies clusters\nbased on p-value threshold and uses the absolute of the summed z-values of\nthe clusters as test statistic.\n\nIf no clusters reach the threshold, the test is skipped right away. By\ndefault the Intercept is ignored for this criterion, because the intercept\nusually has significant clusters that we're not interested in. However, you\ncan change this using the `test_intercept` keyword.\n\n*Warning:* This is generally an extremely time-consuming analysis because\nit requires thousands of lmers to be run.\n\nSee `lmer_crossvalidation()` for an explanation of the arguments.\n\n### Parameters\n\n* **dm: DataMatrix**\n\n* **formula: str**\n\n* **groups: str**\n\n* **re\\_formula: str or None, optional**\n\n* **winlen: int, optional**\n\n* **suppress\\_convergence\\_warnings: bool, optional**\n\n* **fit\\_kwargs: dict, optional**\n\n* **iterations: int, optional**\n\n  The number of permutations to run.\n\n* **cluster\\_p\\_threshold: float or None, optional**\n\n  The maximum p-value for a sample to be considered part of a cluster.\n\n* **test\\_intercept: bool, optional**\n\n  Indicates whether the intercept should be included when considering if\n  there are any clusters, as described above.\n\n* **\\*\\*kwargs: dict, optional**\n\n### Returns\n\n* **_dict_**\n\n  A dict with effects as keys and lists of clusters defined by\n  (start, end, z-sum, hit proportion) tuples. The p-value is\n  1 - hit proportion.\n\n## <span style=\"color:purple\">time\\_series\\_test.lmer\\_series</span>_(dm, formula, winlen=1, fit\\_kwargs={}, \\*\\*kwargs)_\n\nPerforms a sample-by-sample linear-mixed-effects analysis. See\n`lmer_crossvalidation()` for an explanation of the arguments.\n\n### Parameters\n\n* **dm: DataMatrix**\n\n* **formula: str**\n\n* **winlen: int, optional**\n\n* **fit\\_kwargs: dict, optional**\n\n* **\\*\\*kwargs: dict, optional**\n\n### Returns\n\n* **_DataMatrix_**\n\n  A DataMatrix with one row per effect, including the intercept, and\n  three series columns with the same depth as the dependent measure\n  specified in the formula:\n\n  - `est`: the slope\n  - `p`: the p value\n  - `z`: the z value\n  - `se`: the standard error\n\n## <span style=\"color:purple\">time\\_series\\_test.plot</span>_(dm, dv, hue\\_factor, results=None, linestyle\\_factor=None, hues=None, linestyles=None, alpha\\_level=0.05, annotate\\_intercept=False, annotation\\_hues=None, annotation\\_linestyle=':', legend\\_kwargs=None, annotation\\_legend\\_kwargs=None)_\n\nVisualizes a time series, where the signal is plotted as a function of\nsample number on the x-axis. One fixed effect is indicated by the hue\n(color) of the lines. An optional second fixed effect is indicated by the\nlinestyle. If the `results` parameter is used, significant effects are\nannotated in the figure.\n\n### Parameters\n\n* **dm: DataMatrix**\n\n  The dataset\n\n* **dv: str**\n\n  The name of the dependent variable, which should be a series column\n  in `dm`.\n\n* **hue\\_factor: str**\n\n  The name of a regular (non-series) column in `dm` that specifies the\n  hue (color) of the lines.\n\n* **results: dict, optional**\n\n  A `results` dict as returned by `lmer_crossvalidation()`.\n\n* **linestyle\\_factor: str, optional**\n\n  The name of a regular (non-series) column in `dm` that specifies the\n  linestyle of the lines for a two-factor plot.\n\n* **hues: str, list, or None, optional**\n\n  The name of a matplotlib colormap or a list of hues to be used as line\n  colors for the hue factor.\n\n* **linestyles: list or None, optional**\n\n  A list of linestyles to be used for the second factor.\n\n* **alpha\\_level: float, optional**\n\n  The alpha level (maximum p value) to be used for annotating effects\n  in the plot.\n\n* **annotate\\_intercept: bool, optional**\n\n  Specifies whether the intercept should also be annotated along with\n  the fixed effects.\n\n* **annotation\\_hues: str, list, or None, optional**\n\n  The name of a matplotlib colormap or a list of hues to be used for the\n  annotations if `results` is provided.\n\n* **annotation\\_linestyle: str, optional**\n\n  The linestyle for the annotations.\n\n* **legend\\_kwargs: None or dict, optional**\n\n  Optional keywords to be passed to `plt.legend()` for the factor legend.\n\n* **annotation\\_legend\\_kwargs: None or dict, optional**\n\n  Optional keywords to be passed to `plt.legend()` for the annotation\n  legend.\n\n## <span style=\"color:purple\">time\\_series\\_test.summarize</span>_(results, detailed=False)_\n\nGenerates a string with a human-readable summary of a results `dict` as\nreturned by `lmer_crossvalidation()`.\n\n### Parameters\n\n* **results: dict**\n\n  A `results` dict as returned by `lmer_crossvalidation()`.\n\n* **detailed: bool, optional**\n\n  Indicates whether model details should be included in the summary.\n\n### Returns\n\n* **_str_**\n\n\n## License\n\n`time_series_test` is licensed under the [GNU General Public License\nv3](http://www.gnu.org/licenses/gpl-3.0.en.html).\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A statistical test and plotting function for time-series data in general,",
    "version": "0.11.2",
    "project_urls": {
        "Documentation": "https://github.com/smathot/time_series_test",
        "Source": "https://github.com/smathot/time_series_test"
    },
    "split_keywords": [
        "signal processing",
        "statistics",
        "plotting",
        "data analysis",
        "crossvalidation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cdc2777dd817a96dba57a60f0dd3bce2731772c4888bb20e90208ae437e29a66",
                "md5": "be2fd2945a114748edfd9e6e01342788",
                "sha256": "b08a798ebfbc7acf15bbc8e30b8101a2da3188ac5316e7b8b90e56406fe356b6"
            },
            "downloads": -1,
            "filename": "time_series_test-0.11.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "be2fd2945a114748edfd9e6e01342788",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 27033,
            "upload_time": "2023-05-26T14:28:12",
            "upload_time_iso_8601": "2023-05-26T14:28:12.612159Z",
            "url": "https://files.pythonhosted.org/packages/cd/c2/777dd817a96dba57a60f0dd3bce2731772c4888bb20e90208ae437e29a66/time_series_test-0.11.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c8afcc625e8ae2e149f6148c365c6dbf73098d84a29dd9489644da762d3ca7b9",
                "md5": "dc5780f1ad21e16c66c9d7df159dc824",
                "sha256": "34245cc9d40a7da078414f8bff7be3ebd443ccdc5e58f8baa896e8dfd64a8c83"
            },
            "downloads": -1,
            "filename": "time_series_test-0.11.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dc5780f1ad21e16c66c9d7df159dc824",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 29668,
            "upload_time": "2023-05-26T14:28:14",
            "upload_time_iso_8601": "2023-05-26T14:28:14.280353Z",
            "url": "https://files.pythonhosted.org/packages/c8/af/cc625e8ae2e149f6148c365c6dbf73098d84a29dd9489644da762d3ca7b9/time_series_test-0.11.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-26 14:28:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "smathot",
    "github_project": "time_series_test",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "time-series-test"
}
        
Elapsed time: 0.07207s