timemachines


Nametimemachines JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/microprediction/timemachines
SummaryTime series models represented as pure functions with SKATER convention.
upload_time2021-01-14 02:08:23
maintainer
docs_urlNone
authormicroprediction
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # timemachines [![Build Status](https://travis-ci.com/microprediction/timemachines.svg?branch=main)](https://travis-ci.com/microprediction/timemachines) ![tests](https://github.com/microprediction/timemachines/workflows/tests/badge.svg) ![regression](https://github.com/microprediction/timemachines/workflows/regression/badge.svg)



This package is an experiment in a different approach to the representation of time series models. Here a time series model:

- takes the form of a *pure function* with a *skater* signature,
- that is a recipe for a *state machine*,
- where the intent that the *caller* might carry the state from one invocation to the next, not the *callee*, and
- with the further, somewhat unusual convention that variables known in advance (*a*) and the full set of model hyper-parameters (*r*) are both squished down into their respective *scalar* arguments. 

The penultimate convention is for generality, and also eyes lambda-based deployments. The last convention imposes at design time a consistent hyper-parameter space. This step may seem unnatural, but it facilitates comparisons of models and hyper-parameter optimizers in different settings. It is workable, we hope, with some space-filling curve conventions.   

### Want to discuss time series modeling standardization?

This isn't put forward as *the right way* to write time series packages - more a way of exposing their functionality. If you are interested in design thoughts for time series maybe participate in this thread https://github.com/MaxBenChrist/awesome_time_series_in_python/issues/1. 

### A "skater" function 

Most time series packages use a complex combination of methods and data to represent a time series model, its fitting, and forecasting usage. But in this package a "model" is *merely a function* We mean *function* in the mathematical sense.   

    x, s, w = f(   y:Union[float,[float]],               # Contemporaneously observerd data, 
                                                         # ... including exogenous variables in y[1:], if any. 
                s=None,                                  # Prior state
                k:float=1,                               # Number of steps ahead to forecast. Typically integer. 
                a:float=None,                            # Variable(s) known in advance, or conditioning
                t:float=None,                            # Time of observation (epoch seconds)
                e:float=None,                            # Non-binding maximal computation time ("e for expiry"), in seconds
                r:float=None)                            # Hyper-parameters ("r" stands for for hype(r)-pa(r)amete(r)s in R^n)
The function returns: 

                     -> float,                           # A point estimate, or anchor point, or theo
                        Any,                             # Posterior state, intended for safe keeping by the callee until the next invocation 
                        Any                              # Everything else (e.g. confidence intervals) not needed for the next invocation. 

(Yes one might quibble with the purity given that state s can be modified, but that's Python sensible).  

### Skating forward

    def posteriors(f,ys):
        s = None
        xs = list()
        for y in ys: 
            x, s, _ = f(y,s)
            xs.append(xs)
        return xs

![](https://i.imgur.com/DkZvZRq.png)

Picture by [Joe Cook](https://www.instagram.com/joecooke_/?utm_medium=referral&utm_source=unsplash)


### Conventions: 

- The caller, not the callee, persists state from one invocation to the next
    - The format taken by state is determined by the callee, not caller
    - The caller passes s=None the first time
    - The function initializes state as necessary, and passes it back
    - The caller keeps the state and sends it back to the callee
    - State can be mutable for efficiency (e.g. it might be a long buffer) or not. 
    - Recall that Python is pass-by-object-reference. 
    - State should, ideally, be JSON-friendly. Use .tolist() on arrays.
    - State is not an invitation to sneak in additional arguments.

- Univariate or multivariate observation argument
     - If y is a vector, the target is the first element y[0]
     - The elements y[1:] are contemporaneous exogenous variables, *not known in advance*.  
     - Missing data as np.nan but *not None* (see fitting below)

- Fitting:  
     - If y=None is passed, it is a suggestion to the callee to perform fitting, should that be necessary. 
     - Or some other offline periodic task. 
     - In this case the *e* argument takes on a slightly different interpretation, and should probably
     be considerably larger than usual. 
     - The callee should return x=None, as acknowledgement that it has recognized the "offline" convention

- Variables known in advance, or conditioning variables:
     - Passed as *scalar* argument *a* in (0,1). 
     - See discussion below re: space-filling curves so you know this isn't really a huge restriction.  
     - Rationale: make it easier to design general purpose conditional prediction algorithms
     - Bear in mind many functions will ignore this argument, so we have little to lose here. 
     - Caller can deepcopy the state to effect multiple conditional predictions.
     - Example: business day indicator
     - Example: size of a trade
     - Example: joystick button up 

- Parameter space:
     - Caller has a limited ability to suggest variation in parameters (or maybe hyper-parameters, since 
     many callees will fit parameters on the fly or when there is time).
     - This communication is squished into a single float *r* in (0,1). 
     - Arguably, this makes callees more canonical and, 
     - seriously, there are lots of real numbers, and 
     - the intent here is that the caller shouldn't need to know a lot about parameters.
     - This package provides some conventions for expanding to R^n using space filling curves,
     - so that the callee's (hyper) parameter optimization can still exploit geometry, as you see fit. 

- Ordering of parameters in space-filling curve:
    - The most important variables should be listed first, as they vary more slowly. 
    - See picture below or video

### Space-filling conventions for *a* and *r*

The script [demo_balanced_log_scale.py](https://github.com/microprediction/timemachines/blob/master/examples/demo_balanced_log_scale.py) illustrates the
quasi-logarithmic parameter mapping from r in (0,1) to R. 

The script [demo_param_ordering.py](https://github.com/microprediction/timemachines/blob/master/examples/demo_param_ordering.py) illustrates
the mapping from r in (0,1) to R^n. Observe why the most important parameter should be listed first. It will vary
more smoothly as we vary r. 

[![IMAGE ALT TEXT](https://i.imgur.com/4F1oHXR.png)](https://vimeo.com/497113737 "Parameter importance")
Click to see video


### FAQ:

Question 1. Why not have the model persist the state?

Answer 1. Go ahead:

       class Predictor:

           def __init__(self,f):
                self.f = f
                self.s = s

           def __call__(self,y,k,a,t,e):
                x, self.s = self.f(y=y,s=self.s,k=k,a=a,t=t,e=e)
                return x

or write a decorator. However:
- We have lambda patterns in mind
- The callee has more control in this setup (e.g. for multiple conditional forecasts)

Question 2. Why do it this bare-bones manner with squished parameter spaces?  

Answer 2. The intent is to produce lambda-friendly models but also:
- Comparison, combination and search for models, made possible by
- A *reasonable* way to map the most important hyper-parameter choices (we hope),
- Which imposes some geometric discipline on the hyper-parameter space (e.g. most important first), and
- enables search across packages which have *entirely different conventions* and hyper-parameter spaces. 


Observe that this package wraps *some* partial functionality of some time series prediction libraries. Those libraries could not be further removed from the above in that they:
 - Use pandas dataframes
 - Bundle data with prediction logic
 - Rely on column naming conventions 
 - Require 10-20 lines of setup code before a prediction can be made
 - Require tracing into the code to infer intent
 - Use conventions such as '5min' which not everyone agrees on 

This package should *not* be viewed as an attempt to wrap most of the functionality of these packages. If you 
have patterns in mind that match them, and you are confident of their performance, you are best served to 
use them directly. 

### Scope and limitations
The simple interface is not well suited to problems where exogenous data comes and goes. 
You might consider a dictionary interface instead, as with the river package. 
It is also not well suited to fixed horizon forecasting if the data isn't sampled terribly regularly. 
Nor is it well suited to prediction of multiple time series whose sampling occurs irregularly. 
Ordinal values can be kludged into the parameter space and action argument, but purely categorical not so much. And finally, if you
don't like the idea of hyper-parameters lying in R^n or don't see any obvious embedding, this might 
not be for you. 

### Yes, we're keen to receive PR's
If you'd like to contribute to this standardizing and benchmarking effort, here are some ideas:

- See the [list of popular time series packages](https://www.microprediction.com/blog/popular-timeseries-packages) ranked by download popularity. 
- Think about the most important hyper-parameters.
- Consider "warming up" the mapping (0,1)->hyper-params by testing on real data. There is a [tutorial](https://www.microprediction.com/python-3) on retrieving live data, or use the [real data](https://pypi.org/project/realdata/) package, if that's simpler.
- The [comparison of hyper-parameter optimization packages](https://www.microprediction.com/blog/optimize) might also be helpful.  

If you are the maintainer of a time series package, we'd love your feedback and if you take the time to submit a PR here, do yourself a favor and also enable "supporting" on your repo. 

### Deployment

Some of these models are used as intermediate steps in the creation of distributional forecasts, at [microprediction.org](https://www.microprediction.org). 



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/microprediction/timemachines",
    "name": "timemachines",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "microprediction",
    "author_email": "pcotton@intechinvestments.com",
    "download_url": "https://files.pythonhosted.org/packages/00/3e/0662137fdbcfd3f88457f57d575f7dc9db5826532d01826ecf7bf459785e/timemachines-0.1.0.tar.gz",
    "platform": "",
    "description": "# timemachines [![Build Status](https://travis-ci.com/microprediction/timemachines.svg?branch=main)](https://travis-ci.com/microprediction/timemachines) ![tests](https://github.com/microprediction/timemachines/workflows/tests/badge.svg) ![regression](https://github.com/microprediction/timemachines/workflows/regression/badge.svg)\n\n\n\nThis package is an experiment in a different approach to the representation of time series models. Here a time series model:\n\n- takes the form of a *pure function* with a *skater* signature,\n- that is a recipe for a *state machine*,\n- where the intent that the *caller* might carry the state from one invocation to the next, not the *callee*, and\n- with the further, somewhat unusual convention that variables known in advance (*a*) and the full set of model hyper-parameters (*r*) are both squished down into their respective *scalar* arguments. \n\nThe penultimate convention is for generality, and also eyes lambda-based deployments. The last convention imposes at design time a consistent hyper-parameter space. This step may seem unnatural, but it facilitates comparisons of models and hyper-parameter optimizers in different settings. It is workable, we hope, with some space-filling curve conventions.   \n\n### Want to discuss time series modeling standardization?\n\nThis isn't put forward as *the right way* to write time series packages - more a way of exposing their functionality. If you are interested in design thoughts for time series maybe participate in this thread https://github.com/MaxBenChrist/awesome_time_series_in_python/issues/1. \n\n### A \"skater\" function \n\nMost time series packages use a complex combination of methods and data to represent a time series model, its fitting, and forecasting usage. But in this package a \"model\" is *merely a function* We mean *function* in the mathematical sense.   \n\n    x, s, w = f(   y:Union[float,[float]],               # Contemporaneously observerd data, \n                                                         # ... including exogenous variables in y[1:], if any. \n                s=None,                                  # Prior state\n                k:float=1,                               # Number of steps ahead to forecast. Typically integer. \n                a:float=None,                            # Variable(s) known in advance, or conditioning\n                t:float=None,                            # Time of observation (epoch seconds)\n                e:float=None,                            # Non-binding maximal computation time (\"e for expiry\"), in seconds\n                r:float=None)                            # Hyper-parameters (\"r\" stands for for hype(r)-pa(r)amete(r)s in R^n)\nThe function returns: \n\n                     -> float,                           # A point estimate, or anchor point, or theo\n                        Any,                             # Posterior state, intended for safe keeping by the callee until the next invocation \n                        Any                              # Everything else (e.g. confidence intervals) not needed for the next invocation. \n\n(Yes one might quibble with the purity given that state s can be modified, but that's Python sensible).  \n\n### Skating forward\n\n    def posteriors(f,ys):\n        s = None\n        xs = list()\n        for y in ys: \n            x, s, _ = f(y,s)\n            xs.append(xs)\n        return xs\n\n![](https://i.imgur.com/DkZvZRq.png)\n\nPicture by [Joe Cook](https://www.instagram.com/joecooke_/?utm_medium=referral&utm_source=unsplash)\n\n\n### Conventions: \n\n- The caller, not the callee, persists state from one invocation to the next\n    - The format taken by state is determined by the callee, not caller\n    - The caller passes s=None the first time\n    - The function initializes state as necessary, and passes it back\n    - The caller keeps the state and sends it back to the callee\n    - State can be mutable for efficiency (e.g. it might be a long buffer) or not. \n    - Recall that Python is pass-by-object-reference. \n    - State should, ideally, be JSON-friendly. Use .tolist() on arrays.\n    - State is not an invitation to sneak in additional arguments.\n\n- Univariate or multivariate observation argument\n     - If y is a vector, the target is the first element y[0]\n     - The elements y[1:] are contemporaneous exogenous variables, *not known in advance*.  \n     - Missing data as np.nan but *not None* (see fitting below)\n\n- Fitting:  \n     - If y=None is passed, it is a suggestion to the callee to perform fitting, should that be necessary. \n     - Or some other offline periodic task. \n     - In this case the *e* argument takes on a slightly different interpretation, and should probably\n     be considerably larger than usual. \n     - The callee should return x=None, as acknowledgement that it has recognized the \"offline\" convention\n\n- Variables known in advance, or conditioning variables:\n     - Passed as *scalar* argument *a* in (0,1). \n     - See discussion below re: space-filling curves so you know this isn't really a huge restriction.  \n     - Rationale: make it easier to design general purpose conditional prediction algorithms\n     - Bear in mind many functions will ignore this argument, so we have little to lose here. \n     - Caller can deepcopy the state to effect multiple conditional predictions.\n     - Example: business day indicator\n     - Example: size of a trade\n     - Example: joystick button up \n\n- Parameter space:\n     - Caller has a limited ability to suggest variation in parameters (or maybe hyper-parameters, since \n     many callees will fit parameters on the fly or when there is time).\n     - This communication is squished into a single float *r* in (0,1). \n     - Arguably, this makes callees more canonical and, \n     - seriously, there are lots of real numbers, and \n     - the intent here is that the caller shouldn't need to know a lot about parameters.\n     - This package provides some conventions for expanding to R^n using space filling curves,\n     - so that the callee's (hyper) parameter optimization can still exploit geometry, as you see fit. \n\n- Ordering of parameters in space-filling curve:\n    - The most important variables should be listed first, as they vary more slowly. \n    - See picture below or video\n\n### Space-filling conventions for *a* and *r*\n\nThe script [demo_balanced_log_scale.py](https://github.com/microprediction/timemachines/blob/master/examples/demo_balanced_log_scale.py) illustrates the\nquasi-logarithmic parameter mapping from r in (0,1) to R. \n\nThe script [demo_param_ordering.py](https://github.com/microprediction/timemachines/blob/master/examples/demo_param_ordering.py) illustrates\nthe mapping from r in (0,1) to R^n. Observe why the most important parameter should be listed first. It will vary\nmore smoothly as we vary r. \n\n[![IMAGE ALT TEXT](https://i.imgur.com/4F1oHXR.png)](https://vimeo.com/497113737 \"Parameter importance\")\nClick to see video\n\n\n### FAQ:\n\nQuestion 1. Why not have the model persist the state?\n\nAnswer 1. Go ahead:\n\n       class Predictor:\n\n           def __init__(self,f):\n                self.f = f\n                self.s = s\n\n           def __call__(self,y,k,a,t,e):\n                x, self.s = self.f(y=y,s=self.s,k=k,a=a,t=t,e=e)\n                return x\n\nor write a decorator. However:\n- We have lambda patterns in mind\n- The callee has more control in this setup (e.g. for multiple conditional forecasts)\n\nQuestion 2. Why do it this bare-bones manner with squished parameter spaces?  \n\nAnswer 2. The intent is to produce lambda-friendly models but also:\n- Comparison, combination and search for models, made possible by\n- A *reasonable* way to map the most important hyper-parameter choices (we hope),\n- Which imposes some geometric discipline on the hyper-parameter space (e.g. most important first), and\n- enables search across packages which have *entirely different conventions* and hyper-parameter spaces. \n\n\nObserve that this package wraps *some* partial functionality of some time series prediction libraries. Those libraries could not be further removed from the above in that they:\n - Use pandas dataframes\n - Bundle data with prediction logic\n - Rely on column naming conventions \n - Require 10-20 lines of setup code before a prediction can be made\n - Require tracing into the code to infer intent\n - Use conventions such as '5min' which not everyone agrees on \n\nThis package should *not* be viewed as an attempt to wrap most of the functionality of these packages. If you \nhave patterns in mind that match them, and you are confident of their performance, you are best served to \nuse them directly. \n\n### Scope and limitations\nThe simple interface is not well suited to problems where exogenous data comes and goes. \nYou might consider a dictionary interface instead, as with the river package. \nIt is also not well suited to fixed horizon forecasting if the data isn't sampled terribly regularly. \nNor is it well suited to prediction of multiple time series whose sampling occurs irregularly. \nOrdinal values can be kludged into the parameter space and action argument, but purely categorical not so much. And finally, if you\ndon't like the idea of hyper-parameters lying in R^n or don't see any obvious embedding, this might \nnot be for you. \n\n### Yes, we're keen to receive PR's\nIf you'd like to contribute to this standardizing and benchmarking effort, here are some ideas:\n\n- See the [list of popular time series packages](https://www.microprediction.com/blog/popular-timeseries-packages) ranked by download popularity. \n- Think about the most important hyper-parameters.\n- Consider \"warming up\" the mapping (0,1)->hyper-params by testing on real data. There is a [tutorial](https://www.microprediction.com/python-3) on retrieving live data, or use the [real data](https://pypi.org/project/realdata/) package, if that's simpler.\n- The [comparison of hyper-parameter optimization packages](https://www.microprediction.com/blog/optimize) might also be helpful.  \n\nIf you are the maintainer of a time series package, we'd love your feedback and if you take the time to submit a PR here, do yourself a favor and also enable \"supporting\" on your repo. \n\n### Deployment\n\nSome of these models are used as intermediate steps in the creation of distributional forecasts, at [microprediction.org](https://www.microprediction.org). \n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Time series models represented as pure functions with SKATER convention.",
    "version": "0.1.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "acbe6da77202d8c02af1e7c46fcd250a",
                "sha256": "a798f30dc04b7b8be5295b1d2404f54a660aeeeea572e28900f10871cd48c383"
            },
            "downloads": -1,
            "filename": "timemachines-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acbe6da77202d8c02af1e7c46fcd250a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 31557,
            "upload_time": "2021-01-14T02:08:22",
            "upload_time_iso_8601": "2021-01-14T02:08:22.147470Z",
            "url": "https://files.pythonhosted.org/packages/30/01/ba3bdf956bb7b9d348625b0b5636ec683e131c5b3ebd305076cc88989c99/timemachines-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "8bd84c8cef2ec8392baa2dbedc40993d",
                "sha256": "6d359981219da78f7a11412289d4d3d9c8307f60c4948337bfee1b3887a5bb6f"
            },
            "downloads": -1,
            "filename": "timemachines-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8bd84c8cef2ec8392baa2dbedc40993d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24781,
            "upload_time": "2021-01-14T02:08:23",
            "upload_time_iso_8601": "2021-01-14T02:08:23.472977Z",
            "url": "https://files.pythonhosted.org/packages/00/3e/0662137fdbcfd3f88457f57d575f7dc9db5826532d01826ecf7bf459785e/timemachines-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-01-14 02:08:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "microprediction",
    "error": "Could not fetch GitHub repository",
    "lcname": "timemachines"
}
        
Elapsed time: 0.25185s