unsupervised-multimodal-trajectory-modeling

Name	unsupervised-multimodal-trajectory-modeling JSON
Version	2024.2.1 JSON
	download
home_page
Summary	trains mixtures of state space models with expectation maximization
upload_time	2024-03-13 15:24:12
maintainer
docs_url	None
author
requires_python	>=3.10
license	MIT License (c) 2023-2024 Michael C. Burkhart Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	unsupervised clustering trajectory clustering mixture models state space models machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Unsupervised Multimodal Trajectory Modeling

[![DOI](https://zenodo.org/badge/692068384.svg)](https://zenodo.org/badge/latestdoi/692068384)

We propose and validate a mixture of state space models to perform unsupervised
clustering of short trajectories. Within the state space framework, we let
expensive-to-gather biomarkers correspond to hidden states and readily
obtainable cognitive metrics correspond to measurements. Upon training with
expectation maximization, we find that our clusters stratify persons according
to clinical outcome. Furthermore, we can effectively predict on held-out
trajectories using cognitive metrics alone. Our approach accommodates missing
data through model marginalization and generalizes across research and clinical
cohorts.

### Data format

We consider a training dataset

$$
\mathcal{D} = \{(x_{1:T}^{i}, z_{1:T}^{i}) \}_{1 \leq i \leq n_d}
$$

consisting of $n_d$ sequences of states and observations paired in time. We
denote the states $z_{1:T}^{i} = (z_1^i, z_2^i, \dotsc, z_T^i)$ where
$z_t^i \in \mathbb{R}^d$ corresponds to the state at time $t$ for the $i$ th
instance and measurements $x_{1:T}^{i} = (x_1^i, x_2^i, \dotsc, x_T^i)$ where
$x_t^i \in \mathbb{R}^\ell$ corresponds to the observation at time $t$ for the
$i$ th instance. For the purposes of this code, we adopt the convention that
collections of time-delineated sequences of vectors will be stored as
3-tensors, where the first dimension spans time $1\leq t \leq T$, the second
dimension spans instances $1\leq i \leq n_d$ (these will almost always
correspond to an individual or participant), and the third dimension spans the
components of each state or observation vector (and so will have dimension
either $d$ or $\ell$). We accommodate trajectories of differing lengths by
standardising to the longest available trajectory in a dataset and appending
`np.nan`'s to shorter trajectories.

### Model specification

We adopt a mixture of state space models for the data:

<img src="figure1.png" 
    alt="plate notation for mixture of state space models" 
    style="max-width:700px;width:100%">

given explicitly by:

$$
p(z^i_{1:T}, x^i_{1:T})
		= \sum_{c=1}^{n_c} \pi_{c} \delta_{ \\{c=c^i \\} }
    \bigg( p(z_1^i| c)  \prod_{t=2}^T p(z_t^i | z_{t-1}^i, c)
    \prod_{t=1}^T p(x_t^i | z_t^i, c) \bigg)
$$

Each individual $i$ is independently assigned to some cluster $c^i$ with
probability $\pi_{c}$, and then conditional on this cluster assignment, their
initial state $z_1^i$ is drawn according to $p(z_1^i| c)$, with each subsequent
state $z_t^i, 2\leq t \leq T$ being drawn in turn using the cluster-specific
_state model_ $p(z_t^i | z_{t-1}^i, c)$, depending on the previous state. At
each point in time, we obtain an observation $x_t^i$ from the cluster-specific
_measurement model_ $p(x_t^i | z_t^i, c)$, depending on the current state. In
what follows, we assume both the state and measurement models are stationary
for each cluster, i.e. they are independent of $t$. In particular, for a given
individual, the relationship between the state and measurement should not
change over time.

In our main framework, inspired by the work of Chiappa and Barber[^1], we
additionally assume that the cluster-specific state initialisation is Gaussian,
i.e. $p(z_1^i| c) = \eta_d(z_1^i; m_c, S_c)$, and the cluster-specific state
and measurement models are linear Gaussian, i.e.
$p(z_t^i | z_{t-1}^i, c) = \eta_d(z_t^i; z_{t-1}^iA_c, \Gamma_c)$ and
$p(x_t^i
| z_t^i, c) = \eta_\ell(x_t^i; z_t^iH_c, \Lambda_c)$, where
$\eta_d(\cdot, \mu,
\Sigma)$ denotes the multivariate $d$-dimensional Gaussian
density with mean $\mu$ and covariance $\Sigma$, yielding:

$$
p(z^i_{1:T}, x^i_{1:T})
		= \sum_{c=1}^{n_c} \pi_{c} \delta_{ \\{c=c^i \\} }
    \bigg( \eta_d(z_1^i; m_c, S_c)
		\prod_{t=2}^T \eta_d(z_t^i; z_{t-1}^iA_c, \Gamma_c) \prod_{t=1}^T
		\eta_\ell(x_t^i; z_t^iH_c, \Lambda_c) \bigg).
$$

In particular, we assume that the variables we are modeling are continuous and
changing over time. When we train a model like the above, we take a dataset
$\mathcal{D}$ and an arbitrary set of cluster assignments $c^i$ (as these are
also latent/ hidden from us) and iteratively perform M and E steps (from which
EM[^2] gets its name):

- [**E**] Expectation step: given the current model, we assign each data
  instance $(z^i_{1:T}, x^i_{1:T})$ to the cluster to which it is mostly likely
  to belong under the current model
- [**M**] Maximization step: given the current cluster assignments, we compute
  the sample-level cluster assignment probabilities (the $\pi_c$) and optimal
  cluster-specific parameters

Optimization completes after a fixed (large) number of steps or when no data
instances change their cluster assignment at a given iteration.

### Adapting the code for your own use

A typical workflow is described at:
[https://github.com/burkh4rt/Unsupervised-Trajectory-Clustering-Starter](https://github.com/burkh4rt/Unsupervised-Trajectory-Clustering-Starter)

### Caveats & Troubleshooting

Some efforts have been made to automatically handle edge cases. For a given
training run, if any cluster becomes too small (fewer than 3 members), training
terminates. In order to learn a model, we make assumptions about our training
data as described above. While our approach seems to be robust to some types of
model misspecification, we have encountered training issues with the following
problems:

1. Extreme outliers. An extreme outlier tends to want to form its own cluster
   (and that's problematic). In many cases this may be due to a typo or failed
   data-cleaning (i.e. an upstream problem). Generating histograms of each
   feature is one way to recognise this problem.
2. Discrete / static features. Including discrete data violates our Gaussian
   assumptions. If we learn a cluster where each trajectory has the same value
   for one of the states or observations at a given time step, then we are
   prone to estimating a singular covariance structure for this cluster which
   yields numerical instabilities. Adding a small bit of noise to discrete
   features may remediate numerical instability to some extent.

Another assumption that is easy-to-violate is our stationarity assumption for
the measurement model.

[^1]:
    S. Chiappa and D. Barber. _Dirichlet Mixtures of Bayesian Linear Gaussian
    State-Space Models: a Variational Approach._ Tech. rep. 161. Max Planck
    Institute for Biological Cybernetics, 2007.

[^2]:
    A. Dempster, N. Laird, and D. B. Rubin. _Maximum Likelihood from  
    Incomplete Data via the EM Algorithm._ J. Roy. Stat. Soc. Ser. B (Stat.
    Methodol.) 39.1 (1977), pp. 1–38.

<!--
rm dist/*
isort --profile black .
black .
prettier --write --print-width 79 --prose-wrap always **/*.md
python3 -m build
twine upload -s  -r pypi dist/*
# twine upload -r testpypi dist/*
-->

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "unsupervised-multimodal-trajectory-modeling",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "unsupervised clustering,trajectory clustering,mixture models,state space models,machine learning",
    "author": "",
    "author_email": "\"Michael C. Burkhart\" <mcb93@cam.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/a8/30/f02f903ac2c20a8776a16383f7b97e9c37a5ebe4ef89036904e5ee5ccdea/unsupervised-multimodal-trajectory-modeling-2024.2.1.tar.gz",
    "platform": null,
    "description": "# Unsupervised Multimodal Trajectory Modeling\n\n[![DOI](https://zenodo.org/badge/692068384.svg)](https://zenodo.org/badge/latestdoi/692068384)\n\nWe propose and validate a mixture of state space models to perform unsupervised\nclustering of short trajectories. Within the state space framework, we let\nexpensive-to-gather biomarkers correspond to hidden states and readily\nobtainable cognitive metrics correspond to measurements. Upon training with\nexpectation maximization, we find that our clusters stratify persons according\nto clinical outcome. Furthermore, we can effectively predict on held-out\ntrajectories using cognitive metrics alone. Our approach accommodates missing\ndata through model marginalization and generalizes across research and clinical\ncohorts.\n\n### Data format\n\nWe consider a training dataset\n\n$$\n\\mathcal{D} = \\{(x_{1:T}^{i}, z_{1:T}^{i}) \\}_{1 \\leq i \\leq n_d}\n$$\n\nconsisting of $n_d$ sequences of states and observations paired in time. We\ndenote the states $z_{1:T}^{i} = (z_1^i, z_2^i, \\dotsc, z_T^i)$ where\n$z_t^i \\in \\mathbb{R}^d$ corresponds to the state at time $t$ for the $i$ th\ninstance and measurements $x_{1:T}^{i} = (x_1^i, x_2^i, \\dotsc, x_T^i)$ where\n$x_t^i \\in \\mathbb{R}^\\ell$ corresponds to the observation at time $t$ for the\n$i$ th instance. For the purposes of this code, we adopt the convention that\ncollections of time-delineated sequences of vectors will be stored as\n3-tensors, where the first dimension spans time $1\\leq t \\leq T$, the second\ndimension spans instances $1\\leq i \\leq n_d$ (these will almost always\ncorrespond to an individual or participant), and the third dimension spans the\ncomponents of each state or observation vector (and so will have dimension\neither $d$ or $\\ell$). We accommodate trajectories of differing lengths by\nstandardising to the longest available trajectory in a dataset and appending\n`np.nan`'s to shorter trajectories.\n\n### Model specification\n\nWe adopt a mixture of state space models for the data:\n\n<img src=\"figure1.png\" \n    alt=\"plate notation for mixture of state space models\" \n    style=\"max-width:700px;width:100%\">\n\ngiven explicitly by:\n\n$$\np(z^i_{1:T}, x^i_{1:T})\n\t\t= \\sum_{c=1}^{n_c} \\pi_{c} \\delta_{ \\\\{c=c^i \\\\} }\n    \\bigg( p(z_1^i| c)  \\prod_{t=2}^T p(z_t^i | z_{t-1}^i, c)\n    \\prod_{t=1}^T p(x_t^i | z_t^i, c) \\bigg)\n$$\n\nEach individual $i$ is independently assigned to some cluster $c^i$ with\nprobability $\\pi_{c}$, and then conditional on this cluster assignment, their\ninitial state $z_1^i$ is drawn according to $p(z_1^i| c)$, with each subsequent\nstate $z_t^i, 2\\leq t \\leq T$ being drawn in turn using the cluster-specific\n_state model_ $p(z_t^i | z_{t-1}^i, c)$, depending on the previous state. At\neach point in time, we obtain an observation $x_t^i$ from the cluster-specific\n_measurement model_ $p(x_t^i | z_t^i, c)$, depending on the current state. In\nwhat follows, we assume both the state and measurement models are stationary\nfor each cluster, i.e. they are independent of $t$. In particular, for a given\nindividual, the relationship between the state and measurement should not\nchange over time.\n\nIn our main framework, inspired by the work of Chiappa and Barber[^1], we\nadditionally assume that the cluster-specific state initialisation is Gaussian,\ni.e. $p(z_1^i| c) = \\eta_d(z_1^i; m_c, S_c)$, and the cluster-specific state\nand measurement models are linear Gaussian, i.e.\n$p(z_t^i | z_{t-1}^i, c) = \\eta_d(z_t^i; z_{t-1}^iA_c, \\Gamma_c)$ and\n$p(x_t^i\n| z_t^i, c) = \\eta_\\ell(x_t^i; z_t^iH_c, \\Lambda_c)$, where\n$\\eta_d(\\cdot, \\mu,\n\\Sigma)$ denotes the multivariate $d$-dimensional Gaussian\ndensity with mean $\\mu$ and covariance $\\Sigma$, yielding:\n\n$$\np(z^i_{1:T}, x^i_{1:T})\n\t\t= \\sum_{c=1}^{n_c} \\pi_{c} \\delta_{ \\\\{c=c^i \\\\} }\n    \\bigg( \\eta_d(z_1^i; m_c, S_c)\n\t\t\\prod_{t=2}^T \\eta_d(z_t^i; z_{t-1}^iA_c, \\Gamma_c) \\prod_{t=1}^T\n\t\t\\eta_\\ell(x_t^i; z_t^iH_c, \\Lambda_c) \\bigg).\n$$\n\nIn particular, we assume that the variables we are modeling are continuous and\nchanging over time. When we train a model like the above, we take a dataset\n$\\mathcal{D}$ and an arbitrary set of cluster assignments $c^i$ (as these are\nalso latent/ hidden from us) and iteratively perform M and E steps (from which\nEM[^2] gets its name):\n\n- [**E**] Expectation step: given the current model, we assign each data\n  instance $(z^i_{1:T}, x^i_{1:T})$ to the cluster to which it is mostly likely\n  to belong under the current model\n- [**M**] Maximization step: given the current cluster assignments, we compute\n  the sample-level cluster assignment probabilities (the $\\pi_c$) and optimal\n  cluster-specific parameters\n\nOptimization completes after a fixed (large) number of steps or when no data\ninstances change their cluster assignment at a given iteration.\n\n### Adapting the code for your own use\n\nA typical workflow is described at:\n[https://github.com/burkh4rt/Unsupervised-Trajectory-Clustering-Starter](https://github.com/burkh4rt/Unsupervised-Trajectory-Clustering-Starter)\n\n### Caveats & Troubleshooting\n\nSome efforts have been made to automatically handle edge cases. For a given\ntraining run, if any cluster becomes too small (fewer than 3 members), training\nterminates. In order to learn a model, we make assumptions about our training\ndata as described above. While our approach seems to be robust to some types of\nmodel misspecification, we have encountered training issues with the following\nproblems:\n\n1. Extreme outliers. An extreme outlier tends to want to form its own cluster\n   (and that's problematic). In many cases this may be due to a typo or failed\n   data-cleaning (i.e. an upstream problem). Generating histograms of each\n   feature is one way to recognise this problem.\n2. Discrete / static features. Including discrete data violates our Gaussian\n   assumptions. If we learn a cluster where each trajectory has the same value\n   for one of the states or observations at a given time step, then we are\n   prone to estimating a singular covariance structure for this cluster which\n   yields numerical instabilities. Adding a small bit of noise to discrete\n   features may remediate numerical instability to some extent.\n\nAnother assumption that is easy-to-violate is our stationarity assumption for\nthe measurement model.\n\n[^1]:\n    S. Chiappa and D. Barber. _Dirichlet Mixtures of Bayesian Linear Gaussian\n    State-Space Models: a Variational Approach._ Tech. rep. 161. Max Planck\n    Institute for Biological Cybernetics, 2007.\n\n[^2]:\n    A. Dempster, N. Laird, and D. B. Rubin. _Maximum Likelihood from  \n    Incomplete Data via the EM Algorithm._ J. Roy. Stat. Soc. Ser. B (Stat.\n    Methodol.) 39.1 (1977), pp. 1\u201338.\n\n<!--\nrm dist/*\nisort --profile black .\nblack .\nprettier --write --print-width 79 --prose-wrap always **/*.md\npython3 -m build\ntwine upload -s  -r pypi dist/*\n# twine upload -r testpypi dist/*\n-->\n",
    "bugtrack_url": null,
    "license": "MIT License  (c) 2023-2024 Michael C. Burkhart  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "trains mixtures of state space models with expectation maximization",
    "version": "2024.2.1",
    "project_urls": {
        "Homepage": "https://pypi.org/project/unsupervised-multimodal-trajectory-modeling/",
        "Repository": "https://github.com/burkh4rt/Unsupervised-Multimodal-Trajectory-Modeling"
    },
    "split_keywords": [
        "unsupervised clustering",
        "trajectory clustering",
        "mixture models",
        "state space models",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "21d3c43b8571608c759585d0572bf3a42fb02aa04021c7302aa64724cf98bd09",
                "md5": "a31a5eb963d29bce56bf1952a5a5e20a",
                "sha256": "dbb5d09bbcdddd7bcfa6b4e861898de3b9160eb0627ddca27bac4c5c3e344829"
            },
            "downloads": -1,
            "filename": "unsupervised_multimodal_trajectory_modeling-2024.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a31a5eb963d29bce56bf1952a5a5e20a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 44445,
            "upload_time": "2024-03-13T15:24:11",
            "upload_time_iso_8601": "2024-03-13T15:24:11.314109Z",
            "url": "https://files.pythonhosted.org/packages/21/d3/c43b8571608c759585d0572bf3a42fb02aa04021c7302aa64724cf98bd09/unsupervised_multimodal_trajectory_modeling-2024.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a830f02f903ac2c20a8776a16383f7b97e9c37a5ebe4ef89036904e5ee5ccdea",
                "md5": "111cf03c8136994c8766356d2d14b3ea",
                "sha256": "c92925c6b4e5ffc05357f659cd7f6f27ef5f4739a581ca7f97edb95d3bd705bb"
            },
            "downloads": -1,
            "filename": "unsupervised-multimodal-trajectory-modeling-2024.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "111cf03c8136994c8766356d2d14b3ea",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 40899,
            "upload_time": "2024-03-13T15:24:12",
            "upload_time_iso_8601": "2024-03-13T15:24:12.514549Z",
            "url": "https://files.pythonhosted.org/packages/a8/30/f02f903ac2c20a8776a16383f7b97e9c37a5ebe4ef89036904e5ee5ccdea/unsupervised-multimodal-trajectory-modeling-2024.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-13 15:24:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "burkh4rt",
    "github_project": "Unsupervised-Multimodal-Trajectory-Modeling",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "unsupervised-multimodal-trajectory-modeling"
}