MSig
====
MSig is a statistical framework for evaluating the significance of motifs with arbitrary multivariate order, accommodating different variable types.
Highlights
----------
- **Pattern probability**: Estimates the probability of a motif occurring by chance using a null model.
- **Significance**: Calculates motif significance based on binomial tails to assess motif recurrence in time series data.
Installation
------------
You can install the package using pip:
.. code-block:: bash
pip install msig
Usage
-----
Here is an example of how to use MSig:
.. code-block:: python
from MSig import Motif, NullModel
import numpy as np
# Load your data
ts1 = [1, 3, 3, 5, 5, 2, 3, 3, 5, 5, 3, 3, 5, 4, 4]
ts2 = [4.3, 4.5, 2.6, 3.0, 3.0, 1.7, 4.9, 2.9, 3.3, 1.9, 4.9, 2.5, 3.1, 1.8, 0.3]
ts3 = ["A", "D", "B", "D", "A", "A", "A" ,"C", "C", "B", "D", "D", "C", "A", "A"]
ts4 = ["T", "L", "T", "Z", "Z", "T", "L", "T", "Z", "T", "L", "T", "Z", "L", "L"]
data = np.stack([np.asarray(ts1, dtype=int), np.asarray(ts2, dtype=float), np.asarray(ts3, dtype=str), np.asarray(ts4, dtype=str)])
m, n = data.shape # data with shape (m=4 x n=15)
# Create the null model
model = NullModel(data, dtypes=[int, float, str, str], model="empirical")
# Identify the motif of length p=3 with three matches (at indices 1, 6, and 10) spanning the first, second, and fourth variables
# with a maximum deviation threshold of δ = 0.5.
vars = np.array([0, 1, 3])
motif_subsequence = data[vars, 1:4]
motif_subsequence
# Obtain the null probability of the motif
motif = Motif(motif_subsequence, vars, np.array([0, 0.5, 0, 0]), n_matches=3)
probability = motif.set_pattern_probability(model, vars_indep=True)
# Calculate the significance of the motif
p = len(motif_subsequence[0]) # length of the motif
max_possible_matches = n - p + 1 # maximum number of possible matches
pvalue = motif.set_significance(max_possible_matches, data_n_variables=m, idd_correction=False)
Authors
-------
- **Miguel G. Silva** - `<https://github.com/MiguelGarcaoSilva>`
- **Rui Henriques** - `<https://web.ist.utl.pt/rmch>`
- **Sara C. Madeira** - `<https://saracmadeira.wordpress.com>`
Acknowledgements
----------------
This work was partially funded by Fundação para a Ciência e a Tecnologia (FCT) through LASIGE Research Unit, UIDB/00408/2020 (`https://doi.org/10.54499/UIDB/00408/2020`_), UIDP/00408/2020 (`https://doi.org/10.54499/UIDP/00408/2020`_), INESC-ID Pluriannual, UIDB/50021/2020 (`https://doi.org/10.54499/UIDB/50021/2020`_), and a PhD research scholarship UIBD/153086/2022 to Miguel G. Silva.
.. _https://doi.org/10.54499/UIDB/00408/2020:
.. _https://doi.org/10.54499/UIDP/00408/2020:
.. _https://doi.org/10.54499/UIDB/50021/2020:
How to cite
-----------
If you use this method in your research, please cite the following paper: Paper available soon.
License
-------
This project is licensed under the MIT License - see the `LICENSE <LICENSE>`_ file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/MiguelGarcaoSilva/msig",
"name": "msig",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "time series motifs",
"author": "Miguel G. Silva",
"author_email": "mmsilva@ciencias.ulisboa.pt",
"download_url": "https://files.pythonhosted.org/packages/6e/39/b2b62b0a5af5d6f8f5bdc9007f20370ff1423effc77285f10cefea29806d/msig-0.1.1.tar.gz",
"platform": null,
"description": "MSig\n====\n\nMSig is a statistical framework for evaluating the significance of motifs with arbitrary multivariate order, accommodating different variable types.\n\nHighlights\n----------\n\n- **Pattern probability**: Estimates the probability of a motif occurring by chance using a null model.\n- **Significance**: Calculates motif significance based on binomial tails to assess motif recurrence in time series data.\n\nInstallation\n------------\n\nYou can install the package using pip:\n\n.. code-block:: bash\n\n pip install msig\n\nUsage\n-----\n\nHere is an example of how to use MSig:\n\n.. code-block:: python\n\n from MSig import Motif, NullModel\n import numpy as np\n\n # Load your data\n ts1 = [1, 3, 3, 5, 5, 2, 3, 3, 5, 5, 3, 3, 5, 4, 4]\n ts2 = [4.3, 4.5, 2.6, 3.0, 3.0, 1.7, 4.9, 2.9, 3.3, 1.9, 4.9, 2.5, 3.1, 1.8, 0.3]\n ts3 = [\"A\", \"D\", \"B\", \"D\", \"A\", \"A\", \"A\" ,\"C\", \"C\", \"B\", \"D\", \"D\", \"C\", \"A\", \"A\"]\n ts4 = [\"T\", \"L\", \"T\", \"Z\", \"Z\", \"T\", \"L\", \"T\", \"Z\", \"T\", \"L\", \"T\", \"Z\", \"L\", \"L\"]\n data = np.stack([np.asarray(ts1, dtype=int), np.asarray(ts2, dtype=float), np.asarray(ts3, dtype=str), np.asarray(ts4, dtype=str)])\n m, n = data.shape # data with shape (m=4 x n=15)\n\n # Create the null model\n model = NullModel(data, dtypes=[int, float, str, str], model=\"empirical\")\n\n # Identify the motif of length p=3 with three matches (at indices 1, 6, and 10) spanning the first, second, and fourth variables\n # with a maximum deviation threshold of \u03b4 = 0.5.\n vars = np.array([0, 1, 3])\n motif_subsequence = data[vars, 1:4]\n motif_subsequence\n\n # Obtain the null probability of the motif\n motif = Motif(motif_subsequence, vars, np.array([0, 0.5, 0, 0]), n_matches=3)\n probability = motif.set_pattern_probability(model, vars_indep=True)\n\n # Calculate the significance of the motif\n p = len(motif_subsequence[0]) # length of the motif\n max_possible_matches = n - p + 1 # maximum number of possible matches\n pvalue = motif.set_significance(max_possible_matches, data_n_variables=m, idd_correction=False)\n\nAuthors\n-------\n\n- **Miguel G. Silva** - `<https://github.com/MiguelGarcaoSilva>`\n- **Rui Henriques** - `<https://web.ist.utl.pt/rmch>`\n- **Sara C. Madeira** - `<https://saracmadeira.wordpress.com>`\n\nAcknowledgements\n----------------\n\nThis work was partially funded by Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia (FCT) through LASIGE Research Unit, UIDB/00408/2020 (`https://doi.org/10.54499/UIDB/00408/2020`_), UIDP/00408/2020 (`https://doi.org/10.54499/UIDP/00408/2020`_), INESC-ID Pluriannual, UIDB/50021/2020 (`https://doi.org/10.54499/UIDB/50021/2020`_), and a PhD research scholarship UIBD/153086/2022 to Miguel G. Silva.\n\n.. _https://doi.org/10.54499/UIDB/00408/2020:\n.. _https://doi.org/10.54499/UIDP/00408/2020:\n.. _https://doi.org/10.54499/UIDB/50021/2020:\n\nHow to cite\n-----------\n\nIf you use this method in your research, please cite the following paper: Paper available soon.\n\nLicense\n-------\n\nThis project is licensed under the MIT License - see the `LICENSE <LICENSE>`_ file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Statistical Significance Criteria for multivariate Time Series Motifs",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/MiguelGarcaoSilva/msig"
},
"split_keywords": [
"time",
"series",
"motifs"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6e39b2b62b0a5af5d6f8f5bdc9007f20370ff1423effc77285f10cefea29806d",
"md5": "d8d41359525cc24de4d61dbc736c300f",
"sha256": "33e1e0819dd48c8091327a33dfdc3cfbb4d1c168dbabdf2ed26ef3783c012f1e"
},
"downloads": -1,
"filename": "msig-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "d8d41359525cc24de4d61dbc736c300f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5419,
"upload_time": "2024-07-03T18:27:16",
"upload_time_iso_8601": "2024-07-03T18:27:16.474705Z",
"url": "https://files.pythonhosted.org/packages/6e/39/b2b62b0a5af5d6f8f5bdc9007f20370ff1423effc77285f10cefea29806d/msig-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-03 18:27:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MiguelGarcaoSilva",
"github_project": "msig",
"github_not_found": true,
"lcname": "msig"
}