vlmc


Namevlmc JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryVariable Length Markov Chain
upload_time2023-06-07 16:09:20
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Variable Length Markov Model (VLMC)

[![Downloads](https://pepy.tech/badge/vlmc)](https://pepy.tech/project/vlmc) 
[![PyPI version](https://badge.fury.io/py/vlmc.svg)](https://pypi.org/project/vlmc/)

Implementation of Variable Length Markov Chains (VLMC) for Python.
Suffix tree building is done top-down using the [Peres-Shield](https://link.springer.com/chapter/10.1007/11557067_24) order estimation method.
It is written in Rust with Python Bindings.

##### Contents
  - [Installation](#installation)
    * [Compiling from source](#compilation-from-source)  
  - [Usage](#usage)
    - [`fit`](#fit)
    - [`suffix`](#get_suffix)
    - [`counts`](#get_counts)
    - [`distribution`](#get_distribution)
    - [`contexts`](#get_contexts)
  - [Future](#todo)


## Installation

Pre-built packages for many Linux, Windows, and OSX systems are available
in [PyPI](https://pypi.org/project/vlmc/) and can be installed with:

```sh
pip install vlmc
```
On uncommon architectures, you may need to first
[install Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) before running `pip install vlmc`.
### Compilation from source

In order to compile from source you will need to [install Rust/Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) and [maturin](https://github.com/PyO3/maturin#maturin) for the python bindings.
Maturin is best used within a Python virtual environment:

```sh
# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/antonio-leitao/vlmc.git
cd vlmc
# build and install the package:
maturin develop --release
```

# Usage

```python
import vlmc
tree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)
```
Parameters:
- `max_depth`: Maximum depth of tree. Subsequences whose length exceed the `max_depth` will not be considered nor counted. 
- `alphabet_size`: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors. 
- `n_jobs`: Number of subprocesses to spawn when running the vlmc. Choose `-1` for using all available processes.  

### `fit`

> **Note**
> fit method returns `None` and not `self`. This is by design as to not expose the rust object to python.

```python
data = [
  [1,2,3],
  [2,3],
  [1,0,1],
  [2]
]

tree.fit(data)
```

Arguments:
- `data`: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form `0` to `alphabet_size`. List is expected to be two dimensional.

### `get_suffix`
Given a sequence, returns the longest suffix that is present in the VLMC.

```python
suffix = tree.get_suffix(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete varaibles. 

Returns:
- `suffix` : longest suffix of sequence that is present in the VLMC. 

### `get_counts`
Gets the total number of occurences of a given sequence of integers.
Will throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.

```python
counts = tree.get_counts(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete varaibles.
 
Returns:
- `counts` : integer 

### `get_distribution`
Gets the vector of probabilities over the entire alphabet for the given sequence.
Will throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.

```python
probabilities = tree.get_distribution(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete variables. 

Returns:
- `probabilities` : list of floats representing the probability of observing a specific state (index) as the next symbol.

### `get_contexts`

```python
contexts = tree.get_contexts()
```
Returns:
- `contexts`: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.

# TODO
### Paralelization
After experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length.
Hashmaps are then joined from longest to smallest.
The hashmap at `max_depth + 1` can be discarded after.
Could be very fast depending on merging algo.



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vlmc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Ant\u00f3nio Leit\u00e3o <aleitao@novaims.unl.pt>",
    "download_url": "https://files.pythonhosted.org/packages/d5/87/a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd/vlmc-0.2.0.tar.gz",
    "platform": null,
    "description": "# Variable Length Markov Model (VLMC)\n\n[![Downloads](https://pepy.tech/badge/vlmc)](https://pepy.tech/project/vlmc) \n[![PyPI version](https://badge.fury.io/py/vlmc.svg)](https://pypi.org/project/vlmc/)\n\nImplementation of Variable Length Markov Chains (VLMC) for Python.\nSuffix tree building is done top-down using the [Peres-Shield](https://link.springer.com/chapter/10.1007/11557067_24) order estimation method.\nIt is written in Rust with Python Bindings.\n\n##### Contents\n  - [Installation](#installation)\n    * [Compiling from source](#compilation-from-source)  \n  - [Usage](#usage)\n    - [`fit`](#fit)\n    - [`suffix`](#get_suffix)\n    - [`counts`](#get_counts)\n    - [`distribution`](#get_distribution)\n    - [`contexts`](#get_contexts)\n  - [Future](#todo)\n\n\n## Installation\n\nPre-built packages for many Linux, Windows, and OSX systems are available\nin [PyPI](https://pypi.org/project/vlmc/) and can be installed with:\n\n```sh\npip install vlmc\n```\nOn uncommon architectures, you may need to first\n[install Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) before running `pip install vlmc`.\n### Compilation from source\n\nIn order to compile from source you will need to [install Rust/Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) and [maturin](https://github.com/PyO3/maturin#maturin) for the python bindings.\nMaturin is best used within a Python virtual environment:\n\n```sh\n# activate your desired virtual environment first, then:\npip install maturin\ngit clone https://github.com/antonio-leitao/vlmc.git\ncd vlmc\n# build and install the package:\nmaturin develop --release\n```\n\n# Usage\n\n```python\nimport vlmc\ntree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)\n```\nParameters:\n- `max_depth`: Maximum depth of tree. Subsequences whose length exceed the `max_depth` will not be considered nor counted. \n- `alphabet_size`: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors. \n- `n_jobs`: Number of subprocesses to spawn when running the vlmc. Choose `-1` for using all available processes.  \n\n### `fit`\n\n> **Note**\n> fit method returns `None` and not `self`. This is by design as to not expose the rust object to python.\n\n```python\ndata = [\n  [1,2,3],\n  [2,3],\n  [1,0,1],\n  [2]\n]\n\ntree.fit(data)\n```\n\nArguments:\n- `data`: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form `0` to `alphabet_size`. List is expected to be two dimensional.\n\n### `get_suffix`\nGiven a sequence, returns the longest suffix that is present in the VLMC.\n\n```python\nsuffix = tree.get_suffix(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete varaibles. \n\nReturns:\n- `suffix` : longest suffix of sequence that is present in the VLMC. \n\n### `get_counts`\nGets the total number of occurences of a given sequence of integers.\nWill throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.\n\n```python\ncounts = tree.get_counts(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete varaibles.\n \nReturns:\n- `counts` : integer \n\n### `get_distribution`\nGets the vector of probabilities over the entire alphabet for the given sequence.\nWill throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.\n\n```python\nprobabilities = tree.get_distribution(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete variables. \n\nReturns:\n- `probabilities` : list of floats representing the probability of observing a specific state (index) as the next symbol.\n\n### `get_contexts`\n\n```python\ncontexts = tree.get_contexts()\n```\nReturns:\n- `contexts`: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.\n\n# TODO\n### Paralelization\nAfter experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length.\nHashmaps are then joined from longest to smallest.\nThe hashmap at `max_depth + 1` can be discarded after.\nCould be very fast depending on merging algo.\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Variable Length Markov Chain",
    "version": "0.2.0",
    "project_urls": {
        "homepage": "https://github.com/antonio-leitao/vlmc",
        "repository": "https://github.com/antonio-leitao/vlmc"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8344794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb",
                "md5": "a6f50696ea26e2984ed0e901689bad5b",
                "sha256": "486559533902402343611df85c7dfb3c90b26a974747424ca6e4bcf68d20ed0a"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "a6f50696ea26e2984ed0e901689bad5b",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 274477,
            "upload_time": "2023-06-07T16:08:58",
            "upload_time_iso_8601": "2023-06-07T16:08:58.059952Z",
            "url": "https://files.pythonhosted.org/packages/83/44/794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb/vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e1ce97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca",
                "md5": "521e23505698cb002ac68e723cbb97c9",
                "sha256": "c7d0a4bc012de1fcb3b52077cdb573afd828ffbbdf9f43dbcbcebbbf30286148"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "521e23505698cb002ac68e723cbb97c9",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 263740,
            "upload_time": "2023-06-07T16:09:00",
            "upload_time_iso_8601": "2023-06-07T16:09:00.807317Z",
            "url": "https://files.pythonhosted.org/packages/e1/ce/97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca/vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "02c5b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980",
                "md5": "892687e4ac3e18d52f91f53640bc3f27",
                "sha256": "3c344f2482017f07c83d550b3288fec59f5133500edf76e5ede036aa0bcaed06"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "892687e4ac3e18d52f91f53640bc3f27",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1091786,
            "upload_time": "2023-06-07T16:09:02",
            "upload_time_iso_8601": "2023-06-07T16:09:02.955566Z",
            "url": "https://files.pythonhosted.org/packages/02/c5/b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980/vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a8c6cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b",
                "md5": "41e6857c1f40a6daaf9cba6614544e2f",
                "sha256": "b1288e1fc720ba68339baf89dd5de3d29a8bc72e0fcc136b350078b3d16d58e4"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
            "has_sig": false,
            "md5_digest": "41e6857c1f40a6daaf9cba6614544e2f",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1098823,
            "upload_time": "2023-06-07T16:09:05",
            "upload_time_iso_8601": "2023-06-07T16:09:05.403720Z",
            "url": "https://files.pythonhosted.org/packages/a8/c6/cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b/vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "00e79752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9",
                "md5": "8ee302a0d5b8393d9811102d47120a0a",
                "sha256": "959fd2d844dabfd649efc9b35f89947a48dba506188fd8797ebd97cc892589be"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
            "has_sig": false,
            "md5_digest": "8ee302a0d5b8393d9811102d47120a0a",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1214686,
            "upload_time": "2023-06-07T16:09:07",
            "upload_time_iso_8601": "2023-06-07T16:09:07.805487Z",
            "url": "https://files.pythonhosted.org/packages/00/e7/9752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9/vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0e0be75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664",
                "md5": "bd363d38a59170dac5175a262a811153",
                "sha256": "f840b7909afb3df6b3c1d424c6994339169106a2a8ebba3e3ece47d5fa6995ad"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
            "has_sig": false,
            "md5_digest": "bd363d38a59170dac5175a262a811153",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1270370,
            "upload_time": "2023-06-07T16:09:09",
            "upload_time_iso_8601": "2023-06-07T16:09:09.817274Z",
            "url": "https://files.pythonhosted.org/packages/0e/0b/e75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664/vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "09319d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9",
                "md5": "09e7e391bef3c6a6d946194dae6e9f5d",
                "sha256": "d811af5a679980e64a8b90214a55f1538f710660ac9dfeaa802aede631871828"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "09e7e391bef3c6a6d946194dae6e9f5d",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1101380,
            "upload_time": "2023-06-07T16:09:12",
            "upload_time_iso_8601": "2023-06-07T16:09:12.043460Z",
            "url": "https://files.pythonhosted.org/packages/09/31/9d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9/vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "174dd28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e",
                "md5": "98ff3a83dc31c11635de09931d7c30ec",
                "sha256": "97ed9dbd341a47250eabb863984faa279639a492e95d4f75fe727684c26b2dbb"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl",
            "has_sig": false,
            "md5_digest": "98ff3a83dc31c11635de09931d7c30ec",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 1122731,
            "upload_time": "2023-06-07T16:09:14",
            "upload_time_iso_8601": "2023-06-07T16:09:14.606009Z",
            "url": "https://files.pythonhosted.org/packages/17/4d/d28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e/vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "25c422d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e",
                "md5": "ac66fdec8007146f540414701cd69b53",
                "sha256": "28a0b37965c5732858493e0eaf5b062ae596045f706edc12cf802d0c9ec418f8"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-win32.whl",
            "has_sig": false,
            "md5_digest": "ac66fdec8007146f540414701cd69b53",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 143140,
            "upload_time": "2023-06-07T16:09:16",
            "upload_time_iso_8601": "2023-06-07T16:09:16.577201Z",
            "url": "https://files.pythonhosted.org/packages/25/c4/22d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e/vlmc-0.2.0-cp37-abi3-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e07a7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201",
                "md5": "1e24cc42683d8f0fd364c1a3cbcd33f9",
                "sha256": "1333ba076280e4d66e25b052b198f807edb86f41848e28e201f4aff4b50f5d5f"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0-cp37-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "1e24cc42683d8f0fd364c1a3cbcd33f9",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.10",
            "size": 144633,
            "upload_time": "2023-06-07T16:09:18",
            "upload_time_iso_8601": "2023-06-07T16:09:18.448610Z",
            "url": "https://files.pythonhosted.org/packages/e0/7a/7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201/vlmc-0.2.0-cp37-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d587a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd",
                "md5": "ef7248b77f63a7220539ba3be1681a44",
                "sha256": "97b3543787fb608fe18ddd3b4c806efe8fb486fc1066461d4e03f5b4d4889571"
            },
            "downloads": -1,
            "filename": "vlmc-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ef7248b77f63a7220539ba3be1681a44",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8852,
            "upload_time": "2023-06-07T16:09:20",
            "upload_time_iso_8601": "2023-06-07T16:09:20.145816Z",
            "url": "https://files.pythonhosted.org/packages/d5/87/a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd/vlmc-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-07 16:09:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "antonio-leitao",
    "github_project": "vlmc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vlmc"
}
        
Elapsed time: 0.06924s