Name | vlmc JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | Variable Length Markov Chain |
upload_time | 2023-06-07 16:09:20 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Variable Length Markov Model (VLMC)
[![Downloads](https://pepy.tech/badge/vlmc)](https://pepy.tech/project/vlmc)
[![PyPI version](https://badge.fury.io/py/vlmc.svg)](https://pypi.org/project/vlmc/)
Implementation of Variable Length Markov Chains (VLMC) for Python.
Suffix tree building is done top-down using the [Peres-Shield](https://link.springer.com/chapter/10.1007/11557067_24) order estimation method.
It is written in Rust with Python Bindings.
##### Contents
- [Installation](#installation)
* [Compiling from source](#compilation-from-source)
- [Usage](#usage)
- [`fit`](#fit)
- [`suffix`](#get_suffix)
- [`counts`](#get_counts)
- [`distribution`](#get_distribution)
- [`contexts`](#get_contexts)
- [Future](#todo)
## Installation
Pre-built packages for many Linux, Windows, and OSX systems are available
in [PyPI](https://pypi.org/project/vlmc/) and can be installed with:
```sh
pip install vlmc
```
On uncommon architectures, you may need to first
[install Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) before running `pip install vlmc`.
### Compilation from source
In order to compile from source you will need to [install Rust/Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) and [maturin](https://github.com/PyO3/maturin#maturin) for the python bindings.
Maturin is best used within a Python virtual environment:
```sh
# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/antonio-leitao/vlmc.git
cd vlmc
# build and install the package:
maturin develop --release
```
# Usage
```python
import vlmc
tree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)
```
Parameters:
- `max_depth`: Maximum depth of tree. Subsequences whose length exceed the `max_depth` will not be considered nor counted.
- `alphabet_size`: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors.
- `n_jobs`: Number of subprocesses to spawn when running the vlmc. Choose `-1` for using all available processes.
### `fit`
> **Note**
> fit method returns `None` and not `self`. This is by design as to not expose the rust object to python.
```python
data = [
[1,2,3],
[2,3],
[1,0,1],
[2]
]
tree.fit(data)
```
Arguments:
- `data`: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form `0` to `alphabet_size`. List is expected to be two dimensional.
### `get_suffix`
Given a sequence, returns the longest suffix that is present in the VLMC.
```python
suffix = tree.get_suffix(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete varaibles.
Returns:
- `suffix` : longest suffix of sequence that is present in the VLMC.
### `get_counts`
Gets the total number of occurences of a given sequence of integers.
Will throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.
```python
counts = tree.get_counts(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete varaibles.
Returns:
- `counts` : integer
### `get_distribution`
Gets the vector of probabilities over the entire alphabet for the given sequence.
Will throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.
```python
probabilities = tree.get_distribution(sequence)
```
Arguments:
- `sequence`: list of integers representing a sequence of discrete variables.
Returns:
- `probabilities` : list of floats representing the probability of observing a specific state (index) as the next symbol.
### `get_contexts`
```python
contexts = tree.get_contexts()
```
Returns:
- `contexts`: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.
# TODO
### Paralelization
After experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length.
Hashmaps are then joined from longest to smallest.
The hashmap at `max_depth + 1` can be discarded after.
Could be very fast depending on merging algo.
Raw data
{
"_id": null,
"home_page": null,
"name": "vlmc",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Ant\u00f3nio Leit\u00e3o <aleitao@novaims.unl.pt>",
"download_url": "https://files.pythonhosted.org/packages/d5/87/a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd/vlmc-0.2.0.tar.gz",
"platform": null,
"description": "# Variable Length Markov Model (VLMC)\n\n[![Downloads](https://pepy.tech/badge/vlmc)](https://pepy.tech/project/vlmc) \n[![PyPI version](https://badge.fury.io/py/vlmc.svg)](https://pypi.org/project/vlmc/)\n\nImplementation of Variable Length Markov Chains (VLMC) for Python.\nSuffix tree building is done top-down using the [Peres-Shield](https://link.springer.com/chapter/10.1007/11557067_24) order estimation method.\nIt is written in Rust with Python Bindings.\n\n##### Contents\n - [Installation](#installation)\n * [Compiling from source](#compilation-from-source) \n - [Usage](#usage)\n - [`fit`](#fit)\n - [`suffix`](#get_suffix)\n - [`counts`](#get_counts)\n - [`distribution`](#get_distribution)\n - [`contexts`](#get_contexts)\n - [Future](#todo)\n\n\n## Installation\n\nPre-built packages for many Linux, Windows, and OSX systems are available\nin [PyPI](https://pypi.org/project/vlmc/) and can be installed with:\n\n```sh\npip install vlmc\n```\nOn uncommon architectures, you may need to first\n[install Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) before running `pip install vlmc`.\n### Compilation from source\n\nIn order to compile from source you will need to [install Rust/Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) and [maturin](https://github.com/PyO3/maturin#maturin) for the python bindings.\nMaturin is best used within a Python virtual environment:\n\n```sh\n# activate your desired virtual environment first, then:\npip install maturin\ngit clone https://github.com/antonio-leitao/vlmc.git\ncd vlmc\n# build and install the package:\nmaturin develop --release\n```\n\n# Usage\n\n```python\nimport vlmc\ntree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)\n```\nParameters:\n- `max_depth`: Maximum depth of tree. Subsequences whose length exceed the `max_depth` will not be considered nor counted. \n- `alphabet_size`: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors. \n- `n_jobs`: Number of subprocesses to spawn when running the vlmc. Choose `-1` for using all available processes. \n\n### `fit`\n\n> **Note**\n> fit method returns `None` and not `self`. This is by design as to not expose the rust object to python.\n\n```python\ndata = [\n [1,2,3],\n [2,3],\n [1,0,1],\n [2]\n]\n\ntree.fit(data)\n```\n\nArguments:\n- `data`: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form `0` to `alphabet_size`. List is expected to be two dimensional.\n\n### `get_suffix`\nGiven a sequence, returns the longest suffix that is present in the VLMC.\n\n```python\nsuffix = tree.get_suffix(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete varaibles. \n\nReturns:\n- `suffix` : longest suffix of sequence that is present in the VLMC. \n\n### `get_counts`\nGets the total number of occurences of a given sequence of integers.\nWill throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.\n\n```python\ncounts = tree.get_counts(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete varaibles.\n \nReturns:\n- `counts` : integer \n\n### `get_distribution`\nGets the vector of probabilities over the entire alphabet for the given sequence.\nWill throw a `KeyError` if the sequence is not a tree node. Consider using `get_suffix` to make sure to get a tree node.\n\n```python\nprobabilities = tree.get_distribution(sequence)\n```\nArguments:\n- `sequence`: list of integers representing a sequence of discrete variables. \n\nReturns:\n- `probabilities` : list of floats representing the probability of observing a specific state (index) as the next symbol.\n\n### `get_contexts`\n\n```python\ncontexts = tree.get_contexts()\n```\nReturns:\n- `contexts`: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.\n\n# TODO\n### Paralelization\nAfter experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length.\nHashmaps are then joined from longest to smallest.\nThe hashmap at `max_depth + 1` can be discarded after.\nCould be very fast depending on merging algo.\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Variable Length Markov Chain",
"version": "0.2.0",
"project_urls": {
"homepage": "https://github.com/antonio-leitao/vlmc",
"repository": "https://github.com/antonio-leitao/vlmc"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8344794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb",
"md5": "a6f50696ea26e2984ed0e901689bad5b",
"sha256": "486559533902402343611df85c7dfb3c90b26a974747424ca6e4bcf68d20ed0a"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "a6f50696ea26e2984ed0e901689bad5b",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 274477,
"upload_time": "2023-06-07T16:08:58",
"upload_time_iso_8601": "2023-06-07T16:08:58.059952Z",
"url": "https://files.pythonhosted.org/packages/83/44/794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb/vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e1ce97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca",
"md5": "521e23505698cb002ac68e723cbb97c9",
"sha256": "c7d0a4bc012de1fcb3b52077cdb573afd828ffbbdf9f43dbcbcebbbf30286148"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "521e23505698cb002ac68e723cbb97c9",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 263740,
"upload_time": "2023-06-07T16:09:00",
"upload_time_iso_8601": "2023-06-07T16:09:00.807317Z",
"url": "https://files.pythonhosted.org/packages/e1/ce/97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca/vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "02c5b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980",
"md5": "892687e4ac3e18d52f91f53640bc3f27",
"sha256": "3c344f2482017f07c83d550b3288fec59f5133500edf76e5ede036aa0bcaed06"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "892687e4ac3e18d52f91f53640bc3f27",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1091786,
"upload_time": "2023-06-07T16:09:02",
"upload_time_iso_8601": "2023-06-07T16:09:02.955566Z",
"url": "https://files.pythonhosted.org/packages/02/c5/b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980/vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a8c6cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b",
"md5": "41e6857c1f40a6daaf9cba6614544e2f",
"sha256": "b1288e1fc720ba68339baf89dd5de3d29a8bc72e0fcc136b350078b3d16d58e4"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"has_sig": false,
"md5_digest": "41e6857c1f40a6daaf9cba6614544e2f",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1098823,
"upload_time": "2023-06-07T16:09:05",
"upload_time_iso_8601": "2023-06-07T16:09:05.403720Z",
"url": "https://files.pythonhosted.org/packages/a8/c6/cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b/vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "00e79752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9",
"md5": "8ee302a0d5b8393d9811102d47120a0a",
"sha256": "959fd2d844dabfd649efc9b35f89947a48dba506188fd8797ebd97cc892589be"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"has_sig": false,
"md5_digest": "8ee302a0d5b8393d9811102d47120a0a",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1214686,
"upload_time": "2023-06-07T16:09:07",
"upload_time_iso_8601": "2023-06-07T16:09:07.805487Z",
"url": "https://files.pythonhosted.org/packages/00/e7/9752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9/vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0e0be75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664",
"md5": "bd363d38a59170dac5175a262a811153",
"sha256": "f840b7909afb3df6b3c1d424c6994339169106a2a8ebba3e3ece47d5fa6995ad"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"has_sig": false,
"md5_digest": "bd363d38a59170dac5175a262a811153",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1270370,
"upload_time": "2023-06-07T16:09:09",
"upload_time_iso_8601": "2023-06-07T16:09:09.817274Z",
"url": "https://files.pythonhosted.org/packages/0e/0b/e75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664/vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "09319d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9",
"md5": "09e7e391bef3c6a6d946194dae6e9f5d",
"sha256": "d811af5a679980e64a8b90214a55f1538f710660ac9dfeaa802aede631871828"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "09e7e391bef3c6a6d946194dae6e9f5d",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1101380,
"upload_time": "2023-06-07T16:09:12",
"upload_time_iso_8601": "2023-06-07T16:09:12.043460Z",
"url": "https://files.pythonhosted.org/packages/09/31/9d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9/vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "174dd28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e",
"md5": "98ff3a83dc31c11635de09931d7c30ec",
"sha256": "97ed9dbd341a47250eabb863984faa279639a492e95d4f75fe727684c26b2dbb"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl",
"has_sig": false,
"md5_digest": "98ff3a83dc31c11635de09931d7c30ec",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 1122731,
"upload_time": "2023-06-07T16:09:14",
"upload_time_iso_8601": "2023-06-07T16:09:14.606009Z",
"url": "https://files.pythonhosted.org/packages/17/4d/d28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e/vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "25c422d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e",
"md5": "ac66fdec8007146f540414701cd69b53",
"sha256": "28a0b37965c5732858493e0eaf5b062ae596045f706edc12cf802d0c9ec418f8"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-win32.whl",
"has_sig": false,
"md5_digest": "ac66fdec8007146f540414701cd69b53",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 143140,
"upload_time": "2023-06-07T16:09:16",
"upload_time_iso_8601": "2023-06-07T16:09:16.577201Z",
"url": "https://files.pythonhosted.org/packages/25/c4/22d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e/vlmc-0.2.0-cp37-abi3-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e07a7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201",
"md5": "1e24cc42683d8f0fd364c1a3cbcd33f9",
"sha256": "1333ba076280e4d66e25b052b198f807edb86f41848e28e201f4aff4b50f5d5f"
},
"downloads": -1,
"filename": "vlmc-0.2.0-cp37-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "1e24cc42683d8f0fd364c1a3cbcd33f9",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.10",
"size": 144633,
"upload_time": "2023-06-07T16:09:18",
"upload_time_iso_8601": "2023-06-07T16:09:18.448610Z",
"url": "https://files.pythonhosted.org/packages/e0/7a/7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201/vlmc-0.2.0-cp37-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d587a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd",
"md5": "ef7248b77f63a7220539ba3be1681a44",
"sha256": "97b3543787fb608fe18ddd3b4c806efe8fb486fc1066461d4e03f5b4d4889571"
},
"downloads": -1,
"filename": "vlmc-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "ef7248b77f63a7220539ba3be1681a44",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 8852,
"upload_time": "2023-06-07T16:09:20",
"upload_time_iso_8601": "2023-06-07T16:09:20.145816Z",
"url": "https://files.pythonhosted.org/packages/d5/87/a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd/vlmc-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-07 16:09:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "antonio-leitao",
"github_project": "vlmc",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vlmc"
}