gsp-python


Namegsp-python JSON
Version 0.0.10 PyPI version JSON
download
home_page
SummaryGSP Python implementation
upload_time2023-08-11 15:29:38
maintainer
docs_urlNone
authorSlocon
requires_python
license
keywords python gsp data mining sequential pattern mining seuence mining
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# GSP-python [![](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/)

## A Python implementation of the Generalized Sequential Patterns (GSP) algorithm for sequential pattern mining

This project implements the Generalized Sequential Patterns (GSP) algorithm to find frequent sequences within a given
dataset. This implementation includes parameters for the _mingap_, _maxgap_, and _maxspan_ time constraints.

The project also features a simple dataset generator.

---

## Installation

The package can be installed via pip:

```
python3 -m pip install gsp_python
```

---

## Usage

The GSP algorithm and the dataset generator can be executed either from the command line or by importing the package modules in a script.

---

### From the command line

To run the GSP algorithm:

```
python3 -m gsp_python GSP infile outfile minsup -t maxgap mingap maxspan
```
Where:

- `infile`: specifies the path of the file containing the dataset from which sequences must be mined from. The file must be a text file in which each data-sequence is terminated by ' -2', each element is terminated by ' -1', and each event is separated by a space.
- `outfile`: specifies the path of the output file where the result will be printed to. It will contain all frequent sequences found, each paired with their support count.
- `minsup`: specifies the minimum support threshold used during execution.
- `-t maxgap mingap maxspan` (optional): specifies the _maxgap_, _mingap_, and _maxspan_ values used during execution. If not specified, the default values of _inf_, 0, and _inf_ will be used instead.

For more information about additional optional arguments, type:

```
python3 -m gsp_python GSP -h
```

---

To generate a random dataset:

```
python3 -m gsp_python DatasetGen outfile size nevents maxevents avgelems
```
Where:

- `outfile`: specifies the path of the output file where the dataset will be printed to. The format used is the same as the one accepted as input for the algorithm above.
- `size`: specifies the number of data-sequences.
- `nevents`: specifies the number of unique events.
- `maxevents`: specifies the maximum number of events per element.
- `avgelems`: specifies the average number of elements per data-sequence.

For more information about additional optional arguments, type:

```
python3 -m gsp_python DatasetGen -h
```

---

### From within a script

To run the GSP algorithm, use `gsp_python.gsp.GSP()` to create and initialize a `GSP` object, providing the required arguments; then, call method `run_gsp()` to execute the algorithm (the result is returned as a list of tuples, each pairing a sequence with its support count).

An example is given below:

```python
from gsp_python.gsp import load_ds
from gsp_python.gsp import GSP

dataset, dict1, dict2 = load_ds("path/to/file.txt")

algo_gsp = GSP(dataset, minsup=0.3, mingap=1, maxgap=2, maxspan=5)
output = algo_gsp.run_gsp()
```

Method `load_ds()` loads the dataset contained in the file at the specified path (provided that it follows the format explained above), converting all events to integers. It also returns the dictionary (here assigned to `dict1`) that can be used to convert each integer back to the corresponding event.

---

To generate a random dataset, use `gsp_python.dataset_gen.DatasetGenerator()` to create and initialize a `DatasetGenerator()` object, providing the required arguments; then, call method `generate_sequence_dataset()` to generate a dataset (the dataset is returned as a `list[list[list[int]]]`).

An example is given below:

```python
from gsp_python.dataset_gen import DatasetGenerator

algo_dsgen = DatasetGenerator(size=100, nevents=8, maxevents=4, avgelems=16)
algo_dsgen.generate_sequence_dataset()
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "gsp-python",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,gsp,data mining,sequential pattern mining,seuence mining",
    "author": "Slocon",
    "author_email": "<79758160+Slocon00@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/4e/e1/e2d3aa31e6f7972e2d03fd58222444603cc7a6da219780a65994f5d39af1/gsp_python-0.0.10.tar.gz",
    "platform": null,
    "description": "\n# GSP-python [![](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/)\n\n## A Python implementation of the Generalized Sequential Patterns (GSP) algorithm for sequential pattern mining\n\nThis project implements the Generalized Sequential Patterns (GSP) algorithm to find frequent sequences within a given\ndataset. This implementation includes parameters for the _mingap_, _maxgap_, and _maxspan_ time constraints.\n\nThe project also features a simple dataset generator.\n\n---\n\n## Installation\n\nThe package can be installed via pip:\n\n```\npython3 -m pip install gsp_python\n```\n\n---\n\n## Usage\n\nThe GSP algorithm and the dataset generator can be executed either from the command line or by importing the package modules in a script.\n\n---\n\n### From the command line\n\nTo run the GSP algorithm:\n\n```\npython3 -m gsp_python GSP infile outfile minsup -t maxgap mingap maxspan\n```\nWhere:\n\n- `infile`: specifies the path of the file containing the dataset from which sequences must be mined from. The file must be a text file in which each data-sequence is terminated by ' -2', each element is terminated by ' -1', and each event is separated by a space.\n- `outfile`: specifies the path of the output file where the result will be printed to. It will contain all frequent sequences found, each paired with their support count.\n- `minsup`: specifies the minimum support threshold used during execution.\n- `-t maxgap mingap maxspan` (optional): specifies the _maxgap_, _mingap_, and _maxspan_ values used during execution. If not specified, the default values of _inf_, 0, and _inf_ will be used instead.\n\nFor more information about additional optional arguments, type:\n\n```\npython3 -m gsp_python GSP -h\n```\n\n---\n\nTo generate a random dataset:\n\n```\npython3 -m gsp_python DatasetGen outfile size nevents maxevents avgelems\n```\nWhere:\n\n- `outfile`: specifies the path of the output file where the dataset will be printed to. The format used is the same as the one accepted as input for the algorithm above.\n- `size`: specifies the number of data-sequences.\n- `nevents`: specifies the number of unique events.\n- `maxevents`: specifies the maximum number of events per element.\n- `avgelems`: specifies the average number of elements per data-sequence.\n\nFor more information about additional optional arguments, type:\n\n```\npython3 -m gsp_python DatasetGen -h\n```\n\n---\n\n### From within a script\n\nTo run the GSP algorithm, use `gsp_python.gsp.GSP()` to create and initialize a `GSP` object, providing the required arguments; then, call method `run_gsp()` to execute the algorithm (the result is returned as a list of tuples, each pairing a sequence with its support count).\n\nAn example is given below:\n\n```python\nfrom gsp_python.gsp import load_ds\nfrom gsp_python.gsp import GSP\n\ndataset, dict1, dict2 = load_ds(\"path/to/file.txt\")\n\nalgo_gsp = GSP(dataset, minsup=0.3, mingap=1, maxgap=2, maxspan=5)\noutput = algo_gsp.run_gsp()\n```\n\nMethod `load_ds()` loads the dataset contained in the file at the specified path (provided that it follows the format explained above), converting all events to integers. It also returns the dictionary (here assigned to `dict1`) that can be used to convert each integer back to the corresponding event.\n\n---\n\nTo generate a random dataset, use `gsp_python.dataset_gen.DatasetGenerator()` to create and initialize a `DatasetGenerator()` object, providing the required arguments; then, call method `generate_sequence_dataset()` to generate a dataset (the dataset is returned as a `list[list[list[int]]]`).\n\nAn example is given below:\n\n```python\nfrom gsp_python.dataset_gen import DatasetGenerator\n\nalgo_dsgen = DatasetGenerator(size=100, nevents=8, maxevents=4, avgelems=16)\nalgo_dsgen.generate_sequence_dataset()\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "GSP Python implementation",
    "version": "0.0.10",
    "project_urls": {
        "Source": "https://github.com/Slocon00/GSP-python"
    },
    "split_keywords": [
        "python",
        "gsp",
        "data mining",
        "sequential pattern mining",
        "seuence mining"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c59a5d8eefdeac758804bebc51a9265290c5d324d76b997ac933141ff9dee776",
                "md5": "4cc264970a5d4abc26fa2b647e3a7cfa",
                "sha256": "5432d72a69f6a3be29e0680327f6f101be0ac2cdebe2e1730071296e87420612"
            },
            "downloads": -1,
            "filename": "gsp_python-0.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4cc264970a5d4abc26fa2b647e3a7cfa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10960,
            "upload_time": "2023-08-11T15:29:37",
            "upload_time_iso_8601": "2023-08-11T15:29:37.364271Z",
            "url": "https://files.pythonhosted.org/packages/c5/9a/5d8eefdeac758804bebc51a9265290c5d324d76b997ac933141ff9dee776/gsp_python-0.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ee1e2d3aa31e6f7972e2d03fd58222444603cc7a6da219780a65994f5d39af1",
                "md5": "7f731a7aa99f233551dde3117b4aa8fc",
                "sha256": "47cde916d9f103582bac5e3b356566425859e8e48ceac6f1d35d360305b1ea67"
            },
            "downloads": -1,
            "filename": "gsp_python-0.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "7f731a7aa99f233551dde3117b4aa8fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11105,
            "upload_time": "2023-08-11T15:29:38",
            "upload_time_iso_8601": "2023-08-11T15:29:38.650336Z",
            "url": "https://files.pythonhosted.org/packages/4e/e1/e2d3aa31e6f7972e2d03fd58222444603cc7a6da219780a65994f5d39af1/gsp_python-0.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-11 15:29:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Slocon00",
    "github_project": "GSP-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gsp-python"
}
        
Elapsed time: 0.10561s